Product updates

Explore the latest features and improvements in Y42.

Introducing CData connectors for data ingestion (Public Beta)

March 21, 2024

sources

cdata

Introducing CData connectors for data ingestion (Public Beta)

The CData connectors enable ingestion from databases & data warehouses, cloud & SaaS applications, flat files, and unstructured data sources, among others.

ℹ️ Please note that the CData connectors feature is currently in Beta and available for use. As we continue to refine and improve this feature, we encourage users to provide feedback on their experience. Keep in mind that during the beta phase, you may encounter unexpected behavior or changes.

We are rapidly expanding our data source coverage with the CData Connectors. The following are currently supported and ready to use:

Anomaly detection tests (Public Beta)

March 12, 2024

data quality

tests

anomaly detection

Anomaly detection tests (Public Beta)

You can now perform anomaly detection tests directly within your dbt assets through Y42's integration with Elementary, an open-source anomaly detection package designed to improve data quality. This integration enables the discovery of data issues, such as data freshness, volume, or column-level anomalies.

Setting up anomaly detection tests is similar to configuring dbt built-in tests, package tests (like dbt-expectations), or custom tests.

Supported anomaly tests: Column-level anomalies

ℹ️ Please note that the Anomaly detection tests feature is currently in Beta and available for use. As we continue to refine and improve this feature, we encourage users to provide feedback on their experience. Keep in mind that during the beta phase, you may encounter unexpected behavior or changes.

Continuous Integration (CI) checks for GitHub and GitLab

March 7, 2024

CI

pull requests

Protect your main branch with Y42 out-of-the-box CI checks. The Y42 CI Check, triggered by a pull request (PR) or merge request (MR), generates a status report on affected assets, helping to prevent the merging of problematic code changes.

Use the Y42 CI check to prevent faulty code merges from going live.

Python actions

February 29, 2024

python

automation

Python actions

Python Actions allow you to sync your data pipelines with workflows in Asana, JIRA, Hubspot, or any other tool you prefer. This feature supports a wide range of use cases, such as Reverse ETL data into Salesforce, task creation in Asana, notifications in Slack, data generation in Google Sheets, or data entry into operational databases.

Python Actions provide a Zapier or Make-like functionality, with the added benefits of having all your processing in one place, within one data lineage, and having integrated documentation within your team's existing workflow.

🔄 Learn how you can trigger Census or Hightouch syncs with Python actions.

Fivetran packages

February 22, 2024

sources

fivetran

Use dbt packages for Fivetran to streamline source configuration for your models:

- package: fivetran/package_name
    version: x.xx.x

Find out how →

Managing git identities

February 20, 2024

git

Managing git identities

You can control which identity performs Git actions in each space.

sql_header and set_sql_header

February 15, 2024

dbt

SQL headers allow you to manipulate the current session settings, such as roles and timezones, or create functions, directly within your asset's run session.

{{ config(
sql_header="alter session set timezone = 'Europe/London';"
) }}

{% call set_sql_header(config) %}
create or replace ssn_mask(ssn STRING)
returns STRING
language SQL
AS '
    REGEXP_REPLACE(ssn, ''[0-9]'', ''X'') /* 123-45-6789 -> XXX-XX-XXXX */
';
{%- endcall %}

select ssn_mask(ssn) from {{ ref('model_name') }}

Dive deeper →

Python assets preview

February 13, 2024

python

data preview

Python assets preview

You can now preview Python scripts output and logs before committing.

❗Behavior change - Source tables include the metadata timestamp field _y42_extracted_at

February 8, 2024

sources

behavior change

This field indicates when a value was extracted at the source or, if not available, when it was written on Y42's side. There is no required action. Any full import after the apiVersion upgrade will automatically add the _y42_extracted_at column to your tables.

Read more about the impact on the Behavior changes page.

Asset Health integration in Lineage and List modes

February 8, 2024

devex

observability

The Asset Health is now embedded within both the Lineage and List views to better understand the health of your space.

Toggle the Asseth Health widget in the List view.

Column-level lineage links

February 6, 2024

devex

observability

Column-level lineage links

Find column usage with hover-over links, showing how a column participates in operations like JOIN, WHERE, GROUP, or a combination of these.

Connect your own GitHub / GitLab repository

January 24, 2024

git

When setting up a new space in Y42, you have the option to link your own repository hosted on GitHub Cloud or GitLab Cloud, or opt for the Y42-managed repository hosted on Y42's GitLab instance.

Connect your own repository to Y42.

Connect your own repository to Y42.

Swap git repository

January 23, 2024

git

You can now swap a space Git repository from the Space settings > Integrations page.

Swap git repository.

Swap git repository.

Python Ingest - logging, and incremental loads

January 18, 2024

source

python

Python Ingest assets now support logging and incremental loads.

You can record messages using the logging module for your Python ingest assets. Access these logs by navigating to the Logs tab in the asset's Build History.

Visualize Python asset logs in the Builg history tab.

Visualize Python asset logs in the Builg history tab.

Incremental loads

Y42 has introduced support for incremental data loading through the context.state variable in Python-based data ingestion assets. context.state is a dictionary that gets updated with each asset refresh and can store any key-value pair for your data process.

from y42.v1.decorators import data_loader

import requests
import pandas as pd
import json

@data_loader
def todos(context) -> pd.DataFrame:

    prev_state = context.state.get()

    if prev_state: 
        last_update = prev_state['last_update']
        # perform incremental activities, such as filtering the dataset by last_update variable
    else: 
        # perform full refresh activities 

    context.state.set({"last_update": datetime.utcnow()})

    return df

CTE autocompletion

January 13, 2024

developer experience

Y42 supports now autocompletion for Common Table Expression (CTEs) code blocks. You have the ability to:

Retrieve the names of CTEs referenced within the SQL model.
Obtain a complete list of columns for any referenced CTE.
Access specific columns from a CTE.

Custom schemas and databases

January 11, 2024

dbt

custom schemas

By default, all assets are built in the schema/dataset specified in the Branch environment. Each branch can be configured to write to its own schema and dataset.

You can now build assets in schemas and databases other than the default schema and database set in the Branch environments settings page.

{{ config(schema='staging_schema') }}
select ...

models:
  y42_project:
    staging:
      +materialized: view
      +schema: staging # Assets in `models/staging/ will be built in the "<branch_name>_staging" schema.
    mart:
      +schema: mart # Assets in `models/mart/ will be built in the "<branch_name>_mart" schema.
    # All other assets will be built in the "<branch_name>" schema.

Data preview enhancements - Group by, pivot, and warehouse selection

January 9, 2024

query preview

In the data preview section, you can now perform the following operations as well:

Group by
Pivot
Warehouse selection (Snowflake only)

Group by

You can aggregate results and group them by specific columns in the preview. Click on the three dots of any column for more actions, navigate to the second tab with three horizontal lines (aggregations), and select Group By <column_name>.

Group by the resultset.

Group by the resultset.

Pivot

You can pivot the preview result set. Click on the <n>/<m> columns label and toggle Pivot on. You can now select the Row Groups, Values, and Column Labels by dragging columns from the list.

Pivot the resultset.

Pivot the resultset.

Warehouse selection for previewing data (Snowflake only)

For Snowflake-connected spaces, you can select from the available warehouses to preview data.

Select warehouse to use when previewing data.

Select warehouse to use when previewing data.

❗Behavior change - Auto-capitalize column and table names on Snowflake

January 9, 2024

snowflake

behavior change

We've capitalized Snowflake table and column names to be compatible with the default Snowflake behaviour on unquoted columns. This leads to better autocomplete in BI tools and within Y42’s own SQL editor.

Read more about the impact on the Behavior changes page.

Warehouse selection for previewing data

January 4, 2024

query preview

For Snowflake spaces, you can select any available warehouses to preview data.

Access control overview

Access control overview

Asset catalog view

December 28, 2023

ux

Asset catalog view

A comprehensive read-only view of all your space assets. At a glance, see how many assets you have by type and quickly identify how many are unhealthy. The Asset Catalog is synchronized with the development modes, and offers detailed asset-level information like documented columns, lineage, asset queries, and more.

Learn more about the Asset catalog →

Python Ingest (Public Beta)

December 21, 2023

sources

python

Y42 simplifies data integration from external APIs using Python. Our platform handles the infrastructure and eliminates the need for boilerplate code for loading data into the Data Warehouse (DWH).

Your main task is writing Python logic to fetch data into a DataFrame, with each DataFrame representing a unique source table. These can then be modeled and processed downstream like any other source. Furthermore, Python ingest assets are subjected to version control and comply with the Virtual Data Builds mechanism. This ensures consistency and reliability in pipelines that utilize Python Ingest assets as sources.

from y42.v1.decorators import data_loader

import os
import requests
import json
import pandas as pd

@data_loader
def pizza_status(context) -> pd.DataFrame:
    url = "https://database.supabase.co/rest/v1/orders?select=status"
    api_key = context.secrets.get("SUPABASE_PIZZA_POSTGRES")
    headers = {
        'apikey': api_key,
        'Authorization': 'Bearer ' + api_key
    }
    response = requests.get(url, headers=headers)
    data = response.json()
    df = pd.DataFrame(data)

    return df

Read more on how you can run your python scripts in Y42 →

Orchestrated Fivetran syncs

December 21, 2023

sources

fivetran

Orchestrated Fivetran syncs

Trigger and manage your Fivetran Connectors directly within Y42.

The Fivetran Source feature acts as a bridge, seamlessly linking your Fivetran Connectors with the tables they produce in your data warehouse.

Find out how you can trigger Fivetran jobs from Y42 →

Public API

December 20, 2023

api

You can use the public API for more advanced integrations and customizations. Capabilities include retrieving the manifest content, triggering runs by command or asset, and retrieving run information by id or conditions.

curl --request POST \
    --url https://api.y42.dev/api/4/orchestrations/org_slug/space_slug \
    --header 'accept: application/json' \
    --header 'content-type: application/json'

API Documentation ↗

UI for configuring snapshots

December 19, 2023

dbt

snapshots

ux

UI for configuring snapshots

You can now configure your Snapshots via UI. For those who prefer a more hands-on approach, direct manipulation of the underlying code is still available via our VS Code IDE integration

Learn how to set up snapshots →

Automatically generate staging models

December 19, 2023

automation

Automatically generate staging models

Generate staging models from seeds and sources with a simple right-click.

Variables in the Y42 build command

December 14, 2023

orchestration

You can now provide custom variables in your Y42 build command for more flexbility when building data pipelines.

y42 build -s +my_exposure --vars '{key: value, date: 20180101}'

Explore all the various ways to construct your DAG →

Branch update notifications and one-click merge

December 12, 2023

git

Introducing a new notification system for branches. When your branch is behind the main branch, a yellow notification icon will alert you, indicating that it's time to sync changes from the main branch into your branch. If this merging result in any conflicts, the icon changes to red, signaling the need for conflict resolution. The one-click merge feature simplifies the process of updating your branch with the latest changes from main.

Access control overview

Access control overview

Learn more about resolving conflicts and using one-click merge here →

Y42-hosted sandbox

December 7, 2023

sandbox

If you need a temporary data warehouse, you can request a Y42-hosted sandbox, ready within 5 minutes. Sandboxes are ideal for testing, and expire after 14 days.

Access control overview

Access control overview

Discover the different ways to set up a space →

Incremental predicates support for incremental models

December 5, 2023

dbt

incremental models

incremental_predicates offer an advanced approach for managing large-volume data in incremental models, justifying further performance optimization efforts. This configuration accepts a list of valid SQL expressions. Note that Y42 does not verify the syntax of these SQL statements.

{{
  config(
    materialized = 'incremental',
    unique_key = 'id',
    cluster_by = ['session_start'],  
    incremental_strategy = 'merge',
    incremental_predicates = [
      "target_alias.session_start > dateadd(day, -7, current_date)"
    ]
  )
}}
..

The above configuration will generate the following MERGE command:

merge into <existing_table> target_alias
    from <temp_table_with_new_records> source_alias
    on
        -- unique key
        target_alias.id = source_alias.id
        and
        -- custom predicate: limits data scan in the "old" data / existing table
        target_alias.session_start > dateadd(day, -7, current_date)
    when matched then update ...
    when not matched then insert ...

Learn more about incremental models →

Column-level lineage

November 11, 2023

dbt

observability

Column-level lineage

Now you can view lineage at the column level, offering more granular insights into your data relationships.

Learn more about column-level lineage →

Monitoring new features

November 9, 2023

monitoring

Monitoring new features

Asset Health Dashboard: Keep track of both your assets' and project's health in real-time.

Asset Health History Dashboard: Trace back the historical health data of your assets and projects.

Snapshots

November 4, 2023

dbt

snapshots

Capture and analyze historical data changes with snapshots.

{{
    config(
      target_database='analytics',
      target_schema='snapshots',
      unique_key='customer_id',
      strategy='check',
      check_cols=['column1', 'name', 'birthdate'],
    )
}}

select * from {{ source('jaffle_shop', 'orders') }}

Learn how to set up snapshots →

Partitioning & clustering for BigQuery assets

October 24, 2023

dbt

partitioning

Optimize your BigQuery storage and query performance through partitioning and clustering.

{{ config(
    materialized='table',
    partition_by={
      "field": "orderdate",
      "data_type": "date",
      "granularity": "month"
    },
    
    cluster_by = ['customerid', 'orderid'],
  )
}}

with orders AS (
  select 
    orderid, 
    customerid, 
    employeeid, 
    orderdate, 
    price
  from {{ source('mdm-prod', 'orders') }}
)
select * from orders

Learn about partitioning and clustering configurations →

Seeds

October 19, 2023

sources

seeds

You can upload and reference CSV files in your Y42 pipelines.

Add a CSV seed file.

Source data freshness checks

October 17, 2023

sources

freshness

Configure source data freshness tests to halt the execution of downstream assets if the source asset is stale.

version: 2

sources:
  - name: pizza_shop
    database: raw

    freshness: # default freshness
      error_after: {count: 24, period: hour}

    loaded_at_field: _etl_loaded_at

    tables:
      - name: customers # this will use the freshness defined above

      - name: orders # this will use the more specific freshness below
        freshness: # make this a little more strict
          error_after: {count: 12, period: hour}

Set up source data freshness tests →

Data preview for each job run

October 12, 2023

query preview

Data preview for each job run

You can preview data related to the current or of previous materializations of an asset.

Partial SQL query preview and compile, and new keyboard shortcuts

October 10, 2023

query preview

Partial SQL query preview and compile, and new keyboard shortcuts

You can now preview and compile either the entire SQL query or parts of it.

Alternatively, you can use CMD/CTRL + ENTER to preview data, or CMD/CTRL + SHIFT + ENTER to compile queries.

Find out how →

Published assets - Zero-copy clones for materialized tables

October 6, 2023

publishing

With published assets, you can turn your branches into data warehouse environments. This feature is especially useful for teams who need to rapidly test changes in isolated environments before merging into main.

For materialized tables, views are now replaced by more efficient zero-copy clone (in Snowflake) or a table clone (in BigQuery). If the asset is materialized as a view, the published asset will remain a view.

With clones, you can leverage BigQuery's wildcard table feature to query multiple tables simultaneously using a single SQL statement. This enables a more efficient way to handle datasets that span across multiple tables.

Discover more about publishing assets →

Customizable SQLFluff configurations

October 3, 2023

dbt

linting

sqlfluff

You can customize SQLFluff rules by adding a .sqlfluff file at the root level of your project. Here's an example:

[sqlfluff:rules]
allow_scalar= True
single_table_references = consistent
unquoted_identifiers_policy = all 

[sqlfluff:rules:capitalisation.keywords]
capitalisation_policy = upper

Explore the SQLFluff integration →

Exposures

September 28, 2023

dbt

exposures

Exposures

You can now define exposures to group relevant upstream assets together, outlining the data required for external use, including dashboards, notebooks, data apps, or ML use cases.

Asset health status indicator

September 21, 2023

observability

Asset health status indicator

Gain a deeper understanding of your assets with the new health status indicators. Learn how thes health status is derived to streamline your pipeline development process.

The Stale status is triggered when the asset configurations are changed, impacting downstream assets linked to it. The stale status serves as a notification that these linked models may now contain outdated information due to recent changes.

You can use the following command to build all stale assets in a space:

y42 build --stale

Asset Tiers

September 19, 2023

observability

Asset Tiers

Assign different tier levels to your assets to prioritize and manage them more effectively. These tiers assist in distinguishing the criticality and importance of each asset, thereby aiding in optimal resource allocation and focus.

❗ It's important to note that assigning an asset to a particular tier does not influence its health status. The tier levels are primarily for organizational and prioritization purposes, allowing for a more structured approach to asset management.

Full data query preview

September 18, 2023

ux

query preview

In addition to previewing the top 100 rows, you can now materialize the preview query as a table inside the Data Warehouse with a 24-hour time-to-live period.

With a full data preview you can filter across all your asset rows, not just the first 100. This is handy for verifying specific row data before committing changes and building your asset.

Refresh package dependencies

September 14, 2023

dbt

packages

Refresh package dependencies

The command menu now facilitates downloading or refreshing of package content to be displayed in the Y42 UI. The content is stored in the dbt_packages folder within the file system, and it can be parsed and displayed across various UI modes including catalog, lineage, and code.

dbt Analyses

September 11, 2023

dbt

analyses

dbt analyses serve as a tool to manage SQL statements that aren't intended to be materialized in your data warehouse, promoting version control of analytical SQL files within your dbt project. They can be accessed in code editor mode to preview data.

New Local and Remote tabs in the bottom drawer

September 7, 2023

ux

Y42 now clearly separates the bottom drawer into two: local and remote, showcasing more clearly that previews are based on local changes, whereas each build is executed based on the state of the current commit.

Keyboard shortcuts in the help drawer

September 4, 2023

shortcuts

Keyboard shortcuts in the help drawer

You can find all keyboard shortcuts within the help drawer now.

dbt hooks

August 30, 2023

dbt

hooks

Y42 now supports dbt pre- and post-hooks, primarily used for data warehouse administration tasks like masking sensitive columns. These hooks run immediately before and after the main query and are treated as a single operation.

If a hook fails, Y42 will not update your latest valid job, preventing downstream models from referencing tables created by the failed job. This design choice adds a layer of security and control, especially for tasks like masking sensitive customer data.

{{ config(
    pre_hook="SQL-statement" | ["SQL-statement"],
    post_hook="SQL-statement" | ["SQL-statement"],
) }}

select ...

Linting with SQLFluff

August 28, 2023

dbt

linting

sqlfluff

Linting with SQLFluff

SQLFluff is a SQL linter that improves your SQL code quality and development workflow. SQLFluff helps find issues in your SQL code to enforce a consistent code style and early detection of errors.

You can activate an auto-fix for most issues detected in the linting:

Unpublishing assets

August 24, 2023

publishing

Unpublishing assets

Users can now unpublish specific assets from virtual branch environments within Y42, giving precise control over the assets' visibility for downstream consumption.

Enhanced published assets

August 24, 2023

publishing

Enhanced published assets

The latest update to Y42 allows you to easily convert specific branches, not just the main branch, into datasets (Bigquery) or schemas (Snowflake) for virtual environments. Naming conventions have also been standardized offering more control and consistency.

Multi-branch orchestration

August 21, 2023

orchestration

Multi-branch orchestration

Orchestration allows you to automatically build assets and pipelines on a schedule. By default, scheduled builds and alerts only work on the main branch. If you need orchestrations to run on multiple branches, especially useful when you have multiple virtual environments, you can enable this feature from here. Set up which branch will be enabled for build schedules to run automatically.

Auto-delete unused tables

August 16, 2023

governance

Auto-delete unused tables

Y42 keeps the materialization of each physical UUID table within the DWH, allowing you to easily rollback using the Virtual Data Builds mechanics. By default, these tables are kept for 30 days in your data warehouse. You can customize the timeframe for when the physical UUID tables should be deleted.

The expiration logic for deleting tables is based on two criteria:

1. The job is older than 30 days or older than what the user has configured

2. It is not the latest valid job run on any existing branch's head

New Airbyte sources

August 8, 2023

sources

New Airbyte sources

We've added two new Airbyte sources:

- MSSQL

- S3

Incremental models

July 24, 2023

dbt

incremental

Incremental models allows you to configure models to load only the latest rows during each run, resulting in faster runtime for your data pipelines. Incremental models also help save costs, such as in Snowflake processing, as less data means less time spent running the Virtual Warehouse.

{{ config(materialized='incremental') }}

select
    *,
    my_slow_function(my_column)
from raw_app_data.events

{% if is_incremental() %}

  -- this filter will only be applied on an incremental run
  where event_time > (select max(event_time) from {{ this }})

{% endif %}

New Airbyte sources

July 18, 2023

sources

New Airbyte sources

We've added several new Airbyte sources:

- Google Anlaytics 4

- Google Ads

- Salesforce

- Hubspot

- Amazon Ads

- Shopify

Be the first to know

Subscribe to our newsletter to get the latest news and insights from the dataverse, curated by fellow data practitioners.

Comet