
Loading...
Control how DataChonk generates materialization configs for different model types and use cases.
DataChonk applies sensible defaults based on model type:
| Chonk Type | Default Materialization | Rationale |
|---|---|---|
| Source | source (no materialization) | Raw data, no transformation |
| Staging | view | Light transforms, always current |
| Entity (Dimension) | table | Stable data, fast lookups |
| Fact | incremental | Large datasets, append-heavy |
| Metric | table | Pre-aggregated for BI tools |
# datachonk.config.yml
materializations:
staging:
type: view
entity:
type: table
fact:
type: incremental
incremental_strategy: merge
unique_key: "{{ primary_key }}"
metric:
type: table
# Warehouse-specific overrides
warehouses:
bigquery:
materializations:
fact:
incremental_strategy: insert_overwrite
partition_by:
field: event_date
data_type: date
snowflake:
materializations:
entity:
type: table
cluster_by: ["customer_id"]Override defaults for specific models in the UI or via the Brain:
Prompt example:
"Generate fct_orders as an incremental model with merge strategy, partitioned by order_date, clustered by customer_id"
Simplest strategy - just adds new rows. Best for immutable event data.
{{
config(
materialized='incremental',
incremental_strategy='append'
)
}}
SELECT *
FROM {{ source('events', 'page_views') }}
{% if is_incremental() %}
WHERE event_timestamp > (SELECT MAX(event_timestamp) FROM {{ this }})
{% endif %}Updates existing rows and inserts new ones. Best for mutable data.
{{
config(
materialized='incremental',
incremental_strategy='merge',
unique_key='order_id'
)
}}
SELECT
order_id,
customer_id,
status,
amount,
updated_at
FROM {{ source('shopify', 'orders') }}
{% if is_incremental() %}
WHERE updated_at > (SELECT MAX(updated_at) FROM {{ this }})
{% endif %}Deletes matching rows then inserts. Useful when merge isn't available.
{{
config(
materialized='incremental',
incremental_strategy='delete+insert',
unique_key='date_day'
)
}}
SELECT
date_trunc('day', created_at) as date_day,
count(*) as order_count,
sum(amount) as total_revenue
FROM {{ ref('fct_orders') }}
{% if is_incremental() %}
WHERE date_day >= (SELECT MAX(date_day) - INTERVAL '3 days' FROM {{ this }})
{% endif %}
GROUP BY 1Overwrites entire partitions. Most efficient for partitioned BigQuery tables.
{{
config(
materialized='incremental',
incremental_strategy='insert_overwrite',
partition_by={
'field': 'event_date',
'data_type': 'date',
'granularity': 'day'
},
partitions=['current_date', 'date_sub(current_date, interval 1 day)']
)
}}{{
config(
materialized='table',
cluster_by=['customer_id', 'order_date']
)
}}{{
config(
materialized='table',
partition_by={
'field': 'created_date',
'data_type': 'date'
},
cluster_by=['customer_id', 'product_category']
)
}}{{
config(
materialized='table',
dist='customer_id',
sort=['order_date', 'order_id']
)
}}{{
config(
materialized='table',
file_format='delta',
partition_by=['event_date'],
liquid_clustered_by=['customer_id']
)
}}DataChonk can generate models that use your custom materializations:
# datachonk.config.yml
custom_materializations:
- name: slowly_changing_dimension
package: your_dbt_package
config_template: |
{{
config(
materialized='slowly_changing_dimension',
unique_key='{{ unique_key }}',
valid_from='valid_from',
valid_to='valid_to',
strategy='type_2'
)
}}Then prompt the Brain:
Prompt:
"Create dim_customers as a Type 2 SCD using the slowly_changing_dimension materialization, tracking changes to email, address, and tier"
| Strategy | Best For | Trade-offs |
|---|---|---|
| view | Small data, always-current needs | Slow for large queries |
| table | Stable data, complex transforms | Full rebuild each run |
| incremental | Large datasets, append-heavy | Complex to get right |
| ephemeral | Intermediate CTEs | No direct querying |
Create reusable materialization templates:
# templates/fact_incremental.yml
name: fact_incremental
description: Standard incremental fact table config
config:
materialized: incremental
incremental_strategy: merge
unique_key: "{{ model.unique_key }}"
on_schema_change: append_new_columns
# Warehouse-specific
snowflake:
cluster_by: ["{{ model.cluster_key }}"]
bigquery:
partition_by:
field: "{{ model.partition_field }}"
data_type: dateTip: Start with views for development, then switch to tables or incrementals for production. DataChonk can auto-adjust materializations based on your target environment.