
Loading...
Leverage Delta Lake, Unity Catalog, and Databricks SQL with DataChonk for lakehouse analytics.
version: 1
warehouse: databricks
connections:
prod:
type: databricks
host: dbc-abc123-xyz.cloud.databricks.com
http_path: /sql/1.0/warehouses/abcd1234
catalog: main # Unity Catalog
schema: analytics
# Token stored in env var: DATABRICKS_TOKEN
dev:
type: databricks
host: dbc-abc123-xyz.cloud.databricks.com
http_path: /sql/1.0/warehouses/dev5678
catalog: dev
schema: analytics_devAll tables are Delta by default. Use liquid clustering for automatic optimization:
{{ config(
materialized='table',
file_format='delta',
liquid_clustered_by=['customer_id', 'order_date'],
tblproperties={
'delta.autoOptimize.optimizeWrite': 'true',
'delta.autoOptimize.autoCompact': 'true'
}
) }}
select
order_id,
customer_id,
order_date,
total_amount
from {{ source('raw', 'orders') }}Delta Lake's MERGE is highly optimized. DataChonk generates efficient merge patterns:
{{ config(
materialized='incremental',
unique_key='event_id',
incremental_strategy='merge',
merge_update_columns=['status', 'updated_at'],
file_format='delta'
) }}
select
event_id,
user_id,
status,
updated_at
from {{ source('raw', 'events') }}
{% if is_incremental() %}
where updated_at > (select max(updated_at) from {{ this }})
{% endif %}Use Delta Live Tables or streaming for real-time pipelines:
{{ config(
materialized='streaming_table',
file_format='delta'
) }}
-- Streaming from Auto Loader
select
event_id,
event_timestamp,
event_data
from stream({{ source('streaming', 'raw_events') }})Use three-level namespacing with Unity Catalog:
-- Source definition with catalog
{{ config(
database='prod_catalog', -- Unity Catalog
schema='analytics'
) }}
-- Or use generate_database_name macro
-- profiles.yml: database = prod_catalog
-- Resulting table: prod_catalog.analytics.dim_customers
select * from {{ ref('stg_customers') }}Liquid clustering automatically reorganizes data as it changes. Better than Z-ordering for evolving workloads.
Photon provides 2-8x speedup for most SQL queries. Always use SQL Warehouse Pro or Serverless.
Set autoOptimize.optimizeWrite to automatically compact small files during writes.
Many small files hurt performance. Enable auto-compaction or run OPTIMIZE regularly.
Cause: Too many small files accumulated, or running OPTIMIZE too frequently.
Solution: Enable auto-compaction instead of manual OPTIMIZE. Use liquid clustering.
Cause: Missing grants on catalog, schema, or table level.
Solution: Ensure your service principal has USE CATALOG, USE SCHEMA, and CREATE TABLE grants.
Cause: Auto-stop timeout shorter than dbt run duration.
Solution: Increase auto-stop timeout or use Serverless warehouse which scales automatically.