DocsWarehouse GuidesSnowflake

Best Practices

BigQuery

DocsWarehouse GuidesSnowflake

Warehouse Guide

Snowflake

Optimize your dbt projects for Snowflake with DataChonk-specific patterns and best practices.

Automatic Clustering

DataChonk detects large tables and suggests optimal clustering keys based on your query patterns.

Zero-Copy Clones

Use Snowflake clones for development environments without duplicating storage costs.

Cost Optimization

Warehouse sizing recommendations and query optimization to reduce compute costs.

Time Travel

Leverage Snowflake's Time Travel for snapshot patterns and data recovery.

Connection Setup

Configure your Snowflake connection in .datachonk.yml:

.datachonk.yml

version: 1
warehouse: snowflake

connections:
  prod:
    type: snowflake
    account: xy12345.us-east-1  # Account locator
    warehouse: ANALYTICS_WH
    database: ANALYTICS
    role: ANALYST_ROLE
    # Password stored securely - not in config
    
  dev:
    type: snowflake
    account: xy12345.us-east-1
    warehouse: DEV_WH
    database: ANALYTICS_DEV
    role: DEVELOPER_ROLE

Security Best Practice

Never store passwords in configuration files. Use environment variables or Snowflake key-pair authentication.

Snowflake-Specific Patterns

Transient Tables for Staging

Use transient tables for staging models to reduce Time Travel storage costs:

{{ config(
    materialized='table',
    transient=true,
    cluster_by=['loaded_date']
) }}

select
    id,
    customer_name,
    loaded_date::date as loaded_date
from {{ source('raw', 'customers') }}

Incremental with Merge

DataChonk generates efficient merge statements for incremental models:

{{ config(
    materialized='incremental',
    unique_key='order_id',
    incremental_strategy='merge',
    merge_update_columns=['status', 'updated_at']
) }}

select
    order_id,
    customer_id,
    status,
    updated_at
from {{ source('raw', 'orders') }}

{% if is_incremental() %}
where updated_at > (select max(updated_at) from {{ this }})
{% endif %}

Dynamic Tables (Preview)

Use Snowflake Dynamic Tables for near-real-time transformations:

{{ config(
    materialized='dynamic_table',
    target_lag='1 minute',
    snowflake_warehouse='STREAMING_WH'
) }}

select
    date_trunc('minute', event_time) as event_minute,
    count(*) as event_count
from {{ source('streaming', 'events') }}
group by 1

Performance Optimization

Use Clustering for Large Tables

Tables over 1TB benefit from clustering. DataChonk analyzes your JOIN and WHERE clauses to suggest optimal clustering keys.

Leverage Result Caching

Snowflake caches query results for 24 hours. Structure your models to take advantage of this free caching.

Right-Size Your Warehouse

Use separate warehouses for different workloads. DataChonk suggests optimal sizes based on query complexity.

Avoid VARIANT Abuse

While VARIANT columns are flexible, over-use impacts query performance. DataChonk suggests when to flatten semi-structured data.

Common Issues

Query timeout on large tables

Cause: Warehouse too small or missing clustering.

Solution: Scale up warehouse temporarily, or add clustering on frequently filtered columns.

High storage costs from Time Travel

Cause: Default 1-day retention on all tables.

Solution: Use transient tables for staging/intermediate models. Set DATA_RETENTION_TIME_IN_DAYS=0 for ephemeral tables.

Incremental models getting slower

Cause: Micro-partitions accumulating from many small merges.

Solution: Periodically run dbt run --full-refresh or enable automatic clustering.

Pro Tips

Multi-Cluster Warehouses

For production dbt runs, use multi-cluster warehouses with auto-scaling (1-3 clusters) to handle parallel model execution efficiently.