DocsWarehouse GuidesRedshift

BigQuery

Databricks

DocsWarehouse GuidesRedshift

Warehouse Guide

Amazon Redshift

Optimize your dbt models for Redshift with distribution keys, sort keys, and columnar storage patterns.

Distribution Keys

DataChonk analyzes JOINs to recommend optimal distribution keys for collocating data.

Sort Keys

Compound and interleaved sort key recommendations based on your filter patterns.

Concurrency Scaling

Guidance on workload management and concurrency scaling for dbt runs.

Spectrum Integration

Query S3 data lakes directly with Redshift Spectrum external tables.

Connection Setup

.datachonk.yml

version: 1
warehouse: redshift

connections:
  prod:
    type: redshift
    host: my-cluster.abc123.us-east-1.redshift.amazonaws.com
    port: 5439
    database: analytics
    # Use IAM or username/password
    
  serverless:
    type: redshift
    host: my-workgroup.123456789.us-east-1.redshift-serverless.amazonaws.com
    port: 5439
    database: dev

Redshift Serverless

Redshift Serverless automatically scales compute. Perfect for variable dbt workloads without managing cluster capacity.

Redshift-Specific Patterns

Distribution and Sort Keys

Critical for Redshift performance - DataChonk analyzes your data model to suggest optimal keys:

{{ config(
    materialized='table',
    dist='key',
    dist_key='customer_id',  -- Collocate with dim_customers
    sort=['order_date', 'customer_id'],
    sort_type='compound'  -- or 'interleaved'
) }}

select
    order_id,
    customer_id,
    order_date,
    total_amount
from {{ source('raw', 'orders') }}

Late Binding Views

Use late binding views for Spectrum tables and cross-database queries:

{{ config(
    materialized='view',
    bind=false  -- Late binding view
) }}

-- Query external Spectrum table
select
    event_date,
    user_id,
    event_type
from spectrum_schema.raw_events
where event_date >= current_date - 7

Incremental Strategy

Redshift works best with delete+insert for incrementals (no native MERGE until recently):

{{ config(
    materialized='incremental',
    unique_key='event_id',
    incremental_strategy='delete+insert',
    dist='user_id',
    sort='event_timestamp'
) }}

select
    event_id,
    user_id,
    event_timestamp,
    event_data
from {{ source('raw', 'events') }}

{% if is_incremental() %}
where event_timestamp > (
    select max(event_timestamp) from {{ this }}
)
{% endif %}

Performance Optimization

Use DISTKEY for Large Fact Tables

Distribute on your most frequently joined key. This collocates data and eliminates data shuffling.

DISTSTYLE ALL for Small Dimensions

Tables under 3M rows should use DISTSTYLE ALL to replicate across all nodes for faster joins.

Compound Sort Keys for Range Queries

Use compound sort keys when you frequently filter by date ranges. Put the most-filtered column first.

Run VACUUM Regularly

Redshift doesn't auto-vacuum deleted rows. Schedule VACUUM DELETE ONLY after incremental runs.

Common Issues

Queries slow after many incrementals

Cause: Table has many unsorted regions and deleted rows.

Solution: Run VACUUM FULL tablename or schedule regular maintenance.

High network traffic in query plan

Cause: Data shuffling due to mismatched distribution keys.

Solution: Ensure tables being joined have the same DISTKEY, or use DISTSTYLE ALL for small tables.

Disk space alert

Cause: Too many staging tables or deleted rows not vacuumed.

Solution: Use ephemeral models for intermediate CTEs, drop unused tables, run VACUUM.

Pro Tips

WLM Configuration

Create a dedicated WLM queue for dbt with higher memory allocation and concurrency. This prevents dbt runs from competing with ad-hoc queries.