
Loading...
Optimize your dbt models for Redshift with distribution keys, sort keys, and columnar storage patterns.
version: 1
warehouse: redshift
connections:
prod:
type: redshift
host: my-cluster.abc123.us-east-1.redshift.amazonaws.com
port: 5439
database: analytics
# Use IAM or username/password
serverless:
type: redshift
host: my-workgroup.123456789.us-east-1.redshift-serverless.amazonaws.com
port: 5439
database: devCritical for Redshift performance - DataChonk analyzes your data model to suggest optimal keys:
{{ config(
materialized='table',
dist='key',
dist_key='customer_id', -- Collocate with dim_customers
sort=['order_date', 'customer_id'],
sort_type='compound' -- or 'interleaved'
) }}
select
order_id,
customer_id,
order_date,
total_amount
from {{ source('raw', 'orders') }}Use late binding views for Spectrum tables and cross-database queries:
{{ config(
materialized='view',
bind=false -- Late binding view
) }}
-- Query external Spectrum table
select
event_date,
user_id,
event_type
from spectrum_schema.raw_events
where event_date >= current_date - 7Redshift works best with delete+insert for incrementals (no native MERGE until recently):
{{ config(
materialized='incremental',
unique_key='event_id',
incremental_strategy='delete+insert',
dist='user_id',
sort='event_timestamp'
) }}
select
event_id,
user_id,
event_timestamp,
event_data
from {{ source('raw', 'events') }}
{% if is_incremental() %}
where event_timestamp > (
select max(event_timestamp) from {{ this }}
)
{% endif %}Distribute on your most frequently joined key. This collocates data and eliminates data shuffling.
Tables under 3M rows should use DISTSTYLE ALL to replicate across all nodes for faster joins.
Use compound sort keys when you frequently filter by date ranges. Put the most-filtered column first.
Redshift doesn't auto-vacuum deleted rows. Schedule VACUUM DELETE ONLY after incremental runs.
Cause: Table has many unsorted regions and deleted rows.
Solution: Run VACUUM FULL tablename or schedule regular maintenance.
Cause: Data shuffling due to mismatched distribution keys.
Solution: Ensure tables being joined have the same DISTKEY, or use DISTSTYLE ALL for small tables.
Cause: Too many staging tables or deleted rows not vacuumed.
Solution: Use ephemeral models for intermediate CTEs, drop unused tables, run VACUUM.