
Loading...
Optimize dbt for PostgreSQL with indexing strategies, partitioning, and performance tuning.
version: 1
warehouse: postgres
connections:
prod:
type: postgresql
host: db.example.com
port: 5432
database: analytics
schema: public
# SSL recommended for production
sslmode: require
# Also works with managed PostgreSQL
supabase:
type: postgresql
host: db.xxx.supabase.co
port: 5432
database: postgres
schema: analyticsCreate indexes post-hook for optimal query performance:
{{ config(
materialized='table',
post_hook=[
"CREATE INDEX IF NOT EXISTS idx_{{ this.name }}_customer_id ON {{ this }} (customer_id)",
"CREATE INDEX IF NOT EXISTS idx_{{ this.name }}_order_date ON {{ this }} (order_date)"
]
) }}
select
order_id,
customer_id,
order_date,
total_amount
from {{ source('raw', 'orders') }}Use declarative partitioning for large time-series tables:
-- First, create the partitioned table structure
-- This is typically done in a migration or manually
{{ config(
materialized='incremental',
unique_key='event_id',
incremental_strategy='append'
) }}
-- dbt will insert into the partitioned table
-- PostgreSQL automatically routes to correct partition
select
event_id,
event_timestamp,
user_id,
event_data
from {{ source('raw', 'events') }}
{% if is_incremental() %}
where event_timestamp > (select max(event_timestamp) from {{ this }})Use materialized views for expensive aggregations:
{{ config(
materialized='materialized_view'
) }}
-- Requires unique index for CONCURRENTLY refresh
-- Add in post_hook: CREATE UNIQUE INDEX ON {{ this }} (id)
select
date_trunc('day', created_at) as day,
product_id,
sum(quantity) as total_quantity,
sum(amount) as total_revenue
from {{ ref('fct_orders') }}
group by 1, 2
-- Refresh with: REFRESH MATERIALIZED VIEW CONCURRENTLY {{ this }}PostgreSQL has powerful window functions and JSON support:
select
customer_id,
order_id,
order_date,
total_amount,
-- Window functions
row_number() over (partition by customer_id order by order_date) as order_sequence,
sum(total_amount) over (partition by customer_id) as customer_lifetime_value,
lag(order_date) over (partition by customer_id order by order_date) as previous_order_date,
-- JSON aggregation
jsonb_agg(
jsonb_build_object('product', product_name, 'qty', quantity)
) as order_items
from {{ ref('stg_orders') }}
left join {{ ref('stg_order_items') }} using (order_id)
group by 1, 2, 3, 4PostgreSQL doesn't auto-index foreign keys. Always add indexes on columns used in JOINs and WHERE clauses.
Run ANALYZE tablename after bulk inserts to update query planner statistics.
For filtered queries, use partial indexes: CREATE INDEX ... WHERE status = 'active'
PostgreSQL has limited connections. Set dbt's threads appropriately and consider connection pooling (PgBouncer).
Cause: Missing index or outdated statistics.
Solution: Add appropriate indexes and run ANALYZE. Check with EXPLAIN ANALYZE.
Cause: PostgreSQL MVCC leaves dead tuples from UPDATE/DELETE operations.
Solution: Enable autovacuum (on by default) or manually run VACUUM FULL during maintenance windows.
Cause: Long-running queries blocking DDL operations.
Solution: Schedule dbt runs during low-traffic periods, or use views/mat views instead of table swaps.