DocsWarehouse GuidesPostgreSQL

Databricks

Project Structure

DocsWarehouse GuidesPostgreSQL

Warehouse Guide

PostgreSQL

Optimize dbt for PostgreSQL with indexing strategies, partitioning, and performance tuning.

Index Recommendations

DataChonk analyzes query patterns and suggests optimal indexes for your dbt models.

Table Partitioning

Declarative partitioning support for time-series data with automatic pruning.

Materialized Views

Native materialized view support with concurrent refresh patterns.

Row-Level Security

Generate models with RLS policies for multi-tenant data access.

Connection Setup

.datachonk.yml

version: 1
warehouse: postgres

connections:
  prod:
    type: postgresql
    host: db.example.com
    port: 5432
    database: analytics
    schema: public
    # SSL recommended for production
    sslmode: require
    
  # Also works with managed PostgreSQL
  supabase:
    type: postgresql
    host: db.xxx.supabase.co
    port: 5432
    database: postgres
    schema: analytics

Managed PostgreSQL

DataChonk works with any PostgreSQL-compatible database: Supabase, Neon, RDS, Cloud SQL, Azure Database, etc.

PostgreSQL-Specific Patterns

Adding Indexes

Create indexes post-hook for optimal query performance:

{{ config(
    materialized='table',
    post_hook=[
      "CREATE INDEX IF NOT EXISTS idx_{{ this.name }}_customer_id ON {{ this }} (customer_id)",
      "CREATE INDEX IF NOT EXISTS idx_{{ this.name }}_order_date ON {{ this }} (order_date)"
    ]
) }}

select
    order_id,
    customer_id,
    order_date,
    total_amount
from {{ source('raw', 'orders') }}

Partitioned Tables

Use declarative partitioning for large time-series tables:

-- First, create the partitioned table structure
-- This is typically done in a migration or manually

{{ config(
    materialized='incremental',
    unique_key='event_id',
    incremental_strategy='append'
) }}

-- dbt will insert into the partitioned table
-- PostgreSQL automatically routes to correct partition
select
    event_id,
    event_timestamp,
    user_id,
    event_data
from {{ source('raw', 'events') }}

{% if is_incremental() %}
where event_timestamp > (select max(event_timestamp) from {{ this }})

Materialized Views with Concurrent Refresh

Use materialized views for expensive aggregations:

{{ config(
    materialized='materialized_view'
) }}

-- Requires unique index for CONCURRENTLY refresh
-- Add in post_hook: CREATE UNIQUE INDEX ON {{ this }} (id)

select
    date_trunc('day', created_at) as day,
    product_id,
    sum(quantity) as total_quantity,
    sum(amount) as total_revenue
from {{ ref('fct_orders') }}
group by 1, 2

-- Refresh with: REFRESH MATERIALIZED VIEW CONCURRENTLY {{ this }}

Leveraging PostgreSQL Functions

PostgreSQL has powerful window functions and JSON support:

select
    customer_id,
    order_id,
    order_date,
    total_amount,
    
    -- Window functions
    row_number() over (partition by customer_id order by order_date) as order_sequence,
    sum(total_amount) over (partition by customer_id) as customer_lifetime_value,
    lag(order_date) over (partition by customer_id order by order_date) as previous_order_date,
    
    -- JSON aggregation
    jsonb_agg(
        jsonb_build_object('product', product_name, 'qty', quantity)
    ) as order_items
    
from {{ ref('stg_orders') }}
left join {{ ref('stg_order_items') }} using (order_id)
group by 1, 2, 3, 4

Performance Optimization

Index Your Foreign Keys

PostgreSQL doesn't auto-index foreign keys. Always add indexes on columns used in JOINs and WHERE clauses.

Use ANALYZE After Loads

Run ANALYZE tablename after bulk inserts to update query planner statistics.

Consider Partial Indexes

For filtered queries, use partial indexes: CREATE INDEX ... WHERE status = 'active'

Watch Connection Limits

PostgreSQL has limited connections. Set dbt's threads appropriately and consider connection pooling (PgBouncer).

Common Issues

Sequential scan on large table

Cause: Missing index or outdated statistics.

Solution: Add appropriate indexes and run ANALYZE. Check with EXPLAIN ANALYZE.

Table bloat after many updates

Cause: PostgreSQL MVCC leaves dead tuples from UPDATE/DELETE operations.

Solution: Enable autovacuum (on by default) or manually run VACUUM FULL during maintenance windows.

Lock timeout during dbt run

Cause: Long-running queries blocking DDL operations.

Solution: Schedule dbt runs during low-traffic periods, or use views/mat views instead of table swaps.

Pro Tips

Use pg_stat_statements

Enable the pg_stat_statements extension to identify slow queries. DataChonk can analyze this data to recommend index and model optimizations.