DocsWarehouse GuidesBigQuery

Snowflake

Redshift

DocsWarehouse GuidesBigQuery

Warehouse Guide

BigQuery

Leverage BigQuery's serverless architecture with DataChonk-optimized patterns for cost-efficient analytics.

Partitioning

DataChonk automatically suggests partitioning strategies based on your data patterns and query usage.

BI Engine

Optimize models for BI Engine acceleration with recommended materializations.

Slot Management

Guidance on flat-rate vs on-demand pricing and slot reservation strategies.

Column Security

Generate models with column-level security policies for sensitive data.

Connection Setup

Configure your BigQuery connection in .datachonk.yml:

.datachonk.yml

version: 1
warehouse: bigquery

connections:
  prod:
    type: bigquery
    project: my-analytics-project
    dataset: analytics
    location: US  # or EU, asia-east1, etc.
    # Uses Application Default Credentials
    
  dev:
    type: bigquery
    project: my-analytics-project-dev
    dataset: analytics_dev
    keyFile: ./service-account-key.json  # Optional

Authentication

Use gcloud auth application-default login for local development, or service account keys for CI/CD.

BigQuery-Specific Patterns

Partitioned Tables

Use partitioning to reduce query costs and improve performance:

{{ config(
    materialized='table',
    partition_by={
      "field": "event_date",
      "data_type": "date",
      "granularity": "day"
    },
    cluster_by=["user_id", "event_type"],
    require_partition_filter=true
) }}

select
    date(event_timestamp) as event_date,
    user_id,
    event_type,
    event_data
from {{ source('raw', 'events') }}

Incremental with Partition Pruning

DataChonk generates efficient incremental models using partition filters:

{{ config(
    materialized='incremental',
    unique_key='event_id',
    incremental_strategy='merge',
    partition_by={"field": "event_date", "data_type": "date"}
) }}

select
    event_id,
    event_date,
    user_id,
    amount
from {{ source('raw', 'transactions') }}

{% if is_incremental() %}
where event_date >= date_sub(current_date(), interval 3 day)
  and event_date > (select max(event_date) from {{ this }})
{% endif %}

Materialized Views

Use BigQuery materialized views for frequently-accessed aggregations:

{{ config(
    materialized='materialized_view',
    enable_refresh=true,
    refresh_interval_minutes=30
) }}

select
    date_trunc(event_date, month) as month,
    product_category,
    sum(revenue) as total_revenue,
    count(distinct user_id) as unique_users
from {{ ref('fct_orders') }}
group by 1, 2

Working with Nested Data

DataChonk handles BigQuery's nested and repeated fields:

-- Flatten nested arrays
select
    order_id,
    customer_id,
    item.product_id,
    item.quantity,
    item.unit_price
from {{ source('raw', 'orders') }},
unnest(line_items) as item

-- Keep nested structure for analytics
select
    user_id,
    array_agg(struct(
        event_type,
        event_timestamp,
        event_data
    )) as user_events
from {{ ref('stg_events') }}
group by user_id

Cost Optimization

Always Partition Large Tables

BigQuery charges by bytes scanned. Partitioning can reduce costs by 90%+ by limiting scanned data.

Use Clustering with Partitioning

Cluster by columns you frequently filter on. BigQuery automatically maintains the clustering.

Enable BI Engine for Dashboards

Reserve BI Engine capacity for frequently-queried mart tables. Sub-second response times, lower costs.

Avoid SELECT * in Production

BigQuery is columnar - only select the columns you need. DataChonk warns about overly broad selects.

Common Issues

Query exceeds bytes billed limit

Cause: Table not partitioned or partition filter missing.

Solution: Add partitioning and ensure queries include partition filters. Use require_partition_filter=true.

Slot quota exceeded

Cause: Too many concurrent queries or complex queries consuming slots.

Solution: Consider flat-rate pricing with reserved slots, or schedule dbt runs during off-peak hours.

Merge statement too large

Cause: Incremental model merging too many rows at once.

Solution: Use incremental_strategy='insert_overwrite' for partition-based incrementals.

Pro Tips

Use Editions

BigQuery Editions (Standard, Enterprise, Enterprise Plus) offer different pricing models. For consistent dbt workloads, Enterprise with reserved slots is often more cost-effective than on-demand.