
Loading...
Leverage BigQuery's serverless architecture with DataChonk-optimized patterns for cost-efficient analytics.
Configure your BigQuery connection in .datachonk.yml:
version: 1
warehouse: bigquery
connections:
prod:
type: bigquery
project: my-analytics-project
dataset: analytics
location: US # or EU, asia-east1, etc.
# Uses Application Default Credentials
dev:
type: bigquery
project: my-analytics-project-dev
dataset: analytics_dev
keyFile: ./service-account-key.json # Optionalgcloud auth application-default login for local development, or service account keys for CI/CD.Use partitioning to reduce query costs and improve performance:
{{ config(
materialized='table',
partition_by={
"field": "event_date",
"data_type": "date",
"granularity": "day"
},
cluster_by=["user_id", "event_type"],
require_partition_filter=true
) }}
select
date(event_timestamp) as event_date,
user_id,
event_type,
event_data
from {{ source('raw', 'events') }}DataChonk generates efficient incremental models using partition filters:
{{ config(
materialized='incremental',
unique_key='event_id',
incremental_strategy='merge',
partition_by={"field": "event_date", "data_type": "date"}
) }}
select
event_id,
event_date,
user_id,
amount
from {{ source('raw', 'transactions') }}
{% if is_incremental() %}
where event_date >= date_sub(current_date(), interval 3 day)
and event_date > (select max(event_date) from {{ this }})
{% endif %}Use BigQuery materialized views for frequently-accessed aggregations:
{{ config(
materialized='materialized_view',
enable_refresh=true,
refresh_interval_minutes=30
) }}
select
date_trunc(event_date, month) as month,
product_category,
sum(revenue) as total_revenue,
count(distinct user_id) as unique_users
from {{ ref('fct_orders') }}
group by 1, 2DataChonk handles BigQuery's nested and repeated fields:
-- Flatten nested arrays
select
order_id,
customer_id,
item.product_id,
item.quantity,
item.unit_price
from {{ source('raw', 'orders') }},
unnest(line_items) as item
-- Keep nested structure for analytics
select
user_id,
array_agg(struct(
event_type,
event_timestamp,
event_data
)) as user_events
from {{ ref('stg_events') }}
group by user_idBigQuery charges by bytes scanned. Partitioning can reduce costs by 90%+ by limiting scanned data.
Cluster by columns you frequently filter on. BigQuery automatically maintains the clustering.
Reserve BI Engine capacity for frequently-queried mart tables. Sub-second response times, lower costs.
BigQuery is columnar - only select the columns you need. DataChonk warns about overly broad selects.
Cause: Table not partitioned or partition filter missing.
Solution: Add partitioning and ensure queries include partition filters. Use require_partition_filter=true.
Cause: Too many concurrent queries or complex queries consuming slots.
Solution: Consider flat-rate pricing with reserved slots, or schedule dbt runs during off-peak hours.
Cause: Incremental model merging too many rows at once.
Solution: Use incremental_strategy='insert_overwrite' for partition-based incrementals.