Model Format
Rocky models define SQL transformations and their materialization behavior. Each model is a SQL query paired with configuration that tells Rocky how to materialize it, what it depends on, and where to write the output.
Rocky supports two model formats: sidecar (recommended) and inline (legacy).
Sidecar Format (Recommended)
Section titled “Sidecar Format (Recommended)”The sidecar format keeps SQL and configuration in separate files with matching names:
models/├── fct_orders.sql <- pure SQL├── fct_orders.toml <- configuration├── stg_customers.sql├── stg_customers.toml├── dim_products.sql└── dim_products.tomlThis separation keeps SQL files clean and editable by any SQL tool without needing to understand Rocky-specific syntax.
SQL File
Section titled “SQL File”The .sql file contains a plain SQL query. No templating, no Jinja, no special markers.
-- models/fct_orders.sqlSELECT o.order_id, o.customer_id, o.order_date, o.total_amount, c.customer_name, c.segmentFROM analytics.staging.orders AS oJOIN analytics.staging.customers AS c ON o.customer_id = c.customer_idWHERE o.order_date >= '2024-01-01'TOML Config File
Section titled “TOML Config File”The .toml file specifies the model name, dependencies, materialization strategy, and target table.
Fields:
| Field | Type | Required | Description |
|---|---|---|---|
name | string | Yes | Model identifier. Must be unique across all models. |
depends_on | list of strings | No | Names of upstream models that must run before this one. Defaults to []. |
retention | string | No | Data retention policy for this model. Grammar ^\d+[dy]$ — e.g. "90d" or "1y". See Retention. |
[strategy] — Materialization configuration:
| Field | Type | Default | Description |
|---|---|---|---|
type | string | "full_refresh" | Materialization type. One of "full_refresh", "incremental", "merge", "time_interval", "ephemeral", "delete_insert", "microbatch", "content_addressed". |
timestamp_column | string | Column used as the incremental watermark. Required when type = "incremental" or type = "microbatch". | |
unique_key | list of strings | Key columns for merge matching. Required when type = "merge". | |
update_columns | list of strings | Columns to update on merge match. Defaults to all non-key columns if omitted. | |
partition_by | list of strings | Column(s) identifying the partition to delete. Required when type = "delete_insert". | |
time_column | string | Partition column for time-interval processing. Required when type = "time_interval". | |
granularity | string | "hour" (microbatch) | Partition granularity: "hour", "day", "month", or "year". Required when type = "time_interval"; optional default for "microbatch". |
lookback | integer | 0 | Number of past partitions to reprocess. Optional for "time_interval". |
batch_size | integer | 1 | Max partitions per batch. Optional for "time_interval". |
first_partition | string | Earliest partition key (e.g., "2024-01-01"). Optional for "time_interval". | |
storage_prefix | string | Object-store key prefix that holds _delta_log/ + Parquet files for the target table (e.g. "s3://bucket/path/table"). Required when type = "content_addressed". | |
partition_columns | list of strings | [] | Logical partition columns for content-addressed tables. Empty for unpartitioned tables. Optional for "content_addressed". |
[target] — Output table:
| Field | Type | Required | Description |
|---|---|---|---|
catalog | string | Yes | Target catalog name. |
schema | string | Yes | Target schema name. |
table | string | Yes | Target table name. |
[[sources]] — Input tables (optional, for documentation and lineage):
| Field | Type | Required | Description |
|---|---|---|---|
catalog | string | Yes | Source catalog name. |
schema | string | Yes | Source schema name. |
table | string | Yes | Source table name. |
Environment variables
Section titled “Environment variables”Sidecar .toml files (and models/_defaults.toml) go through the same ${VAR} / ${VAR:-default} substitution as rocky.toml. This lets an orchestrator inject per-model [target] values via subprocess env without templating the sidecar:
[target]catalog = "${ROCKY_TARGET_CATALOG:-warehouse}"schema = "${ROCKY_TARGET_SCHEMA:-marts}"table = "${ROCKY_TABLE_OVERRIDE:-customer_facts}"See Environment Variables for the canonical syntax reference and examples/playground/pocs/00-foundations/07-config-layering/ for a runnable three-layer example.
[classification]
Section titled “[classification]”Per-column classification tags. Keys are column names, values are free-form classification strings. Rocky resolves each value against [mask] / [mask.<env>] in rocky.toml to pick the masking strategy, then applies both the column tag and the mask via the governance adapter after a successful DAG.
| Key pattern | Value type | Description |
|---|---|---|
<column_name> | string | Free-form classification tag (e.g. "pii", "confidential", "internal"). Matched case-insensitively against [mask] keys in rocky.toml. Tags without a matching strategy emit the W004 compiler warning unless listed in [classifications] allow_unmasked. |
name = "customers"
[classification]email = "pii"phone = "pii"ssn = "confidential"Tags are free-form strings — no enum — so teams can coin new classifications without touching the engine. See Governance for the end-to-end story (classify → mask → audit → compliance rollup) and [mask] for the resolver semantics.
Retention
Section titled “Retention”Top-level retention key on the sidecar declares a data-retention policy for the model. Parsed at load time into a typed RetentionPolicy { duration_days: u32 }.
| Field | Type | Default | Description |
|---|---|---|---|
retention | string | null | null (disabled) | Grammar ^\d+[dy]$. "d" = days verbatim, "y" = years flattened at 365 days per year (no leap-year semantics). Zero ("0d", "0y") is rejected — use null to disable. |
name = "fct_orders"retention = "90d"
[strategy]type = "incremental"timestamp_column = "_fivetran_synced"
[target]catalog = "analytics"schema = "warehouse"table = "fct_orders"Applied by GovernanceAdapter::apply_retention_policy after a successful DAG run:
| Adapter | SQL emitted |
|---|---|
| Databricks (Delta) | ALTER TABLE ... SET TBLPROPERTIES ('delta.logRetentionDuration' = '{N} days', 'delta.deletedFileRetentionDuration' = '{N} days') — both keys written together. |
| Snowflake | ALTER TABLE ... SET DATA_RETENTION_TIME_IN_DAYS = {N}. |
| BigQuery / DuckDB | Default-unsupported — those warehouses lack a first-class retention knob at the config level. |
Garbage inputs ("abc", "90", "-3d", "1.5d", leading signs, exponents) are rejected at sidecar parse time with a ModelError::InvalidRetention diagnostic naming the offending value. Inspect resolved policies + warehouse state with rocky retention-status.
Inline Format (Legacy)
Section titled “Inline Format (Legacy)”The inline format embeds TOML configuration directly in the SQL file using a ---toml / --- fenced block at the top of the file:
---tomlname = "stg_orders"depends_on = []
[target]catalog = "analytics"schema = "staging"table = "orders"---
SELECT order_id, customer_id, order_date, total_amountFROM raw_catalog.src__acme__us_west__shopify.ordersThe inline format uses the same fields as the sidecar TOML file. The SQL query follows the closing --- marker.
The frontmatter block supports the same ${VAR} / ${VAR:-default} substitution as sidecar .toml files (see Environment Variables); the SQL body below the closing --- is not substituted, so any ${VAR} token in the query stays literal.
This format is supported for backward compatibility. The sidecar format is preferred because it keeps SQL files portable and free of non-SQL syntax.
Strategy Examples
Section titled “Strategy Examples”Full Refresh
Section titled “Full Refresh”Drops and recreates the target table on every run. Use this for small dimension tables or when you need a clean rebuild.
SQL (models/dim_products.sql):
SELECT product_id, product_name, category, price, is_activeFROM raw_catalog.src__acme__us_west__shopify.productsWHERE _fivetran_deleted = falseConfig (models/dim_products.toml):
name = "dim_products"depends_on = []
[strategy]type = "full_refresh"
[target]catalog = "analytics"schema = "warehouse"table = "dim_products"
[[sources]]catalog = "raw_catalog"schema = "src__acme__us_west__shopify"table = "products"Generated SQL:
CREATE OR REPLACE TABLE analytics.warehouse.dim_products ASSELECT product_id, product_name, category, price, is_activeFROM raw_catalog.src__acme__us_west__shopify.productsWHERE _fivetran_deleted = falseIncremental
Section titled “Incremental”Appends only new rows based on a watermark column. Use this for large fact tables where full refresh is too slow.
SQL (models/fct_orders.sql):
SELECT order_id, customer_id, order_date, total_amount, _fivetran_syncedFROM raw_catalog.src__acme__us_west__shopify.ordersConfig (models/fct_orders.toml):
name = "fct_orders"depends_on = ["dim_products"]
[strategy]type = "incremental"timestamp_column = "_fivetran_synced"
[target]catalog = "analytics"schema = "warehouse"table = "fct_orders"
[[sources]]catalog = "raw_catalog"schema = "src__acme__us_west__shopify"table = "orders"Generated SQL (on incremental runs):
INSERT INTO analytics.warehouse.fct_ordersSELECT order_id, customer_id, order_date, total_amount, _fivetran_syncedFROM raw_catalog.src__acme__us_west__shopify.ordersWHERE _fivetran_synced > ( SELECT COALESCE(MAX(_fivetran_synced), TIMESTAMP '1970-01-01') FROM analytics.warehouse.fct_orders)On the first run (when the target table does not exist), Rocky performs a full refresh automatically.
Upserts rows based on a unique key. Matching rows are updated; non-matching rows are inserted. Use this for slowly changing dimensions or tables with late-arriving updates.
SQL (models/dim_customers.sql):
SELECT customer_id, customer_name, email, segment, lifetime_value, updated_atFROM raw_catalog.src__acme__us_west__shopify.customersWHERE _fivetran_deleted = falseConfig (models/dim_customers.toml):
name = "dim_customers"depends_on = []
[strategy]type = "merge"unique_key = ["customer_id"]update_columns = ["customer_name", "email", "segment", "lifetime_value", "updated_at"]
[target]catalog = "analytics"schema = "warehouse"table = "dim_customers"
[[sources]]catalog = "raw_catalog"schema = "src__acme__us_west__shopify"table = "customers"Generated SQL:
MERGE INTO analytics.warehouse.dim_customers AS targetUSING ( SELECT customer_id, customer_name, email, segment, lifetime_value, updated_at FROM raw_catalog.src__acme__us_west__shopify.customers WHERE _fivetran_deleted = false) AS sourceON target.customer_id = source.customer_idWHEN MATCHED THEN UPDATE SET target.customer_name = source.customer_name, target.email = source.email, target.segment = source.segment, target.lifetime_value = source.lifetime_value, target.updated_at = source.updated_atWHEN NOT MATCHED THEN INSERT *When update_columns is omitted, Rocky updates all non-key columns.
Ephemeral
Section titled “Ephemeral”An ephemeral model is never materialized — Rocky inlines it as a CTE in every downstream consumer. Useful for lightweight intermediate transformations you don’t want to persist.
Config (models/stg_recent_orders.toml):
name = "stg_recent_orders"depends_on = []
[strategy]type = "ephemeral"
[target]catalog = "analytics"schema = "staging"table = "stg_recent_orders"No DDL runs for ephemeral models. The SQL body is injected as a WITH stg_recent_orders AS (…) CTE wherever the model is referenced.
Delete + Insert
Section titled “Delete + Insert”Deletes matching rows by partition key, then inserts fresh data. A lower-overhead alternative to merge when the partition key identifies the rows being rewritten.
Config (models/fct_daily_activity.toml):
name = "fct_daily_activity"depends_on = []
[strategy]type = "delete_insert"partition_by = ["activity_date"]
[target]catalog = "analytics"schema = "warehouse"table = "fct_daily_activity"Microbatch
Section titled “Microbatch”An alias for time_interval with hour-granularity defaults. dbt-compatible naming for partition-based incremental processing.
Config (models/fct_hourly_events.toml):
name = "fct_hourly_events"depends_on = []
[strategy]type = "microbatch"timestamp_column = "event_at" # TIMESTAMP column on the model output# granularity = "hour" # optional — defaults to hour
[target]catalog = "analytics"schema = "warehouse"table = "fct_hourly_events"Content-Addressed
Section titled “Content-Addressed”Writes the model’s SELECT result to a Delta UniForm table as content-addressed Parquet (blake3-hashed file names) plus a Delta log commit. Designed for cross-engine reads from DuckDB, Trino, Spark, and any Iceberg-compatible reader — Rocky owns the writer, the consumers read directly from the object store. See Content-Addressed Materialization for the why and when.
Config (models/fct_events.toml):
name = "fct_events"depends_on = []
[strategy]type = "content_addressed"storage_prefix = "s3://${ROCKY_BUCKET}/marts/fct_events"partition_columns = ["event_date"]
[target]catalog = "analytics"schema = "marts"table = "fct_events"The runtime executes the model SQL, converts the result to Arrow, hashes the Parquet bytes, uploads to storage_prefix, and emits a Delta log commit. partition_columns may be omitted for unpartitioned tables. Backed by the rocky-iceberg writer (shipped in engine v1.30.0 across Phases 1–5: discover, write, sync, partitioned, rowTracking, schema evolution).
Time Interval
Section titled “Time Interval”Partition-keyed materialization for time-series data. The model SQL uses @start_date and @end_date placeholders that the runtime substitutes per partition.
SQL (models/fct_daily_events.sql):
SELECT event_date, event_type, COUNT(*) AS event_countFROM raw_catalog.events.page_viewsWHERE event_date >= @start_date AND event_date < @end_dateGROUP BY event_date, event_typeConfig (models/fct_daily_events.toml):
name = "fct_daily_events"depends_on = []
[strategy]type = "time_interval"time_column = "event_date"granularity = "day"lookback = 3first_partition = "2024-01-01"
[target]catalog = "analytics"schema = "warehouse"table = "fct_daily_events"CLI flags for time-interval models. As of engine v1.33 every flag below is accepted on both rocky plan and the legacy rocky run alias; rocky run continues to work and emits a one-line [deprecated] notice to stderr (silence with ROCKY_SUPPRESS_DEPRECATION=1). The canonical form is rocky plan followed by rocky apply <plan-id>.
# Process a specific partitionrocky plan --partition 2026-04-01 && rocky apply <plan-id>
# Process a date rangerocky plan --from 2026-03-01 --to 2026-04-01 && rocky apply <plan-id>
# Process the latest partitionrocky plan --latest && rocky apply <plan-id>
# Discover and process missing partitionsrocky plan --missing && rocky apply <plan-id>
# Set lookback windowrocky plan --lookback 7 && rocky apply <plan-id>
# Parallelize partition processingrocky plan --parallel 4 && rocky apply <plan-id>Per-partition state is tracked in the state store. The --missing flag consults stored partition records to discover gaps.
DAG Resolution
Section titled “DAG Resolution”Rocky automatically resolves the execution order of models based on their depends_on declarations. Models are executed in topological order, meaning every upstream dependency runs before its downstream dependents.
During rocky validate, the DAG is checked for cycles. If a cycle is detected (e.g., model A depends on B, B depends on A), validation fails with an error listing the cycle.
!! dag_validation — cycle detected: fct_orders -> dim_customers -> fct_ordersModels with no dependencies run first. Models at the same depth in the DAG may run concurrently in future versions.