Bronze Layer

The bronze layer is Rocky’s config-driven replication within the warehouse. No SQL files needed. Rocky discovers what tables are available, generates the SQL, and copies data from the ingestion catalog into structured target catalogs and schemas.

The flow

rocky discover  →  rocky plan  →  rocky apply

Discover. Finds what schemas and tables are available for processing. For fivetran adapters, it calls the Fivetran REST API to list connectors and enabled tables. For duckdb adapters, it queries information_schema. For manual adapters, it reads inline schema and table definitions.
Plan. Parses source schema names, resolves target catalogs and schemas, and generates SQL statements. Records a deterministic plan keyed by plan_id.
Apply. Executes the plan by id: creates catalogs and schemas, copies data, runs quality checks, updates watermarks. The rocky run alias collapses plan and apply into a single invocation for local iteration and automation.

Schema pattern parsing

Source schemas follow a naming convention. Rocky parses these into structured components using a configurable pattern:

src__acme__us_west__shopify
│    │     │        │
│    │     │        └── source (connector name)
│    │     └── regions (variable-length)
│    └── tenant
└── prefix (stripped)

The pattern is defined under the pipeline source in rocky.toml:

[pipeline.bronze.source.schema_pattern]
prefix = "src__"
separator = "__"
components = ["tenant", "regions...", "source"]

Given src__acme__us_west__shopify, Rocky extracts:

tenant = "acme"
regions = ["us_west"]
source = "shopify"

Target mapping

Templates on the pipeline target determine where data lands:

[pipeline.bronze.target]
adapter = "prod"
catalog_template = "warehouse"
schema_template = "stage__{source}"

Using the parsed components:

warehouse is a static catalog name (no variable substitution)
stage__{source} resolves to stage__shopify

So fivetran_catalog.src__acme__us_west__shopify.orders is copied to warehouse.stage__shopify.orders.

For multi-tenant setups where each tenant gets its own catalog, see Schema Patterns for the {tenant}_warehouse + components = ["tenant", "regions...", "source"] pattern.

Auto-creation

When auto_create_catalogs = true and auto_create_schemas = true, Rocky creates target catalogs and schemas before copying data:

CREATE CATALOG IF NOT EXISTS warehouse;
CREATE SCHEMA IF NOT EXISTS warehouse.stage__shopify;

Catalogs are tagged (e.g., managed_by = "rocky") so Rocky can later discover which catalogs it manages.

Incremental strategy

On the first run (no watermark), Rocky performs a full refresh. On subsequent runs, it only copies rows where the timestamp column exceeds the last known watermark:

INSERT INTO warehouse.stage__shopify.orders
SELECT *, CAST(NULL AS STRING) AS _loaded_by
FROM fivetran_catalog.src__acme__us_west__shopify.orders
WHERE _fivetran_synced > TIMESTAMP '2026-04-17 09:30:00'

The watermark literal is the previous run’s MAX(_fivetran_synced), which Rocky stores in its state store and threads into the query — it does not read it back from the target with a subquery. The _fivetran_synced column is Fivetran’s built-in timestamp that records when each row was synced. Rocky uses it as the watermark column by default (configurable via timestamp_column).

If schema drift is detected, Rocky applies a graduated response: safe type widenings become ALTER COLUMN TYPE, newly added columns become ALTER TABLE ADD COLUMN, and only unsafe type changes fall back to a full refresh (dropping and recreating the target table).

Metadata columns

Rocky can add metadata columns to replicated tables. They are declared on the pipeline alongside strategy and timestamp_column:

[pipeline.bronze]
type = "replication"
strategy = "incremental"
timestamp_column = "_fivetran_synced"
metadata_columns = [
    { name = "_loaded_by", type = "STRING", value = "NULL" }
]

These are appended to the SELECT: SELECT *, CAST(NULL AS STRING) AS _loaded_by.

Filtering

Scope execution to a specific tenant:

plan_id=$(rocky --config rocky.toml plan --filter tenant=acme --output json | jq -r .plan_id)
rocky apply "$plan_id"

This processes only schemas where the parsed tenant component matches acme.

Comparison to dbt Core

In dbt Core, you write one staging model per source table:

-- models/staging/shopify/stg_orders.sql
SELECT * FROM {{ source('shopify', 'orders') }}

Multiply that by every table, every source, every tenant. For a multi-tenant setup with 50 connectors and 20 tables each, that’s 1,000 SQL files that all look the same.

In Rocky, the entire bronze layer is config-driven. Zero SQL files.