Skip to content

Bronze Layer

The bronze layer is Rocky’s config-driven replication within the warehouse. No SQL files needed — Rocky discovers what tables are available, generates the SQL, and copies data from the ingestion catalog into structured target catalogs/schemas.

rocky discover → rocky plan → rocky run
  1. Discover — Finds what schemas and tables are available for processing. For fivetran adapters, calls the Fivetran REST API to list connectors and enabled tables. For duckdb adapters, queries information_schema. For manual adapters, reads inline schema/table definitions. Discovery is metadata-only — it identifies what exists, it does not extract data.
  2. Plan — Parses source schema names, resolves target catalogs/schemas, generates SQL statements. Shows what will happen without executing.
  3. Run — Executes the plan: creates catalogs/schemas, copies data, runs quality checks, updates watermarks.

Source schemas follow a naming convention. Rocky parses these into structured components using a configurable pattern:

src__acme__us_west__shopify
│ │ │ │
│ │ │ └── source (connector name)
│ │ └── regions (variable-length)
│ └── tenant
└── prefix (stripped)

The pattern is defined under the pipeline source in rocky.toml:

[pipeline.bronze.source.schema_pattern]
prefix = "src__"
separator = "__"
components = ["tenant", "regions...", "source"]

Given src__acme__us_west__shopify, Rocky extracts:

  • tenant = "acme"
  • regions = ["us_west"]
  • source = "shopify"

Templates on the pipeline target determine where data lands:

[pipeline.bronze.target]
adapter = "prod"
catalog_template = "warehouse"
schema_template = "stage__{source}"

Using the parsed components:

  • warehouse is a static catalog name (no variable substitution)
  • stage__{source} resolves to stage__shopify

So fivetran_catalog.src__shopify.orders is copied to warehouse.stage__shopify.orders.

For multi-tenant setups where each tenant gets its own catalog, see Schema Patterns for the {tenant}_warehouse + components = ["tenant", "regions...", "source"] pattern.

When auto_create_catalogs = true and auto_create_schemas = true, Rocky creates target catalogs and schemas before copying data:

CREATE CATALOG IF NOT EXISTS warehouse;
CREATE SCHEMA IF NOT EXISTS warehouse.stage__shopify;

Catalogs are tagged (e.g., managed_by = "rocky") so Rocky can later discover which catalogs it manages.

On the first run (no watermark), Rocky performs a full refresh. On subsequent runs, it only copies rows where the timestamp column exceeds the last known watermark:

INSERT INTO acme_warehouse.staging__us_west__shopify.orders
SELECT *, CAST(NULL AS STRING) AS _loaded_by
FROM fivetran_catalog.src__acme__us_west__shopify.orders
WHERE _fivetran_synced > (
SELECT COALESCE(MAX(_fivetran_synced), TIMESTAMP '1970-01-01')
FROM acme_warehouse.staging__us_west__shopify.orders
)

The _fivetran_synced column is Fivetran’s built-in timestamp that records when each row was synced. Rocky uses it as the watermark column by default (configurable via timestamp_column).

If schema drift is detected (column type mismatch between source and target), Rocky falls back to a full refresh: it drops the target table and recreates it.

Rocky can add metadata columns to replicated tables. They are declared on the pipeline alongside strategy and timestamp_column:

[pipeline.bronze]
type = "replication"
strategy = "incremental"
timestamp_column = "_fivetran_synced"
metadata_columns = [
{ name = "_loaded_by", type = "STRING", value = "NULL" }
]

These are appended to the SELECT: SELECT *, CAST(NULL AS STRING) AS _loaded_by.

Scope execution to a specific tenant:

Terminal window
rocky run --config rocky.toml --filter tenant=acme

This processes only schemas where the parsed tenant component matches acme.

In dbt, you write one staging model per source table:

-- models/staging/shopify/stg_orders.sql
SELECT * FROM {{ source('shopify', 'orders') }}

Multiply that by every table, every source, every tenant. For a multi-tenant setup with 50 connectors and 20 tables each, that’s 1,000 SQL files that all look the same.

In Rocky, the entire bronze layer is config-driven. Zero SQL files. The rocky.toml configuration handles all source-to-target mapping, and Rocky generates the appropriate SQL at runtime.