Skip to content

Core Pipeline Commands

The core pipeline commands cover the full lifecycle of a Rocky pipeline: project initialization, configuration validation, source discovery, dry-run planning, execution, and state inspection.

These flags apply to all commands and are documented in the CLI Reference.

FlagShortTypeDefaultDescription
--config <PATH>-cPathBufrocky.tomlPath to the pipeline configuration file.
--output <FORMAT>-ostringjsonOutput format: json or table.
--state-path <PATH>PathBuf.rocky-state.redbPath to the embedded state store for watermarks.

Initialize a new Rocky project with starter configuration and directory structure.

Terminal window
rocky init [path] [flags]
ArgumentTypeDefaultDescription
pathstring.Directory where the project will be created.
FlagTypeDefaultDescription
--template <NAME>stringduckdbScaffold template. One of duckdb, databricks-fivetran, snowflake, bigquery, trino. Each template emits a runnable rocky.toml with the matching adapter wired up via ${VAR} env-var placeholders (never inline secrets) plus a models/welcome.{sql,toml} that compiles with no source tables.

Create a project in the current directory (default DuckDB template):

Terminal window
rocky init
Created rocky.toml
Created models/
Rocky project initialized.

Create a Trino-targeted project in a new directory:

Terminal window
rocky init acme-trino --template trino

The emitted rocky.toml wires the trino adapter to ${TRINO_HOST} / ${TRINO_USER} / ${TRINO_PASSWORD} (HTTP Basic) or ${TRINO_JWT} (JWT bearer), with inline TOML comments documenting both auth modes.


Check the pipeline configuration for correctness without connecting to any external APIs. Returns a non-zero exit code if any check fails.

Terminal window
rocky validate [flags]

No command-specific flags. Uses global flags only.

CheckDescription
TOML syntaxThe config file parses without errors as v2 (named adapters + named pipelines).
AdaptersEach [adapter.NAME] is a recognized type (databricks, snowflake, duckdb, fivetran, bigquery, trino, airbyte, iceberg, manual) with the required fields populated. For Databricks, at least one of token or client_id/client_secret must be set. The known-types list is driven directly off the adapter registry, so new first-party adapters propagate without a follow-up edit.
PipelinesEach [pipeline.NAME] references existing adapters for source, target, and (optional) discovery, and its schema_pattern parses.
DAG validationIf models/ exists, loads all models and checks for dependency cycles.

Validate the default config:

Terminal window
rocky validate
ok Config syntax valid (v2 format)
ok adapter.fivetran: fivetran
ok adapter.prod: databricks (auth configured)
ok pipeline.bronze: schema pattern parseable
ok pipeline.bronze: replication / incremental -> warehouse / stage__{source}
Validation complete.

Validate a specific config file:

Terminal window
rocky -c pipelines/prod.toml validate
ok Config syntax valid (v2 format)
ok adapter.fivetran: fivetran
!! adapter.prod: no auth configured (token or client_id/secret)
ok pipeline.bronze: schema pattern parseable
ok pipeline.bronze: replication / incremental -> warehouse / stage__{source}
Validation complete.

List available connectors and their tables from the configured source. This is a metadata-only operation — it identifies what schemas and tables exist, but does not move any data.

Terminal window
rocky discover [flags]
FlagTypeDefaultDescription
--pipeline <NAME>stringPipeline name (required if multiple pipelines are defined).

Discover all sources with JSON output:

Terminal window
rocky discover
{
"version": "1.6.0",
"command": "discover",
"sources": [
{
"id": "connector_abc123",
"components": { "tenant": "acme", "regions": ["us_west"], "source": "shopify" },
"source_type": "fivetran",
"last_sync_at": "2026-03-30T10:00:00Z",
"tables": [
{ "name": "orders", "row_count": null },
{ "name": "customers", "row_count": null }
]
},
{
"id": "connector_def456",
"components": { "tenant": "acme", "regions": ["eu_central"], "source": "stripe" },
"source_type": "fivetran",
"last_sync_at": "2026-03-29T22:15:00Z",
"tables": [
{ "name": "charges", "row_count": null },
{ "name": "refunds", "row_count": null }
]
}
]
}

Discover with table output:

Terminal window
rocky -o table discover
connector_id | components | tables
------------------+-------------------------------------+-------
connector_abc123 | acme / us_west / shopify | 12
connector_def456 | acme / eu_central / stripe | 8

Discover a specific pipeline when multiple are defined:

Terminal window
rocky discover --pipeline shopify_us
  • rocky plan — generate SQL from discovered sources
  • rocky run — discover and execute in one step

Generate the SQL statements Rocky would execute without actually running them. Useful for auditing, previewing changes, and CI/CD approval workflows.

Terminal window
rocky plan --filter <key=value> [flags]
FlagTypeDefaultDescription
--filter <key=value>string(required)Filter sources by component value (e.g., client=acme).
--pipeline <NAME>stringPipeline name (required if multiple pipelines are defined).

Plan all SQL for a specific tenant:

Terminal window
rocky plan --filter client=acme
{
"version": "1.6.0",
"command": "plan",
"filter": "client=acme",
"statements": [
{
"purpose": "create_catalog",
"target": "acme_warehouse",
"sql": "CREATE CATALOG IF NOT EXISTS acme_warehouse"
},
{
"purpose": "create_schema",
"target": "acme_warehouse.staging__us_west__shopify",
"sql": "CREATE SCHEMA IF NOT EXISTS acme_warehouse.staging__us_west__shopify"
},
{
"purpose": "incremental_copy",
"target": "acme_warehouse.staging__us_west__shopify.orders",
"sql": "SELECT *, CAST(NULL AS STRING) AS _loaded_by FROM source_catalog.src__acme__us_west__shopify.orders WHERE _fivetran_synced > (SELECT COALESCE(MAX(_fivetran_synced), TIMESTAMP '1970-01-01') FROM acme_warehouse.staging__us_west__shopify.orders)"
}
]
}

Plan with table output and a custom config:

Terminal window
rocky -c pipelines/prod.toml -o table plan --filter client=acme
purpose | target | sql (truncated)
------------------+----------------------------------------------------+--------------------------
create_catalog | acme_warehouse | CREATE CATALOG IF NOT...
create_schema | acme_warehouse.staging__us_west__shopify | CREATE SCHEMA IF NOT...
incremental_copy | acme_warehouse.staging__us_west__shopify.orders | SELECT *, CAST(NULL...
incremental_copy | acme_warehouse.staging__us_west__shopify.customers | SELECT *, CAST(NULL...

Plan for a specific pipeline:

Terminal window
rocky plan --filter client=acme --pipeline shopify_us

Note: as of engine v1.33, the canonical form is rocky plan followed by rocky apply <plan-id>. rocky run continues to work and is now an alias; it emits a one-line [deprecated] notice to stderr that can be silenced with ROCKY_SUPPRESS_DEPRECATION=1.

Execute the full pipeline end-to-end: discover sources, detect schema drift, create catalogs/schemas, copy data, apply governance, and run quality checks.

Terminal window
rocky run --filter <key=value> [flags]
FlagTypeDefaultDescription
--filter <key=value>string(required)Filter sources by component value (e.g., client=acme).
--pipeline <NAME>stringPipeline name (required if multiple pipelines are defined).
--governance-override <JSON>stringAdditional governance config as inline JSON or @file.json, merged with defaults.
--models <PATH>PathBufModels directory for transformation execution.
--allboolfalseExecute both replication and compiled models.
--resume <RUN_ID>stringResume a specific previous run from its last checkpoint; mints a new run_id and records the prior one as resumed_from.
--resume-latestboolfalseResume the most recent failed run from its last checkpoint; mints a new run_id and records the prior one as resumed_from.
--shadowboolfalseRun in shadow mode: write to shadow targets instead of production.
--shadow-suffix <SUFFIX>string_rocky_shadowSuffix appended to table names in shadow mode.
--shadow-schema <NAME>stringOverride schema for shadow tables (mutually exclusive with --shadow-suffix).
--branch <NAME>stringExecute against a named branch previously registered with rocky branch create. Applies the branch’s schema_prefix to every target (internally equivalent to --shadow --shadow-schema <branch.schema_prefix>). Mutually exclusive with --shadow / --shadow-schema.
--watchboolfalseWrap the run in a filesystem watcher: re-execute the pipeline on every change to rocky.toml or any file under models/, debounced to 200 ms so editor save bursts coalesce into a single re-run. Failed runs do not exit the loop; Ctrl-C exits cleanly between runs. v0 limitations: mutually exclusive with --dag, --resume, --resume-latest, --idempotency-key, and --model (rejected at parse time).
  1. Discover — enumerate sources and tables.
  2. Governance setup (sequential, per catalog/schema) — create catalogs, apply tags, bind workspaces, grant permissions, create schemas.
  3. Parallel table processing (up to execution.concurrency) — drift detection, incremental copy, tag application, watermark update.
  4. Batched checks — row count, column match, freshness.
  5. Retry — failed tables retried sequentially (per execution.table_retries).

Run the pipeline for a specific tenant:

Terminal window
rocky run --filter client=acme
{
"version": "1.6.0",
"command": "run",
"filter": "client=acme",
"duration_ms": 45200,
"tables_copied": 20,
"tables_failed": 0,
"materializations": [
{
"asset_key": ["fivetran", "acme", "us_west", "shopify", "orders"],
"rows_copied": null,
"duration_ms": 2300,
"metadata": {
"strategy": "incremental",
"watermark": "2026-03-30T10:00:00Z",
"target_table_full_name": "acme_warehouse.staging__us_west__shopify.orders"
}
}
],
"check_results": [],
"errors": [],
"excluded_tables": [],
"permissions": { "grants_added": 3, "grants_revoked": 0, "catalogs_created": 0, "schemas_created": 1 },
"drift": { "tables_checked": 20, "tables_drifted": 1, "actions_taken": [] },
"anomalies": [],
"partition_summaries": []
}

Run with a governance override file:

Terminal window
rocky run --filter client=acme --governance-override @overrides/acme.json

Run both replication and model transformations:

Terminal window
rocky run --filter client=acme --models models/ --all

Resume the most recent failed run from its last checkpoint:

Terminal window
rocky run --filter client=acme --resume-latest

Run in shadow mode (writes to *_rocky_shadow tables instead of production) so you can compare results before promoting:

Terminal window
rocky run --filter client=acme --shadow
rocky compare --filter client=acme

Or run against a named branch — the persistent, named analogue of --shadow:

Terminal window
rocky branch create fix-price --description "testing reprice migration"
rocky run --filter client=acme --branch fix-price

Run in watch mode for the inner-loop developer workflow — every save re-materializes the pipeline against the local DuckDB warehouse:

Terminal window
rocky run --watch

--watch watches the parent directory of rocky.toml (filtered to rocky.toml itself) plus the resolved models/ directory recursively. The directory watch is FSEvents-safe on macOS — atomic-rename saves (vim’s :w, VSCode’s default) trigger correctly where a file-level watch can miss the new inode. Banner / “detected change” lines go to stderr so stdout stays parseable; with --output json, each iteration emits one RunOutput JSON object on stdout (newline-delimited).


Display stored watermarks from the embedded state file. Shows every tracked table with its last watermark value and the timestamp it was recorded.

Terminal window
rocky state [flags]

No command-specific flags. Uses global flags only.

Show watermarks with JSON output:

Terminal window
rocky state
{
"version": "1.6.0",
"command": "state",
"watermarks": [
{
"table": "acme_warehouse.staging__us_west__shopify.orders",
"last_value": "2026-03-30T10:00:00Z",
"updated_at": "2026-03-30T10:01:32Z"
},
{
"table": "acme_warehouse.staging__us_west__shopify.customers",
"last_value": "2026-03-30T09:55:00Z",
"updated_at": "2026-03-30T10:01:32Z"
}
]
}

Show watermarks with table output using a custom state path:

Terminal window
rocky -o table --state-path /var/rocky/state.redb state
table | last_value | updated_at
-----------------------------------------------------+---------------------------+---------------------------
acme_warehouse.staging__us_west__shopify.orders | 2026-03-30T10:00:00Z | 2026-03-30T10:01:32Z
acme_warehouse.staging__us_west__shopify.customers | 2026-03-30T09:55:00Z | 2026-03-30T10:01:32Z
acme_warehouse.staging__eu_central__stripe.charges | 2026-03-29T22:15:00Z | 2026-03-30T10:01:32Z

Manage named virtual branches. A branch is the persistent, named analogue of --shadow mode: it records a schema_prefix in the state store and, when rocky plan --branch <name> + rocky apply <plan-id> is invoked (or the legacy rocky run --branch <name> alias), every model target has the prefix applied. Schema-prefix branches work uniformly across every adapter today; warehouse-native clones (Delta SHALLOW CLONE, Snowflake zero-copy CLONE) are a follow-up.

Terminal window
rocky branch create <name> [--description <text>]
rocky branch delete <name>
rocky branch list
rocky branch show <name>
rocky branch compare <name> [--filter <key=value>]
rocky branch approve <name> [--message <text>] [--out <path>]
rocky branch promote <name> [--allow-breaking] [--base-ref <ref>]
[--models <path>] [--skip-approval]
[--filter <key=value>]
rocky branch promote <name> --plan <plan-id> # canonical: plan + apply

Branch names accept [A-Za-z0-9_.\-] up to 64 characters. The default schema prefix is branch__<name>. Deleting a branch removes the state-store entry but does not drop warehouse tables that were materialized under it.

FlagTypeDefaultDescription
--message <text>string(none)Optional free-form note persisted in the approval artifact.
--out <path>PathBuf./.rocky/approvals/<branch>/<approval_id>.jsonOverride the artifact destination path.

Writes a content-addressed approval artifact that binds the approver’s git identity to the exact bytes of the branch’s models and project config. Editing, adding, or renaming any model after approval voids that approval, so rocky branch promote refuses to run unless the on-disk approvals still match the current state and satisfy the [branch.approval] policy.

Upgrade note (engine v1.43): approvals created before v1.43 bound to the project config only, not the model bytes. They no longer satisfy the gate after upgrading. Run rocky branch approve <name> once to re-sign each branch against its current model contents.

Note: as of engine v1.33, the canonical form is rocky plan promote <name> followed by rocky apply <plan-id> (or rocky branch promote <name> --plan <plan-id>). The bare rocky branch promote <name> form continues to work and is now an alias; it emits a one-line [deprecated] notice to stderr that can be silenced with ROCKY_SUPPRESS_DEPRECATION=1.

FlagTypeDefaultDescription
--allow-breakingflagoffBypass the semantic breaking-change gate. Always emits a breaking_changes_allowed audit event so the override leaves a paper trail.
--base-ref <ref>stringmainGit ref to diff against for the breaking-change gate.
--models <path>PathBufmodelsModels directory used by the breaking-change gate.
--skip-approvalflagoffBypass the approval gate. Always emits an approval_skipped audit event so the bypass leaves a paper trail.
--pipeline <name>string(none)Which pipeline to promote, in a multi-pipeline project. Optional when the project defines a single pipeline.
--filter <key=value>string(none)Filter the promote targets. Replication pipelines filter sources by schema-pattern component (e.g. --filter client=acme); transformation pipelines filter models by table, model, catalog, or schema.

Enumerates the pipeline’s production targets and promotes each one. A replication pipeline discovers the source connector’s tables through the schema-pattern templates; a transformation pipeline walks the configured models glob and promotes one target per model, skipping ephemeral models. It then runs the optional [branch.approval] gate, runs the semantic breaking-change gate against --base-ref, and dispatches CREATE OR REPLACE TABLE prod.<x> AS SELECT * FROM branch__<name>.<x> per target. Quality and snapshot pipelines are not supported and return a clear error.

The breaking-change gate vetoes the promote (exit nonzero) when any finding has severity == "breaking" unless --allow-breaking is passed. Every gate decision — block, allow-via-override, or fail-open when the gate couldn’t run — is recorded in the audit trail. See rocky ci-diff --semantic to surface the same findings informationally on every PR.

Create, list, run against, and delete a branch:

Terminal window
rocky branch create fix-price --description "testing reprice migration"
{
"version": "1.11.0",
"command": "branch create",
"branch": {
"name": "fix-price",
"schema_prefix": "branch__fix-price",
"created_by": "hugo",
"created_at": "2026-04-20T14:22:11+00:00",
"description": "testing reprice migration"
}
}
Terminal window
rocky branch list
{
"version": "1.11.0",
"command": "branch list",
"total": 2,
"branches": [
{ "name": "fix-price", "schema_prefix": "branch__fix-price", "created_by": "hugo", "created_at": "2026-04-20T14:22:11+00:00", "description": "testing reprice migration" },
{ "name": "ingest-v2", "schema_prefix": "branch__ingest-v2", "created_by": "ci", "created_at": "2026-04-18T09:05:00+00:00", "description": null }
]
}
Terminal window
rocky run --filter client=acme --branch fix-price
rocky branch delete fix-price

Diff a branch’s materialized tables against production (row counts + schemas):

Terminal window
rocky branch compare fix-price

Internally this is rocky compare pointed at the branch’s schema_prefix via ShadowConfig.schema_override — the same mechanism rocky run --branch uses for writes, so compare always hits exactly the tables the branch produced. Accepts the shared --filter flag.

  • rocky run — execute a pipeline against a branch via --run --branch
  • rocky compare — diff an ad-hoc shadow against production (the generic form rocky branch compare specialises)