CLI Reference
Rocky provides a single binary with subcommands for the full pipeline lifecycle. Commands are organized into categories:
- Core Pipeline:
init,validate,discover,plan,apply,state,branch(legacy alias:run) - Modeling:
compile,lineage,lineage-diff,test,ci,ci-diff,preview - Data:
seed,snapshot,docs - AI:
ai,ai-sync,ai-explain,ai-test - Development:
playground,shell,watch,fmt,list,serve,lsp,import-dbt,init-adapter,adapter,hooks,validate-migration,test-adapter - Administration:
history,replay,trace,metrics,optimize,compact,profile-storage,archive,compliance,retention-status - Diagnostics:
doctor,compare
See the dedicated command reference pages for detailed documentation of each category.
Global Flags
Section titled “Global Flags”These flags apply to all commands.
| Flag | Short | Default | Description |
|---|---|---|---|
--config <PATH> | -c | rocky.toml | Path to the pipeline configuration file. |
--output <FORMAT> | -o | json | Output format. Accepted values: json, table. |
--state-path <PATH> | resolved (see below) | Path to the embedded state store. When omitted, Rocky resolves to <models>/.rocky-state.redb (canonical) or a legacy CWD .rocky-state.redb (deprecated, warns on stderr). Passing the flag explicitly is always a hard override. See rocky state. | |
--cache-ttl <SECONDS> | [cache.schemas] ttl_seconds or 86400 | Override the DESCRIBE TABLE schema-cache TTL for this invocation. Precedence: --cache-ttl > rocky.toml > 86400 (24 h). --cache-ttl 0 treats every entry as instantly stale. To disable the cache entirely, set [cache.schemas] enabled = false in rocky.toml. Applies to the CLI read path only (rocky compile, rocky plan, rocky apply, rocky run, …); rocky lsp / rocky serve keep the config-derived TTL. |
# Example: use a custom config and table outputrocky -c pipelines/prod.toml -o table discover
# Force a fresh typecheck against warehouse metadatarocky --cache-ttl 0 compileCommands
Section titled “Commands”rocky init
Section titled “rocky init”Scaffolds a new Rocky project in the target directory.
rocky init [path]Arguments:
| Argument | Default | Description |
|---|---|---|
path | . (current directory) | Directory where the project will be created. |
Behavior:
- Creates a starter
rocky.tomlwith placeholder values. - Creates a
models/directory for SQL model files. - Fails with an error if
rocky.tomlalready exists in the target directory.
Example:
# Scaffold in current directoryrocky init
# Scaffold in a new directoryrocky init my-pipelinerocky validate
Section titled “rocky validate”Checks the pipeline configuration for correctness without connecting to any external APIs.
rocky validateChecks performed:
| Check | Description |
|---|---|
| TOML syntax | The config file parses without errors as v2 (named adapters + named pipelines). |
| Adapters | Each [adapter.NAME] is a recognized type (databricks, snowflake, duckdb, fivetran, manual) with the required fields populated. |
| Pipelines | Each [pipeline.NAME] references existing adapters for source, target, and (optional) discovery, and its schema_pattern parses. |
| DAG validation | If models/ exists, loads all models and checks for dependency cycles. |
Output:
Each check prints ok or !! followed by a short description. A non-zero exit code is returned if any check fails.
ok Config syntax valid (v2 format)ok adapter.fivetran: fivetranok adapter.prod: databricks (auth configured)ok pipeline.bronze: schema pattern parseableok pipeline.bronze: replication / incremental -> warehouse / stage__{source}rocky discover
Section titled “rocky discover”Lists available connectors and their tables from the configured source.
rocky discover [--pipeline NAME] [--with-schemas]Flags:
| Flag | Description |
|---|---|
--pipeline <NAME> | Pipeline name. Required when more than one [pipeline.NAME] is defined. |
--with-schemas | Warm the schema cache for every discovered source. For each (catalog, schema) pair reachable via the source adapter, issues one batch_describe_schema round-trip and persists the per-table columns to state.redb::schema_cache. Subsequent rocky compile / rocky lsp invocations pick up the entries instead of typechecking leaf models as Unknown. Errors on individual sources are logged and skipped. Setting this flag with [cache.schemas] enabled = false errors with a clear message rather than silently no-op-ing. DiscoverOutput.schemas_cached records the count. |
Behavior:
- For
fivetranadapters, calls the Fivetran REST API to list connectors and their enabled tables. Forduckdbadapters, queriesinformation_schema.{schemata,tables}. Formanualadapters, reads inline schema/table definitions. - This is a metadata-only operation — it identifies what schemas and tables exist, it does not extract or move data.
- Parses each source schema name using the pipeline’s
schema_patternto extract structured components (tenant, regions, source, etc.). - Returns structured data about every discovered source and its tables.
JSON output:
{ "version": "1.6.0", "command": "discover", "sources": [ { "id": "connector_abc123", "components": { "tenant": "acme", "regions": ["us_west"], "source": "shopify" }, "source_type": "fivetran", "last_sync_at": "2026-03-30T10:00:00Z", "tables": [{ "name": "orders", "row_count": null }] } ]}Table output:
connector_id | components | tables──────────────────┼─────────────────────────────────────┼───────connector_abc123 | acme / us_west / shopify | 12connector_def456 | acme / eu_central / stripe | 8rocky plan
Section titled “rocky plan”Generates the SQL statements Rocky would execute, without actually running them. Useful for auditing and previewing changes before a run.
rocky plan --filter <key=value> [--pipeline NAME]Flags:
| Flag | Required | Description |
|---|---|---|
--filter <key=value> | Yes | Filter sources by component. Example: --filter tenant=acme. |
--pipeline <NAME> | Pipeline name. Required when more than one pipeline is defined. |
Behavior:
- Runs discovery and drift detection.
- Generates all SQL statements (catalog creation, schema creation, incremental copy, permission grants) and returns them without execution.
JSON output:
{ "version": "1.6.0", "command": "plan", "filter": "tenant=acme", "statements": [ { "purpose": "create_catalog", "target": "acme_warehouse", "sql": "CREATE CATALOG IF NOT EXISTS acme_warehouse" }, { "purpose": "create_schema", "target": "acme_warehouse.staging__us_west__shopify", "sql": "..." }, { "purpose": "incremental_copy", "target": "acme_warehouse.staging__us_west__shopify.orders", "sql": "..." } ]}rocky run
Section titled “rocky run”Note: as of engine v1.33, the canonical form is
rocky planfollowed byrocky apply <plan-id>.rocky runcontinues to work and is now an alias; it emits a one-line[deprecated]notice to stderr that can be silenced withROCKY_SUPPRESS_DEPRECATION=1.
Executes the full pipeline end-to-end.
rocky run --filter <key=value> [flags]Flags:
| Flag | Required | Description |
|---|---|---|
--filter <key=value> | Yes | Filter sources by component. Example: --filter tenant=acme. |
--pipeline <NAME> | Pipeline name. Required when more than one pipeline is defined. | |
--governance-override <JSON> | Additional governance config as inline JSON or @file.json, merged with defaults. | |
--models <PATH> | Models directory for transformation execution. | |
--all | Execute both replication and compiled models. | |
--resume <RUN_ID> | Resume a specific previous run from its last checkpoint; mints a new run_id and records the prior one as resumed_from. | |
--resume-latest | Resume the most recent failed run from its last checkpoint; mints a new run_id and records the prior one as resumed_from. | |
--shadow | Run in shadow mode: write to shadow targets instead of production. | |
--shadow-suffix <SUFFIX> | Suffix appended to table names in shadow mode (default _rocky_shadow). | |
--shadow-schema <NAME> | Override schema for shadow tables (mutually exclusive with --shadow-suffix). | |
--branch <NAME> | Execute against a named branch created with rocky branch create. Mutually exclusive with --shadow / --shadow-schema. See rocky branch. | |
--idempotency-key <KEY> | Caller-supplied opaque key used to dedup this run against prior runs with the same key. Three outcomes: a prior run succeeded (or reached a terminal state under dedup_on = "any") → exit 0 with status = "skipped_idempotent" and the prior skipped_by_run_id; another caller currently holds the claim within in_flight_ttl_hours → exit 0 with status = "skipped_in_flight"; otherwise proceed normally. Rejected when combined with --resume / --resume-latest (resume is an explicit override). Stamps are stored verbatim — do not put secrets in the key. See [state.idempotency] for tuning. |
Pipeline stages (in order):
- Discover — enumerate sources and tables from the configured source adapter.
- Governance setup (sequential, per matching catalog/schema):
- Create catalog (if
auto_create_catalogs = true) - Apply catalog tags (
ALTER CATALOG SET TAGS) - Bind workspaces (Unity Catalog bindings API, if
governance.isolationconfigured) - Apply catalog-level grants (
GRANT ... ON CATALOG) - Create schema (if
auto_create_schemas = true) - Apply schema tags (
ALTER SCHEMA SET TAGS) - Apply schema-level grants (
GRANT ... ON SCHEMA)
- Create catalog (if
- Parallel table processing — for each table concurrently (up to
execution.concurrency):- Drift detection (compare column types between source and target)
- Copy data (incremental or full refresh SQL)
- Apply table tags
- Update watermark in state store
- Batched checks — row count, column match, freshness (batched with UNION ALL for efficiency)
- Retry — failed tables retried sequentially (configurable via
execution.table_retries)
JSON output:
{ "version": "1.6.0", "command": "run", "filter": "tenant=acme", "duration_ms": 45200, "tables_copied": 20, "tables_failed": 0, "materializations": [ { "asset_key": ["fivetran", "acme", "us_west", "shopify", "orders"], "rows_copied": null, "duration_ms": 2300, "metadata": { "strategy": "incremental", "watermark": "2026-03-30T10:00:00Z", "target_table_full_name": "acme_warehouse.staging__us_west__shopify.orders", "sql_hash": null } } ], "check_results": [], "errors": [], "excluded_tables": [], "permissions": { "grants_added": 3, "grants_revoked": 0, "catalogs_created": 0, "schemas_created": 1 }, "drift": { "tables_checked": 20, "tables_drifted": 1, "actions_taken": [] }, "anomalies": [], "partition_summaries": []}rocky doctor
Section titled “rocky doctor”Runs aggregate health checks on your Rocky project: config validation, state store health, adapter connectivity, pipeline consistency, state backend configuration, live state read/write, and auth.
rocky doctorChecks performed:
| Check | Name | Description |
|---|---|---|
| Config | config | Parses rocky.toml, validates adapters and pipelines |
| State | state | Verifies the local state store is readable and not corrupted |
| Adapters | adapters | Tests connectivity to configured adapters |
| Pipelines | pipelines | Validates schema patterns, templates, and governance config |
| State Sync | state_sync | Inspects the configured remote state backend (type only) |
| State RW | state_rw | Round-trips a marker object against the configured backend (put → get → delete). Surfaces IAM and reachability problems at cold start instead of end-of-run upload. No-op for local; tiered probes both legs. |
| Auth | auth, auth/<adapter> | Pings each warehouse and discovery adapter to verify credentials and connectivity |
JSON output:
{ "command": "doctor", "overall": "warning", "checks": [ { "name": "config", "status": "healthy", "message": "rocky.toml valid", "duration_ms": 4 }, { "name": "state", "status": "healthy", "message": "state store readable", "duration_ms": 2 }, { "name": "adapters", "status": "warning", "message": "adapter.fivetran: API key not set", "duration_ms": 120 } ], "suggestions": [ "Set FIVETRAN_API_KEY to enable the Fivetran discovery adapter." ]}Run a specific check:
rocky doctor --check authrocky doctor --check state_rw # live round-trip probe against the remote state backendVerbose mode (v1.20.0+):
rocky doctor --verbosePrints extra per-check context (config path, state file size, adapter type + credential signal, pipeline kind, state backend) under each check in human-readable output. The JSON output is unchanged unless --verbose is passed — each checks[] entry then carries a details array of [key, value] string pairs (omitted entirely when empty). Credential signal values: token, oauth_client, oauth_token, key_pair, password, service_account, adc, env, none.
rocky list
Section titled “rocky list”Inspect project contents: pipelines, adapters, models, sources, and dependency relationships.
rocky list pipelines # List all pipeline definitionsrocky list adapters # List all adapter configurationsrocky list models # List all transformation modelsrocky list sources # List replication source configurationsrocky list deps <model> # Show what a model depends onrocky list consumers <model> # Show what depends on a modelAll subcommands support --output json (via the parent -o json flag) for machine-readable output.
Example (table format):
$ rocky -o table list pipelinesNAME TYPE TARGET SOURCE DEPENDS ONplayground replication default default -Example (JSON format):
{ "version": "1.6.0", "command": "list_pipelines", "pipelines": [ { "name": "playground", "pipeline_type": "replication", "target_adapter": "default", "source_adapter": "default", "depends_on": [], "concurrency": "16" } ]}rocky seed
Section titled “rocky seed”Load static reference data from CSV files into the target warehouse. Rocky’s equivalent of dbt’s dbt seed.
rocky seed # Load all seeds from seeds/rocky seed --seeds data/seeds/ # Custom seeds directoryrocky seed --filter name=dim_date # Load specific seedSeeds are .csv files in the seeds/ directory. Rocky infers column types (STRING, BIGINT, DOUBLE, BOOLEAN, TIMESTAMP) from the data and creates/replaces the target tables. Optional .toml sidecars can override inferred types.
Sidecar example (seeds/dim_date.toml):
[target]catalog = "warehouse"schema = "reference"table = "dim_date"
[[columns]]name = "date_key"data_type = "DATE"JSON output:
{ "version": "1.6.0", "command": "seed", "seeds_dir": "seeds", "tables_loaded": 1, "tables_failed": 0, "tables": [ { "name": "dim_date", "target": "warehouse.reference.dim_date", "rows": 365, "columns": 4, "duration_ms": 42 } ], "duration_ms": 55}rocky compare
Section titled “rocky compare”Compare shadow tables against production tables. Used after rocky plan --shadow + rocky apply <plan-id> (or the legacy rocky run --shadow alias) to validate results before promoting shadow data to production.
rocky compare --filter <key=value> [flags]Flags:
| Flag | Required | Description |
|---|---|---|
--filter <key=value> | Yes | Filter tables by component. |
--pipeline <NAME> | Pipeline name. | |
--shadow-suffix <SUFFIX> | Shadow table suffix (default _rocky_shadow). | |
--shadow-schema <NAME> | Override schema for shadow tables. |
JSON output:
{ "version": "1.6.0", "command": "compare", "filter": "tenant=acme", "tables_compared": 1, "tables_passed": 1, "tables_warned": 0, "tables_failed": 0, "results": [ { "production_table": "warehouse.staging.orders", "shadow_table": "warehouse.staging.orders_rocky_shadow", "row_count_match": true, "production_count": 15000, "shadow_count": 15000, "row_count_diff_pct": 0.0, "schema_match": true, "schema_diffs": [], "verdict": "pass" } ], "overall_verdict": "pass"}rocky state
Section titled “rocky state”Inspect or manage the embedded state store. rocky state is a subcommand group; bare rocky state continues to display watermarks for backwards compatibility.
rocky state # show watermarks (default)rocky state show # same as bare `rocky state`rocky state clear-schema-cache [--dry-run] # flush the DESCRIBE cacheSubcommands:
| Subcommand | Description |
|---|---|
show (default) | Display stored watermarks. |
clear-schema-cache | Remove every entry from the SCHEMA_CACHE redb table. --dry-run reports what would be removed without touching the store. A missing state store is a no-op (CI-safe on ephemeral runners). Emits ClearSchemaCacheOutput. See rocky state clear-schema-cache. |
State-path resolution (v1.16.0):
When --state-path is not passed, Rocky resolves the state file via rocky_core::state::resolve_state_path:
<models>/.rocky-state.redb— canonical location for new projects; matches the LSP convention so inlay hints observe the same filerocky applywrites.- Legacy
.rocky-state.redbin CWD — still works; emits a one-time deprecation warning on stderr. - Both present — CWD wins (to preserve existing watermarks / branches / partitions); a louder warning asks you to reconcile. Merge is lossy.
- Neither present — fresh project lands on
<models>/.rocky-state.redbwhen amodels/directory exists, otherwise CWD.
Explicit --state-path <PATH> always overrides the resolver.
rocky state behavior (show):
- Reads the redb state store at the resolved path.
- Lists every tracked table with its last watermark value and the timestamp it was recorded.
JSON output:
{ "version": "1.6.0", "command": "state", "watermarks": [ { "table": "acme_warehouse.staging__us_west__shopify.orders", "last_value": "2026-03-30T10:00:00Z", "updated_at": "2026-03-30T10:01:32Z" } ]}Table output:
table | last_value | updated_at─────────────────────────────────────────────────────┼───────────────────────────┼───────────────────────────acme_warehouse.staging__us_west__shopify.orders | 2026-03-30T10:00:00Z | 2026-03-30T10:01:32Zacme_warehouse.staging__us_west__shopify.customers | 2026-03-30T09:55:00Z | 2026-03-30T10:01:32Zrocky snapshot
Section titled “rocky snapshot”Execute an SCD Type 2 snapshot pipeline. Generates and runs MERGE SQL that tracks historical changes to a source table, maintaining valid_from, valid_to, is_current, and snapshot_id columns in the target history table.
rocky snapshot # Run the snapshot pipelinerocky snapshot --dry-run # Preview generated SQL without executingrocky snapshot --pipeline customers_scd # Select a specific pipelineFlags:
| Flag | Description |
|---|---|
--pipeline <NAME> | Pipeline name. Required when more than one pipeline is defined. |
--dry-run | Show generated SQL without executing. |
Pipeline config (rocky.toml):
[pipeline.customers_history]type = "snapshot"unique_key = ["customer_id"]updated_at = "updated_at"invalidate_hard_deletes = true
[pipeline.customers_history.source]adapter = "prod"catalog = "main"schema = "raw"table = "customers"
[pipeline.customers_history.target]adapter = "prod"catalog = "warehouse"schema = "history"table = "customers_history"Strategies:
- Timestamp — detects changes by comparing the
updated_atcolumn between source and target. Efficient when the source maintains a reliable last-modified timestamp. - Check — detects changes by comparing specified columns between source and target. Used when there is no reliable timestamp.
Generated SQL steps:
- Initial load —
CREATE TABLE IF NOT EXISTSwith SCD2 columns added - Close changed rows — MERGE that sets
valid_toandis_current = FALSE - Insert new versions — INSERT for rows that were just closed
- Invalidate hard deletes (optional) — UPDATE rows missing from source
JSON output:
{ "version": "1.6.0", "command": "snapshot", "pipeline": "customers_history", "source": "main.raw.customers", "target": "warehouse.history.customers_history", "dry_run": false, "steps_total": 4, "steps_ok": 4, "steps": [ { "step": "initial_load", "sql": "...", "status": "ok", "duration_ms": 12 }, { "step": "merge_1", "sql": "...", "status": "ok", "duration_ms": 45 } ], "duration_ms": 120}rocky docs
Section titled “rocky docs”Generate project documentation as a single-page HTML catalog. Discovers models from the models directory and renders them with metadata, dependencies, and tests.
rocky docs # Generate to docs/catalog.htmlrocky docs --models models/ --output site/api.html # Custom pathsFlags:
| Flag | Default | Description |
|---|---|---|
--models <PATH> | models | Models directory to scan. |
--output <PATH> | docs/catalog.html | Output HTML file path. |
Behavior:
- Loads all
.sqland.rockymodel files with their TOML sidecars. - Extracts: name, description (from
intent), target table, strategy, dependencies, tests. - Renders a self-contained HTML page with dark theme, search, and model cards.
- No external dependencies — the HTML is fully self-contained.
JSON output:
{ "version": "1.6.0", "command": "docs", "output_path": "docs/catalog.html", "models_count": 12, "pipelines_count": 2, "duration_ms": 15}rocky shell
Section titled “rocky shell”Interactive SQL shell against the configured warehouse. Supports multi-line queries, .tables and .schema meta-commands, and command history.
rocky shell # Use default adapterrocky shell --pipeline prod # Use a specific pipeline's adapterFlags:
| Flag | Description |
|---|---|
--pipeline <NAME> | Pipeline name to select the warehouse adapter. |
Meta-commands:
| Command | Description |
|---|---|
.tables | List tables in the current catalog/schema. |
.schema <table> | Describe columns for a table. |
.quit / .exit | Exit the shell. |
Multi-line queries are supported — end a statement with ; to execute.
rocky watch
Section titled “rocky watch”Watch the models directory for file changes and auto-recompile. Useful during development to get instant feedback on model changes.
rocky watch # Watch models/ directoryrocky watch --models src/models/ # Custom directoryrocky watch --contracts contracts/ # Include contractsFlags:
| Flag | Default | Description |
|---|---|---|
--models <PATH> | models | Models directory to watch. |
--contracts <PATH> | Contracts directory (optional). |
Behavior:
- Uses filesystem notifications (platform-native) to detect changes.
- Debounces rapid changes (waits for writes to settle before recompiling).
- Runs
compileon each change and reports diagnostics to the terminal.
rocky fmt
Section titled “rocky fmt”Format .rocky DSL files. Normalizes indentation, trims trailing whitespace, and enforces consistent style.
rocky fmt # Format all .rocky files in current directoryrocky fmt models/ # Format a specific directoryrocky fmt --check # Check mode: exit non-zero if any file needs formattingFlags:
| Flag | Description |
|---|---|
--check | Check mode for CI — exits non-zero if any file would be reformatted. |
Arguments:
| Argument | Default | Description |
|---|---|---|
paths | . | Files or directories to format. |
rocky compliance
Section titled “rocky compliance”Governance rollup over classification sidecars plus the project [mask] policy. Answers: “are all classified columns masked wherever policy says they should be?” Static resolver — no warehouse calls.
rocky compliance [--env NAME] [--exceptions-only] [--fail-on exception]Flags:
| Flag | Default | Description |
|---|---|---|
--env <NAME> | (expand all) | Scope the report to a single environment (e.g. prod). When unset, the report expands across the defaults plus every [mask.<env>] override block. |
--exceptions-only | false | Filter per_column to rows that produced at least one exception. The exceptions list is unaffected. |
--fail-on <CONDITION> | Gate condition. The only supported value is exception — exits 1 when any exception is emitted. Useful as a CI gate to block merges that leave classified columns unmasked. | |
--models <PATH> | models | Models directory to scan for [classification] sidecars. |
Behavior:
- Walks every model’s
[classification]sidecar block and, for each(model, column, env)triple, resolves the masking strategy from[mask]/[mask.<env>]. MaskStrategy::Nonecounts as masked — an explicit-identity policy is a conscious decision, not an enforcement gap.- Tags listed under
[classifications] allow_unmaskedsuppress exception emission but still reportenforced = falsein the per-column breakdown. - JSON output is
ComplianceOutput(summary/per_column/exceptions).
rocky retention-status
Section titled “rocky retention-status”Report each model’s declared data-retention policy (retention = "<N>[dy]" in the model sidecar).
rocky retention-status [--model NAME] [--drift]Flags:
| Flag | Default | Description |
|---|---|---|
--model <NAME> | (all) | Scope the report to a single model. |
--drift | false | Stretch goal reserved for v2 — today this filters output to models with a declared policy and leaves warehouse_days null. The warehouse probe (SHOW TBLPROPERTIES / SHOW PARAMETERS) is deferred. |
--models <PATH> | models (via rocky.toml) | Models directory. |
Behavior:
- Compiles the project, then emits one
ModelRetentionStatusper model withconfigured_days,warehouse_days(alwaysNonein v1), andin_sync. - Models without a
retentionsidecar value reportconfigured_days = nullandin_sync = true. - JSON output is
RetentionStatusOutput.