JSON Output

Rocky’s JSON output is the interface contract between Rocky and orchestrators such as Dagster. The schema is versioned so that consumers can detect breaking changes. Every command that emits --output json is backed by a typed Rust struct deriving JsonSchema, with autogenerated Pydantic and TypeScript bindings.

Schema Version

Every JSON response includes a top-level version field that tracks the Rocky engine release. It’s set to env!("CARGO_PKG_VERSION") at compile time, so rocky --output json always reports the version of the binary producing the output. Examples on this page pin a representative version string for readability; your actual output will reflect whichever engine version you have installed.

Compatibility contract:

Additive changes (new fields) ship in minor releases and are backward compatible.
Field removals or renames are breaking and only happen in a major release.
Orchestrators should parse defensively and ignore unknown fields.

For machine-readable schemas, the canonical source is schemas/*.schema.json in the repo, exported via rocky export-schemas. The Pydantic (dagster) and TypeScript (vscode) bindings are autogenerated from the same schemas; see the codegen pipeline.

Asset Key Format

Throughout the output, asset_key arrays follow the format:

[source_type, ...component_values, table_name]

For example, a Fivetran source with tenant acme, region us_west, connector shopify, and table orders produces:

["fivetran", "acme", "us_west", "shopify", "orders"]

This key is designed to map directly to orchestrator asset definitions (e.g., Dagster’s AssetKey).

Example payloads

discover (a read) and run (a write) are shown in full below as representative payloads. Every other command returns the same versioned envelope; see Every other command for where each one is documented.

`rocky discover`

Returns all discovered sources and their tables.

{
  "version": "1.6.0",
  "command": "discover",
  "sources": [
    {
      "id": "connector_abc123",
      "components": {
        "tenant": "acme",
        "regions": ["us_west"],
        "source": "shopify"
      },
      "source_type": "fivetran",
      "last_sync_at": "2026-03-30T10:00:00Z",
      "tables": [
        { "name": "orders", "row_count": null },
        { "name": "customers", "row_count": null },
        { "name": "products", "row_count": null }
      ]
    }
  ],
  "failed_sources": [
    {
      "id": "connector_flaky456",
      "schema": "src__acme__us_west__hubspot",
      "source_type": "fivetran",
      "error_class": "transient",
      "message": "schema fetch failed: 503 Service Unavailable"
    }
  ],
  "checks": {
    "freshness": { "threshold_seconds": 86400 }
  },
  "new_sources": ["src__acme__ca_central__shopify"],
  "collision_candidates": [
    {
      "external_object_id": "act_1234567890",
      "sources": ["src__acme__us_west__shopify", "src__acme__eu_central__shopify"]
    }
  ]
}

Field reference:

Field	Type	Description
`sources[].id`	string	Connector identifier from the source system.
`sources[].components`	object	Parsed schema pattern components.
`sources[].source_type`	string	Source type (`"fivetran"` or `"manual"`).
`sources[].last_sync_at`	string or null	ISO 8601 timestamp of the last successful sync. Null if unknown.
`sources[].tables`	array	List of tables in this source.
`sources[].tables[].name`	string	Table name.
`sources[].tables[].row_count`	integer or null	Row count if available, otherwise null.
`failed_sources`	array or absent	Sources the adapter attempted to fetch metadata for and failed on. Absent when empty. Distinct from `sources` (succeeded) and `excluded_tables` (filtered post-success).
`failed_sources[].id`	string	Connector or namespace identifier.
`failed_sources[].schema`	string	Source schema string (when known).
`failed_sources[].source_type`	string	Source type (`"fivetran"`, `"iceberg"`, etc.).
`failed_sources[].error_class`	string	One of `"transient"`, `"timeout"`, `"rate_limit"`, `"auth"`, `"unknown"`. Lets consumers branch on operating-mode without parsing `message`.
`failed_sources[].message`	string	Free-form error detail for human inspection.
`checks`	object or absent	Pipeline-level check configuration, when `[checks]` is declared in `rocky.toml`.
`checks.freshness.threshold_seconds`	integer	Freshness threshold in seconds.
`new_sources`	array or absent	Source schemas seen for the first time since the prior snapshot. Present only when `[…source.discovery] report_new_sources = true`; absent (omitted) otherwise. The first discover of a pipeline establishes the baseline and reports none.
`collision_candidates`	array or absent	Cross-source collisions — the same external object onboarded under more than one schema. Present only when `[…source.discovery] on_collision` is `"warn"` or `"error"`; absent otherwise. With `"error"`, discover also exits non-zero.
`collision_candidates[].external_object_id`	string	The shared external object id (e.g. an ad-account id) found under more than one schema.
`collision_candidates[].sources`	array	The distinct source schemas that resolve to this object id.

Consumers diffing successive discover snapshots must treat ids that appear in failed_sources but not in sources as “unknown state, do not delete”; that’s the contract that distinguishes a fetch failure from a deletion. Available since engine 1.17.4.

new_sources and collision_candidates are the discover-time signals for cross-source duplicate detection; both are opt-in and omitted from the payload when their feature is off.

`rocky run`

Note: the canonical, auditable form is rocky plan followed by rocky apply <plan-id>. The rocky run single-step alias fuses plan + apply into one invocation for local iteration and automation; the JSON output shape below is the same on both apply and run (the command field reflects which verb was invoked).

Returns a complete summary of the pipeline execution.

{
  "version": "1.6.0",
  "command": "run",
  "pipeline_type": "replication",
  "filter": "tenant=acme",
  "duration_ms": 45200,
  "tables_copied": 20,
  "tables_failed": 0,
  "materializations": [
    {
      "asset_key": ["fivetran", "acme", "us_west", "shopify", "orders"],
      "rows_copied": null,
      "duration_ms": 2300,
      "metadata": {
        "strategy": "incremental",
        "watermark": "2026-03-30T10:00:00Z",
        "target_table_full_name": "acme_warehouse.staging__us_west__shopify.orders",
        "sql_hash": "a1b2c3d4e5f67890",
        "column_count": 12,
        "compile_time_ms": 8
      }
    }
  ],
  "check_results": [
    {
      "asset_key": ["fivetran", "acme", "us_west", "shopify", "orders"],
      "checks": [
        {
          "name": "row_count",
          "passed": true,
          "source_count": 15000,
          "target_count": 15000
        },
        {
          "name": "column_match",
          "passed": true,
          "missing": [],
          "extra": []
        },
        {
          "name": "freshness",
          "passed": true,
          "lag_seconds": 300,
          "threshold_seconds": 86400
        }
      ]
    }
  ],
  "permissions": {
    "grants_added": 3,
    "grants_revoked": 0,
    "catalogs_created": 1,
    "schemas_created": 2
  },
  "drift": {
    "tables_checked": 20,
    "tables_drifted": 1,
    "actions_taken": [
      {
        "table": "acme_warehouse.staging__us_west__shopify.line_items",
        "action": "drop_and_recreate",
        "reason": "column 'status' changed STRING -> INT"
      }
    ]
  },
  "execution": {
    "concurrency": 8,
    "tables_processed": 20,
    "tables_failed": 0
  },
  "metrics": {
    "tables_processed": 20,
    "tables_failed": 0,
    "statements_executed": 45,
    "retries_attempted": 1,
    "retries_succeeded": 1,
    "anomalies_detected": 0,
    "table_duration_p50_ms": 1200,
    "table_duration_p95_ms": 4500,
    "table_duration_max_ms": 8200,
    "query_duration_p50_ms": 800,
    "query_duration_p95_ms": 3200,
    "query_duration_max_ms": 7100
  },
  "errors": [],
  "anomalies": []
}

Top-level fields:

Field	Type	Description
`pipeline_type`	string or absent	Pipeline type executed (e.g., `"replication"`).
`filter`	string	The filter applied to this run. Empty string when no filter was set.
`duration_ms`	integer	Total pipeline execution time in milliseconds.
`tables_copied`	integer	Number of tables that were copied (full or incremental).
`tables_failed`	integer	Number of tables that failed during processing.
`tables_skipped`	integer	Number of tables skipped (omitted when 0).
`resumed_from`	string or absent	Run ID this run resumed from, if `--resume` was used.
`shadow`	boolean	True when running in shadow mode (omitted when false).
`errors`	array	Error details for tables that failed. Each entry has `asset_key`, `error`, and a typed `failure_kind` discriminator (kebab-case, e.g. `query-rejected`, `transient`, `compile-error`) so consumers can branch without parsing the free-form string. See Per-table error containment.
`execution`	object	Concurrency and throughput summary.
`metrics`	object or null	Counters and percentile histograms for the run.
`anomalies`	array	Row count anomalies detected by historical baseline comparison.
`partition_summaries`	array	Per-model partition execution summaries (present for `time_interval` models).
`cost_summary`	object or absent	Per-run cost rollup: `total_cost_usd` (float or null), `adapter_type` (string), `total_bytes_scanned` (integer or null), `total_duration_ms` (integer), and `per_model` (array of `{asset_key, duration_ms, cost_usd}`). Absent only for unbilled source adapters (`fivetran`/`airbyte`); present otherwise — including DuckDB, which reports `total_cost_usd` `0`, and billed adapters that computed no cost, where `total_cost_usd` is null. See `[budget]` for how cost limits are enforced.
`budget_breaches`	array	Populated when `[budget]` limits tripped. Each entry has `limit_type` (`"max_usd"` / `"max_duration_ms"` / `"max_bytes_scanned"`), `limit`, and `actual` (both floats). Empty array when within budget or no limits configured.

A transformation model that fails to compile during a run is now a counted failure rather than a silent skip: the model lands on tables_failed, gets an errors[] entry with failure_kind: "compile-error" carrying the diagnostic, and the run reports overall status Failure (or PartialFailure when other models succeeded) with a non-zero exit code (1 or 2). Earlier engine versions skipped the model and still reported success.

materializations[]:

Field	Type	Description
`asset_key`	array of strings	Unique asset identifier.
`rows_copied`	integer or null	Number of rows inserted. Null if the warehouse does not report this.
`duration_ms`	integer	Time spent copying this table in milliseconds.
`metadata.strategy`	string	Replication strategy used (`"incremental"` or `"full_refresh"`).
`metadata.watermark`	string or null	The watermark value after this copy. Null for full refresh.
`metadata.target_table_full_name`	string or absent	Fully-qualified target table (`catalog.schema.table`).
`metadata.sql_hash`	string or absent	16-char hex hash of the generated SQL.
`metadata.column_count`	integer or absent	Number of columns in the materialized table.
`metadata.compile_time_ms`	integer or absent	Compile time in milliseconds for derived models.
`cost_usd`	float or absent	Observed cost of this materialization in USD, computed post-hoc from the adapter’s cost formula. Rolls up into `cost_summary.total_cost_usd` at the run level.
`job_ids`	array of strings	Warehouse-side job IDs for the statements this materialization issued, accumulated alongside `bytes_scanned` / `bytes_written`. Lets orchestrators cross-check rocky-reported figures against the warehouse console (`bq show -j`, Snowflake query history, Databricks SQL warehouse history). Empty `[]` for adapters that don’t surface a job id. Available since engine `1.21.0`.
`partition`	object or absent	Partition window info for `time_interval` materializations.

Cross-checking BigQuery cost against `bq show -j`

job_ids lets operators reconcile rocky’s reported bytes_scanned against BigQuery’s own job statistics.

# Capture the first job id from a run.
rocky run --config rocky.toml --output json \
  | jq -r '.materializations[].job_ids[]' \
  | head -1
# → bquxjob_5f3c4e2a_19a1b6d3e21

# Fetch the same job via the BigQuery REST API and read the billed bytes.
bq show -j --location=EU --format=prettyjson bquxjob_5f3c4e2a_19a1b6d3e21 \
  | jq '.statistics.query.totalBytesBilled'
# → "10485760"

The number returned by bq show -j is the same value the BigQuery console displays under “Bytes billed” and matches materializations[].bytes_scanned in rocky’s JSON output. --location must match the dataset’s region (EU, US, us-east1, …): BigQuery jobs are region-scoped and bq show -j returns Not found: Job if the location is wrong.

check_results[]:

Field	Type	Description
`asset_key`	array of strings	The table this check applies to.
`checks[].name`	string	Check name: `"row_count"`, `"column_match"`, or `"freshness"`.
`checks[].passed`	boolean	Whether the check passed.

Additional fields vary by check type:

row_count: source_count (integer), target_count (integer)
column_match: missing (list of column names missing from target), extra (list of unexpected columns in target)
freshness: lag_seconds (integer), threshold_seconds (integer)

permissions:

Field	Type	Description
`grants_added`	integer	Number of GRANT statements executed.
`grants_revoked`	integer	Number of REVOKE statements executed.
`catalogs_created`	integer	Number of catalogs created during this run.
`schemas_created`	integer	Number of schemas created during this run.

drift:

Field	Type	Description
`tables_checked`	integer	Total tables inspected for schema drift.
`tables_drifted`	integer	Number of tables where drift was detected.
`actions_taken[].table`	string	Fully qualified table name.
`actions_taken[].action`	string	Action taken (e.g., `"drop_and_recreate"`).
`actions_taken[].reason`	string	Human-readable explanation of the drift.

Every other command

Every other --output json command returns the same versioned envelope and is documented in two places:

Machine-readable schemas: schemas/*.schema.json, one per command, exported via rocky export-schemas. These generate the Dagster Pydantic models and the VS Code TypeScript types, so they are the contract to validate against.
Per-command examples: each command’s entry in the CLI Reference and the category pages under Reference → Commands shows a worked JSON example.

The shapes below are the ones consumers tend to get wrong; the full, authoritative shape for each still lives in the generated schemas.

`test` unit-test results

When a model declares a fixture-driven [[test]] block, rocky test --output json carries a unit_tests object alongside the top-level total / passed / failed counts. It’s present only when at least one model declares such a block, and is distinct from declarative (the [[tests]] summary, present only under --declarative).

Field	Type	Description
`unit_tests.total`	integer	Number of fixture-driven unit tests run.
`unit_tests.passed`	integer	Number that passed.
`unit_tests.failed`	integer	Number that failed.
`unit_tests.results`	array	Per-test outcomes.
`unit_tests.results[].model`	string	Model under test.
`unit_tests.results[].test`	string	Test name.
`unit_tests.results[].passed`	boolean	Whether the test passed.
`unit_tests.results[].error`	string or null	Failure message. Null when the test passed.
`unit_tests.results[].mismatches`	array	Row-level diffs between expected and actual output, for diagnosing a failure. Empty on pass. See the generated schema for the per-row shape.

`test` and `ci` failures

On both rocky test and rocky ci, the top-level failures field is an array of objects, each { "name": "...", "error": "..." }, not positional [name, error] tuples. JSON Schema can’t represent positional tuples cleanly, so the engine emits named fields. See test.schema.json and ci.schema.json for the exact shape.

`compile` model tags

rocky compile --output json includes a models_detail[] array, one entry per compiled model. Each entry’s tags object carries the model’s resolved governance tags: the model’s own sidecar [tags] merged over its config-group [tags] baseline, with the sidecar winning per key. So a domain set once on a config group is visible on every member model’s models_detail[].tags without being repeated. The authoritative models_detail[] shape lives in compile.schema.json, and the tag-resolution rules are documented under Group tags.

JSON Output

Schema Version

Asset Key Format

Example payloads

rocky discover

rocky run

Cross-checking BigQuery cost against bq show -j