Skip to content

Introduction

dagster-rocky bridges Rocky’s Rust binary with Dagster orchestration. Rocky is the trust plane — typed compiler, compile-time contracts, column-level lineage, schema drift detection, branches + replay, per-model cost. Dagster is the orchestrator — scheduling, retries, alerts, the asset-centric UI. The guarantees Rocky enforces at compile time surface as native Dagster events (asset checks, materializations, metadata) so the asset graph reflects the same trust contract.

Two ways to wire Rocky into Dagster. Start with the component — it auto-discovers your tables.

Option A — component (defs.yaml):

type: dagster_rocky.RockyComponent
attributes:
binary_path: rocky
config_path: config/rocky.toml
models_dir: models

Option B — resource + asset:

import dagster as dg
from dagster_rocky import RockyResource
rocky = RockyResource(binary_path="rocky", config_path="config/rocky.toml")
@dg.asset
def acme_orders(rocky: RockyResource) -> dg.MaterializeResult:
result = rocky.run(filter="tenant=acme")
return dg.MaterializeResult(
metadata={"tables_copied": result.tables_copied, "duration_ms": result.duration_ms},
)
defs = dg.Definitions(assets=[acme_orders], resources={"rocky": rocky})
SymbolPurpose
RockyResourceConfigurableResource wrapping the CLI; 25+ methods; three run modes (buffered, streaming, Pipes)
RockyComponentState-backed component that caches discovery; dag_mode=True builds connected asset graphs
RockyDagsterTranslatorCustomize asset keys, groups, tags, and metadata per Rocky table
load_rocky_assets()Returns one AssetSpec per enabled Rocky table
emit_check_results() / emit_materializations()Convert Rocky results into Dagster events

The integration follows a simple pattern:

  1. Dagster calls the rocky binary via subprocess (e.g., rocky discover --output json).
  2. Rocky executes against your warehouse and sources, returning structured JSON.
  3. dagster-rocky parses that JSON into Pydantic models.
  4. The models are translated into Dagster events (asset materializations, check results, etc.).

Rocky handles the SQL transformation layer: DAG resolution, incremental logic, SQL generation, schema drift detection, and permission reconciliation. Dagster handles everything around it: scheduling, retries, alerting, lineage visualization, and operational monitoring.

  • dagster >= 1.13.0
  • pydantic >= 2.0
  • pygments >= 2.20.0
  • The rocky binary must be available on PATH (or configured via binary_path). For deployment, you can vendor the binary under a vendor/ directory and point binary_path to it.

RockyResource exposes one Python method per Rocky CLI command. The full set includes:

  • Core Pipelinediscover, plan, run, run_model, run_streaming, run_pipes, state, resume_run
  • DAGdag (full unified DAG with enriched metadata)
  • Modelingcompile, lineage, test, ci
  • AIai, ai_sync, ai_explain, ai_test
  • Observabilityhistory, metrics, optimize
  • Diagnosticsdoctor, validate_migration, test_adapter
  • Hookshooks_list, hooks_test

See the RockyResource page for full method signatures and details.