Introduction
dagster-rocky bridges Rocky’s Rust binary with Dagster orchestration. Rocky is the trust plane — typed compiler, compile-time contracts, column-level lineage, schema drift detection, branches + replay, per-model cost. Dagster is the orchestrator — scheduling, retries, alerts, the asset-centric UI. The guarantees Rocky enforces at compile time surface as native Dagster events (asset checks, materializations, metadata) so the asset graph reflects the same trust contract.
Quick start
Section titled “Quick start”Two ways to wire Rocky into Dagster. Start with the component — it auto-discovers your tables.
Option A — component (defs.yaml):
type: dagster_rocky.RockyComponentattributes: binary_path: rocky config_path: config/rocky.toml models_dir: modelsOption B — resource + asset:
import dagster as dgfrom dagster_rocky import RockyResource
rocky = RockyResource(binary_path="rocky", config_path="config/rocky.toml")
@dg.assetdef acme_orders(rocky: RockyResource) -> dg.MaterializeResult: result = rocky.run(filter="tenant=acme") return dg.MaterializeResult( metadata={"tables_copied": result.tables_copied, "duration_ms": result.duration_ms}, )
defs = dg.Definitions(assets=[acme_orders], resources={"rocky": rocky})What it provides
Section titled “What it provides”| Symbol | Purpose |
|---|---|
RockyResource | ConfigurableResource wrapping the CLI; 25+ methods; three run modes (buffered, streaming, Pipes) |
RockyComponent | State-backed component that caches discovery; dag_mode=True builds connected asset graphs |
RockyDagsterTranslator | Customize asset keys, groups, tags, and metadata per Rocky table |
load_rocky_assets() | Returns one AssetSpec per enabled Rocky table |
emit_check_results() / emit_materializations() | Convert Rocky results into Dagster events |
Architecture
Section titled “Architecture”The integration follows a simple pattern:
- Dagster calls the
rockybinary via subprocess (e.g.,rocky discover --output json). - Rocky executes against your warehouse and sources, returning structured JSON.
dagster-rockyparses that JSON into Pydantic models.- The models are translated into Dagster events (asset materializations, check results, etc.).
Rocky handles the SQL transformation layer: DAG resolution, incremental logic, SQL generation, schema drift detection, and permission reconciliation. Dagster handles everything around it: scheduling, retries, alerting, lineage visualization, and operational monitoring.
Requirements
Section titled “Requirements”dagster >= 1.13.0pydantic >= 2.0pygments >= 2.20.0- The
rockybinary must be available onPATH(or configured viabinary_path). For deployment, you can vendor the binary under avendor/directory and pointbinary_pathto it.
CLI methods on the resource
Section titled “CLI methods on the resource”RockyResource exposes one Python method per Rocky CLI command. The full set includes:
- Core Pipeline —
discover,plan,run,run_model,run_streaming,run_pipes,state,resume_run - DAG —
dag(full unified DAG with enriched metadata) - Modeling —
compile,lineage,test,ci - AI —
ai,ai_sync,ai_explain,ai_test - Observability —
history,metrics,optimize - Diagnostics —
doctor,validate_migration,test_adapter - Hooks —
hooks_list,hooks_test
See the RockyResource page for full method signatures and details.