Skip to content

Introduction

Rocky is a compiled SQL transformation engine written in Rust. It replaces dbt’s core responsibilities — DAG resolution, incremental logic, SQL generation, and schema management — with a type-safe, config-driven approach.

No Jinja. No manifest. No parse step.

Rocky follows the ELT pattern — it operates on data already in your warehouse, landed by ingestion tools like Fivetran or Airbyte. Rocky handles the T (transformation and replication within the warehouse), not the E or L.

Traditional tools like dbt work well at small scale, but introduce significant overhead as pipelines grow:

  • Slow startup: dbt takes 30-60 seconds just to start, then 2-3 minutes to parse 500 models
  • Memory-hungry: A mid-size dbt project consumes 500MB-2GB of RAM
  • Jinja complexity: Business logic buried in template macros is hard to test and debug
  • Repetitive staging: Writing SELECT * FROM {{ source(...) }} for every source table

Rocky eliminates these problems by compiling to a single binary that starts in under 100ms, parses instantly, and uses pure SQL instead of Jinja templates.

  1. Adapter-based architecture: Source adapters (Fivetran, DuckDB, manual) handle discovery; warehouse adapters (Databricks, Snowflake, DuckDB) handle execution. The core engine is warehouse-agnostic.
  2. Config over code: The bronze layer (source replication) is driven entirely by rocky.toml — no SQL files needed for 1:1 copies
  3. Pure SQL: Transformation models use standard SQL with TOML configuration — no templating language
  4. Inline quality checks: Data checks run during replication, not as a separate step
  5. Structured output: All CLI output is versioned JSON, designed for orchestrator integration
  6. Embedded state: Watermarks stored in a local redb database, with optional remote sync to S3 or Valkey
ConceptdbtRocky
Project configdbt_project.yml + profiles.ymlrocky.toml
Connection configprofiles.yml[adapter.NAME] blocks in rocky.toml
Sourcesschema.yml with source() macroAuto-discovered (Fivetran, DuckDB, manual)
Staging modelsOne .sql file per source tableConfig-driven bronze layer (zero SQL)
Transformation modelsSQL + Jinja {{ ref() }}SQL + TOML config
Materialization config{{ config(materialized='...') }}[strategy] in TOML
Dependencies{{ ref('model') }}depends_on = ["model"]
MacrosJinja macrosNot needed — pure SQL
SeedsCSV files loaded as tablesNot supported (out of scope)
SnapshotsSCD Type 2 via snapshotsmerge strategy with unique key
Testsschema.yml testsInline checks in rocky.toml
Compiledbt compilerocky plan
Rundbt runrocky run
Test separatelydbt testBuilt into rocky run
Statemanifest.json + target/Embedded redb database

Rocky’s adapter model separates where data is discovered from where data is processed:

RoleAdapterDescription
SourceFivetranCalls Fivetran REST API to discover connectors and enabled tables
SourceDuckDBLists schemas and tables from a local DuckDB database via information_schema
SourceManualReads schema/table definitions from rocky.toml config
WarehouseDatabricksExecutes SQL via SQL Statement API, manages Unity Catalog governance
WarehouseSnowflakeExecutes SQL via Snowflake REST API; OAuth, key-pair JWT, and password auth
WarehouseBigQueryExecutes SQL via BigQuery API; service account and application default auth (Beta)
WarehouseDuckDBLocal in-process execution — runs rocky validate/discover/plan/run end-to-end with no credentials

Source adapters are metadata-only — they identify what schemas and tables exist, they do not extract or move data. The actual data must already be in the warehouse, landed by an ingestion tool.

A single DuckDB adapter instance can act as both source and warehouse, which is how the playground and credential-free examples run end-to-end.

Rocky is designed for extensibility — new source and warehouse adapters can be added through the Adapter SDK without modifying the core engine.

Rocky lives in a single monorepo with four subprojects:

PathLanguageWhat it is
engine/RustCore CLI + engine (20-crate Cargo workspace)
integrations/dagster/PythonDagster integration (dagster-rocky package)
editors/vscode/TypeScriptVS Code extension (LSP client + syntax highlighting)
examples/playground/Config onlySelf-contained POC catalog (28 POCs) + benchmark suite

Rocky’s engine is a Cargo workspace with 20 crates:

  • rocky-core — Warehouse-agnostic transformation engine: IR, schema patterns, SQL generation, checks, state, hooks
  • rocky-sql — SQL parsing and validation (built on sqlparser-rs)
  • rocky-lang — Rocky DSL lexer and parser (.rocky files)
  • rocky-compiler — Type checking, semantic analysis, contract validation, diagnostics
  • rocky-adapter-sdk — Traits and conformance tests for building custom warehouse adapters
  • rocky-databricks — Databricks warehouse adapter: SQL Statement API, Unity Catalog governance, adaptive concurrency (AIMD)
  • rocky-snowflake — Snowflake warehouse adapter: REST API, OAuth + key-pair JWT + password auth
  • rocky-duckdb — DuckDB warehouse + discovery adapter, also used by rocky test and the bundled E2E suite
  • rocky-fivetran — Fivetran source adapter: REST API discovery (metadata only), schema config, sync detection
  • rocky-engine — Local query engine (DataFusion + Arrow) used for type inference and rocky test
  • rocky-server — HTTP API and LSP server (rocky serve / rocky lsp)
  • rocky-cache — Three-tier caching (memory, Valkey, API)
  • rocky-ai — AI intent layer (explain, sync, test, generate)
  • rocky-observe — Metrics, event bus, and structured JSON logging
  • rocky-bigquery — BigQuery warehouse adapter (connector, auth, dialect) (Beta)
  • rocky-airbyte — Airbyte source adapter (protocol integration)
  • rocky-iceberg — Apache Iceberg table format adapter (metadata + snapshot management)
  • rocky-cli — CLI commands, output formatting, Dagster Pipes protocol
  • rocky-wasm — WebAssembly exports for browser/edge execution
  • rocky — The binary crate

Rocky is licensed under Apache 2.0.