Skip to content

All Features

WarehouseStatusAuth methods
DatabricksProductionUnity Catalog, SQL Statement API, OAuth M2M
SnowflakeBetaKey-pair JWT, OAuth, password
BigQueryBetaService account, application default
DuckDBProductionEmbedded (local dev/test, no credentials)

Source adapters: Fivetran (REST API discovery), DuckDB (information_schema), Manual (config-defined).

StrategyDescription
full_refreshDrop and recreate table (default)
incrementalAppend past a timestamp watermark
mergeUpsert via MERGE INTO with unique key
time_intervalPartition-keyed with per-partition state, lookback, parallel execution
snapshotSCD Type 2 with valid_from/valid_to and hard delete tracking
materialized_viewWarehouse-managed materialized view (Databricks)
dynamic_tableSnowflake dynamic table with lag-based refresh
ephemeralInlined as CTE in downstream queries, no table created
microbatchAlias for time_interval with hourly defaults (dbt-compatible)
delete_insertDelete by partition key, then insert fresh data

Partition-keyed materialization with:

  • --partition KEY — run a single partition
  • --from / --to — run a date range
  • --latest — run the partition containing now()
  • --missing — discover and fill gaps from state store
  • --lookback N — recompute N previous partitions for late-arriving data
  • --parallel N — concurrent partition processing
  • Per-partition state tracking (Computed / Failed / InProgress)
  • Atomic writes: Databricks uses INSERT INTO ... REPLACE WHERE; Snowflake/DuckDB use transactional DELETE + INSERT
  • Static type inference across the full DAG at compile time
  • Column type tracking through JOINs, GROUP BYs, window functions
  • 35+ diagnostic codes (E001-E026, W001-W003, W010-W011, E010-E013, V001-V020) with actionable fix suggestions
  • Safe type widening detection: INT → BIGINT, FLOAT → DOUBLE, VARCHAR expansion — handled via zero-copy ALTER TABLE
  • NULL-safe equality: != compiles to IS DISTINCT FROM
  • SELECT * expansion with deduplication
  • Parallel type checking via rayon across DAG execution layers
  • Data contracts with required columns, protected columns (prevent removal), allowed type changes (widening whitelist), and nullability constraints
  • Computed at compile time — no warehouse query, no catalog rebuild
  • Per-column trace through SQL and Rocky DSL transformations
  • Transform tracking: Direct, Cast, Aggregation, Expression
  • Output formats: JSON, Graphviz DOT, human-readable text
  • CLI: rocky lineage model.column
  • Automatic detection at compile time
  • Safe type widening: ALTER TABLE for compatible changes (INT → BIGINT, FLOAT → DOUBLE, VARCHAR expansion, numeric → STRING)
  • Graduated response: ALTER for safe changes, full refresh for unsafe changes
  • Column addition/removal detected with recommended actions
  • Shadow mode: Write to _rocky_shadow tables for validation before production
CheckDescription
Row countSource vs target row count match
Column matchMissing/extra column detection (case-insensitive, with exclusion list)
FreshnessTimestamp lag against configurable threshold
Null ratePer-column null percentage with TABLESAMPLE support
Custom SQLThreshold-based checks with {target} placeholder
Anomaly detectionRow count deviation from historical baselines

All checks run inline during pipeline execution, not as a separate step.

Pipeline-oriented syntax that compiles to standard SQL:

StepDescription
fromLoad from source model or table
whereFilter rows
deriveCreate computed columns
groupAggregate with grouping
selectProject columns
joinJoin another model
sortOrder rows
takeLimit row count
distinctRemove duplicates

Plus: window functions with PARTITION BY / ORDER BY / frame specs, match expressions (→ CASE WHEN), date literals (@2025-01-01), IS NULL / IS NOT NULL, IN lists.

init · validate · discover · plan · run · compare · state

compile · lineage · test · ci · export-schemas

ai (generate) · ai-sync · ai-explain · ai-test

playground · serve · lsp · init-adapter · test-adapter · import-dbt · validate-migration

doctor · history · metrics · optimize · compact · profile-storage · archive · bench · hooks list · hooks test

Full LSP implementation via rocky lsp:

  • Diagnostics (live compile errors/warnings)
  • Hover (column types and lineage)
  • Go to Definition
  • Find References
  • Completions (models, columns, functions)
  • Rename (across all files)
  • Code Actions (quick fixes)
  • Inlay Hints (inline type annotations)
  • Semantic Tokens (syntax highlighting)
  • Signature Help (function parameters)
  • Document Symbols (outline)

Published VS Code extension with TextMate grammar + semantic tokens.

  • dagster-rocky package with RockyResource and RockyComponent
  • Auto-discovery: Rocky discover → Dagster asset definitions
  • Dagster Pipes protocol: Hand-rolled emitter (no external dependency) — reports materializations, check results, drift observations, anomaly alerts
  • 28 typed JSON output schemas with auto-generated Pydantic v2 models and TypeScript interfaces via rocky export-schemas
  • Freshness policies auto-attached from [checks.freshness] config
  • Column lineage attached to derived model asset metadata
  • Catalog lifecycle: Auto-create catalogs with configurable tags
  • Schema lifecycle: Auto-create schemas with tags
  • RBAC: 6 permission types with declarative GRANT management
  • Permission reconciliation: SHOW GRANTS → diff → GRANT/REVOKE
  • Workspace isolation: Databricks catalog binding (READ_WRITE, READ_ONLY)
  • Resource tagging: Component-derived tags (client, region, source) + static tags
  • Multi-tenant patterns: Schema pattern routing (src__client__region__connector)

Embedded redb state store (.rocky-state.redb) with:

TablePurpose
WATERMARKSPer-table incremental progress
PARTITIONSPer-partition lifecycle (Computed/Failed/InProgress)
CHECK_HISTORYRow count snapshots for anomaly detection
RUN_HISTORYFull run execution records
QUALITY_HISTORYCheck results and metrics
DAG_SNAPSHOTSCompilation snapshots
RUN_PROGRESSIn-flight run progress (checkpoint/resume)

Remote sync to S3 or Valkey/Redis (tiered backend). No manifest file.

Configurable shell commands or webhooks triggered on:

  • Pipeline: start, discover_complete, compile_complete, complete, error
  • Table: before_materialize, after_materialize, materialize_error
  • Model: before_model_run, after_model_run, model_error
  • Checks: before_checks, check_result, after_checks
  • State: drift_detected, anomaly_detected, state_synced

Each hook supports command, timeout_ms, and on_failure behavior. Webhook hooks send POST with JSON templates.

Powered by Claude (via ANTHROPIC_API_KEY):

CommandDescription
rocky ai <intent>Generate a model from natural language
rocky ai-syncDetect schema changes and update models guided by stored intent
rocky ai-explainGenerate intent description from existing code
rocky ai-testGenerate test assertions from intent

All AI commands are CLI-native, not cloud-dependent. Intent is stored as metadata in model TOML files.

Metric (10k models)Value
Compile1.00 s
Per-model cost100 µs
Peak memory147 MB
Lineage0.84 s
Warm compile0.72 s
Startup14 ms
Config validation15 ms
SQL generation200 ms (50k tables/sec)

Linear scaling verified from 1k to 50k models. See benchmarks for full comparison with dbt-core and dbt-fusion.