Getting Started with the Playground

The Rocky playground creates a self-contained sample project that runs entirely on DuckDB. No warehouse credentials, no Fivetran account, no external services. It is the quickest way to try Rocky’s compiler, type system, lineage engine, and AI features.

1. Create the Playground

rocky playground my-project

This creates a directory with everything you need:

my-project/
├── rocky.toml                           # DuckDB pipeline config
├── models/
│   ├── raw_orders.sql                   # SQL replication model
│   ├── raw_orders.toml                  # Model config
│   ├── customer_orders.rocky            # Rocky DSL transformation
│   ├── customer_orders.toml             # Model config
│   ├── revenue_summary.sql              # SQL transformation
│   └── revenue_summary.toml             # Model config
├── contracts/
│   └── revenue_summary.contract.toml    # Data contract
└── data/
    └── seed.sql                         # DuckDB seed data

The default template is quickstart (3 models). Two larger templates are available for exploring more features:

rocky playground my-project --template ecommerce   # 10 models, sources/staging/intermediate/marts
rocky playground my-project --template showcase    # ecommerce + Rocky DSL + extra contracts

Enter the project directory:

cd my-project

2. Explore the Generated Files

rocky.toml

The pipeline config uses a local DuckDB adapter instead of Databricks. No credentials are required:

[adapter]
type = "duckdb"
path = "playground.duckdb"

[pipeline.playground]
type = "transformation"
models = "models/**"

# Models land in the database's default schema (`playground.main`), so a model
# can reference an upstream by name (`from raw_orders`) and it resolves both
# when materialized by `rocky run` and in the in-memory `rocky test` run.
[pipeline.playground.target.governance]
auto_create_schemas = true

[pipeline.playground.execution]
concurrency = 4

This is a transformation pipeline: the unnamed [adapter] block auto-wraps as the default adapter, and models = "models/**" builds the model DAG from the models/ directory.

The same playground.duckdb file backs the whole run. rocky test ignores path and runs models against an in-memory database, auto-loading data/seed.sql on the fly. rocky run materializes models into the file – which rocky playground already seeded at creation time – so the playground works end-to-end with no manual seeding step. (Re-seed the file at any time with duckdb playground.duckdb < data/seed.sql.)

raw_orders.sql + raw_orders.toml

The first model selects raw order data from the seeded source table:

raw_orders.sql:

SELECT
    order_id,
    customer_id,
    product_id,
    amount,
    status,
    order_date
FROM raw__orders.orders

The data/seed.sql file creates the raw__orders.orders table this model reads from.

raw_orders.toml:

[strategy]
type = "full_refresh"

[target]
catalog = "playground"
schema = "main"

customer_orders.rocky

This model uses the Rocky DSL – a concise syntax for common aggregation patterns:

-- Customer orders aggregation (Rocky DSL)
from raw_orders
where status != "cancelled"
group customer_id {
    total_revenue: sum(amount),
    order_count: count(),
    first_order: min(order_date)
}
where total_revenue > 0

The Rocky DSL compiles to standard SQL. The compiler type-checks column references, validates aggregation semantics, and resolves the raw_orders dependency automatically.

revenue_summary.sql

A standard SQL transformation that builds on customer_orders:

SELECT
    customer_id,
    total_revenue,
    order_count,
    total_revenue / order_count AS avg_order_value,
    first_order
FROM customer_orders
WHERE order_count >= 2

revenue_summary.contract.toml

A data contract that enforces the output schema of revenue_summary:

# Loose contract suitable for the playground.
# Type checker can't infer non-null from `SELECT col FROM raw__orders.orders`
# (the source schema is unknown to the compiler), so columns are declared
# nullable here to keep `rocky compile --contracts contracts` clean.

[[columns]]
name = "customer_id"
type = "Int64"
nullable = true

[[columns]]
name = "total_revenue"
type = "Decimal"
nullable = true

[[columns]]
name = "order_count"
type = "Int64"
nullable = true

[rules]
required = ["customer_id", "total_revenue", "order_count"]
protected = ["customer_id"]

The contract declares two rules:

required: These columns must exist with the specified types. The compiler fails if they are missing or have the wrong type.
protected: These columns cannot be removed in future changes. The compiler fails compilation (error E013) if a protected column disappears from the model’s output.

The columns are marked nullable = true (the TOML comment above explains why). A strict-contract walkthrough that pins types and nullability lives in the dedicated POCs.

3. Compile the Models

Run the compiler to type-check all models, resolve dependencies, and validate contracts. Pass --contracts contracts to check the model output against the contract files (plain rocky compile skips contract validation):

rocky compile --contracts contracts

Expected output:

  ✓ raw_orders (6 columns)
  ✓ customer_orders (4 columns)
  ✓ revenue_summary (5 columns)

  Compiled: 3 models, 0 errors, 0 warnings

The compiler performs several checks:

Dependency resolution: Builds a DAG from model configs. customer_orders depends on raw_orders; revenue_summary depends on customer_orders.
Type inference: Resolves column types through the chain. amount in raw_orders propagates through sum(amount) in customer_orders to total_revenue / order_count in revenue_summary.
Contract validation: Checks that revenue_summary outputs customer_id (Int64), total_revenue (Decimal), and order_count (Int64) as required by the contract.

Try introducing an error

Edit revenue_summary.sql and drop a required column – remove the order_count line from the SELECT:

SELECT
    customer_id,
    total_revenue,
    total_revenue / order_count AS avg_order_value,
    first_order
FROM customer_orders
WHERE order_count >= 2

Run rocky compile --contracts contracts again:

  ✓ raw_orders (6 columns)
  ✓ customer_orders (4 columns)
  ✗ revenue_summary
  x error[E010]: required column 'order_count' missing from model output
  help: add `order_count` to the SELECT, or remove it from `[rules] required`

  Compiled: 3 models, 1 errors, 0 warnings

Revert the change before continuing.

4. Run the Tests

Rocky can execute models locally using DuckDB without any warehouse connection:

rocky test

Expected output:

Testing 3 models...

  All 3 models passed

  Result: 3 passed, 0 failed

The test runner:

Compiles all models
Executes each model’s SQL against DuckDB in dependency order
Validates contract assertions against the actual output (when run with --contracts contracts)
Reports pass/fail for each model

Test a single model

rocky test --model revenue_summary

5. View Column Lineage

Rocky traces data flow at the column level. See the full lineage for a model:

rocky lineage revenue_summary

Model: revenue_summary
Upstream: customer_orders
Downstream:

Columns:
  customer_id <- customer_orders.customer_id (direct)
  total_revenue <- customer_orders.total_revenue (direct)
  order_count <- customer_orders.order_count (direct)
  avg_order_value (no lineage)
  first_order <- customer_orders.first_order (direct)

avg_order_value shows (no lineage): it is a computed expression (total_revenue / order_count), and the column tracer only follows columns that pass through directly or via an aggregation.

Trace a single column through the entire chain

rocky lineage revenue_summary --column total_revenue

Column trace: revenue_summary.total_revenue
  <- customer_orders.total_revenue (direct)
    <- raw_orders.amount (aggregation: sum)
      <- raw__orders.orders.amount (direct)

Generate Graphviz output

rocky lineage revenue_summary --format dot

digraph lineage {
  rankdir=LR;
  "customer_orders.customer_id" -> "revenue_summary.customer_id";
  "customer_orders.total_revenue" -> "revenue_summary.total_revenue";
  "customer_orders.order_count" -> "revenue_summary.order_count";
  "customer_orders.first_order" -> "revenue_summary.first_order";
}

Pipe this to Graphviz to generate an SVG: rocky lineage revenue_summary --format dot | dot -Tsvg -o lineage.svg

6. Try AI Features

If you have an Anthropic API key, you can generate models from natural language:

export ANTHROPIC_API_KEY="sk-ant-..."

Generate a new model

rocky ai "monthly revenue per customer from raw_orders, only completed orders"

Rocky sends your intent to Claude, receives generated code, and compiles it to verify correctness. If compilation fails, it retries with the error context (up to 3 attempts).

Add intent to existing models

rocky ai-explain --all --save

This reads each model’s SQL, generates a plain-English description, and saves it to the model’s TOML config as an intent field. The intent is used later by ai-sync to automatically update models when upstream schemas change.

Generate tests from intent

rocky ai-test --all --save

Generates test assertions based on each model’s SQL logic and intent description, and saves them to the tests/ directory.

See the AI Features guide for a complete walkthrough.

7. Run CI Locally

The ci command combines compilation and testing into a single pass with an exit code suitable for CI pipelines:

rocky ci

Rocky CI Pipeline

  Compile: PASS (3 models)
  Test:    PASS (3 passed, 0 failed)

  Exit code: 0

Exit code 0 means all checks passed. A non-zero exit code fails the CI job.

8. Explore the POC Catalog

Beyond this walkthrough, the playground ships 99 self-contained POCs across 8 categories, each demonstrating one Rocky capability end-to-end. Browse them, grouped by category with links to every one, in the Examples & POC catalog. Or run one directly:

cd examples/playground
./pocs/02-performance/01-incremental-watermark/run.sh

Most run on local DuckDB with no credentials. The exceptions: AI (ANTHROPIC_API_KEY), Governance (a Databricks workspace), and the warehouse-specific adapter POCs.

9. Benchmarks

The playground includes a benchmark suite comparing Rocky against dbt-core, dbt-fusion, and PySpark:

cd examples/playground/benchmarks
make bench

Headline (10k models): Rocky compiles in 1.00s, 34x faster than dbt-core and 38x faster than dbt-fusion, with 4-7x less memory. See Benchmarks for the methodology and the 50k extrapolation.

Next Steps

Migrating from dbt – import an existing dbt project
IDE Setup – install the VS Code extension for hover types, go-to-definition, and inline lineage
CI/CD Integration – add Rocky to your GitHub Actions or GitLab CI pipeline
AI Features – generate models, sync schema changes, and create tests with AI
Data Governance – configure contracts, permissions, and quality checks