AI Features
Rocky includes an AI layer that uses Claude to generate models, explain existing code, propagate schema changes, and create test assertions. Every AI-generated artifact passes through the compiler before it can be used — the compile-verify loop ensures correctness regardless of LLM output quality.
1. Setup
Section titled “1. Setup”Rocky’s AI features require an Anthropic API key. Set it in your environment:
export ANTHROPIC_API_KEY="sk-ant-..."Add this to your shell profile (~/.zshrc, ~/.bashrc) for persistence:
echo 'export ANTHROPIC_API_KEY="sk-ant-..."' >> ~/.zshrcsource ~/.zshrcRocky uses claude-sonnet-4-6 by default. No additional configuration is needed.
Verify the setup
Section titled “Verify the setup”rocky ai "hello world model that selects 1 as id"If the API key is missing or invalid, you will see:
Error: ANTHROPIC_API_KEY not set. Set it to use `rocky ai`.2. Generate a Model from Intent
Section titled “2. Generate a Model from Intent”The rocky ai command generates a model from a natural language description:
rocky ai "monthly revenue by product category from orders and products, completed orders only"Output:
Generated model: monthly_category_revenue (rocky)Attempts: 1
from ordersjoin products on orders.product_id = products.product_idwhere status = "completed"group extract(month from order_date), category { month: extract(month from order_date), category: category, total_revenue: sum(quantity * unit_price), order_count: count()}Choose the output format
Section titled “Choose the output format”By default, Rocky generates Rocky DSL. Use --format sql for standard SQL:
rocky ai "monthly revenue by product category" --format sqlSELECT DATE_TRUNC('month', o.order_date) AS month, p.category, SUM(o.quantity * o.unit_price) AS total_revenue, COUNT(*) AS order_countFROM orders oJOIN products p ON o.product_id = p.product_idWHERE o.status = 'completed'GROUP BY DATE_TRUNC('month', o.order_date), p.categoryJSON output
Section titled “JSON output”For programmatic use:
rocky ai "monthly revenue by category" -o json{ "version": "0.3.0", "command": "ai", "intent": "monthly revenue by category", "format": "rocky", "name": "monthly_category_revenue", "source": "from orders\njoin products on ...", "attempts": 1}3. The Compile-Verify Loop
Section titled “3. The Compile-Verify Loop”This is Rocky’s key safety mechanism for AI-generated code. Every generated model is compiled before it is shown to you:
Intent | vLLM generates code | vCompiler type-checks ──> Pass? ──> Output to user | | Fail vError feedback sent to LLM | vLLM retries with error context | vCompiler type-checks again (up to 3 attempts)The compiler catches:
- Syntax errors: Invalid SQL or Rocky DSL syntax
- Type mismatches: Column used as wrong type (e.g., string compared to integer)
- Missing references: Column or table does not exist in the project
- Invalid functions: Unrecognized SQL functions or wrong argument counts
If all attempts fail, Rocky reports the best attempt and the remaining errors. No AI-generated code reaches your warehouse without passing the type checker.
Why this matters
Section titled “Why this matters”LLMs occasionally generate plausible-looking SQL that is semantically wrong — a column name that does not exist, a JOIN on mismatched types, or an aggregation without a GROUP BY. The compile-verify loop catches these errors before you see the output.
4. Add Intent to Existing Models
Section titled “4. Add Intent to Existing Models”Intent is a natural language description stored in the model’s TOML config. It serves two purposes:
- Documentation: Explains what the model does in business terms
- AI context: Powers
ai-syncandai-testwith understanding of the model’s purpose
Write intent manually
Section titled “Write intent manually”Add an intent field to any model’s TOML config:
name = "fct_daily_revenue"intent = """Calculate daily revenue by product category.Join orders with products, filter to completed orders only.Revenue is quantity * unit_price after discounts.Grain: one row per date per category."""depends_on = ["stg_orders", "dim_products"]
[strategy]type = "incremental"timestamp_column = "order_date"
[target]catalog = "analytics"schema = "warehouse"table = "fct_daily_revenue"Generate intent with AI
Section titled “Generate intent with AI”Instead of writing intent manually, let Rocky analyze your SQL and generate it:
# Explain a single modelrocky ai-explain fct_daily_revenue --saveSaved intent for fct_daily_revenue: Calculate daily revenue aggregated by productcategory. Joins staging orders with product dimension on product_id. Filters tocompleted orders only. Revenue computed as quantity * unit_price * (1 - discount).Grain: one row per order_date per category.Bulk-generate intent for all models
Section titled “Bulk-generate intent for all models”rocky ai-explain --all --save --models modelsThis processes every model that does not already have an intent field. Models that already have intent are skipped. The --save flag writes the generated description directly to each model’s .toml sidecar.
Review without saving
Section titled “Review without saving”Omit --save to preview the generated intent without modifying files:
rocky ai-explain --all --models modelsfct_daily_revenue: Calculate daily revenue aggregated by product category...dim_customers: Customer dimension with lifetime value and segment classification...stg_orders: Stage raw Shopify orders selecting order_id, customer, date, amount...Review and refine the descriptions before saving. The AI-generated intent is a starting point — you know your business domain better than the LLM.
5. Schema Change Sync
Section titled “5. Schema Change Sync”When upstream schemas change (columns renamed, types changed, new columns added), rocky ai-sync proposes updates to downstream models based on their stored intent.
Preview proposals (dry run)
Section titled “Preview proposals (dry run)”rocky ai-sync --models modelsModel: fct_daily_revenue (intent: "Calculate daily revenue by product category...")- unit_price -> unit_price_local revenue = quantity * unit_price * (1 - discount) revenue = quantity * unit_price_local * (1 - discount)
Run with --apply to update models.Apply proposals
Section titled “Apply proposals”rocky ai-sync --models models --applyUpdated: models/fct_daily_revenue.sqlUpdated: models/fct_monthly_summary.sqlFilter to a specific model
Section titled “Filter to a specific model”rocky ai-sync --models models --model fct_daily_revenueOnly process models with intent
Section titled “Only process models with intent”rocky ai-sync --models models --with-intentModels without intent are skipped because the AI has no context to make intelligent update proposals.
How sync works
Section titled “How sync works”- Rocky compiles the project and builds the semantic graph
- For each model with intent, it identifies upstream schema changes (renamed columns, type changes, new columns)
- The LLM receives the model’s SQL, its intent, and the upstream changes
- The LLM proposes a minimal diff that preserves the model’s intent while adapting to the schema change
- The proposed code is compiled to verify correctness
- With
--apply, the updated SQL is written back to the file
Example scenario
Section titled “Example scenario”Suppose an upstream model stg_orders renames unit_price to unit_price_local as part of a currency normalization effort:
-
rocky compilefails — downstream models referenceunit_pricewhich no longer exists -
rocky ai-sync --models modelsdetects the rename and proposes updates:--- models/fct_daily_revenue.sql+++ models/fct_daily_revenue.sql@@ -4,7 +4,7 @@SELECTo.order_date,p.category,- SUM(o.quantity * o.unit_price * (1 - o.discount)) as revenue,+ SUM(o.quantity * o.unit_price_local * (1 - o.discount)) as revenue,COUNT(*) as order_countFROM stg_orders o -
Review the diff and apply with
--apply -
rocky compilepasses again
6. Generate Test Assertions
Section titled “6. Generate Test Assertions”rocky ai-test generates test assertions based on each model’s SQL logic and intent:
rocky ai-test fct_daily_revenueTests for fct_daily_revenue: - grain_uniqueness: No duplicate rows per date and category - revenue_positive: Revenue should be non-negative for completed orders - no_future_dates: Order dates should not be in the futureSave tests to disk
Section titled “Save tests to disk”rocky ai-test fct_daily_revenue --saveSaved 3 tests for fct_daily_revenueTests are saved to the tests/ directory as SQL files that return 0 rows on success:
-- tests/fct_daily_revenue/grain_uniqueness.sql-- No duplicate rows per date and categorySELECT order_date, category, COUNT(*) as nFROM fct_daily_revenueGROUP BY order_date, categoryHAVING n > 1Generate tests for all models
Section titled “Generate tests for all models”rocky ai-test --all --save --models modelsRun the generated tests
Section titled “Run the generated tests”Tests run with rocky test (DuckDB) and are included in rocky ci:
rocky test --models modelsTesting 12 models...
All 12 models passed
Result: 12 passed, 0 failed7. Intent in the IDE
Section titled “7. Intent in the IDE”When using the VS Code extension, models with intent get enhanced IDE features:
- Hover: Shows the intent description above the column list when hovering over a model name
- Document Symbols: Intent appears as the first child of the model in the Outline panel
- Diagnostics: The compiler warns when intent mentions columns that do not exist in the model’s output schema
8. Best Practices for Intent Descriptions
Section titled “8. Best Practices for Intent Descriptions”Good intent descriptions make ai-sync and ai-test significantly more effective. Here are guidelines:
State the grain
Section titled “State the grain”Specify what one row represents. This is the most important piece of context:
Grain: one row per customer per monthName key columns and their business meaning
Section titled “Name key columns and their business meaning”Do not just list column names — explain what they mean:
customer_lifetime_value is the total revenue from all completed orders for this customerDescribe filters and their purpose
Section titled “Describe filters and their purpose”Explain why data is filtered, not just what the filter is:
Filter to completed orders only (exclude cancelled, pending, refunded) becauserevenue should only count fulfilled transactionsExplain aggregation logic precisely
Section titled “Explain aggregation logic precisely”Be specific about how calculated columns are computed:
Revenue is quantity * unit_price * (1 - discount_pct), aggregated per day per categoryMention source models being joined
Section titled “Mention source models being joined”Join stg_orders with dim_products on product_id to get category informationGood example
Section titled “Good example”intent = """Calculate daily revenue by product category.Join stg_orders with dim_products on product_id.Filter to completed orders only (exclude cancelled, pending).Revenue = quantity * unit_price * (1 - discount_pct).Grain: one row per order_date per product_category."""Bad example
Section titled “Bad example”intent = "Revenue model."This is too vague for ai-sync to make intelligent update proposals or for ai-test to generate meaningful assertions.
9. AI Commands Reference
Section titled “9. AI Commands Reference”| Command | Description |
|---|---|
rocky ai "<intent>" | Generate a model from natural language |
rocky ai "<intent>" --format sql | Generate as standard SQL instead of Rocky DSL |
rocky ai-explain <model> | Generate intent for a single model |
rocky ai-explain --all --save | Generate and save intent for all models without intent |
rocky ai-sync | Preview schema change proposals |
rocky ai-sync --apply | Apply proposed schema changes |
rocky ai-sync --model <name> | Sync a specific model |
rocky ai-sync --with-intent | Only process models that have intent |
rocky ai-test <model> | Generate tests for a single model |
rocky ai-test --all --save | Generate and save tests for all models |