The Rocky DSL
Rocky includes a domain-specific language (implemented in the rocky-lang crate) that provides a pipeline-oriented alternative to SQL for writing transformation models. Rocky DSL files (.rocky) are parsed into an AST and lowered to standard SQL before execution.
Why a DSL
Section titled “Why a DSL”SQL is powerful but its syntax works against readability for common data transformation patterns. A SELECT with multiple CTEs, window functions, and CASE expressions quickly becomes hard to follow. Rocky’s DSL addresses this with a top-to-bottom pipeline syntax where each step transforms the data flowing through it.
Rocky DSL is optional. You can write models in plain SQL and they work the same way. The DSL is for teams that want more readable, composable transformations.
Pipeline syntax
Section titled “Pipeline syntax”A Rocky file is a sequence of pipeline steps. Data flows from the top step downward:
from orderswhere status == "completed"derive { total: amount * quantity}group customer_id { revenue: sum(total), order_count: count()}sort revenue desctake 100This lowers to:
SELECT customer_id, SUM(amount * quantity) AS revenue, COUNT() AS order_countFROM ordersWHERE status = 'completed'GROUP BY customer_idORDER BY revenue DESCLIMIT 100Constructs
Section titled “Constructs”Every pipeline starts with from. It accepts a model name, a qualified table reference, or a table with an alias:
from ordersfrom catalog.schema.ordersfrom orders as oFilters rows. Multiple where steps combine with AND. A where step after group becomes a HAVING clause:
from orderswhere amount > 0where status != "cancelled"group customer_id { total: sum(amount)}where total > 1000 -- this becomes HAVINGGroups rows by one or more keys and defines aggregations in a block:
from ordersgroup customer_id { revenue: sum(amount), avg_order: avg(amount), first_order: min(order_date), cnt: count()}derive
Section titled “derive”Adds computed columns without removing existing ones:
from ordersderive { total: amount * quantity, is_large: amount > 1000}select
Section titled “select”Chooses which columns to keep. Replaces the current column set:
from ordersselect { id, customer_id, total }Supports * and qualified references:
from orders as ojoin customers as c on customer_id { keep c.name}select { o.id, c.name }Joins another model by key columns, with an optional keep clause to pull in specific columns:
from orders as ojoin customers as c on customer_id { keep c.name, c.email}This lowers to:
SELECT *, c.name, c.emailFROM orders oJOIN customers AS c ON o.customer_id = c.customer_idOrders results. Use desc for descending:
from orderssort order_date descLimits the number of rows:
from orderssort amount desctake 10distinct
Section titled “distinct”Deduplicates rows:
from ordersselect { customer_id }distinctPattern matching that lowers to CASE WHEN:
from ordersderive { tier: match amount { > 10000 => "enterprise", > 1000 => "mid-market", _ => "smb" }}Lowers to:
CASE WHEN amount > 10000 THEN 'enterprise' WHEN amount > 1000 THEN 'mid-market' ELSE 'smb' END AS tierreplicate
Section titled “replicate”Shorthand for SELECT * — used in bronze-layer models that pass data through unchanged:
from source.fivetran.ordersreplicateDate literals
Section titled “Date literals”The @ prefix creates date literals without quoting:
from orderswhere order_date >= @2025-01-01Lowers to WHERE order_date >= DATE '2025-01-01'.
Timestamps are also supported: @2025-01-01T00:00:00Z.
NULL-safe operators
Section titled “NULL-safe operators”Rocky’s != operator compiles to SQL’s IS DISTINCT FROM, not the standard !=. This means comparisons involving NULL behave intuitively:
| Rocky | SQL | NULL behavior |
|---|---|---|
a == b | a = b | NULL = NULL is NULL (standard SQL) |
a != b | a IS DISTINCT FROM b | NULL IS DISTINCT FROM NULL is FALSE |
This eliminates an entire class of bugs where != 'value' silently excludes NULL rows.
IS NULL and IS NOT NULL work as expected:
from orderswhere email is not nullWindow functions
Section titled “Window functions”Window functions use an over clause with partition, sort, and optional frame:
from ordersderive { rn: row_number() over (partition customer_id, sort -order_date), running_total: sum(amount) over (partition customer_id, sort order_date, rows unbounded..current), prev_amount: lag(amount, 1) over (sort order_date)}The - prefix on a sort column means descending. Frame bounds use .. syntax: unbounded..current, 3..current.
Comments
Section titled “Comments”Line comments use --, same as SQL:
from orders-- Filter to completed orders onlywhere status == "completed"How lowering works
Section titled “How lowering works”The rocky-lang crate processes DSL files in two phases:
-
Parse — A recursive descent parser (using the
logoslexer) produces a typed AST. Each pipeline step becomes an enum variant (From,Where,Group,Derive,Select,Join,Sort,Take,Distinct,Replicate). -
Lower — The AST is walked step by step, accumulating SQL clauses. The lowering context tracks the current
FROM, joins,WHEREclauses,SELECTcolumns,GROUP BY,HAVING,ORDER BY, andLIMIT. After all steps are processed, the context emits a single SQL string.
The lowered SQL is then handed to the compiler for type checking and dependency resolution, exactly as if it had been written by hand.
For the full language specification, see docs/rocky-lang-spec.md in the repository.