SQL Generation
Rocky generates all SQL from its internal IR (Intermediate Representation). No Jinja templates, no string concatenation with untrusted input. Every identifier is validated before it reaches a SQL statement.
Catalog Lifecycle
Section titled “Catalog Lifecycle”CREATE CATALOG IF NOT EXISTS <catalog>ALTER CATALOG <catalog> SET TAGS ('managed_by' = 'rocky', 'tenant' = 'acme')DESCRIBE CATALOG <catalog>Schema Lifecycle
Section titled “Schema Lifecycle”CREATE SCHEMA IF NOT EXISTS <catalog>.<schema>ALTER SCHEMA <catalog>.<schema> SET TAGS ('managed_by' = 'rocky', 'source' = 'shopify')SHOW SCHEMAS IN <catalog>Table Tagging
Section titled “Table Tagging”Governance tags are applied to each replicated table:
ALTER TABLE <catalog>.<schema>.<table> SET TAGS ('managed_by' = 'rocky')Incremental Copy
Section titled “Incremental Copy”The core replication operation. Copies only rows newer than the latest timestamp in the target:
INSERT INTO <target_catalog>.<target_schema>.<table>SELECT *, CAST(NULL AS STRING) AS _loaded_byFROM <source_catalog>.<source_schema>.<table>WHERE _fivetran_synced > ( SELECT COALESCE(MAX(_fivetran_synced), TIMESTAMP '1970-01-01') FROM <target_catalog>.<target_schema>.<table>)Full Refresh
Section titled “Full Refresh”Used when drift is detected or when explicitly configured:
CREATE OR REPLACE TABLE <target> AS SELECT * FROM <source>Merge (Upsert)
Section titled “Merge (Upsert)”For tables that require key-based deduplication:
MERGE INTO <target> AS tUSING (SELECT ... FROM <source>) AS sON t.key = s.keyWHEN MATCHED THEN UPDATE SET *WHEN NOT MATCHED THEN INSERT *Transformation SQL
Section titled “Transformation SQL”User-defined SQL is wrapped in the appropriate statement depending on the materialization strategy:
- Table:
CREATE OR REPLACE TABLE ... AS <user_sql> - Incremental:
INSERT INTO ... <user_sql> - Merge:
MERGE INTO ... USING (<user_sql>) ... - Materialized View:
CREATE OR REPLACE MATERIALIZED VIEW ... AS <user_sql>(Databricks) - Dynamic Table:
CREATE OR REPLACE DYNAMIC TABLE ... TARGET_LAG = '<lag>' AS <user_sql>(Snowflake) - Time Interval: Per-partition
INSERT OVERWRITEwith@start_date/@end_datesubstitution
Materialized View (Databricks)
Section titled “Materialized View (Databricks)”CREATE OR REPLACE MATERIALIZED VIEW <catalog>.<schema>.<table> AS<user_sql>Dynamic Table (Snowflake)
Section titled “Dynamic Table (Snowflake)”CREATE OR REPLACE DYNAMIC TABLE <catalog>.<schema>.<table>TARGET_LAG = '<lag>'WAREHOUSE = <warehouse>AS <user_sql>Time-Interval Partition Processing
Section titled “Time-Interval Partition Processing”Per-warehouse SQL for partition-keyed materialization:
Databricks (Delta, atomic):
INSERT INTO <target>REPLACE WHERE <time_column> >= '<start>' AND <time_column> < '<end>'<user_sql with @start_date/@end_date substituted>Snowflake (multi-statement transaction):
BEGIN;DELETE FROM <target> WHERE <time_column> >= '<start>' AND <time_column> < '<end>';INSERT INTO <target> <user_sql>;COMMIT;Drift Detection
Section titled “Drift Detection”DESCRIBE TABLE <catalog>.<schema>.<table>When drift is found:
DROP TABLE IF EXISTS <target_catalog>.<target_schema>.<table>Permission Reconciliation
Section titled “Permission Reconciliation”SHOW GRANTS ON CATALOG <catalog>SHOW GRANTS ON SCHEMA <catalog>.<schema>GRANT <PERMISSION> ON CATALOG <catalog> TO `<principal>`REVOKE <PERMISSION> ON CATALOG <catalog> FROM `<principal>`Data Quality Checks
Section titled “Data Quality Checks”Row count (batched):
SELECT '<catalog>', '<schema>', '<table>', COUNT(*)FROM <catalog>.<schema>.<table>UNION ALLSELECT '<catalog>', '<schema>', '<table>', COUNT(*)FROM <catalog>.<schema>.<table>-- ... up to 200 tables per batchFreshness:
SELECT MAX(<timestamp_column>) FROM <catalog>.<schema>.<table>Null rate (sampled):
SELECT ... FROM <catalog>.<schema>.<table> TABLESAMPLE (N PERCENT)Workspace Isolation
Section titled “Workspace Isolation”Rocky uses the Databricks REST API for workspace binding and isolation (not SQL):
PATCH /api/2.1/unity-catalog/bindings/catalog/{name}— Bind catalog to workspace IDsPATCH /api/2.1/unity-catalog/catalogs/{name}— Setisolation_mode: "ISOLATED"
Catalog Discovery
Section titled “Catalog Discovery”Find catalogs managed by Rocky using tags:
SELECT catalog_nameFROM system.information_schema.catalog_tagsWHERE tag_name = 'managed_by' AND tag_value = 'rocky'SQL Safety
Section titled “SQL Safety”All SQL generation follows strict safety rules:
- Identifiers (catalogs, schemas, tables, tenants, regions, sources) are validated against
^[a-zA-Z0-9_]+$ - Principal names are validated against
^[a-zA-Z0-9_ \-\.@]+$and always wrapped in backticks - Rocky never uses
format!()to interpolate untrusted input into SQL - All validation happens in
rocky-sql/validation.rsbefore any SQL is constructed