Quickstart
This guide walks you through setting up a Rocky pipeline that replicates Fivetran-landed sources into Databricks. If you do not have warehouse credentials handy, the Playground guide does the same thing end-to-end against a local DuckDB file with no setup.
1. Initialize a Project
Section titled “1. Initialize a Project”rocky init my-pipelinecd my-pipelineThis creates:
my-pipeline/├── rocky.toml # Pipeline configuration (named adapters + named pipelines)└── models/ # Transformation models (SQL + TOML)2. Configure Your Pipeline
Section titled “2. Configure Your Pipeline”Edit rocky.toml to declare a Fivetran source adapter, a Databricks warehouse adapter, and a pipeline that wires them together:
[adapter.fivetran]type = "fivetran"destination_id = "${FIVETRAN_DESTINATION_ID}"api_key = "${FIVETRAN_API_KEY}"api_secret = "${FIVETRAN_API_SECRET}"
[adapter.prod]type = "databricks"host = "${DATABRICKS_HOST}"http_path = "${DATABRICKS_HTTP_PATH}"token = "${DATABRICKS_TOKEN}"
[pipeline.bronze]type = "replication"strategy = "incremental"timestamp_column = "_fivetran_synced"metadata_columns = [ { name = "_loaded_by", type = "STRING", value = "NULL" },]
[pipeline.bronze.source]adapter = "fivetran"
[pipeline.bronze.source.schema_pattern]prefix = "src__"separator = "__"components = ["source"]
[pipeline.bronze.target]adapter = "prod"catalog_template = "warehouse"schema_template = "stage__{source}"
[pipeline.bronze.target.governance]auto_create_catalogs = trueauto_create_schemas = true
[pipeline.bronze.checks]enabled = truerow_count = truecolumn_match = truefreshness = { threshold_seconds = 86400 }
[state]backend = "local"The [adapter.NAME] blocks define connections; the [pipeline.NAME] block ties them together. You can declare additional adapters and pipelines in the same file and select between them with --pipeline NAME.
Set the environment variables:
export DATABRICKS_HOST="your-workspace.cloud.databricks.com"export DATABRICKS_HTTP_PATH="/sql/1.0/warehouses/abc123"export DATABRICKS_TOKEN="dapi..."export FIVETRAN_DESTINATION_ID="your_destination_id"export FIVETRAN_API_KEY="your_api_key"export FIVETRAN_API_SECRET="your_api_secret"3. Validate Your Config
Section titled “3. Validate Your Config”rocky validateExpected output:
ok Config syntax valid (v2 format) ok adapter.fivetran: fivetran ok adapter.prod: databricks (auth configured) ok pipeline.bronze: schema pattern parseable ok pipeline.bronze: replication / incremental -> warehouse / stage__{source}
Validation complete.rocky validate only checks the config — it does not call the Fivetran or Databricks APIs.
4. Discover Sources
Section titled “4. Discover Sources”rocky -o table discoverThis calls the Fivetran API and lists all connectors that match the schema pattern:
connector_abc | tenant=acme regions=[us_west] source=shopify | 12 tablesconnector_def | tenant=acme regions=[us_west] source=stripe | 8 tablesconnector_ghi | tenant=globex regions=[emea] source=hubspot | 15 tables5. Preview the SQL
Section titled “5. Preview the SQL”rocky plan --filter tenant=acmeThis shows the SQL Rocky will generate without executing it:
-- create_catalog (acme_warehouse)CREATE CATALOG IF NOT EXISTS acme_warehouse;
-- create_schema (acme_warehouse.staging__us_west__shopify)CREATE SCHEMA IF NOT EXISTS acme_warehouse.staging__us_west__shopify;
-- incremental_copy (acme_warehouse.staging__us_west__shopify.orders)INSERT INTO acme_warehouse.staging__us_west__shopify.ordersSELECT *, CAST(NULL AS STRING) AS _loaded_byFROM source_catalog.src__acme__us_west__shopify.ordersWHERE _fivetran_synced > ( SELECT COALESCE(MAX(_fivetran_synced), TIMESTAMP '1970-01-01') FROM acme_warehouse.staging__us_west__shopify.orders);6. Run the Pipeline
Section titled “6. Run the Pipeline”rocky run --filter tenant=acmeThis executes the full pipeline:
- Discovers sources from Fivetran
- Creates catalogs and schemas as needed and applies governance
- Detects schema drift between source and target
- Copies data incrementally (or full refresh if drift forces it)
- Runs data quality checks (row count, column match, freshness)
The JSON output includes materializations, check results, drift actions, and permissions:
{ "version": "0.1.0", "command": "run", "filter": "tenant=acme", "duration_ms": 45200, "tables_copied": 20, "materializations": [...], "check_results": [...], "permissions": { "catalogs_created": 1, "schemas_created": 2 }, "drift": { "tables_checked": 20, "tables_drifted": 0 }}If a run fails partway through, you can resume from the last checkpoint instead of rerunning everything:
rocky run --filter tenant=acme --resume-latest7. Check State
Section titled “7. Check State”rocky stateShows stored watermarks for each table:
acme_warehouse.staging__us_west__shopify.orders | 2026-03-30T10:00:00Z | 2026-03-30T10:01:32Zacme_warehouse.staging__us_west__shopify.customers | 2026-03-30T10:00:00Z | 2026-03-30T10:01:35ZNext Steps
Section titled “Next Steps”- Try the playground for a credential-free DuckDB version of this flow
- Learn about schema patterns to customize source-to-target mapping
- Add transformation models for custom SQL
- Configure data quality checks
- Set up permissions for RBAC
- Integrate with Dagster for orchestration