Schema Patterns
Rocky uses a configurable schema pattern system to parse source schema names into structured components and resolve target catalog/schema names using templates.
Purpose
Section titled “Purpose”In multi-tenant data platforms, source schemas follow naming conventions that encode information: which tenant owns the data, which region it came from, which source system produced it. Rocky’s schema pattern system extracts this information and uses it to determine where data should land in the target warehouse.
Configuration
Section titled “Configuration”The schema pattern lives on the pipeline source; the templates live on the pipeline target. Both reference the same component names:
[pipeline.bronze.source.schema_pattern]prefix = "src__"separator = "__"components = ["tenant", "regions...", "source"]
[pipeline.bronze.target]adapter = "prod"catalog_template = "{tenant}_warehouse"schema_template = "staging__{regions}__{source}"Fields
Section titled “Fields”| Field | Description |
|---|---|
prefix | String prefix to strip before parsing. Schemas that don’t start with this prefix are skipped. |
separator | Delimiter between components. |
components | Ordered list of named components to extract from the schema name. |
Component types
Section titled “Component types”Each entry in the components list defines a named component. The suffix determines how it matches:
Variable (single segment)
Section titled “Variable (single segment)”A plain name like "tenant" matches exactly one segment.
tenant → matches one segmentVariable-length (multi-segment)
Section titled “Variable-length (multi-segment)”A name with ... suffix like "regions..." matches one or more segments. Only one variable-length component is allowed per pattern, and it must not be the last component.
regions... → matches 1..N segmentsTerminal
Section titled “Terminal”The last component in the list always matches exactly one segment (the final segment of the schema name).
source → matches the last segmentParsing examples
Section titled “Parsing examples”Given the pattern prefix = "src__", separator = "__", components = ["tenant", "regions...", "source"]:
Single region
Section titled “Single region”src__acme__us_west__shopify │ │ │ │ │ └── source = "shopify" │ └── regions = ["us_west"] └── tenant = "acme"Multiple regions
Section titled “Multiple regions”src__acme__us_west__us_central__shopify │ │ │ │ │ │ │ └── source = "shopify" │ └────────┘ │ regions = ["us_west", "us_central"] └── tenant = "acme"Deep region hierarchy
Section titled “Deep region hierarchy”src__globex__emea__france__paris__zendesk │ │ │ │ │ │ │ │ │ └── source = "zendesk" │ └─────┴───────┘ │ regions = ["emea", "france", "paris"] └── tenant = "globex"Template resolution
Section titled “Template resolution”Templates use {component_name} placeholders that are replaced with parsed values:
[pipeline.bronze.target]adapter = "prod"catalog_template = "{tenant}_warehouse"schema_template = "staging__{regions}__{source}"Single-valued components
Section titled “Single-valued components”{tenant} is replaced with the parsed value directly:
{tenant}_warehouse → acme_warehouseMulti-valued components
Section titled “Multi-valued components”{regions} is replaced with all values joined by the separator:
staging__{regions}__{source}→ staging__us_west__shopify (single region)→ staging__us_west__us_central__shopify (multiple regions)Full resolution example
Section titled “Full resolution example”Source: src__acme__us_west__shopify
| Template | Result |
|---|---|
{tenant}_warehouse | acme_warehouse |
staging__{regions}__{source} | staging__us_west__shopify |
Target table: acme_warehouse.staging__us_west__shopify.<table_name>
Pinning the join separator at the use site
Section titled “Pinning the join separator at the use site”By default, multi-valued components ({regions}) are joined with the caller-supplied separator. Different call sites supply different separators: target rendering uses target.separator while metadata_columns.value uses pattern.separator. The same placeholder can therefore resolve to different strings depending on which TOML field it appears in — a footgun for templates that hash or compare the rendered value (RLS keys, audit hashes).
Use {name:SEP} to pin the join separator at the use site:
[pipeline.bronze]metadata_columns = [ { name = "permission_key", type = "STRING", value = "md5('fivetran_{client}_{regions:_}_{source}')" } # ^^^ join `regions` with "_" regardless of caller default]Grammar:
| Form | Behavior |
|---|---|
{name} | Bare form — multi-valued components join with the caller-supplied default separator. |
{name:SEP} | Explicit form — multi-valued components join with the literal string SEP (may be empty, single-, or multi-character). The closing } terminates SEP, so a literal } cannot appear inside it. |
:SEP is silently ignored when name resolves to a single-valued component, so swapping a component from single to variadic does not require updating every template.
Error handling
Section titled “Error handling”Rocky produces clear errors for invalid schemas:
| Condition | Error |
|---|---|
| Schema doesn’t start with prefix | Schema is skipped (not an error — it’s simply not a managed schema) |
| Not enough segments for all components | "Not enough segments: expected at least N components, got M" |
| Missing required component | "Missing component: tenant" |
Custom patterns
Section titled “Custom patterns”The schema pattern system is not limited to tenant/regions/source. You can define any components that match your naming convention:
[pipeline.bronze.source.schema_pattern]prefix = "raw__"separator = "__"components = ["environment", "department", "system"]This would parse raw__prod__finance__sap into:
environment = "prod"department = "finance"system = "sap"
And you could use templates like:
[pipeline.bronze.target]adapter = "prod"catalog_template = "{environment}_analytics"schema_template = "{department}__{system}"Filtering by parsed component
Section titled “Filtering by parsed component”Once your sources are parsed into components, you can scope rocky plan and rocky compare (and the legacy rocky run alias) to a subset via the --filter flag. The filter key is one of the component names you declared above (or the reserved id), and the value is matched against the parsed value — with containment semantics for multi-valued (...) components:
# Plan everything for tenant "acme" (then `rocky apply <plan-id>` to execute)rocky plan --filter tenant=acme
# Compare every source that touches us-west (works because `regions...` is multi-valued)rocky compare --filter regions=us_westSee the CLI Filters reference for the full syntax, grammar, and common patterns.