Skip to content

Schema Patterns

Rocky uses a configurable schema pattern system to parse source schema names into structured components and resolve target catalog/schema names using templates.

In multi-tenant data platforms, source schemas follow naming conventions that encode information: which tenant owns the data, which region it came from, which source system produced it. Rocky’s schema pattern system extracts this information and uses it to determine where data should land in the target warehouse.

The schema pattern lives on the pipeline source; the templates live on the pipeline target. Both reference the same component names:

[pipeline.bronze.source.schema_pattern]
prefix = "src__"
separator = "__"
components = ["tenant", "regions...", "source"]
[pipeline.bronze.target]
adapter = "prod"
catalog_template = "{tenant}_warehouse"
schema_template = "staging__{regions}__{source}"
FieldDescription
prefixString prefix to strip before parsing. Schemas that don’t start with this prefix are skipped.
separatorDelimiter between components.
componentsOrdered list of named components to extract from the schema name.

Each entry in the components list defines a named component. The suffix determines how it matches:

A plain name like "tenant" matches exactly one segment.

tenant → matches one segment

A name with ... suffix like "regions..." matches one or more segments. Only one variable-length component is allowed per pattern, and it must not be the last component.

regions... → matches 1..N segments

The last component in the list always matches exactly one segment (the final segment of the schema name).

source → matches the last segment

Given the pattern prefix = "src__", separator = "__", components = ["tenant", "regions...", "source"]:

src__acme__us_west__shopify
│ │ │
│ │ └── source = "shopify"
│ └── regions = ["us_west"]
└── tenant = "acme"
src__acme__us_west__us_central__shopify
│ │ │ │
│ │ │ └── source = "shopify"
│ └────────┘
│ regions = ["us_west", "us_central"]
└── tenant = "acme"
src__globex__emea__france__paris__zendesk
│ │ │ │ │
│ │ │ │ └── source = "zendesk"
│ └─────┴───────┘
│ regions = ["emea", "france", "paris"]
└── tenant = "globex"

Templates use {component_name} placeholders that are replaced with parsed values:

[pipeline.bronze.target]
adapter = "prod"
catalog_template = "{tenant}_warehouse"
schema_template = "staging__{regions}__{source}"

{tenant} is replaced with the parsed value directly:

{tenant}_warehouse → acme_warehouse

{regions} is replaced with all values joined by the separator:

staging__{regions}__{source}
→ staging__us_west__shopify (single region)
→ staging__us_west__us_central__shopify (multiple regions)

Source: src__acme__us_west__shopify

TemplateResult
{tenant}_warehouseacme_warehouse
staging__{regions}__{source}staging__us_west__shopify

Target table: acme_warehouse.staging__us_west__shopify.<table_name>

Rocky produces clear errors for invalid schemas:

ConditionError
Schema doesn’t start with prefixSchema is skipped (not an error — it’s simply not a managed schema)
Not enough segments for all components"Not enough segments: expected at least N components, got M"
Missing required component"Missing component: tenant"

The schema pattern system is not limited to tenant/regions/source. You can define any components that match your naming convention:

[pipeline.bronze.source.schema_pattern]
prefix = "raw__"
separator = "__"
components = ["environment", "department", "system"]

This would parse raw__prod__finance__sap into:

  • environment = "prod"
  • department = "finance"
  • system = "sap"

And you could use templates like:

[pipeline.bronze.target]
adapter = "prod"
catalog_template = "{environment}_analytics"
schema_template = "{department}__{system}"

Once your sources are parsed into components, you can scope rocky plan, rocky run, and rocky compare to a subset via the --filter flag. The filter key is one of the component names you declared above (or the reserved id), and the value is matched against the parsed value — with containment semantics for multi-valued (...) components:

Terminal window
# Run everything for tenant "acme"
rocky run --filter tenant=acme
# Compare every source that touches us-west (works because `regions...` is multi-valued)
rocky compare --filter regions=us_west

See the CLI Filters reference for the full syntax, grammar, and common patterns.