State Management

Rocky uses an embedded key-value store to track watermarks and run history. No external database is required; state is stored in a local file that Rocky manages automatically.

Backend

Rocky uses redb, an embedded key-value store written in Rust. Think of it as SQLite for key-value data: a single file, no server process, ACID transactions, and zero configuration.

State file

By default, Rocky stores state in <models>/.rocky-state.redb (a legacy .rocky-state.redb in the current directory keeps working, with a one-time deprecation warning on stderr). You can override the location with the --state-path flag:

plan_id=$(rocky --config rocky.toml --state-path /var/lib/rocky/state.redb plan --output json | jq -r .plan_id)
rocky --state-path /var/lib/rocky/state.redb apply "$plan_id"

Per-namespace state files

redb permits one writer per state file. When you fan out one rocky run per pipeline or per client against the single global .rocky-state.redb, those independent runs serialize on one advisory lock, even though they touch unrelated watermarks. Namespacing gives each run its own state file so they proceed concurrently.

This is opt-in and default-off: with neither knob set, Rocky uses the single global state file, byte-identical to before.

Per invocation, route a run to its own state file with --state-namespace <key>:

rocky run --state-namespace acme       # writes/reads <models>/.rocky-state/acme.redb
rocky run --state-namespace globex      # independent file, independent lock — runs concurrently

<key> must be a SQL identifier (^[a-zA-Z0-9_]+$) because it becomes a path segment; anything else is rejected.

Or make each pipeline namespace itself by default in rocky.toml:

[state]
namespacing = "pipeline"   # each pipeline → <models>/.rocky-state/<pipeline>.redb

The per-invocation --state-namespace flag overrides the config (use it to fan out by client/tenant rather than by pipeline name). An explicit --state-path is a hard override that disables namespacing for that invocation; it always wins, so a --state-namespace typo can’t error out a run the explicit path already pins.

What it stores

Watermarks

Each table’s watermark tracks the last successfully replicated timestamp:

Key:   "acme_warehouse.staging__us_west__shopify.orders"
Value: {
    last_value: "2025-03-15T14:30:00Z",
    updated_at: "2025-03-15T14:35:12Z"
}

last_value — The maximum value of the timestamp column (e.g., _fivetran_synced) seen in the last successful run
updated_at — When the watermark was last written

Watermarks are keyed by the fully qualified table name: catalog.schema.table.

Check history

Historical row counts are stored for anomaly detection:

Key:   "acme_warehouse.staging__us_west__shopify.orders"
Value: [
    { count: 150432, timestamp: "2025-03-13T10:00:00Z" },
    { count: 151200, timestamp: "2025-03-14T10:00:00Z" },
    { count: 152100, timestamp: "2025-03-15T10:00:00Z" }
]

Watermark lifecycle

At the start of each table’s replication, Rocky reads the watermark from the state store.

No watermark (first run). Rocky performs a full refresh, copying all rows from the source.

Watermark exists (incremental run). Rocky generates an incremental query that copies only rows newer than the stored watermark:

SELECT *, CAST(NULL AS STRING) AS _loaded_by
FROM fivetran_catalog.src__acme__us_west__shopify.orders
WHERE _fivetran_synced > TIMESTAMP '2025-03-15T14:30:00Z'

Update. After a successful copy, Rocky advances the watermark to the current timestamp, and the next run picks up from there.

Inspecting state

Use rocky state to view the current state:

rocky state

This displays all stored watermarks and their values, useful for debugging incremental runs.

Deleting watermarks

Clearing state causes Rocky to perform a full refresh on the next run — useful when you need to backfill data or recover from issues. There is no CLI command to remove a single table’s watermark; the practical options are:

Delete the state file to clear all watermarks (and run history) at once, then re-run:
Terminal window
```
rm <models>/.rocky-state.redb
```
Route the run to a fresh namespace so it starts from an empty state file without touching the global one:
Terminal window
```
rocky run --state-namespace backfill
```

For a scoped, review-gated re-run of specific models, use rocky backfill instead.

Anomaly detection

Rocky compares the current row count of each table against a historical moving average. If the deviation exceeds a configurable threshold (e.g., 50%), Rocky flags it as an anomaly in the run output.

This catches problems like:

A source table was truncated (count drops to near zero)
A bad sync duplicated data (count spikes dramatically)
A connector stopped syncing (count stays flat when it should be growing)

The threshold is configured per pipeline in rocky.toml:

[pipeline.bronze.checks]
enabled = true
row_count = true
freshness = { threshold_seconds = 86400 }

Remote State Persistence

By default, state is stored locally on disk. On ephemeral environments (e.g., EKS pods, CI runners), the local file is lost between runs. Rocky supports remote state backends to persist watermarks across deployments.

Backends

Backend	Config	Use Case
`local`	Default	Development, persistent VMs
`s3`	`s3_bucket`	Durable storage, multi-region
`valkey`	`valkey_url`	Low-latency, shared state
`tiered`	Both	Valkey for speed, S3 for durability

Configuration

[state]
backend = "s3"
s3_bucket = "${ROCKY_STATE_BUCKET}"
s3_prefix = "rocky/state/"        # default

[state]
backend = "valkey"
valkey_url = "${VALKEY_URL}"
valkey_prefix = "rocky:state:"    # default

[state]
backend = "tiered"
valkey_url = "${VALKEY_URL}"
s3_bucket = "${ROCKY_STATE_BUCKET}"

How Tiered State Works

The tiered backend combines Valkey (fast) with S3 (durable):

Download: try Valkey first (sub-millisecond reads); on miss or error, fall back to S3.
Upload: write to both Valkey (best-effort) and S3 (required).

Sync Lifecycle

When backend is not local, Rocky syncs state automatically:

Before run: Download remote state → local .redb file
During run: Read/write from local .redb (fast, no network)
After run: Upload local .redb → remote storage

If download fails, Rocky logs a warning and starts fresh from target-table metadata. Upload failure behaviour is governed by the retry + failure policy below.

Retry and Failure Policy

Every remote transfer (upload or download) runs inside a wall-clock budget with exponential-backoff retries and a three-state circuit breaker, the same machinery the Databricks and Snowflake adapters already use. Configuration lives under [state.retry] in rocky.toml; the full field list is in the configuration reference.

[state]
backend = "s3"
s3_bucket = "${ROCKY_STATE_BUCKET}"
transfer_timeout_seconds = 300       # total wall-clock ceiling — retries share this budget
on_upload_failure = "skip"           # "skip" (default) or "fail"

[state.retry]
max_retries = 3                       # defaults shown; omit the block to use them
circuit_breaker_threshold = 5

on_upload_failure controls what happens when retries and the circuit breaker are both exhausted:

Mode	Behaviour	When to use
`"skip"` (default)	Log a warning, mark the run successful, leave remote state stale. The next run re-derives watermarks from target-table metadata.	Most callers — the de-facto pre-1.13 behaviour. Trades state durability for run liveness.
`"fail"`	Propagate a `StateSyncError::RetryBudgetExhausted` or `CircuitOpen` to the caller; the run fails.	Strict environments where re-deriving watermarks is prohibitively expensive (long-running backfills, multi-hour syncs).

Terminal outcomes are structured. Every state.upload / state.download event now carries an outcome field so you can alert on state-layer health without log-message regex:

`outcome`	Meaning
`ok`	Transfer completed successfully.
`absent`	Remote state was empty — first run against this backend.
`timeout`	Hit `transfer_timeout_seconds` wall-clock cap.
`error_then_fresh`	Existence check failed; Rocky started fresh.
`transient_exhausted`	`max_retries` exhausted on transient errors.
`budget_exhausted`	`max_retries_per_run` exhausted across transfers.
`circuit_open`	Breaker is open; transfer skipped without attempting.
`skipped_after_failure`	Upload failed, `on_upload_failure = "skip"` applied.

Run rocky doctor --check state_rw at cold start to catch IAM / reachability problems before they show up as end-of-run upload failures.

State Per Environment

Each environment (dev, staging, prod) maintains its own state, with no coordination between them:

A fresh deployment starts with no watermarks (full refresh on first run)
Dev environments can be reset independently by deleting the state file
Remote backends allow state to survive pod restarts in ephemeral environments