Add sdp-quarantine-pattern asset (reference pipeline + companion agent skill) by vmariiechko · Pull Request #12 · vmariiechko/databricks-bundle-template

vmariiechko · 2026-06-06T11:58:04Z

Summary

Adds sdp-quarantine-pattern, a dual-door asset to the asset library: a validated reference Lakeflow Spark Declarative Pipeline that demonstrates the inverse-expectations quarantine pattern on the public samples.nyctaxi.trips dataset, plus a companion agent skill that adapts the pattern to the user's own dataset and verifies it. Critical (drop) expectations route violating rows to a quarantine_trips table while valid rows flow to clean_trips; advisory (warn) expectations annotate the event log without dropping.

The central lesson the asset teaches is the NULL trap: expect_all_or_drop keeps a row whose predicate evaluates to SQL NULL, so a naive inverse expectation double-counts NULL rows into both tables and breaks the partition. Every drop predicate is written NULL-safe so clean and quarantine partition the input exactly once.

Changes

New asset assets/sdp-quarantine-pattern/: pipeline notebook, declarative expectations.json, pure expectations.py helper (loads rules, derives the clean predicate and its inverse), DABs pipeline resource with a published event log and event_log_queries.sql, source parameterization via quarantine.source, and an in-bundle usage doc.
Offline unit-test suite shipped inside the asset (<target_dir>/tests/): a real local-Spark pytest that proves the partition invariant and NULL routing on crafted edge rows, modeling expect_all_or_drop keep-on-NULL semantics with IS NOT FALSE.
Companion agent skill at <skill_dir>/skills/sdp-quarantine-pattern/ (SKILL.md + references/adapt-the-pattern.md + references/self-verify.md): enforces NULL-safe drop predicates, splits work at a deploy/run trust boundary (Phase A adapt + offline unit tests with no workspace; Phase B gated deploy + live-verify, default human-in-the-middle), and ships a three-tier verification ladder.
Repo-level tests tests/assets/test_sdp_quarantine_pattern.py and config tests/configs/assets/sdp_quarantine_pattern.json.
Catalog and docs: ASSETS.md, ROADMAP.md, CHANGELOG.md ([1.9.0]), pyproject.toml (Google docstring convention).

Change Area

Asset Library (assets/<name>/)

Configuration Axes Affected

Unity Catalog / schemas
Template schema (databricks_template_schema.json)
Asset Library (new asset, asset schema, or framework changes)

Testing

All tests pass (pytest tests/ -V)
Manual template generation tested (databricks bundle init . --template-dir assets/sdp-quarantine-pattern)
New tests added for new functionality (if applicable)

Additionally, the offline unit-test suite was run with a real local Spark session (PySpark 4.x, JDK 17) and the pattern was validated live on serverless SDP: the partition invariant held exactly (21,847 clean + 85 quarantine = 21,932 raw at baseline) and a NULL drop column routed to quarantine without leaking into clean.

Asset Changes (if applicable)

Asset installs standalone via databricks bundle init . --template-dir assets/<name> --output-dir <dir>
Asset is self-contained (no references to library/helpers.tmpl or other assets)
tests/configs/assets/<name>.json added
Asset appears in ASSETS.md catalog

Checklist

Go template syntax is valid (no unclosed {{ }} blocks)
No .tmpl files appear in generated output
Generated YAML files are valid
Documentation updated (if behavior changed)

Lakeflow SDP pipeline demonstrating the inverse-expectations quarantine pattern on samples.nyctaxi.trips: drop expectations route violators to a separate quarantine table, valid rows flow to silver, warn expectations log to the event log. Drop predicates are NULL-safe so silver and quarantine partition the input exactly once; Split tables across medallion schemas (bronze_trips in bronze; silver_trips + quarantine_trips in silver); published a queryable event log with parsing queries, and added cost/trace tags. Sanitized all external-course references. Full suite: 2352 passed / 163 skipped.

Turn the asset into a dual-door deliverable: the validated reference pipeline plus a companion agent skill that adapts the inverse-expectations quarantine pattern to the user's own dataset and verifies it. Skill ships a three-tier verification ladder led by offline local-Spark unit tests (partition invariant + NULL-trap, no workspace), then live read-only audits, then an optional source-parameterized integration test. Workflow splits at a deploy/run trust boundary (Phase A autonomous incl. running the unit tests; Phase B gated, default human-in-the-middle) and delegates deploy/run/query mechanics to the runtime. Validated live by dogfooding on samples.bakehouse via Claude Code + Sonnet 4.6. Also: name all three example tables consistently (raw/clean/quarantine) with fully-qualified identifiers, parameterize the source via quarantine.source, rename the event log table, and ship an offline unit-test suite. New target_dir/tests/ and skill_dir prompt.

Under the asset's keep-on-NULL expect_all_or_drop routing, a NULL-producing drop predicate keeps the row in both the clean and quarantine tables (double-counted), not 'both or neither'. Correct the phrasing in the asset README and SKILL.md and cite the Databricks `is false` operator. Replace 'production' with 'live (main-target)' wording in the self-verify reference.

vmariiechko added 4 commits June 4, 2026 18:25

Set 1.9.0 changelog date to release day

f742ce9

vmariiechko merged commit d895f08 into main Jun 6, 2026
1 check passed

vmariiechko deleted the feature/sdp-quarantine-pattern-asset branch June 6, 2026 12:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add sdp-quarantine-pattern asset (reference pipeline + companion agent skill)#12

Add sdp-quarantine-pattern asset (reference pipeline + companion agent skill)#12
vmariiechko merged 4 commits into
mainfrom
feature/sdp-quarantine-pattern-asset

vmariiechko commented Jun 6, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

vmariiechko commented Jun 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Change Area

Configuration Axes Affected

Testing

Asset Changes (if applicable)

Checklist

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vmariiechko commented Jun 6, 2026 •

edited

Loading