Add blueprint repair loop by txmed82 · Pull Request #382 · txmed82/case-crawler

txmed82 · 2026-06-04T01:40:45Z

Summary

add a BYOK blueprint repair loop that judges, records repair requests, repairs, and re-judges up to max_repair_rounds
persist repaired blueprint lineage metadata and repair attempts
add atomic blueprint-plus-attempt storage for successful repairs

Tests

.venv/bin/python -m pytest tests/test_blueprint_repair_loop.py tests/test_blueprint_judge.py tests/test_blueprint_storage.py -q
.venv/bin/python -m ruff check src/casecrawler/generation/blueprint_repair.py src/casecrawler/storage/dataset_store.py tests/test_blueprint_repair_loop.py tests/test_blueprint_storage.py
.venv/bin/python -m pytest -q -m 'not optional_backend and not network and not slow'

Summary by CodeRabbit

New Features

Added automated blueprint repair workflow that iteratively evaluates and repairs clinical blueprints when validation fails.
System now persists blueprint repair attempts and results, tracking evaluation progress across multiple repair cycles.

Tests

Added comprehensive test coverage for blueprint repair functionality and data persistence.

coderabbitai · 2026-06-04T01:40:55Z

📝 Walkthrough

Walkthrough

This PR introduces an asynchronous blueprint repair workflow. It defines the BlueprintRepairLoop class that iteratively evaluates blueprints with a judge, and when failures occur, repairs them via LLM calls. The PR also adds atomic persistence for blueprints and generation attempts, and includes comprehensive tests covering all repair scenarios.

Changes

Blueprint Repair Workflow

Layer / File(s)	Summary
Blueprint Repair Result and Loop Orchestration `src/casecrawler/generation/blueprint_repair.py`	`BlueprintRepairResult` captures the original and final blueprints, judge reports, per-round repaired blueprints, round count, and pass status. `BlueprintRepairLoop.run()` orchestrates the iterative evaluation and repair cycle: judges the blueprint, returns early if passed, respects max repair rounds, calls the repair provider when needed, and returns the complete outcome.
Single Repair Round Execution and Helpers `src/casecrawler/generation/blueprint_repair.py`	`_repair_blueprint()` executes one repair round by building a repair prompt, computing a deterministic hash, calling the provider to generate a structured blueprint, and canonicalizing the result with fresh identifiers and metadata linking. `_canonicalize_blueprint()` constructs metadata connecting the repaired blueprint to its parent, repair round, and judge report. `_repair_prompt()` serializes the blueprint and judge report to deterministic JSON. `_prompt_hash()` and `_hash_payload()` create deterministic SHA-256 hashes for attempt correlation. `_REPAIR_SYSTEM_PROMPT` constrains the repair model to structured output only.
Generation Attempt Creation and Error Handling `src/casecrawler/generation/blueprint_repair.py`	`_repair_requested_attempt()` creates a generation attempt record when repair is initiated, including prompt hash and metadata linking to the judge report and round. `_repair_attempt()` creates terminal-status records (succeeded/failed) with optional token counts and error lists. `_save_failed_attempt_best_effort()` persists failed attempts while suppressing and logging any storage errors.
Blueprint and Attempt Atomic Persistence `src/casecrawler/storage/dataset_store.py`	`DatasetStore.save_blueprint_with_attempt()` atomically upserts both a clinical blueprint and its associated generation attempt in a single `_write_lock`-protected SQLite transaction, committing on success and rolling back with exception re-raising on failure.
Test Suite for Repair Loop and Persistence `tests/test_blueprint_repair_loop.py`, `tests/test_blueprint_storage.py`	Comprehensive async tests covering repair loop behavior: repair of failed judge reports with correct round tracking and metadata linkage, preservation of unchanged blueprints when judge passes initially, and enforcement of max repair rounds. Storage tests validate atomic upsert of blueprints and generation attempts. Tests use fake sequenced providers and verify attempt role/status recording and provider invocation details including temperature and prompt contents.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

txmed82/case-crawler#381: The BlueprintRepairLoop depends on and consumes BlueprintJudge.evaluate results to drive the iterative fail→repair cycle based on judge reports.
txmed82/case-crawler#372: The new DatasetStore.save_blueprint_with_attempt() persistence method extends the broader SQLite blueprint artifact storage infrastructure for blueprints, attempts, and judge reports.
txmed82/case-crawler#376: The batch BlueprintPipeline relies on the atomic save_blueprint_with_attempt() method added here for persisting generated blueprints and their generation attempts.

Poem

🐰 A loop that judges, then repairs with care,
LLM whispers prompt hashes in the air,
Blueprints canonicalized, attempts take flight,
Atomic transactions lock them down tight! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'Add blueprint repair loop' directly and clearly describes the main addition in this changeset—a new BlueprintRepairLoop class with associated result model, storage helper, and comprehensive tests.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch codex/blueprint-repair-loop

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/casecrawler/generation/blueprint_repair.py`:
- Around line 251-278: _repair_requested_attempt currently computes prompt_hash
from a local payload using _hash_payload, which diverges from the terminal
attempts that use _prompt_hash(prompt, policy); change the flow so the repair
prompt is hashed once (via _prompt_hash with the actual repair prompt and
policy) before creating/persisting the REPAIR_REQUESTED GenerationAttempt and
pass that same prompt_hash into _repair_requested_attempt (or add a prompt_hash
parameter) so both the REPAIR_REQUESTED and subsequent SUCCEEDED/FAILED attempts
use the identical prompt_hash value.

In `@src/casecrawler/storage/dataset_store.py`:
- Around line 297-333: Before performing the DB writes in
save_blueprint_with_attempt, validate that the provided GenerationAttempt
actually belongs to the ClinicalBlueprint: check attempt.artifact_id ==
blueprint.blueprint_id and attempt.dataset_id == blueprint.dataset_id (or any
other domain-specific linkage between attempt and blueprint), and raise a clear
exception (e.g., ValueError) if they do not match; perform this validation
before acquiring the write lock / before executing the INSERTs so mismatched
pairs are rejected prior to the transaction.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 1d6f782f-9c58-4fa0-a868-40b1fc244125

📥 Commits

Reviewing files that changed from the base of the PR and between a9fa1fd and 202b946.

📒 Files selected for processing (4)

src/casecrawler/generation/blueprint_repair.py
src/casecrawler/storage/dataset_store.py
tests/test_blueprint_repair_loop.py
tests/test_blueprint_storage.py

coderabbitai · 2026-06-04T01:45:24Z

+    def _repair_requested_attempt(
+        self,
+        *,
+        blueprint: ClinicalBlueprint,
+        policy: GenerationRolePolicy,
+        judge_report: JudgeReport,
+        repair_round: int,
+    ) -> GenerationAttempt:
+        payload = {
+            "artifact_id": blueprint.blueprint_id,
+            "judge_report_id": judge_report.report_id,
+            "repair_round": repair_round,
+            "status": GenerationAttemptStatus.REPAIR_REQUESTED.value,
+        }
+        return GenerationAttempt(
+            attempt_id=f"attempt-{uuid4()}",
+            dataset_id=blueprint.dataset_id,
+            role=GenerationRole.REPAIR,
+            status=GenerationAttemptStatus.REPAIR_REQUESTED,
+            provider=policy.provider,
+            model=policy.model,
+            prompt_hash=_hash_payload(payload),
+            artifact_id=blueprint.blueprint_id,
+            metadata={
+                "judge_report_id": judge_report.report_id,
+                "repair_round": repair_round,
+            },
+        )


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Use the real repair prompt hash for REPAIR_REQUESTED.

_repair_requested_attempt() hashes a local payload instead of the actual repair prompt, while the later SUCCEEDED/FAILED attempt for the same round uses _prompt_hash(prompt, policy). That makes the preflight record impossible to correlate with its terminal record via prompt_hash, which weakens the attempt lineage this PR is adding. Compute the prompt/hash once before persisting the request row, then pass that same hash through both attempt builders.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/casecrawler/generation/blueprint_repair.py` around lines 251 - 278, _repair_requested_attempt currently computes prompt_hash from a local payload using _hash_payload, which diverges from the terminal attempts that use _prompt_hash(prompt, policy); change the flow so the repair prompt is hashed once (via _prompt_hash with the actual repair prompt and policy) before creating/persisting the REPAIR_REQUESTED GenerationAttempt and pass that same prompt_hash into _repair_requested_attempt (or add a prompt_hash parameter) so both the REPAIR_REQUESTED and subsequent SUCCEEDED/FAILED attempts use the identical prompt_hash value.

coderabbitai · 2026-06-04T01:45:24Z

+    def save_blueprint_with_attempt(
+        self,
+        blueprint: ClinicalBlueprint,
+        attempt: GenerationAttempt,
+    ) -> None:
+        with self._write_lock:
+            try:
+                self._conn.execute(
+                    """INSERT OR REPLACE INTO clinical_blueprints
+                    (blueprint_id, dataset_id, cohort_plan_id, archetype_name,
+                     blueprint_json)
+                    VALUES (?, ?, ?, ?, ?)""",
+                    (
+                        blueprint.blueprint_id,
+                        blueprint.dataset_id,
+                        blueprint.cohort_plan_id,
+                        blueprint.archetype_name,
+                        blueprint.model_dump_json(),
+                    ),
+                )
+                self._conn.execute(
+                    """INSERT OR REPLACE INTO generation_attempts
+                    (attempt_id, dataset_id, role, status, artifact_id, attempt_json)
+                    VALUES (?, ?, ?, ?, ?, ?)""",
+                    (
+                        attempt.attempt_id,
+                        attempt.dataset_id,
+                        attempt.role.value,
+                        attempt.status.value,
+                        attempt.artifact_id,
+                        attempt.model_dump_json(),
+                    ),
+                )
+                self._conn.commit()
+            except Exception:
+                self._conn.rollback()
+                raise


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Reject mismatched blueprint/attempt pairs before the transaction.

This method never validates that the attempt belongs to the blueprint being saved. A caller can currently commit a ClinicalBlueprint for one artifact and a GenerationAttempt for another, which silently corrupts repair lineage despite the write being "atomic".

Suggested fix

def save_blueprint_with_attempt( self, blueprint: ClinicalBlueprint, attempt: GenerationAttempt, ) -> None: + if attempt.dataset_id != blueprint.dataset_id: + raise ValueError("Attempt dataset_id must match blueprint dataset_id.") + if attempt.artifact_id != blueprint.blueprint_id: + raise ValueError("Attempt artifact_id must match blueprint blueprint_id.") with self._write_lock: try: self._conn.execute( """INSERT OR REPLACE INTO clinical_blueprints

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def save_blueprint_with_attempt(

self,

blueprint: ClinicalBlueprint,

attempt: GenerationAttempt,

) -> None:

with self._write_lock:

try:

self._conn.execute(

"""INSERT OR REPLACE INTO clinical_blueprints

(blueprint_id, dataset_id, cohort_plan_id, archetype_name,

blueprint_json)

VALUES (?, ?, ?, ?, ?)""",

(

blueprint.blueprint_id,

blueprint.dataset_id,

blueprint.cohort_plan_id,

blueprint.archetype_name,

blueprint.model_dump_json(),

),

)

self._conn.execute(

"""INSERT OR REPLACE INTO generation_attempts

(attempt_id, dataset_id, role, status, artifact_id, attempt_json)

VALUES (?, ?, ?, ?, ?, ?)""",

(

attempt.attempt_id,

attempt.dataset_id,

attempt.role.value,

attempt.status.value,

attempt.artifact_id,

attempt.model_dump_json(),

),

)

self._conn.commit()

except Exception:

self._conn.rollback()

raise

def save_blueprint_with_attempt(

self,

blueprint: ClinicalBlueprint,

attempt: GenerationAttempt,

) -> None:

if attempt.dataset_id != blueprint.dataset_id:

raise ValueError("Attempt dataset_id must match blueprint dataset_id.")

if attempt.artifact_id != blueprint.blueprint_id:

raise ValueError("Attempt artifact_id must match blueprint blueprint_id.")

with self._write_lock:

try:

self._conn.execute(

"""INSERT OR REPLACE INTO clinical_blueprints

(blueprint_id, dataset_id, cohort_plan_id, archetype_name,

blueprint_json)

VALUES (?, ?, ?, ?, ?)""",

(

blueprint.blueprint_id,

blueprint.dataset_id,

blueprint.cohort_plan_id,

blueprint.archetype_name,

blueprint.model_dump_json(),

),

)

self._conn.execute(

"""INSERT OR REPLACE INTO generation_attempts

(attempt_id, dataset_id, role, status, artifact_id, attempt_json)

VALUES (?, ?, ?, ?, ?, ?)""",

(

attempt.attempt_id,

attempt.dataset_id,

attempt.role.value,

attempt.status.value,

attempt.artifact_id,

attempt.model_dump_json(),

),

)

self._conn.commit()

except Exception:

self._conn.rollback()

raise

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/casecrawler/storage/dataset_store.py` around lines 297 - 333, Before performing the DB writes in save_blueprint_with_attempt, validate that the provided GenerationAttempt actually belongs to the ClinicalBlueprint: check attempt.artifact_id == blueprint.blueprint_id and attempt.dataset_id == blueprint.dataset_id (or any other domain-specific linkage between attempt and blueprint), and raise a clear exception (e.g., ValueError) if they do not match; perform this validation before acquiring the write lock / before executing the INSERTs so mismatched pairs are rejected prior to the transaction.

Add blueprint repair loop

202b946

coderabbitai Bot reviewed Jun 4, 2026

View reviewed changes

txmed82 merged commit bcdabe2 into master Jun 4, 2026
6 of 7 checks passed

txmed82 deleted the codex/blueprint-repair-loop branch June 4, 2026 01:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add blueprint repair loop#382

Add blueprint repair loop#382
txmed82 merged 1 commit into
masterfrom
codex/blueprint-repair-loop

txmed82 commented Jun 4, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 4, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Jun 4, 2026

Uh oh!

coderabbitai Bot Jun 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

txmed82 commented Jun 4, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Tests

Summary by CodeRabbit

New Features

Tests

Uh oh!

coderabbitai Bot commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

txmed82 commented Jun 4, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 4, 2026 •

edited

Loading