Skip to content

Add blueprint repair loop#382

Merged
txmed82 merged 1 commit into
masterfrom
codex/blueprint-repair-loop
Jun 4, 2026
Merged

Add blueprint repair loop#382
txmed82 merged 1 commit into
masterfrom
codex/blueprint-repair-loop

Conversation

@txmed82

@txmed82 txmed82 commented Jun 4, 2026

Copy link
Copy Markdown
Owner

Summary

  • add a BYOK blueprint repair loop that judges, records repair requests, repairs, and re-judges up to max_repair_rounds
  • persist repaired blueprint lineage metadata and repair attempts
  • add atomic blueprint-plus-attempt storage for successful repairs

Tests

  • .venv/bin/python -m pytest tests/test_blueprint_repair_loop.py tests/test_blueprint_judge.py tests/test_blueprint_storage.py -q
  • .venv/bin/python -m ruff check src/casecrawler/generation/blueprint_repair.py src/casecrawler/storage/dataset_store.py tests/test_blueprint_repair_loop.py tests/test_blueprint_storage.py
  • .venv/bin/python -m pytest -q -m 'not optional_backend and not network and not slow'

Summary by CodeRabbit

New Features

  • Added automated blueprint repair workflow that iteratively evaluates and repairs clinical blueprints when validation fails.
  • System now persists blueprint repair attempts and results, tracking evaluation progress across multiple repair cycles.

Tests

  • Added comprehensive test coverage for blueprint repair functionality and data persistence.

@coderabbitai

coderabbitai Bot commented Jun 4, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

This PR introduces an asynchronous blueprint repair workflow. It defines the BlueprintRepairLoop class that iteratively evaluates blueprints with a judge, and when failures occur, repairs them via LLM calls. The PR also adds atomic persistence for blueprints and generation attempts, and includes comprehensive tests covering all repair scenarios.

Changes

Blueprint Repair Workflow

Layer / File(s) Summary
Blueprint Repair Result and Loop Orchestration
src/casecrawler/generation/blueprint_repair.py
BlueprintRepairResult captures the original and final blueprints, judge reports, per-round repaired blueprints, round count, and pass status. BlueprintRepairLoop.run() orchestrates the iterative evaluation and repair cycle: judges the blueprint, returns early if passed, respects max repair rounds, calls the repair provider when needed, and returns the complete outcome.
Single Repair Round Execution and Helpers
src/casecrawler/generation/blueprint_repair.py
_repair_blueprint() executes one repair round by building a repair prompt, computing a deterministic hash, calling the provider to generate a structured blueprint, and canonicalizing the result with fresh identifiers and metadata linking. _canonicalize_blueprint() constructs metadata connecting the repaired blueprint to its parent, repair round, and judge report. _repair_prompt() serializes the blueprint and judge report to deterministic JSON. _prompt_hash() and _hash_payload() create deterministic SHA-256 hashes for attempt correlation. _REPAIR_SYSTEM_PROMPT constrains the repair model to structured output only.
Generation Attempt Creation and Error Handling
src/casecrawler/generation/blueprint_repair.py
_repair_requested_attempt() creates a generation attempt record when repair is initiated, including prompt hash and metadata linking to the judge report and round. _repair_attempt() creates terminal-status records (succeeded/failed) with optional token counts and error lists. _save_failed_attempt_best_effort() persists failed attempts while suppressing and logging any storage errors.
Blueprint and Attempt Atomic Persistence
src/casecrawler/storage/dataset_store.py
DatasetStore.save_blueprint_with_attempt() atomically upserts both a clinical blueprint and its associated generation attempt in a single _write_lock-protected SQLite transaction, committing on success and rolling back with exception re-raising on failure.
Test Suite for Repair Loop and Persistence
tests/test_blueprint_repair_loop.py, tests/test_blueprint_storage.py
Comprehensive async tests covering repair loop behavior: repair of failed judge reports with correct round tracking and metadata linkage, preservation of unchanged blueprints when judge passes initially, and enforcement of max repair rounds. Storage tests validate atomic upsert of blueprints and generation attempts. Tests use fake sequenced providers and verify attempt role/status recording and provider invocation details including temperature and prompt contents.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • txmed82/case-crawler#381: The BlueprintRepairLoop depends on and consumes BlueprintJudge.evaluate results to drive the iterative fail→repair cycle based on judge reports.
  • txmed82/case-crawler#372: The new DatasetStore.save_blueprint_with_attempt() persistence method extends the broader SQLite blueprint artifact storage infrastructure for blueprints, attempts, and judge reports.
  • txmed82/case-crawler#376: The batch BlueprintPipeline relies on the atomic save_blueprint_with_attempt() method added here for persisting generated blueprints and their generation attempts.

Poem

🐰 A loop that judges, then repairs with care,
LLM whispers prompt hashes in the air,
Blueprints canonicalized, attempts take flight,
Atomic transactions lock them down tight! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Add blueprint repair loop' directly and clearly describes the main addition in this changeset—a new BlueprintRepairLoop class with associated result model, storage helper, and comprehensive tests.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch codex/blueprint-repair-loop

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/casecrawler/generation/blueprint_repair.py`:
- Around line 251-278: _repair_requested_attempt currently computes prompt_hash
from a local payload using _hash_payload, which diverges from the terminal
attempts that use _prompt_hash(prompt, policy); change the flow so the repair
prompt is hashed once (via _prompt_hash with the actual repair prompt and
policy) before creating/persisting the REPAIR_REQUESTED GenerationAttempt and
pass that same prompt_hash into _repair_requested_attempt (or add a prompt_hash
parameter) so both the REPAIR_REQUESTED and subsequent SUCCEEDED/FAILED attempts
use the identical prompt_hash value.

In `@src/casecrawler/storage/dataset_store.py`:
- Around line 297-333: Before performing the DB writes in
save_blueprint_with_attempt, validate that the provided GenerationAttempt
actually belongs to the ClinicalBlueprint: check attempt.artifact_id ==
blueprint.blueprint_id and attempt.dataset_id == blueprint.dataset_id (or any
other domain-specific linkage between attempt and blueprint), and raise a clear
exception (e.g., ValueError) if they do not match; perform this validation
before acquiring the write lock / before executing the INSERTs so mismatched
pairs are rejected prior to the transaction.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 1d6f782f-9c58-4fa0-a868-40b1fc244125

📥 Commits

Reviewing files that changed from the base of the PR and between a9fa1fd and 202b946.

📒 Files selected for processing (4)
  • src/casecrawler/generation/blueprint_repair.py
  • src/casecrawler/storage/dataset_store.py
  • tests/test_blueprint_repair_loop.py
  • tests/test_blueprint_storage.py

Comment on lines +251 to +278
def _repair_requested_attempt(
self,
*,
blueprint: ClinicalBlueprint,
policy: GenerationRolePolicy,
judge_report: JudgeReport,
repair_round: int,
) -> GenerationAttempt:
payload = {
"artifact_id": blueprint.blueprint_id,
"judge_report_id": judge_report.report_id,
"repair_round": repair_round,
"status": GenerationAttemptStatus.REPAIR_REQUESTED.value,
}
return GenerationAttempt(
attempt_id=f"attempt-{uuid4()}",
dataset_id=blueprint.dataset_id,
role=GenerationRole.REPAIR,
status=GenerationAttemptStatus.REPAIR_REQUESTED,
provider=policy.provider,
model=policy.model,
prompt_hash=_hash_payload(payload),
artifact_id=blueprint.blueprint_id,
metadata={
"judge_report_id": judge_report.report_id,
"repair_round": repair_round,
},
)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Use the real repair prompt hash for REPAIR_REQUESTED.

_repair_requested_attempt() hashes a local payload instead of the actual repair prompt, while the later SUCCEEDED/FAILED attempt for the same round uses _prompt_hash(prompt, policy). That makes the preflight record impossible to correlate with its terminal record via prompt_hash, which weakens the attempt lineage this PR is adding. Compute the prompt/hash once before persisting the request row, then pass that same hash through both attempt builders.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/casecrawler/generation/blueprint_repair.py` around lines 251 - 278,
_repair_requested_attempt currently computes prompt_hash from a local payload
using _hash_payload, which diverges from the terminal attempts that use
_prompt_hash(prompt, policy); change the flow so the repair prompt is hashed
once (via _prompt_hash with the actual repair prompt and policy) before
creating/persisting the REPAIR_REQUESTED GenerationAttempt and pass that same
prompt_hash into _repair_requested_attempt (or add a prompt_hash parameter) so
both the REPAIR_REQUESTED and subsequent SUCCEEDED/FAILED attempts use the
identical prompt_hash value.

Comment on lines +297 to +333
def save_blueprint_with_attempt(
self,
blueprint: ClinicalBlueprint,
attempt: GenerationAttempt,
) -> None:
with self._write_lock:
try:
self._conn.execute(
"""INSERT OR REPLACE INTO clinical_blueprints
(blueprint_id, dataset_id, cohort_plan_id, archetype_name,
blueprint_json)
VALUES (?, ?, ?, ?, ?)""",
(
blueprint.blueprint_id,
blueprint.dataset_id,
blueprint.cohort_plan_id,
blueprint.archetype_name,
blueprint.model_dump_json(),
),
)
self._conn.execute(
"""INSERT OR REPLACE INTO generation_attempts
(attempt_id, dataset_id, role, status, artifact_id, attempt_json)
VALUES (?, ?, ?, ?, ?, ?)""",
(
attempt.attempt_id,
attempt.dataset_id,
attempt.role.value,
attempt.status.value,
attempt.artifact_id,
attempt.model_dump_json(),
),
)
self._conn.commit()
except Exception:
self._conn.rollback()
raise

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Reject mismatched blueprint/attempt pairs before the transaction.

This method never validates that the attempt belongs to the blueprint being saved. A caller can currently commit a ClinicalBlueprint for one artifact and a GenerationAttempt for another, which silently corrupts repair lineage despite the write being "atomic".

Suggested fix
 def save_blueprint_with_attempt(
     self,
     blueprint: ClinicalBlueprint,
     attempt: GenerationAttempt,
 ) -> None:
+    if attempt.dataset_id != blueprint.dataset_id:
+        raise ValueError("Attempt dataset_id must match blueprint dataset_id.")
+    if attempt.artifact_id != blueprint.blueprint_id:
+        raise ValueError("Attempt artifact_id must match blueprint blueprint_id.")
     with self._write_lock:
         try:
             self._conn.execute(
                 """INSERT OR REPLACE INTO clinical_blueprints
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def save_blueprint_with_attempt(
self,
blueprint: ClinicalBlueprint,
attempt: GenerationAttempt,
) -> None:
with self._write_lock:
try:
self._conn.execute(
"""INSERT OR REPLACE INTO clinical_blueprints
(blueprint_id, dataset_id, cohort_plan_id, archetype_name,
blueprint_json)
VALUES (?, ?, ?, ?, ?)""",
(
blueprint.blueprint_id,
blueprint.dataset_id,
blueprint.cohort_plan_id,
blueprint.archetype_name,
blueprint.model_dump_json(),
),
)
self._conn.execute(
"""INSERT OR REPLACE INTO generation_attempts
(attempt_id, dataset_id, role, status, artifact_id, attempt_json)
VALUES (?, ?, ?, ?, ?, ?)""",
(
attempt.attempt_id,
attempt.dataset_id,
attempt.role.value,
attempt.status.value,
attempt.artifact_id,
attempt.model_dump_json(),
),
)
self._conn.commit()
except Exception:
self._conn.rollback()
raise
def save_blueprint_with_attempt(
self,
blueprint: ClinicalBlueprint,
attempt: GenerationAttempt,
) -> None:
if attempt.dataset_id != blueprint.dataset_id:
raise ValueError("Attempt dataset_id must match blueprint dataset_id.")
if attempt.artifact_id != blueprint.blueprint_id:
raise ValueError("Attempt artifact_id must match blueprint blueprint_id.")
with self._write_lock:
try:
self._conn.execute(
"""INSERT OR REPLACE INTO clinical_blueprints
(blueprint_id, dataset_id, cohort_plan_id, archetype_name,
blueprint_json)
VALUES (?, ?, ?, ?, ?)""",
(
blueprint.blueprint_id,
blueprint.dataset_id,
blueprint.cohort_plan_id,
blueprint.archetype_name,
blueprint.model_dump_json(),
),
)
self._conn.execute(
"""INSERT OR REPLACE INTO generation_attempts
(attempt_id, dataset_id, role, status, artifact_id, attempt_json)
VALUES (?, ?, ?, ?, ?, ?)""",
(
attempt.attempt_id,
attempt.dataset_id,
attempt.role.value,
attempt.status.value,
attempt.artifact_id,
attempt.model_dump_json(),
),
)
self._conn.commit()
except Exception:
self._conn.rollback()
raise
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/casecrawler/storage/dataset_store.py` around lines 297 - 333, Before
performing the DB writes in save_blueprint_with_attempt, validate that the
provided GenerationAttempt actually belongs to the ClinicalBlueprint: check
attempt.artifact_id == blueprint.blueprint_id and attempt.dataset_id ==
blueprint.dataset_id (or any other domain-specific linkage between attempt and
blueprint), and raise a clear exception (e.g., ValueError) if they do not match;
perform this validation before acquiring the write lock / before executing the
INSERTs so mismatched pairs are rejected prior to the transaction.

@txmed82 txmed82 merged commit bcdabe2 into master Jun 4, 2026
6 of 7 checks passed
@txmed82 txmed82 deleted the codex/blueprint-repair-loop branch June 4, 2026 01:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant