Add blueprint artifact export by txmed82 · Pull Request #379 · txmed82/case-crawler

txmed82 · 2026-06-04T01:09:07Z

Summary

Add JSONL export helpers for persisted ClinicalBlueprint artifacts with optional CohortPlan context
Add casecrawler export-blueprints with dataset and cohort-plan filters
Add helper and CLI tests for blueprint artifact exports

Tests

.venv/bin/python -m pytest tests/test_blueprint_export.py tests/test_cli_synthetic.py::test_export_blueprints_command_writes_jsonl -q
.venv/bin/python -m ruff check src/casecrawler/export/blueprints.py src/casecrawler/cli.py tests/test_blueprint_export.py tests/test_cli_synthetic.py
.venv/bin/python -m pytest tests/test_blueprint_export.py tests/test_cli_synthetic.py tests/test_fine_tuning_export.py -q
.venv/bin/python -m pytest -q -m "not optional_backend and not network and not slow"

Summary by CodeRabbit

New Features
- Added export-blueprints CLI command to export clinical blueprint artifacts to JSONL format, with optional filtering by dataset and cohort plan.
- Validates blueprint existence before export and reports total count of exported artifacts.
Tests
- Added comprehensive tests for blueprint export functionality and CLI command integration.

coderabbitai · 2026-06-04T01:09:18Z

📝 Walkthrough

Walkthrough

This PR adds a new export-blueprints CLI command that exports persisted clinical blueprint artifacts to JSONL format. The implementation includes core serialization functions, a CLI entry point with DatasetStore integration, and comprehensive unit and integration tests validating the export pipeline.

Changes

Blueprint Export Feature

Layer / File(s)	Summary
Blueprint export serialization and unit tests `src/casecrawler/export/blueprints.py`, `tests/test_blueprint_export.py`	`export_blueprint_payload` constructs JSON payloads from `ClinicalBlueprint` with optional `CohortPlan` context, and `export_blueprints_jsonl` writes blueprints to NDJSON files with optional plan lookup. Unit tests validate payload inclusion of `artifact_type`, blueprint ID, and cohort plan fields, plus JSONL line count and JSON parsing.
CLI export-blueprints command and integration test `src/casecrawler/cli.py`, `tests/test_cli_synthetic.py`	`export-blueprints` command loads blueprints from `DatasetStore` with optional `--dataset-id` and `--cohort-plan-id` filters, validates blueprints exist, calls `export_blueprints_jsonl`, and prints exported count. Integration test seeds the store with a plan and blueprint, runs the command, and asserts JSONL output structure and payload fields.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related PRs

txmed82/case-crawler#371: The PR's serialization logic directly depends on the ClinicalBlueprint and CohortPlan data structures introduced in this PR.
txmed82/case-crawler#372: The export-blueprints CLI command depends on new DatasetStore persistence and query methods (e.g., get_cohort_plan, blueprint listing) introduced in this PR.

Poem

🐰 Blueprint bundles bundled neat,
JSONL lines, a data treat,
From store to export, the rabbit's feat,
Cohort plans and blueprints sweet! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 27.27% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'Add blueprint artifact export' directly and clearly describes the main change: adding export functionality for blueprint artifacts with JSONL serialization and a new CLI command.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch codex/blueprint-export

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (1)

src/casecrawler/export/blueprints.py (1)

16-21: ⚡ Quick win

Use model_dump(mode="json") for JSONL export payloads

export_blueprints_jsonl() writes payload via json.dumps(...), while ClinicalBlueprint/CohortPlan include several dict[str, Any] fields (patient, metadata, etc.). Dumping in JSON mode prevents JSONL serialization failures if those Any values (or future schema fields) contain non-JSON-native objects.

Proposed fix

 def export_blueprint_payload(
     blueprint: ClinicalBlueprint,
     *,
     plan: CohortPlan | None = None,
 ) -> dict[str, Any]:
     payload = {
         "artifact_type": "casecrawler_clinical_blueprint",
-        "blueprint": blueprint.model_dump(),
+        "blueprint": blueprint.model_dump(mode="json"),
     }
     if plan is not None:
-        payload["cohort_plan"] = plan.model_dump()
+        payload["cohort_plan"] = plan.model_dump(mode="json")
     return payload

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/casecrawler/export/blueprints.py` around lines 16 - 21, The payload uses
blueprint.model_dump() and plan.model_dump() which can emit non-JSON-native
types; in export_blueprints_jsonl() change these to
blueprint.model_dump(mode="json") and plan.model_dump(mode="json") so the
ClinicalBlueprint and CohortPlan nested dicts (e.g., patient, metadata) are
serialized into JSON-safe primitives before json.dumps writes the JSONL payload;
update the payload assignment where payload = {"artifact_type":
"casecrawler_clinical_blueprint", "blueprint": ...} and the conditional
payload["cohort_plan"] = ... accordingly.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/casecrawler/cli.py`:
- Around line 2076-2080: The current call to DatasetStore.list_blueprints in
src/casecrawler/cli.py assigns a one-million hard limit to blueprints, which
silently truncates exports; change the export to paginate instead of relying on
a fixed large limit: either update DatasetStore.list_blueprints to provide an
iterator/generator (e.g., implement a list_blueprints_iter method that yields
pages) or loop calling list_blueprints with explicit offset/limit until no more
rows, and replace the single call that sets blueprints =
store.list_blueprints(...) with a paging loop that collects/yields all rows for
export. Ensure you reference and update the call site where blueprints is used
so the export consumes the paginated iterator or accumulated full result set.
- Around line 2087-2092: The call to export_blueprints_jsonl in the CLI can
raise raw OSError on unwritable output paths; wrap the call in a try/except that
catches OSError (and optionally IOError) around the call to
export_blueprints_jsonl(blueprints, output, plan_lookup=store.get_cohort_plan)
and re-raise as click.ClickException with a clear message including the output
path and the original error text (e.g., f"Failed to write export to {output}:
{err}"), leaving successful behavior (click.echo of count) unchanged.

---

Nitpick comments:
In `@src/casecrawler/export/blueprints.py`:
- Around line 16-21: The payload uses blueprint.model_dump() and
plan.model_dump() which can emit non-JSON-native types; in
export_blueprints_jsonl() change these to blueprint.model_dump(mode="json") and
plan.model_dump(mode="json") so the ClinicalBlueprint and CohortPlan nested
dicts (e.g., patient, metadata) are serialized into JSON-safe primitives before
json.dumps writes the JSONL payload; update the payload assignment where payload
= {"artifact_type": "casecrawler_clinical_blueprint", "blueprint": ...} and the
conditional payload["cohort_plan"] = ... accordingly.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: fde7eaad-82a1-4cad-8181-4cbba855030e

📥 Commits

Reviewing files that changed from the base of the PR and between d16d9ef and 85cc423.

📒 Files selected for processing (4)

src/casecrawler/cli.py
src/casecrawler/export/blueprints.py
tests/test_blueprint_export.py
tests/test_cli_synthetic.py

coderabbitai · 2026-06-04T01:13:13Z

+    blueprints = store.list_blueprints(
+        dataset_id=dataset_id,
+        cohort_plan_id=cohort_plan_id,
+        limit=1_000_000,
+    )


⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Avoid silently truncating exports at one million rows.

DatasetStore.list_blueprints() applies this limit directly in SQL, so this command will export only the first 1,000,000 matches and still report success. For an export path, that becomes silent data loss. Please page through results or add an iterator-based store API so large blueprint sets are exported completely.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/casecrawler/cli.py` around lines 2076 - 2080, The current call to DatasetStore.list_blueprints in src/casecrawler/cli.py assigns a one-million hard limit to blueprints, which silently truncates exports; change the export to paginate instead of relying on a fixed large limit: either update DatasetStore.list_blueprints to provide an iterator/generator (e.g., implement a list_blueprints_iter method that yields pages) or loop calling list_blueprints with explicit offset/limit until no more rows, and replace the single call that sets blueprints = store.list_blueprints(...) with a paging loop that collects/yields all rows for export. Ensure you reference and update the call site where blueprints is used so the export consumes the paginated iterator or accumulated full result set.

coderabbitai · 2026-06-04T01:13:13Z

+    count = export_blueprints_jsonl(
+        blueprints,
+        output,
+        plan_lookup=store.get_cohort_plan,
+    )
+    click.echo(f"Exported {count} blueprint artifact(s) to {output}")


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Wrap export write failures in ClickException.

If the output path is unwritable, this command currently bubbles the raw OSError instead of returning a normal CLI error message.

Proposed fix

- count = export_blueprints_jsonl( - blueprints, - output, - plan_lookup=store.get_cohort_plan, - ) + try: + count = export_blueprints_jsonl( + blueprints, + output, + plan_lookup=store.get_cohort_plan, + ) + except OSError as exc: + raise click.ClickException( + f"Failed to write blueprint export to {output}: {exc}" + ) from exc click.echo(f"Exported {count} blueprint artifact(s) to {output}")

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

count = export_blueprints_jsonl(

blueprints,

output,

plan_lookup=store.get_cohort_plan,

)

click.echo(f"Exported {count} blueprint artifact(s) to {output}")

try:

count = export_blueprints_jsonl(

blueprints,

output,

plan_lookup=store.get_cohort_plan,

)

except OSError as exc:

raise click.ClickException(

f"Failed to write blueprint export to {output}: {exc}"

) from exc

click.echo(f"Exported {count} blueprint artifact(s) to {output}")

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/casecrawler/cli.py` around lines 2087 - 2092, The call to export_blueprints_jsonl in the CLI can raise raw OSError on unwritable output paths; wrap the call in a try/except that catches OSError (and optionally IOError) around the call to export_blueprints_jsonl(blueprints, output, plan_lookup=store.get_cohort_plan) and re-raise as click.ClickException with a clear message including the output path and the original error text (e.g., f"Failed to write export to {output}: {err}"), leaving successful behavior (click.echo of count) unchanged.

Add blueprint artifact export

85cc423

coderabbitai Bot reviewed Jun 4, 2026

View reviewed changes

txmed82 merged commit 7eb4d95 into master Jun 4, 2026
4 checks passed

txmed82 deleted the codex/blueprint-export branch June 4, 2026 01:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add blueprint artifact export#379

Add blueprint artifact export#379
txmed82 merged 1 commit into
masterfrom
codex/blueprint-export

txmed82 commented Jun 4, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 4, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Jun 4, 2026

Uh oh!

coderabbitai Bot Jun 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

txmed82 commented Jun 4, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Tests

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

txmed82 commented Jun 4, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 4, 2026 •

edited

Loading