Add blueprint artifact export#379
Conversation
📝 WalkthroughWalkthroughThis PR adds a new ChangesBlueprint Export Feature
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
src/casecrawler/export/blueprints.py (1)
16-21: ⚡ Quick winUse
model_dump(mode="json")for JSONL export payloads
export_blueprints_jsonl()writespayloadviajson.dumps(...), whileClinicalBlueprint/CohortPlaninclude severaldict[str, Any]fields (patient,metadata, etc.). Dumping in JSON mode prevents JSONL serialization failures if thoseAnyvalues (or future schema fields) contain non-JSON-native objects.Proposed fix
def export_blueprint_payload( blueprint: ClinicalBlueprint, *, plan: CohortPlan | None = None, ) -> dict[str, Any]: payload = { "artifact_type": "casecrawler_clinical_blueprint", - "blueprint": blueprint.model_dump(), + "blueprint": blueprint.model_dump(mode="json"), } if plan is not None: - payload["cohort_plan"] = plan.model_dump() + payload["cohort_plan"] = plan.model_dump(mode="json") return payload🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/casecrawler/export/blueprints.py` around lines 16 - 21, The payload uses blueprint.model_dump() and plan.model_dump() which can emit non-JSON-native types; in export_blueprints_jsonl() change these to blueprint.model_dump(mode="json") and plan.model_dump(mode="json") so the ClinicalBlueprint and CohortPlan nested dicts (e.g., patient, metadata) are serialized into JSON-safe primitives before json.dumps writes the JSONL payload; update the payload assignment where payload = {"artifact_type": "casecrawler_clinical_blueprint", "blueprint": ...} and the conditional payload["cohort_plan"] = ... accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/casecrawler/cli.py`:
- Around line 2076-2080: The current call to DatasetStore.list_blueprints in
src/casecrawler/cli.py assigns a one-million hard limit to blueprints, which
silently truncates exports; change the export to paginate instead of relying on
a fixed large limit: either update DatasetStore.list_blueprints to provide an
iterator/generator (e.g., implement a list_blueprints_iter method that yields
pages) or loop calling list_blueprints with explicit offset/limit until no more
rows, and replace the single call that sets blueprints =
store.list_blueprints(...) with a paging loop that collects/yields all rows for
export. Ensure you reference and update the call site where blueprints is used
so the export consumes the paginated iterator or accumulated full result set.
- Around line 2087-2092: The call to export_blueprints_jsonl in the CLI can
raise raw OSError on unwritable output paths; wrap the call in a try/except that
catches OSError (and optionally IOError) around the call to
export_blueprints_jsonl(blueprints, output, plan_lookup=store.get_cohort_plan)
and re-raise as click.ClickException with a clear message including the output
path and the original error text (e.g., f"Failed to write export to {output}:
{err}"), leaving successful behavior (click.echo of count) unchanged.
---
Nitpick comments:
In `@src/casecrawler/export/blueprints.py`:
- Around line 16-21: The payload uses blueprint.model_dump() and
plan.model_dump() which can emit non-JSON-native types; in
export_blueprints_jsonl() change these to blueprint.model_dump(mode="json") and
plan.model_dump(mode="json") so the ClinicalBlueprint and CohortPlan nested
dicts (e.g., patient, metadata) are serialized into JSON-safe primitives before
json.dumps writes the JSONL payload; update the payload assignment where payload
= {"artifact_type": "casecrawler_clinical_blueprint", "blueprint": ...} and the
conditional payload["cohort_plan"] = ... accordingly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: fde7eaad-82a1-4cad-8181-4cbba855030e
📒 Files selected for processing (4)
src/casecrawler/cli.pysrc/casecrawler/export/blueprints.pytests/test_blueprint_export.pytests/test_cli_synthetic.py
| blueprints = store.list_blueprints( | ||
| dataset_id=dataset_id, | ||
| cohort_plan_id=cohort_plan_id, | ||
| limit=1_000_000, | ||
| ) |
There was a problem hiding this comment.
Avoid silently truncating exports at one million rows.
DatasetStore.list_blueprints() applies this limit directly in SQL, so this command will export only the first 1,000,000 matches and still report success. For an export path, that becomes silent data loss. Please page through results or add an iterator-based store API so large blueprint sets are exported completely.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/casecrawler/cli.py` around lines 2076 - 2080, The current call to
DatasetStore.list_blueprints in src/casecrawler/cli.py assigns a one-million
hard limit to blueprints, which silently truncates exports; change the export to
paginate instead of relying on a fixed large limit: either update
DatasetStore.list_blueprints to provide an iterator/generator (e.g., implement a
list_blueprints_iter method that yields pages) or loop calling list_blueprints
with explicit offset/limit until no more rows, and replace the single call that
sets blueprints = store.list_blueprints(...) with a paging loop that
collects/yields all rows for export. Ensure you reference and update the call
site where blueprints is used so the export consumes the paginated iterator or
accumulated full result set.
| count = export_blueprints_jsonl( | ||
| blueprints, | ||
| output, | ||
| plan_lookup=store.get_cohort_plan, | ||
| ) | ||
| click.echo(f"Exported {count} blueprint artifact(s) to {output}") |
There was a problem hiding this comment.
Wrap export write failures in ClickException.
If the output path is unwritable, this command currently bubbles the raw OSError instead of returning a normal CLI error message.
Proposed fix
- count = export_blueprints_jsonl(
- blueprints,
- output,
- plan_lookup=store.get_cohort_plan,
- )
+ try:
+ count = export_blueprints_jsonl(
+ blueprints,
+ output,
+ plan_lookup=store.get_cohort_plan,
+ )
+ except OSError as exc:
+ raise click.ClickException(
+ f"Failed to write blueprint export to {output}: {exc}"
+ ) from exc
click.echo(f"Exported {count} blueprint artifact(s) to {output}")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| count = export_blueprints_jsonl( | |
| blueprints, | |
| output, | |
| plan_lookup=store.get_cohort_plan, | |
| ) | |
| click.echo(f"Exported {count} blueprint artifact(s) to {output}") | |
| try: | |
| count = export_blueprints_jsonl( | |
| blueprints, | |
| output, | |
| plan_lookup=store.get_cohort_plan, | |
| ) | |
| except OSError as exc: | |
| raise click.ClickException( | |
| f"Failed to write blueprint export to {output}: {exc}" | |
| ) from exc | |
| click.echo(f"Exported {count} blueprint artifact(s) to {output}") |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/casecrawler/cli.py` around lines 2087 - 2092, The call to
export_blueprints_jsonl in the CLI can raise raw OSError on unwritable output
paths; wrap the call in a try/except that catches OSError (and optionally
IOError) around the call to export_blueprints_jsonl(blueprints, output,
plan_lookup=store.get_cohort_plan) and re-raise as click.ClickException with a
clear message including the output path and the original error text (e.g.,
f"Failed to write export to {output}: {err}"), leaving successful behavior
(click.echo of count) unchanged.
Summary
casecrawler export-blueprintswith dataset and cohort-plan filtersTests
Summary by CodeRabbit
New Features
export-blueprintsCLI command to export clinical blueprint artifacts to JSONL format, with optional filtering by dataset and cohort plan.Tests