Add blueprint generation API endpoint by txmed82 · Pull Request #377 · txmed82/case-crawler

txmed82 · 2026-06-04T00:34:15Z

Summary

Add POST /api/datasets/blueprints/generate for model-driven BlueprintGenerationRequest payloads
Wire the endpoint to BlueprintPipeline with generated dataset ids, persistence, max-count guard, and returned plan/blueprints
Add API tests for request forwarding and count validation without live LLM calls

Tests

.venv/bin/python -m pytest tests/test_api_datasets.py::test_generate_blueprint_dataset_api_uses_model_driven_request tests/test_api_datasets.py::test_generate_blueprint_dataset_api_rejects_unbounded_counts -q
.venv/bin/python -m ruff check src/casecrawler/api/routes/datasets.py tests/test_api_datasets.py
.venv/bin/python -m pytest tests/test_api_datasets.py tests/test_blueprint_pipeline.py -q
.venv/bin/python -m pytest -q -m "not optional_backend and not network and not slow"

Summary by CodeRabbit

New Features
- Added API endpoint to generate blueprint-backed datasets with automatic dataset identification and configuration support.
Tests
- Added test coverage for the new dataset generation endpoint, validating request handling and configuration constraints.

coderabbitai · 2026-06-04T00:34:24Z

📝 Walkthrough

Walkthrough

This PR adds a new FastAPI endpoint POST /datasets/blueprints/generate that enables blueprint-backed dataset generation. The route validates request bounds against configured limits, generates a unique dataset ID, invokes BlueprintPipeline, handles errors as HTTP 422, truncates results, and returns dataset/plan/blueprint identifiers. Tests validate endpoint behavior and error handling.

Changes

Blueprint Dataset Generation API

Layer / File(s)	Summary
Route handler and imports `src/casecrawler/api/routes/datasets.py`	Imports `uuid4` and `BlueprintPipeline`, implements `POST /datasets/blueprints/generate` to validate `target_count` bounds, create a unique dataset ID, invoke the blueprint pipeline with a shared `DatasetStore`, map `ValueError` to HTTP 422, truncate blueprints to `max_api_returned_records`, and return dataset/plan/blueprint identifiers.
Test helper and endpoint validation `tests/test_api_datasets.py`	Imports blueprint model types, adds `_blueprint_api_result()` helper to construct `BlueprintPipelineResult` with populated plans and blueprints, then tests that the endpoint processes model-driven requests correctly, passes `DatasetStore` to the pipeline, and rejects unbounded `target_count` values with HTTP 422.

Sequence Diagram

sequenceDiagram
  participant Client
  participant FastAPI as POST /datasets/blueprints/generate
  participant BlueprintPipeline
  participant DatasetStore
  Client->>FastAPI: BlueprintGenerationRequest (target_count)
  FastAPI->>FastAPI: Validate target_count ≤ max_api_generation_count
  FastAPI->>FastAPI: Generate blueprint-ds-{uuid}
  FastAPI->>BlueprintPipeline: generate(request, DatasetStore)
  BlueprintPipeline->>DatasetStore: Persist results
  BlueprintPipeline-->>FastAPI: BlueprintPipelineResult
  FastAPI->>FastAPI: Truncate blueprints to max_api_returned_records
  FastAPI-->>Client: { dataset_id, plan, blueprints }

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related PRs

txmed82/case-crawler#373: The new POST /datasets/blueprints/generate endpoint directly depends on the BlueprintGenerationRequest contract introduced in this PR, sharing the same request payload model and validation logic.
txmed82/case-crawler#376: The new endpoint directly invokes the BlueprintPipeline.generate() API and consumes BlueprintPipelineResult introduced in this PR.
txmed82/case-crawler#371: The test helper and endpoint responses depend on blueprint planning model types (CohortPlan, ClinicalBlueprint) introduced in this PR.

Poem

🐰 A blueprint springs to life so bright,
The datasets dance with data's might,
UUID's bloom in garden rows,
And pipelines hum where knowledge flows,
New endpoints hop—the API grows! 🌱

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the main change: adding a new blueprint generation API endpoint, which is the primary focus of both the code and test changes.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch codex/blueprint-api

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/casecrawler/api/routes/datasets.py`:
- Around line 213-225: The response currently returns result.dataset_id which
may diverge from the route-generated dataset_id; update the handler so that
after calling BlueprintPipeline().generate(...) you always use the locally
created dataset_id variable in the response (e.g., replace uses of
result.dataset_id with dataset_id) while keeping the pipeline call and error
handling intact—ensure BlueprintPipeline.generate(...) and any references to
returned_blueprints remain unchanged except for switching the response's
dataset_id to the route-generated dataset_id.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 4d9f7510-9f4c-44f2-ace6-8c4f2d93b30f

📥 Commits

Reviewing files that changed from the base of the PR and between 09ef5de and 5645192.

📒 Files selected for processing (2)

src/casecrawler/api/routes/datasets.py
tests/test_api_datasets.py

coderabbitai · 2026-06-04T00:38:07Z

+    dataset_id = f"blueprint-ds-{uuid4()}"
+    store = DatasetStore.shared()
+    try:
+        result = await BlueprintPipeline().generate(
+            req,
+            dataset_id=dataset_id,
+            store=store,
+        )
+    except ValueError as err:
+        raise HTTPException(status_code=422, detail=str(err)) from err
+    returned_blueprints = result.blueprints[:max_returned]
+    return {
+        "dataset_id": result.dataset_id,


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Return the route-generated dataset ID, not the pipeline-reported one.

This handler creates dataset_id on Line 213 and uses that ID for pipeline/store side effects, but the response on Line 225 trusts result.dataset_id. If those ever diverge, clients get an ID that does not match the persisted dataset and follow-up reads will break.

Suggested fix

dataset_id = f"blueprint-ds-{uuid4()}" store = DatasetStore.shared() try: result = await BlueprintPipeline().generate( req, dataset_id=dataset_id, store=store, ) except ValueError as err: raise HTTPException(status_code=422, detail=str(err)) from err + if result.dataset_id != dataset_id: + raise HTTPException( + status_code=500, + detail="blueprint pipeline returned a mismatched dataset_id", + ) returned_blueprints = result.blueprints[:max_returned] return { - "dataset_id": result.dataset_id, + "dataset_id": dataset_id, "generated": result.generated_count, "total_blueprints": len(result.blueprints), "plan": result.plan.model_dump(), "blueprints": [blueprint.model_dump() for blueprint in returned_blueprints], }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

dataset_id = f"blueprint-ds-{uuid4()}"

store = DatasetStore.shared()

try:

result = await BlueprintPipeline().generate(

req,

dataset_id=dataset_id,

store=store,

)

except ValueError as err:

raise HTTPException(status_code=422, detail=str(err)) from err

returned_blueprints = result.blueprints[:max_returned]

return {

"dataset_id": result.dataset_id,

dataset_id = f"blueprint-ds-{uuid4()}"

store = DatasetStore.shared()

try:

result = await BlueprintPipeline().generate(

req,

dataset_id=dataset_id,

store=store,

)

except ValueError as err:

raise HTTPException(status_code=422, detail=str(err)) from err

if result.dataset_id != dataset_id:

raise HTTPException(

status_code=500,

detail="blueprint pipeline returned a mismatched dataset_id",

)

returned_blueprints = result.blueprints[:max_returned]

return {

"dataset_id": dataset_id,

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/casecrawler/api/routes/datasets.py` around lines 213 - 225, The response currently returns result.dataset_id which may diverge from the route-generated dataset_id; update the handler so that after calling BlueprintPipeline().generate(...) you always use the locally created dataset_id variable in the response (e.g., replace uses of result.dataset_id with dataset_id) while keeping the pipeline call and error handling intact—ensure BlueprintPipeline.generate(...) and any references to returned_blueprints remain unchanged except for switching the response's dataset_id to the route-generated dataset_id.

Add blueprint generation API

5645192

coderabbitai Bot reviewed Jun 4, 2026

View reviewed changes

txmed82 merged commit 1a4f903 into master Jun 4, 2026
4 checks passed

txmed82 deleted the codex/blueprint-api branch June 4, 2026 00:48

coderabbitai Bot mentioned this pull request Jun 4, 2026

Add blueprint generation CLI #378

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add blueprint generation API endpoint#377

Add blueprint generation API endpoint#377
txmed82 merged 1 commit into
masterfrom
codex/blueprint-api

txmed82 commented Jun 4, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 4, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Jun 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

txmed82 commented Jun 4, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Tests

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

txmed82 commented Jun 4, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 4, 2026 •

edited

Loading