Skip to content

Add blueprint generation API endpoint#377

Merged
txmed82 merged 1 commit into
masterfrom
codex/blueprint-api
Jun 4, 2026
Merged

Add blueprint generation API endpoint#377
txmed82 merged 1 commit into
masterfrom
codex/blueprint-api

Conversation

@txmed82

@txmed82 txmed82 commented Jun 4, 2026

Copy link
Copy Markdown
Owner

Summary

  • Add POST /api/datasets/blueprints/generate for model-driven BlueprintGenerationRequest payloads
  • Wire the endpoint to BlueprintPipeline with generated dataset ids, persistence, max-count guard, and returned plan/blueprints
  • Add API tests for request forwarding and count validation without live LLM calls

Tests

  • .venv/bin/python -m pytest tests/test_api_datasets.py::test_generate_blueprint_dataset_api_uses_model_driven_request tests/test_api_datasets.py::test_generate_blueprint_dataset_api_rejects_unbounded_counts -q
  • .venv/bin/python -m ruff check src/casecrawler/api/routes/datasets.py tests/test_api_datasets.py
  • .venv/bin/python -m pytest tests/test_api_datasets.py tests/test_blueprint_pipeline.py -q
  • .venv/bin/python -m pytest -q -m "not optional_backend and not network and not slow"

Summary by CodeRabbit

  • New Features

    • Added API endpoint to generate blueprint-backed datasets with automatic dataset identification and configuration support.
  • Tests

    • Added test coverage for the new dataset generation endpoint, validating request handling and configuration constraints.

@coderabbitai

coderabbitai Bot commented Jun 4, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

This PR adds a new FastAPI endpoint POST /datasets/blueprints/generate that enables blueprint-backed dataset generation. The route validates request bounds against configured limits, generates a unique dataset ID, invokes BlueprintPipeline, handles errors as HTTP 422, truncates results, and returns dataset/plan/blueprint identifiers. Tests validate endpoint behavior and error handling.

Changes

Blueprint Dataset Generation API

Layer / File(s) Summary
Route handler and imports
src/casecrawler/api/routes/datasets.py
Imports uuid4 and BlueprintPipeline, implements POST /datasets/blueprints/generate to validate target_count bounds, create a unique dataset ID, invoke the blueprint pipeline with a shared DatasetStore, map ValueError to HTTP 422, truncate blueprints to max_api_returned_records, and return dataset/plan/blueprint identifiers.
Test helper and endpoint validation
tests/test_api_datasets.py
Imports blueprint model types, adds _blueprint_api_result() helper to construct BlueprintPipelineResult with populated plans and blueprints, then tests that the endpoint processes model-driven requests correctly, passes DatasetStore to the pipeline, and rejects unbounded target_count values with HTTP 422.

Sequence Diagram

sequenceDiagram
  participant Client
  participant FastAPI as POST /datasets/blueprints/generate
  participant BlueprintPipeline
  participant DatasetStore
  Client->>FastAPI: BlueprintGenerationRequest (target_count)
  FastAPI->>FastAPI: Validate target_count ≤ max_api_generation_count
  FastAPI->>FastAPI: Generate blueprint-ds-{uuid}
  FastAPI->>BlueprintPipeline: generate(request, DatasetStore)
  BlueprintPipeline->>DatasetStore: Persist results
  BlueprintPipeline-->>FastAPI: BlueprintPipelineResult
  FastAPI->>FastAPI: Truncate blueprints to max_api_returned_records
  FastAPI-->>Client: { dataset_id, plan, blueprints }
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related PRs

  • txmed82/case-crawler#373: The new POST /datasets/blueprints/generate endpoint directly depends on the BlueprintGenerationRequest contract introduced in this PR, sharing the same request payload model and validation logic.
  • txmed82/case-crawler#376: The new endpoint directly invokes the BlueprintPipeline.generate() API and consumes BlueprintPipelineResult introduced in this PR.
  • txmed82/case-crawler#371: The test helper and endpoint responses depend on blueprint planning model types (CohortPlan, ClinicalBlueprint) introduced in this PR.

Poem

🐰 A blueprint springs to life so bright,
The datasets dance with data's might,
UUID's bloom in garden rows,
And pipelines hum where knowledge flows,
New endpoints hop—the API grows! 🌱

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: adding a new blueprint generation API endpoint, which is the primary focus of both the code and test changes.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch codex/blueprint-api

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/casecrawler/api/routes/datasets.py`:
- Around line 213-225: The response currently returns result.dataset_id which
may diverge from the route-generated dataset_id; update the handler so that
after calling BlueprintPipeline().generate(...) you always use the locally
created dataset_id variable in the response (e.g., replace uses of
result.dataset_id with dataset_id) while keeping the pipeline call and error
handling intact—ensure BlueprintPipeline.generate(...) and any references to
returned_blueprints remain unchanged except for switching the response's
dataset_id to the route-generated dataset_id.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 4d9f7510-9f4c-44f2-ace6-8c4f2d93b30f

📥 Commits

Reviewing files that changed from the base of the PR and between 09ef5de and 5645192.

📒 Files selected for processing (2)
  • src/casecrawler/api/routes/datasets.py
  • tests/test_api_datasets.py

Comment on lines +213 to +225
dataset_id = f"blueprint-ds-{uuid4()}"
store = DatasetStore.shared()
try:
result = await BlueprintPipeline().generate(
req,
dataset_id=dataset_id,
store=store,
)
except ValueError as err:
raise HTTPException(status_code=422, detail=str(err)) from err
returned_blueprints = result.blueprints[:max_returned]
return {
"dataset_id": result.dataset_id,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Return the route-generated dataset ID, not the pipeline-reported one.

This handler creates dataset_id on Line 213 and uses that ID for pipeline/store side effects, but the response on Line 225 trusts result.dataset_id. If those ever diverge, clients get an ID that does not match the persisted dataset and follow-up reads will break.

Suggested fix
     dataset_id = f"blueprint-ds-{uuid4()}"
     store = DatasetStore.shared()
     try:
         result = await BlueprintPipeline().generate(
             req,
             dataset_id=dataset_id,
             store=store,
         )
     except ValueError as err:
         raise HTTPException(status_code=422, detail=str(err)) from err
+    if result.dataset_id != dataset_id:
+        raise HTTPException(
+            status_code=500,
+            detail="blueprint pipeline returned a mismatched dataset_id",
+        )
     returned_blueprints = result.blueprints[:max_returned]
     return {
-        "dataset_id": result.dataset_id,
+        "dataset_id": dataset_id,
         "generated": result.generated_count,
         "total_blueprints": len(result.blueprints),
         "plan": result.plan.model_dump(),
         "blueprints": [blueprint.model_dump() for blueprint in returned_blueprints],
     }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
dataset_id = f"blueprint-ds-{uuid4()}"
store = DatasetStore.shared()
try:
result = await BlueprintPipeline().generate(
req,
dataset_id=dataset_id,
store=store,
)
except ValueError as err:
raise HTTPException(status_code=422, detail=str(err)) from err
returned_blueprints = result.blueprints[:max_returned]
return {
"dataset_id": result.dataset_id,
dataset_id = f"blueprint-ds-{uuid4()}"
store = DatasetStore.shared()
try:
result = await BlueprintPipeline().generate(
req,
dataset_id=dataset_id,
store=store,
)
except ValueError as err:
raise HTTPException(status_code=422, detail=str(err)) from err
if result.dataset_id != dataset_id:
raise HTTPException(
status_code=500,
detail="blueprint pipeline returned a mismatched dataset_id",
)
returned_blueprints = result.blueprints[:max_returned]
return {
"dataset_id": dataset_id,
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/casecrawler/api/routes/datasets.py` around lines 213 - 225, The response
currently returns result.dataset_id which may diverge from the route-generated
dataset_id; update the handler so that after calling
BlueprintPipeline().generate(...) you always use the locally created dataset_id
variable in the response (e.g., replace uses of result.dataset_id with
dataset_id) while keeping the pipeline call and error handling intact—ensure
BlueprintPipeline.generate(...) and any references to returned_blueprints remain
unchanged except for switching the response's dataset_id to the route-generated
dataset_id.

@txmed82 txmed82 merged commit 1a4f903 into master Jun 4, 2026
4 checks passed
@txmed82 txmed82 deleted the codex/blueprint-api branch June 4, 2026 00:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant