Add blueprint generation API endpoint#377
Conversation
📝 WalkthroughWalkthroughThis PR adds a new FastAPI endpoint ChangesBlueprint Dataset Generation API
Sequence DiagramsequenceDiagram
participant Client
participant FastAPI as POST /datasets/blueprints/generate
participant BlueprintPipeline
participant DatasetStore
Client->>FastAPI: BlueprintGenerationRequest (target_count)
FastAPI->>FastAPI: Validate target_count ≤ max_api_generation_count
FastAPI->>FastAPI: Generate blueprint-ds-{uuid}
FastAPI->>BlueprintPipeline: generate(request, DatasetStore)
BlueprintPipeline->>DatasetStore: Persist results
BlueprintPipeline-->>FastAPI: BlueprintPipelineResult
FastAPI->>FastAPI: Truncate blueprints to max_api_returned_records
FastAPI-->>Client: { dataset_id, plan, blueprints }
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/casecrawler/api/routes/datasets.py`:
- Around line 213-225: The response currently returns result.dataset_id which
may diverge from the route-generated dataset_id; update the handler so that
after calling BlueprintPipeline().generate(...) you always use the locally
created dataset_id variable in the response (e.g., replace uses of
result.dataset_id with dataset_id) while keeping the pipeline call and error
handling intact—ensure BlueprintPipeline.generate(...) and any references to
returned_blueprints remain unchanged except for switching the response's
dataset_id to the route-generated dataset_id.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 4d9f7510-9f4c-44f2-ace6-8c4f2d93b30f
📒 Files selected for processing (2)
src/casecrawler/api/routes/datasets.pytests/test_api_datasets.py
| dataset_id = f"blueprint-ds-{uuid4()}" | ||
| store = DatasetStore.shared() | ||
| try: | ||
| result = await BlueprintPipeline().generate( | ||
| req, | ||
| dataset_id=dataset_id, | ||
| store=store, | ||
| ) | ||
| except ValueError as err: | ||
| raise HTTPException(status_code=422, detail=str(err)) from err | ||
| returned_blueprints = result.blueprints[:max_returned] | ||
| return { | ||
| "dataset_id": result.dataset_id, |
There was a problem hiding this comment.
Return the route-generated dataset ID, not the pipeline-reported one.
This handler creates dataset_id on Line 213 and uses that ID for pipeline/store side effects, but the response on Line 225 trusts result.dataset_id. If those ever diverge, clients get an ID that does not match the persisted dataset and follow-up reads will break.
Suggested fix
dataset_id = f"blueprint-ds-{uuid4()}"
store = DatasetStore.shared()
try:
result = await BlueprintPipeline().generate(
req,
dataset_id=dataset_id,
store=store,
)
except ValueError as err:
raise HTTPException(status_code=422, detail=str(err)) from err
+ if result.dataset_id != dataset_id:
+ raise HTTPException(
+ status_code=500,
+ detail="blueprint pipeline returned a mismatched dataset_id",
+ )
returned_blueprints = result.blueprints[:max_returned]
return {
- "dataset_id": result.dataset_id,
+ "dataset_id": dataset_id,
"generated": result.generated_count,
"total_blueprints": len(result.blueprints),
"plan": result.plan.model_dump(),
"blueprints": [blueprint.model_dump() for blueprint in returned_blueprints],
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| dataset_id = f"blueprint-ds-{uuid4()}" | |
| store = DatasetStore.shared() | |
| try: | |
| result = await BlueprintPipeline().generate( | |
| req, | |
| dataset_id=dataset_id, | |
| store=store, | |
| ) | |
| except ValueError as err: | |
| raise HTTPException(status_code=422, detail=str(err)) from err | |
| returned_blueprints = result.blueprints[:max_returned] | |
| return { | |
| "dataset_id": result.dataset_id, | |
| dataset_id = f"blueprint-ds-{uuid4()}" | |
| store = DatasetStore.shared() | |
| try: | |
| result = await BlueprintPipeline().generate( | |
| req, | |
| dataset_id=dataset_id, | |
| store=store, | |
| ) | |
| except ValueError as err: | |
| raise HTTPException(status_code=422, detail=str(err)) from err | |
| if result.dataset_id != dataset_id: | |
| raise HTTPException( | |
| status_code=500, | |
| detail="blueprint pipeline returned a mismatched dataset_id", | |
| ) | |
| returned_blueprints = result.blueprints[:max_returned] | |
| return { | |
| "dataset_id": dataset_id, |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/casecrawler/api/routes/datasets.py` around lines 213 - 225, The response
currently returns result.dataset_id which may diverge from the route-generated
dataset_id; update the handler so that after calling
BlueprintPipeline().generate(...) you always use the locally created dataset_id
variable in the response (e.g., replace uses of result.dataset_id with
dataset_id) while keeping the pipeline call and error handling intact—ensure
BlueprintPipeline.generate(...) and any references to returned_blueprints remain
unchanged except for switching the response's dataset_id to the route-generated
dataset_id.
Summary
Tests
Summary by CodeRabbit
New Features
Tests