feat(sdk): ReasonerFailed exception for reporting reasoner failure with result#697
Conversation
…with result The async execution handler records an execution as `succeeded` whenever the reasoner returns a value — it never inspects the result. A reasoner whose own payload says `success: False` (e.g. a build that completed zero issues and merged nothing) therefore surfaces as green, which is easy to act on incorrectly. Add `ReasonerFailed`, raised inside a reasoner to report that the work ran but failed. The handler maps it to `status="failed"` while still posting the structured `result`, so the control plane (which stores the result payload regardless of terminal status) keeps the rich outcome — debt, DAG state, any PR opened — instead of just a bare error string. error_details is carried through the existing generic path. Refs Agent-Field/SWE-AF#82 (Gap 2, SDK half). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
@AbirAbbas do we need to replicate for tsx/go or nay ? |
Performance
✓ No regressions detected |
📊 Coverage gateThresholds from
✅ Gate passedNo surface regressed past the allowed threshold and the aggregate stayed above the floor. |
📐 Patch coverage gateThreshold: 80% on lines this PR touches vs
✅ Patch gate passedEvery surface whose lines were touched by this PR has patch coverage at or above the threshold. |
santoshkumarradha
left a comment
There was a problem hiding this comment.
Read through the SDK-side failure-path change and reran uv run --extra dev pytest tests/test_async_execution.py -q plus a quick compile check locally. The new ReasonerFailed path keeps the structured result on failed async executions without changing the plain exception path, and the added tests cover both cases well.
0.1.96 ships ReasonerFailed (Agent-Field/agentfield#697), which build() raises so an empty build reports `failed` (not `succeeded`) with its result preserved (#82 Gap 2). Bump the floor in all three dependency declarations — pip/Docker layer caching keys off the constraint string, so the floor must change to pull the new release rather than restore a cached pre-0.1.96 layer. With the floor satisfied the defensive ReasonerFailed import binds the real SDK class (verified: swe_af.app.ReasonerFailed resolves to agentfield.exceptions), so the failed-status callback now preserves the structured BuildResult. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… empty builds (#82) (#84) * fix(build): add ephemeral build-db Postgres + DATABASE_URL_TEST wiring DB-forcing integration checks (e.g. REQUIRE_TEST_DB=1 npm run test:integration) had no reachable database in the build container, so they failed at the connection layer (pg-pool ECONNREFUSED) and could never go green. For a foundation issue this fails unrecoverably and cascades — every dependent issue is skipped, nothing merges, and the build comes back empty. Add a generic, ephemeral postgres:16 `build-db` service to both compose files and wire DATABASE_URL_TEST into swe-agent and swe-fast (gated on a healthy DB). It ships empty and the target repo's own integration global-setup migrates it, so the factory stays repo-agnostic. Throwaway by design (no volume); one shared DB means DB-dependent builds should run one at a time. Closes part of #82 (Gap 1). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(build): report failed (not succeeded) for an empty build A build that completes zero issues and merges nothing returned a BuildResult with success=False — but returning any value makes the control plane record the execution as `succeeded` (the async SDK handler only distinguishes return from raise). So a fully-failed build (e.g. a foundation issue fails and cascades, leaving only a cosmetic .gitignore diff) read as green. Track a high-water mark of work shipped (ever_completed/ever_merged) across the original run and every fix cycle — the verify/fix loop reassigns dag_result, so the final state alone can't tell whether the build ever merged real code. When verification failed AND nothing was ever completed or merged, raise ReasonerFailed (carrying the BuildResult) so the execution reports `failed` while preserving the structured result. Partial builds that shipped at least one issue or branch still return normally. ReasonerFailed is imported defensively so SWE-AF stays importable against agentfield SDKs that predate it (the shim still flips status to failed via the generic exception path; result-preservation needs the SDK release). Closes part of #82 (Gap 2). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * chore(deps): require agentfield>=0.1.96 for ReasonerFailed 0.1.96 ships ReasonerFailed (Agent-Field/agentfield#697), which build() raises so an empty build reports `failed` (not `succeeded`) with its result preserved (#82 Gap 2). Bump the floor in all three dependency declarations — pip/Docker layer caching keys off the constraint string, so the floor must change to pull the new release rather than restore a cached pre-0.1.96 layer. With the floor satisfied the defensive ReasonerFailed import binds the real SDK class (verified: swe_af.app.ReasonerFailed resolves to agentfield.exceptions), so the failed-status callback now preserves the structured BuildResult. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Summary
The async execution handler records an execution as
succeededwhenever the reasoner returns a value — it never inspects the result. A reasoner whose own payload sayssuccess: False(e.g. a SWE-AF build that completed zero issues and merged nothing) therefore surfaces as green, which is easy to act on incorrectly.This adds
ReasonerFailed, raised inside a reasoner to report that the work ran but failed. The handler maps it tostatus="failed"while still posting the structuredresult, so the control plane — which stores the result payload regardless of terminal status — keeps the rich outcome (debt, DAG state, any PR opened) instead of just a bare error string.This is the SDK half of Agent-Field/SWE-AF#82 (Gap 2). The SWE-AF side raises
ReasonerFailedwhen a build ships nothing.Changes Made
exceptions.py: newReasonerFailed(AgentFieldError)carryingmessage+ optionalresult+error_details; exported.__init__.py: exportReasonerFailed(import +__all__).agent.py: in_execute_async_with_callback's failure path, when the raised exception is aReasonerFailedwith a non-Noneresult, attachjsonable_encoder(result)to thestatus="failed"payload.error_detailsalready flows through the existing generic path.Test Plan
tests/test_async_execution.py:ReasonerFailed(result=...)→ status callback isfailed,error/error_detailsset, andresultpreserved.Exception→failedwith noresultkey (regression guard — result-preservation must not leak into the ordinary failure path).ruff check .clean (fullsdk/python).Compatibility
Purely additive — existing reasoners that return values are unchanged. Only reasoners that explicitly
raise ReasonerFailedget the new failed-with-result behavior.🤖 Generated with Claude Code