Skip to content

feat(sdk): ReasonerFailed exception for reporting reasoner failure with result#697

Merged
AbirAbbas merged 1 commit into
mainfrom
fix/issue-82-reasoner-failed-status
Jun 29, 2026
Merged

feat(sdk): ReasonerFailed exception for reporting reasoner failure with result#697
AbirAbbas merged 1 commit into
mainfrom
fix/issue-82-reasoner-failed-status

Conversation

@AbirAbbas

Copy link
Copy Markdown
Contributor

Summary

The async execution handler records an execution as succeeded whenever the reasoner returns a value — it never inspects the result. A reasoner whose own payload says success: False (e.g. a SWE-AF build that completed zero issues and merged nothing) therefore surfaces as green, which is easy to act on incorrectly.

This adds ReasonerFailed, raised inside a reasoner to report that the work ran but failed. The handler maps it to status="failed" while still posting the structured result, so the control plane — which stores the result payload regardless of terminal status — keeps the rich outcome (debt, DAG state, any PR opened) instead of just a bare error string.

This is the SDK half of Agent-Field/SWE-AF#82 (Gap 2). The SWE-AF side raises ReasonerFailed when a build ships nothing.

Changes Made

  • exceptions.py: new ReasonerFailed(AgentFieldError) carrying message + optional result + error_details; exported.
  • __init__.py: export ReasonerFailed (import + __all__).
  • agent.py: in _execute_async_with_callback's failure path, when the raised exception is a ReasonerFailed with a non-None result, attach jsonable_encoder(result) to the status="failed" payload. error_details already flows through the existing generic path.

Test Plan

  • New unit tests in tests/test_async_execution.py:
    • ReasonerFailed(result=...) → status callback is failed, error/error_details set, and result preserved.
    • Plain Exceptionfailed with no result key (regression guard — result-preservation must not leak into the ordinary failure path).
  • ruff check . clean (full sdk/python).
  • Full SDK pytest suite passes.

Compatibility

Purely additive — existing reasoners that return values are unchanged. Only reasoners that explicitly raise ReasonerFailed get the new failed-with-result behavior.

🤖 Generated with Claude Code

…with result

The async execution handler records an execution as `succeeded` whenever the
reasoner returns a value — it never inspects the result. A reasoner whose own
payload says `success: False` (e.g. a build that completed zero issues and
merged nothing) therefore surfaces as green, which is easy to act on
incorrectly.

Add `ReasonerFailed`, raised inside a reasoner to report that the work ran but
failed. The handler maps it to `status="failed"` while still posting the
structured `result`, so the control plane (which stores the result payload
regardless of terminal status) keeps the rich outcome — debt, DAG state, any
PR opened — instead of just a bare error string. error_details is carried
through the existing generic path.

Refs Agent-Field/SWE-AF#82 (Gap 2, SDK half).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@santoshkumarradha

Copy link
Copy Markdown
Member

@AbirAbbas do we need to replicate for tsx/go or nay ?

@github-actions

Copy link
Copy Markdown
Contributor

Performance

SDK Memory Δ Latency Δ Tests Status
Python 9.4 KB +4% 0.32 µs -9%

✓ No regressions detected

@github-actions

Copy link
Copy Markdown
Contributor

📊 Coverage gate

Thresholds from .coverage-gate.toml: per-surface ≥ 84%, aggregate ≥ 85%, max per-surface regression ≤ 1.0 pp, max aggregate regression ≤ 0.50 pp.

Surface Current Baseline Δ
control-plane 87.00% 87.40% ↓ -0.40 pp 🟡
sdk-go 91.80% 92.00% ↓ -0.20 pp 🟢
sdk-python 93.87% 93.73% ↑ +0.14 pp 🟢
sdk-typescript 90.05% 90.42% ↓ -0.37 pp 🟢
web-ui 84.83% 84.79% ↑ +0.04 pp 🟡
aggregate 85.63% 85.75% ↓ -0.12 pp 🟡

✅ Gate passed

No surface regressed past the allowed threshold and the aggregate stayed above the floor.

@github-actions

Copy link
Copy Markdown
Contributor

📐 Patch coverage gate

Threshold: 80% on lines this PR touches vs origin/main (from .coverage-gate.toml:thresholds.min_patch).

Surface Touched lines Patch coverage Status
control-plane 0 ➖ no changes
sdk-go 0 ➖ no changes
sdk-python 0 ➖ no changes
sdk-typescript 0 ➖ no changes
web-ui 0 ➖ no changes

✅ Patch gate passed

Every surface whose lines were touched by this PR has patch coverage at or above the threshold.

@santoshkumarradha santoshkumarradha left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Read through the SDK-side failure-path change and reran uv run --extra dev pytest tests/test_async_execution.py -q plus a quick compile check locally. The new ReasonerFailed path keeps the structured result on failed async executions without changing the plain exception path, and the added tests cover both cases well.

@AbirAbbas AbirAbbas added this pull request to the merge queue Jun 29, 2026
Merged via the queue into main with commit 491460d Jun 29, 2026
32 checks passed
@AbirAbbas AbirAbbas deleted the fix/issue-82-reasoner-failed-status branch June 29, 2026 17:49
AbirAbbas added a commit to Agent-Field/SWE-AF that referenced this pull request Jun 29, 2026
0.1.96 ships ReasonerFailed (Agent-Field/agentfield#697), which build() raises
so an empty build reports `failed` (not `succeeded`) with its result preserved
(#82 Gap 2). Bump the floor in all three dependency declarations — pip/Docker
layer caching keys off the constraint string, so the floor must change to pull
the new release rather than restore a cached pre-0.1.96 layer.

With the floor satisfied the defensive ReasonerFailed import binds the real SDK
class (verified: swe_af.app.ReasonerFailed resolves to agentfield.exceptions),
so the failed-status callback now preserves the structured BuildResult.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
AbirAbbas added a commit to Agent-Field/SWE-AF that referenced this pull request Jun 29, 2026
… empty builds (#82) (#84)

* fix(build): add ephemeral build-db Postgres + DATABASE_URL_TEST wiring

DB-forcing integration checks (e.g. REQUIRE_TEST_DB=1 npm run test:integration)
had no reachable database in the build container, so they failed at the
connection layer (pg-pool ECONNREFUSED) and could never go green. For a
foundation issue this fails unrecoverably and cascades — every dependent issue
is skipped, nothing merges, and the build comes back empty.

Add a generic, ephemeral postgres:16 `build-db` service to both compose files
and wire DATABASE_URL_TEST into swe-agent and swe-fast (gated on a healthy DB).
It ships empty and the target repo's own integration global-setup migrates it,
so the factory stays repo-agnostic. Throwaway by design (no volume); one shared
DB means DB-dependent builds should run one at a time.

Closes part of #82 (Gap 1).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* fix(build): report failed (not succeeded) for an empty build

A build that completes zero issues and merges nothing returned a BuildResult
with success=False — but returning any value makes the control plane record the
execution as `succeeded` (the async SDK handler only distinguishes return from
raise). So a fully-failed build (e.g. a foundation issue fails and cascades,
leaving only a cosmetic .gitignore diff) read as green.

Track a high-water mark of work shipped (ever_completed/ever_merged) across the
original run and every fix cycle — the verify/fix loop reassigns dag_result, so
the final state alone can't tell whether the build ever merged real code. When
verification failed AND nothing was ever completed or merged, raise
ReasonerFailed (carrying the BuildResult) so the execution reports `failed`
while preserving the structured result. Partial builds that shipped at least one
issue or branch still return normally.

ReasonerFailed is imported defensively so SWE-AF stays importable against
agentfield SDKs that predate it (the shim still flips status to failed via the
generic exception path; result-preservation needs the SDK release).

Closes part of #82 (Gap 2).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* chore(deps): require agentfield>=0.1.96 for ReasonerFailed

0.1.96 ships ReasonerFailed (Agent-Field/agentfield#697), which build() raises
so an empty build reports `failed` (not `succeeded`) with its result preserved
(#82 Gap 2). Bump the floor in all three dependency declarations — pip/Docker
layer caching keys off the constraint string, so the floor must change to pull
the new release rather than restore a cached pre-0.1.96 layer.

With the floor satisfied the defensive ReasonerFailed import binds the real SDK
class (verified: swe_af.app.ReasonerFailed resolves to agentfield.exceptions),
so the failed-status callback now preserves the structured BuildResult.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants