fix(build): provide test DB for DB-forcing checks + report failed for empty builds (#82)#84
Merged
Merged
Conversation
DB-forcing integration checks (e.g. REQUIRE_TEST_DB=1 npm run test:integration) had no reachable database in the build container, so they failed at the connection layer (pg-pool ECONNREFUSED) and could never go green. For a foundation issue this fails unrecoverably and cascades — every dependent issue is skipped, nothing merges, and the build comes back empty. Add a generic, ephemeral postgres:16 `build-db` service to both compose files and wire DATABASE_URL_TEST into swe-agent and swe-fast (gated on a healthy DB). It ships empty and the target repo's own integration global-setup migrates it, so the factory stays repo-agnostic. Throwaway by design (no volume); one shared DB means DB-dependent builds should run one at a time. Closes part of #82 (Gap 1). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A build that completes zero issues and merges nothing returned a BuildResult with success=False — but returning any value makes the control plane record the execution as `succeeded` (the async SDK handler only distinguishes return from raise). So a fully-failed build (e.g. a foundation issue fails and cascades, leaving only a cosmetic .gitignore diff) read as green. Track a high-water mark of work shipped (ever_completed/ever_merged) across the original run and every fix cycle — the verify/fix loop reassigns dag_result, so the final state alone can't tell whether the build ever merged real code. When verification failed AND nothing was ever completed or merged, raise ReasonerFailed (carrying the BuildResult) so the execution reports `failed` while preserving the structured result. Partial builds that shipped at least one issue or branch still return normally. ReasonerFailed is imported defensively so SWE-AF stays importable against agentfield SDKs that predate it (the shim still flips status to failed via the generic exception path; result-preservation needs the SDK release). Closes part of #82 (Gap 2). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
4 tasks
0.1.96 ships ReasonerFailed (Agent-Field/agentfield#697), which build() raises so an empty build reports `failed` (not `succeeded`) with its result preserved (#82 Gap 2). Bump the floor in all three dependency declarations — pip/Docker layer caching keys off the constraint string, so the floor must change to pull the new release rather than restore a cached pre-0.1.96 layer. With the floor satisfied the defensive ReasonerFailed import binds the real SDK class (verified: swe_af.app.ReasonerFailed resolves to agentfield.exceptions), so the failed-status callback now preserves the structured BuildResult. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes the two reporter-proposed gaps in #82.
Gap 1 — DB-forcing build checks had no reachable database
The build container had no database, so a target-repo check that hard-requires Postgres (e.g.
REQUIRE_TEST_DB=1 npm run test:integration -- …migration….test.js, or any test that errors-on-connect instead of skipping) failed at the connection layer (pg-pool ECONNREFUSED) and could never go green. For a foundation issue this fails unrecoverably and cascades — every dependent issue is skipped, nothing merges, and the build comes back empty.Add a generic, ephemeral
postgres:16build-dbservice to both compose files and wireDATABASE_URL_TESTintoswe-agentandswe-fast(gated on a healthy DB). It ships empty and the target repo's own integrationglobal-setupmigrates it, so the factory stays repo-agnostic. Throwaway by design (no volume); one shared DB means DB-dependent builds should run one at a time (per-build DBs would be the next step).Gap 2 — a build that completes zero issues surfaced as
succeededThe control plane records an execution as
succeededwheneverbuild()returns — even aBuildResultwithsuccess=False. So a fully-failed build (foundation issue fails → cascade → only a cosmetic.gitignorediff) read as green.build()now tracks a high-water mark of work shipped (ever_completed/ever_merged) across the original run and every fix cycle (the verify/fix loop reassignsdag_result, so the final state alone can't tell whether real code ever merged). When verification failed and nothing was ever completed or merged, it raisesReasonerFailed(carrying theBuildResult) so the execution reportsfailedwhile preserving the structured result. Partial builds that shipped ≥1 issue or branch still return normally.Validation Contract (Gap 2)
Changes Made
docker-compose.yml,docker-compose.local.yml: addbuild-db(healthchecked) +DATABASE_URL_TESTwiring +depends_on: service_healthy..env.example: documentDATABASE_URL_TESTandBUILD_DB_*.swe_af/app.py:_is_empty_build()predicate; high-water-mark tracking; raiseReasonerFailedon empty build.ReasonerFailedimport — SWE-AF stays importable against agentfield SDKs that predate it (shim still flips status tofailedvia the generic path; full result-preservation needs feat(sdk): ReasonerFailed exception for reporting reasoner failure with result agentfield#697 released).Test Plan
docker compose -f docker-compose.yml config/-f docker-compose.local.yml configparse; rendered config showsbuild-db, healthcheck, andDATABASE_URL_TESTon both build nodes.tests/test_empty_build_guard.py— truth table + the exact reporter scenario + partial-build-not-empty.make checkon py3.12 (CI's pinned version): 1013 passed, 1 skipped.Dependency / follow-up
Full-fidelity Gap 2 (failed status with the structured result preserved) needs the SDK change in Agent-Field/agentfield#697. Once that releases, a follow-up should bump the
agentfieldfloor inrequirements*.txt/pyproject.toml(Docker layer caching keys off the constraint string, so the floor must change to pull the new release).Closes #82 (Gaps 1 & 2). The codex empty-builds observation (Gap 3) is a separate, runtime-specific investigation that needs codex creds to reproduce end-to-end.
🤖 Generated with Claude Code