fix(build): provide test DB for DB-forcing checks + report failed for empty builds (#82) by AbirAbbas · Pull Request #84 · Agent-Field/SWE-AF

AbirAbbas · 2026-06-29T16:45:36Z

Summary

Fixes the two reporter-proposed gaps in #82.

Gap 1 — DB-forcing build checks had no reachable database

The build container had no database, so a target-repo check that hard-requires Postgres (e.g. REQUIRE_TEST_DB=1 npm run test:integration -- …migration….test.js, or any test that errors-on-connect instead of skipping) failed at the connection layer (pg-pool ECONNREFUSED) and could never go green. For a foundation issue this fails unrecoverably and cascades — every dependent issue is skipped, nothing merges, and the build comes back empty.

Add a generic, ephemeral postgres:16 build-db service to both compose files and wire DATABASE_URL_TEST into swe-agent and swe-fast (gated on a healthy DB). It ships empty and the target repo's own integration global-setup migrates it, so the factory stays repo-agnostic. Throwaway by design (no volume); one shared DB means DB-dependent builds should run one at a time (per-build DBs would be the next step).

Gap 2 — a build that completes zero issues surfaced as `succeeded`

The control plane records an execution as succeeded whenever build() returns — even a BuildResult with success=False. So a fully-failed build (foundation issue fails → cascade → only a cosmetic .gitignore diff) read as green.

build() now tracks a high-water mark of work shipped (ever_completed/ever_merged) across the original run and every fix cycle (the verify/fix loop reassigns dag_result, so the final state alone can't tell whether real code ever merged). When verification failed and nothing was ever completed or merged, it raises ReasonerFailed (carrying the BuildResult) so the execution reports failed while preserving the structured result. Partial builds that shipped ≥1 issue or branch still return normally.

Validation Contract (Gap 2)

0 completed and 0 merged and verification failed → reported failed (empty build).
≥1 issue completed or ≥1 branch merged → not empty, returns normally even if verification failed (real code + open PR worth surfacing).
Verification passed → never empty.

Changes Made

docker-compose.yml, docker-compose.local.yml: add build-db (healthchecked) + DATABASE_URL_TEST wiring + depends_on: service_healthy.
.env.example: document DATABASE_URL_TEST and BUILD_DB_*.
swe_af/app.py: _is_empty_build() predicate; high-water-mark tracking; raise ReasonerFailed on empty build.
Defensive ReasonerFailed import — SWE-AF stays importable against agentfield SDKs that predate it (shim still flips status to failed via the generic path; full result-preservation needs feat(sdk): ReasonerFailed exception for reporting reasoner failure with result agentfield#697 released).

Test Plan

docker compose -f docker-compose.yml config / -f docker-compose.local.yml config parse; rendered config shows build-db, healthcheck, and DATABASE_URL_TEST on both build nodes.
New tests/test_empty_build_guard.py — truth table + the exact reporter scenario + partial-build-not-empty.
Full make check on py3.12 (CI's pinned version): 1013 passed, 1 skipped.

Dependency / follow-up

Full-fidelity Gap 2 (failed status with the structured result preserved) needs the SDK change in Agent-Field/agentfield#697. Once that releases, a follow-up should bump the agentfield floor in requirements*.txt / pyproject.toml (Docker layer caching keys off the constraint string, so the floor must change to pull the new release).

Closes #82 (Gaps 1 & 2). The codex empty-builds observation (Gap 3) is a separate, runtime-specific investigation that needs codex creds to reproduce end-to-end.

🤖 Generated with Claude Code

DB-forcing integration checks (e.g. REQUIRE_TEST_DB=1 npm run test:integration) had no reachable database in the build container, so they failed at the connection layer (pg-pool ECONNREFUSED) and could never go green. For a foundation issue this fails unrecoverably and cascades — every dependent issue is skipped, nothing merges, and the build comes back empty. Add a generic, ephemeral postgres:16 `build-db` service to both compose files and wire DATABASE_URL_TEST into swe-agent and swe-fast (gated on a healthy DB). It ships empty and the target repo's own integration global-setup migrates it, so the factory stays repo-agnostic. Throwaway by design (no volume); one shared DB means DB-dependent builds should run one at a time. Closes part of #82 (Gap 1). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

A build that completes zero issues and merges nothing returned a BuildResult with success=False — but returning any value makes the control plane record the execution as `succeeded` (the async SDK handler only distinguishes return from raise). So a fully-failed build (e.g. a foundation issue fails and cascades, leaving only a cosmetic .gitignore diff) read as green. Track a high-water mark of work shipped (ever_completed/ever_merged) across the original run and every fix cycle — the verify/fix loop reassigns dag_result, so the final state alone can't tell whether the build ever merged real code. When verification failed AND nothing was ever completed or merged, raise ReasonerFailed (carrying the BuildResult) so the execution reports `failed` while preserving the structured result. Partial builds that shipped at least one issue or branch still return normally. ReasonerFailed is imported defensively so SWE-AF stays importable against agentfield SDKs that predate it (the shim still flips status to failed via the generic exception path; result-preservation needs the SDK release). Closes part of #82 (Gap 2). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

0.1.96 ships ReasonerFailed (Agent-Field/agentfield#697), which build() raises so an empty build reports `failed` (not `succeeded`) with its result preserved (#82 Gap 2). Bump the floor in all three dependency declarations — pip/Docker layer caching keys off the constraint string, so the floor must change to pull the new release rather than restore a cached pre-0.1.96 layer. With the floor satisfied the defensive ReasonerFailed import binds the real SDK class (verified: swe_af.app.ReasonerFailed resolves to agentfield.exceptions), so the failed-status callback now preserves the structured BuildResult. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

AbirAbbas and others added 2 commits June 29, 2026 12:23

AbirAbbas mentioned this pull request Jun 29, 2026

fix(codex): auth-aware default model — fixes codex empty builds (#82 Gap 3) #85

Merged

4 tasks

AbirAbbas merged commit 8b01e26 into main Jun 29, 2026
2 checks passed

AbirAbbas deleted the fix/issue-82-build-db-and-status branch June 29, 2026 19:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(build): provide test DB for DB-forcing checks + report failed for empty builds (#82)#84

fix(build): provide test DB for DB-forcing checks + report failed for empty builds (#82)#84
AbirAbbas merged 3 commits into
mainfrom
fix/issue-82-build-db-and-status

AbirAbbas commented Jun 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

AbirAbbas commented Jun 29, 2026

Summary

Gap 1 — DB-forcing build checks had no reachable database

Gap 2 — a build that completes zero issues surfaced as succeeded

Validation Contract (Gap 2)

Changes Made

Test Plan

Dependency / follow-up

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Gap 2 — a build that completes zero issues surfaced as `succeeded`