Skip to content

fix(build): provide test DB for DB-forcing checks + report failed for empty builds (#82)#84

Merged
AbirAbbas merged 3 commits into
mainfrom
fix/issue-82-build-db-and-status
Jun 29, 2026
Merged

fix(build): provide test DB for DB-forcing checks + report failed for empty builds (#82)#84
AbirAbbas merged 3 commits into
mainfrom
fix/issue-82-build-db-and-status

Conversation

@AbirAbbas

Copy link
Copy Markdown
Collaborator

Summary

Fixes the two reporter-proposed gaps in #82.

Gap 1 — DB-forcing build checks had no reachable database

The build container had no database, so a target-repo check that hard-requires Postgres (e.g. REQUIRE_TEST_DB=1 npm run test:integration -- …migration….test.js, or any test that errors-on-connect instead of skipping) failed at the connection layer (pg-pool ECONNREFUSED) and could never go green. For a foundation issue this fails unrecoverably and cascades — every dependent issue is skipped, nothing merges, and the build comes back empty.

Add a generic, ephemeral postgres:16 build-db service to both compose files and wire DATABASE_URL_TEST into swe-agent and swe-fast (gated on a healthy DB). It ships empty and the target repo's own integration global-setup migrates it, so the factory stays repo-agnostic. Throwaway by design (no volume); one shared DB means DB-dependent builds should run one at a time (per-build DBs would be the next step).

Gap 2 — a build that completes zero issues surfaced as succeeded

The control plane records an execution as succeeded whenever build() returns — even a BuildResult with success=False. So a fully-failed build (foundation issue fails → cascade → only a cosmetic .gitignore diff) read as green.

build() now tracks a high-water mark of work shipped (ever_completed/ever_merged) across the original run and every fix cycle (the verify/fix loop reassigns dag_result, so the final state alone can't tell whether real code ever merged). When verification failed and nothing was ever completed or merged, it raises ReasonerFailed (carrying the BuildResult) so the execution reports failed while preserving the structured result. Partial builds that shipped ≥1 issue or branch still return normally.

Validation Contract (Gap 2)

  • 0 completed and 0 merged and verification failed → reported failed (empty build).
  • ≥1 issue completed or ≥1 branch merged → not empty, returns normally even if verification failed (real code + open PR worth surfacing).
  • Verification passed → never empty.

Changes Made

  • docker-compose.yml, docker-compose.local.yml: add build-db (healthchecked) + DATABASE_URL_TEST wiring + depends_on: service_healthy.
  • .env.example: document DATABASE_URL_TEST and BUILD_DB_*.
  • swe_af/app.py: _is_empty_build() predicate; high-water-mark tracking; raise ReasonerFailed on empty build.
  • Defensive ReasonerFailed import — SWE-AF stays importable against agentfield SDKs that predate it (shim still flips status to failed via the generic path; full result-preservation needs feat(sdk): ReasonerFailed exception for reporting reasoner failure with result agentfield#697 released).

Test Plan

  • docker compose -f docker-compose.yml config / -f docker-compose.local.yml config parse; rendered config shows build-db, healthcheck, and DATABASE_URL_TEST on both build nodes.
  • New tests/test_empty_build_guard.py — truth table + the exact reporter scenario + partial-build-not-empty.
  • Full make check on py3.12 (CI's pinned version): 1013 passed, 1 skipped.

Dependency / follow-up

Full-fidelity Gap 2 (failed status with the structured result preserved) needs the SDK change in Agent-Field/agentfield#697. Once that releases, a follow-up should bump the agentfield floor in requirements*.txt / pyproject.toml (Docker layer caching keys off the constraint string, so the floor must change to pull the new release).

Closes #82 (Gaps 1 & 2). The codex empty-builds observation (Gap 3) is a separate, runtime-specific investigation that needs codex creds to reproduce end-to-end.

🤖 Generated with Claude Code

AbirAbbas and others added 2 commits June 29, 2026 12:23
DB-forcing integration checks (e.g. REQUIRE_TEST_DB=1 npm run test:integration)
had no reachable database in the build container, so they failed at the
connection layer (pg-pool ECONNREFUSED) and could never go green. For a
foundation issue this fails unrecoverably and cascades — every dependent issue
is skipped, nothing merges, and the build comes back empty.

Add a generic, ephemeral postgres:16 `build-db` service to both compose files
and wire DATABASE_URL_TEST into swe-agent and swe-fast (gated on a healthy DB).
It ships empty and the target repo's own integration global-setup migrates it,
so the factory stays repo-agnostic. Throwaway by design (no volume); one shared
DB means DB-dependent builds should run one at a time.

Closes part of #82 (Gap 1).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A build that completes zero issues and merges nothing returned a BuildResult
with success=False — but returning any value makes the control plane record the
execution as `succeeded` (the async SDK handler only distinguishes return from
raise). So a fully-failed build (e.g. a foundation issue fails and cascades,
leaving only a cosmetic .gitignore diff) read as green.

Track a high-water mark of work shipped (ever_completed/ever_merged) across the
original run and every fix cycle — the verify/fix loop reassigns dag_result, so
the final state alone can't tell whether the build ever merged real code. When
verification failed AND nothing was ever completed or merged, raise
ReasonerFailed (carrying the BuildResult) so the execution reports `failed`
while preserving the structured result. Partial builds that shipped at least one
issue or branch still return normally.

ReasonerFailed is imported defensively so SWE-AF stays importable against
agentfield SDKs that predate it (the shim still flips status to failed via the
generic exception path; result-preservation needs the SDK release).

Closes part of #82 (Gap 2).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
0.1.96 ships ReasonerFailed (Agent-Field/agentfield#697), which build() raises
so an empty build reports `failed` (not `succeeded`) with its result preserved
(#82 Gap 2). Bump the floor in all three dependency declarations — pip/Docker
layer caching keys off the constraint string, so the floor must change to pull
the new release rather than restore a cached pre-0.1.96 layer.

With the floor satisfied the defensive ReasonerFailed import binds the real SDK
class (verified: swe_af.app.ReasonerFailed resolves to agentfield.exceptions),
so the failed-status callback now preserves the structured BuildResult.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@AbirAbbas AbirAbbas merged commit 8b01e26 into main Jun 29, 2026
2 checks passed
@AbirAbbas AbirAbbas deleted the fix/issue-82-build-db-and-status branch June 29, 2026 19:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Build factory: DB-forcing checks have no test DB (fix included); failed builds can report succeeded

1 participant