Skip to content

fix(health): consolidate all health gate fixes#3

Merged
jmcentire merged 19 commits into
jmcentire:mainfrom
vaskoevgen:fix/all-health-fixes
May 17, 2026
Merged

fix(health): consolidate all health gate fixes#3
jmcentire merged 19 commits into
jmcentire:mainfrom
vaskoevgen:fix/all-health-fixes

Conversation

@vaskoevgen
Copy link
Copy Markdown
Collaborator

Summary

Consolidates all 4 health gate fixes into a single branch (supersedes #2).

  • Skip phase_balance critical check for execution phases — execution phases don't produce generation tokens, causing false-positive balance failures
  • Skip ratio check when no generation tokens exist yet — prevents division/ratio errors before any tokens are generated
  • Add preflight to _PRE_ARTIFACT_PHASES — pre-artifact phases need preflight included to avoid false positives
  • Skip health ratio checks in post-build phases — post-build phases run after generation is complete; ratio checks are not meaningful there

Test plan

  • Run pact health gate against a contract that uses execution phases — no false-positive failures
  • Run pact health gate against a contract in an early phase with zero generation tokens — no ratio errors
  • Run pact health gate through a full pipeline including post-build phases — no spurious ratio failures

@vaskoevgen vaskoevgen force-pushed the fix/all-health-fixes branch from e464d44 to c76e7dc Compare April 30, 2026 17:10
vaskoevgen and others added 18 commits May 8, 2026 09:12
Two related fixes for spurious health gate pauses:

1. `_check_output_planning_ratio`: guard against `generation_tokens == 0`.
   When no code has been generated yet, the ratio is literally 0/N = 0.00x,
   which always triggers CRITICAL. The fix returns healthy (insufficient data)
   until at least one generation token exists — semantically correct since the
   ratio is undefined before any implementation work begins.

2. `_PRE_ARTIFACT_PHASES`: extend to cover post-build phases (integrate,
   arbiter, polish, retrospective, complete). These phases produce no new
   code, so the cumulative planning/generation ratio never recovers after
   implementation ends, causing 5–6 more spurious pauses after the build
   is already complete.

Without both fixes, a typical build requires 6–7 manual `pact resume .`
calls — one before implementation starts and one after each post-build
phase.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ositives

Preflight runs before any code is generated. The variance_reaches_target
check fires CRITICAL when generation_tokens is 0 and total_tokens > 5000,
blocking all builds that accumulate token spend in the decompose/contract
phases before reaching preflight.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The phase_balance check fires CRITICAL when implement dominates (~89%
of tokens in a unary single-component build). This is expected — the
check was designed to catch inverted architectures where interview/
decompose consume more than execution. Guard: only flag planning phases,
not implement/integrate/polish/retrospective/complete.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…tion

- test_author.py: prompt now generates @pytest.mark.anyio instead of
  @pytest.mark.asyncio — anyio is a pact dependency, no extra install needed
- test_harness.py: add pact's site-packages to PYTHONPATH in test subprocess
  so anyio is always available

Fixes: generated async tests failing with "async def functions are not
natively supported. Install pytest-asyncio..."

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… names

Two bugs in test result parsing:

1. parse_pytest_output triggered a spurious "collection error" when a test
   name contained the substring "error" (e.g. test_valid_error), even though
   all tests simply skipped. Now checks for real pytest error markers:
   "ERROR collecting", "error during collection", "= ERRORS =", etc.

2. all_passed returned False when total==0 (all tests skipped). Skipped ≠
   failed — no failures and no errors should pass vacuously. Test-file-not-
   found still fails correctly because it sets errors=1.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Goodhart tests were generated with `from src.<cid> import *` but pact's
test runner sets PYTHONPATH=src/<cid>, so the importable package name is
<cid> directly — not src.<cid>. This caused every goodhart test to fail
with ModuleNotFoundError on collection.

Fix: generate `from <cid> import *` (matching how regular contract tests
import their components).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ttern to system prompt

Generated contract tests with backend/database fixtures were failing with connection
errors instead of skipping gracefully when infrastructure was absent. Added canonical
urllib.request connectivity-check pattern to TEST_SYSTEM so future generated fixtures
call pytest.skip() when the service is unreachable rather than letting the test fail.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The contract phase is a scheduler no-op (test authoring happens during
decompose). When the health check ran after the contract phase with
pre_artifact=False, it saw 0% generation tokens vs planning tokens from
decompose and falsely triggered variance_reaches_target CRITICAL → pause.

Adding decompose and contract to _PRE_ARTIFACT_PHASES suppresses
generation-ratio checks during phases where no implementation exists yet.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ance false positive

Goodhart system prompt: generated tests were using `requests` (not installed)
instead of `httpx`, and HTTP tests had no skip-when-unreachable guard. Now
requires httpx and adds a pytestmark_backend skip pattern for live-backend tests.

health.py: add decompose to _EXECUTION_PHASES so phase_balance doesn't fire
CRITICAL when decompose dominates token usage — it does contract+test authoring,
which is legitimate execution work, not coordination overhead.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… non-HTTP components

pact deploy was generating 0 edges because leaf components have empty
dependency declarations — only the root integration node declares deps,
and root is excluded from the deployed node list.

Fix edge inference by classifying components via their contract
side-effect types: database_* effects imply an API→DB edge, http_*
effects imply a client→API edge.

Also fix health check generation: components whose side-effects are
exclusively DDL/catalog operations (schema init, migrations) have no
HTTP server and should not receive a health_check URL.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds monitor-pact.sh — a portable bash script that watches a running
pact daemon and keeps it running without manual intervention:

- Polls pact status every N seconds (configurable, default 30s)
- Auto-runs `pact approve` on interview pauses
- Auto-runs `pact resume` on health gate pauses
- Restarts daemon if the process dies, then resumes
- Prints `pact log` tail on phase changes and every 5 polls
- Prints `pact components` during implement/integrate phases
- Exits 0 on complete/certified, exits 1 on failed

No hardcoded paths: pact binary found via script dir or PATH,
API key loaded from environment then walked up parent dirs.

Usage: bash monitor-pact.sh <project-dir> [interval-seconds]

Documents the script in README.md under a new "Monitoring the
Daemon" section with usage examples and sample output.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds silent-hang detection to monitor-pact.sh: if the daemon process
is alive and state=active but the pact audit log timestamp hasn't
advanced in STUCK_TIMEOUT seconds (default 600s / 10 min), the script
kills the daemon PID, restarts it, and runs pact resume.

This covers the case where a hanging API call keeps the process alive
but blocks all progress indefinitely — previously undetected.

STUCK_TIMEOUT is configurable via env var:
  STUCK_TIMEOUT=300 bash monitor-pact.sh . 15

Also documents the new scenario in README.md.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…tection

Three bugs fixed:

1. Removed `set -euo pipefail` — grep returns exit 1 on no-match,
   which silently killed the script mid-loop (visible as one-poll-then-quit).
   Replaced with a `sgrep()` helper that always exits 0.

2. Logs now printed every poll instead of every 5 polls. Users were
   missing log output during active phases.

3. Silent-hang detection now guards against comparing against an empty
   LAST_AUDIT_TS on the first iteration (was triggering false restarts).
   Also logs a progress message showing silence duration vs threshold.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The core problem: print_log reprinted the same last-N entries every
poll, so the output looked like nothing was happening even when pact
was actively writing new entries.

New approach — incremental log tail:
- Seeds SEEN_LOG_LINES on startup to the current entry count
- Each poll counts total entries and compares to SEEN_LOG_LINES
- Only prints the delta (new entries since last poll)
- Announces "N new entries" header so it's clear when progress lands
- Shows nothing when there are no new entries (no noise)

Also removes the every-5-poll gate that was suppressing logs between
phase changes. Logs now check every poll but only print when new.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…e to || true

sgrep() was used in if-conditions where the exit code determines the branch.
Because sgrep wraps grep with || true, it always exits 0, so every condition
was always true: interview/health/dead-daemon/build-complete all fired on
every loop iteration.

Add cgrep() for if-conditions (plain grep, no || true). Keep sgrep() for
output-capturing where we don't want a no-match to abort the script.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…rom audit trail

The audit trail in pact status output contains historic entries like
"Resuming from failed: Phase decompose timed out..." which caused the
bare `failed` pattern to false-trigger BUILD FAILED.

Similarly, health gate matched anywhere in output instead of the
Health: line.

Fixes:
- `failed`/`complete`/`certified`: anchor to state-line format `^[hex] state`
- Health gate: match `Health: CRITICAL|DEGRADED` (not arbitrary occurrence)
- Daemon dead: anchor to `Daemon: stopped/not running` prefix

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…sponse

When the LLM returns package.json or tsconfig.json as a parsed JSON object
(dict) instead of a serialized string, the CodeResponse validator was raising:
  Input should be a valid string [type=string_type, input_value={...}, input_type=dict]

Add a field_validator that auto-serializes any dict value to a JSON string
with 2-space indent. This recovers from the LLM's structured-data habit without
requiring prompt changes.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…dict serialization

field_validator(mode='before') on dict[str, str] was not intercepting dict
values before pydantic validated them. Switched to model_validator(mode='before')
which preprocesses the entire model input dict before any field validation,
reliably serializing dict/list file values to JSON strings.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@vaskoevgen vaskoevgen force-pushed the fix/all-health-fixes branch from 84fe4cb to dcfa0c9 Compare May 8, 2026 14:14
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@jmcentire jmcentire merged commit 29ed47d into jmcentire:main May 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants