fix(health): consolidate all health gate fixes by vaskoevgen · Pull Request #3 · jmcentire/pact

vaskoevgen · 2026-04-30T16:52:57Z

Summary

Consolidates all 4 health gate fixes into a single branch (supersedes #2).

Skip phase_balance critical check for execution phases — execution phases don't produce generation tokens, causing false-positive balance failures
Skip ratio check when no generation tokens exist yet — prevents division/ratio errors before any tokens are generated
Add preflight to _PRE_ARTIFACT_PHASES — pre-artifact phases need preflight included to avoid false positives
Skip health ratio checks in post-build phases — post-build phases run after generation is complete; ratio checks are not meaningful there

Test plan

Run pact health gate against a contract that uses execution phases — no false-positive failures
Run pact health gate against a contract in an early phase with zero generation tokens — no ratio errors
Run pact health gate through a full pipeline including post-build phases — no spurious ratio failures

Two related fixes for spurious health gate pauses: 1. `_check_output_planning_ratio`: guard against `generation_tokens == 0`. When no code has been generated yet, the ratio is literally 0/N = 0.00x, which always triggers CRITICAL. The fix returns healthy (insufficient data) until at least one generation token exists — semantically correct since the ratio is undefined before any implementation work begins. 2. `_PRE_ARTIFACT_PHASES`: extend to cover post-build phases (integrate, arbiter, polish, retrospective, complete). These phases produce no new code, so the cumulative planning/generation ratio never recovers after implementation ends, causing 5–6 more spurious pauses after the build is already complete. Without both fixes, a typical build requires 6–7 manual `pact resume .` calls — one before implementation starts and one after each post-build phase. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ositives Preflight runs before any code is generated. The variance_reaches_target check fires CRITICAL when generation_tokens is 0 and total_tokens > 5000, blocking all builds that accumulate token spend in the decompose/contract phases before reaching preflight. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

The phase_balance check fires CRITICAL when implement dominates (~89% of tokens in a unary single-component build). This is expected — the check was designed to catch inverted architectures where interview/ decompose consume more than execution. Guard: only flag planning phases, not implement/integrate/polish/retrospective/complete. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…tion - test_author.py: prompt now generates @pytest.mark.anyio instead of @pytest.mark.asyncio — anyio is a pact dependency, no extra install needed - test_harness.py: add pact's site-packages to PYTHONPATH in test subprocess so anyio is always available Fixes: generated async tests failing with "async def functions are not natively supported. Install pytest-asyncio..." Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

… names Two bugs in test result parsing: 1. parse_pytest_output triggered a spurious "collection error" when a test name contained the substring "error" (e.g. test_valid_error), even though all tests simply skipped. Now checks for real pytest error markers: "ERROR collecting", "error during collection", "= ERRORS =", etc. 2. all_passed returned False when total==0 (all tests skipped). Skipped ≠ failed — no failures and no errors should pass vacuously. Test-file-not- found still fails correctly because it sets errors=1. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Goodhart tests were generated with `from src.<cid> import *` but pact's test runner sets PYTHONPATH=src/<cid>, so the importable package name is <cid> directly — not src.<cid>. This caused every goodhart test to fail with ModuleNotFoundError on collection. Fix: generate `from <cid> import *` (matching how regular contract tests import their components). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ttern to system prompt Generated contract tests with backend/database fixtures were failing with connection errors instead of skipping gracefully when infrastructure was absent. Added canonical urllib.request connectivity-check pattern to TEST_SYSTEM so future generated fixtures call pytest.skip() when the service is unreachable rather than letting the test fail. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

The contract phase is a scheduler no-op (test authoring happens during decompose). When the health check ran after the contract phase with pre_artifact=False, it saw 0% generation tokens vs planning tokens from decompose and falsely triggered variance_reaches_target CRITICAL → pause. Adding decompose and contract to _PRE_ARTIFACT_PHASES suppresses generation-ratio checks during phases where no implementation exists yet. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ance false positive Goodhart system prompt: generated tests were using `requests` (not installed) instead of `httpx`, and HTTP tests had no skip-when-unreachable guard. Now requires httpx and adds a pytestmark_backend skip pattern for live-backend tests. health.py: add decompose to _EXECUTION_PHASES so phase_balance doesn't fire CRITICAL when decompose dominates token usage — it does contract+test authoring, which is legitimate execution work, not coordination overhead. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

… non-HTTP components pact deploy was generating 0 edges because leaf components have empty dependency declarations — only the root integration node declares deps, and root is excluded from the deployed node list. Fix edge inference by classifying components via their contract side-effect types: database_* effects imply an API→DB edge, http_* effects imply a client→API edge. Also fix health check generation: components whose side-effects are exclusively DDL/catalog operations (schema init, migrations) have no HTTP server and should not receive a health_check URL. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Adds monitor-pact.sh — a portable bash script that watches a running pact daemon and keeps it running without manual intervention: - Polls pact status every N seconds (configurable, default 30s) - Auto-runs `pact approve` on interview pauses - Auto-runs `pact resume` on health gate pauses - Restarts daemon if the process dies, then resumes - Prints `pact log` tail on phase changes and every 5 polls - Prints `pact components` during implement/integrate phases - Exits 0 on complete/certified, exits 1 on failed No hardcoded paths: pact binary found via script dir or PATH, API key loaded from environment then walked up parent dirs. Usage: bash monitor-pact.sh <project-dir> [interval-seconds] Documents the script in README.md under a new "Monitoring the Daemon" section with usage examples and sample output. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Adds silent-hang detection to monitor-pact.sh: if the daemon process is alive and state=active but the pact audit log timestamp hasn't advanced in STUCK_TIMEOUT seconds (default 600s / 10 min), the script kills the daemon PID, restarts it, and runs pact resume. This covers the case where a hanging API call keeps the process alive but blocks all progress indefinitely — previously undetected. STUCK_TIMEOUT is configurable via env var: STUCK_TIMEOUT=300 bash monitor-pact.sh . 15 Also documents the new scenario in README.md. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…tection Three bugs fixed: 1. Removed `set -euo pipefail` — grep returns exit 1 on no-match, which silently killed the script mid-loop (visible as one-poll-then-quit). Replaced with a `sgrep()` helper that always exits 0. 2. Logs now printed every poll instead of every 5 polls. Users were missing log output during active phases. 3. Silent-hang detection now guards against comparing against an empty LAST_AUDIT_TS on the first iteration (was triggering false restarts). Also logs a progress message showing silence duration vs threshold. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

The core problem: print_log reprinted the same last-N entries every poll, so the output looked like nothing was happening even when pact was actively writing new entries. New approach — incremental log tail: - Seeds SEEN_LOG_LINES on startup to the current entry count - Each poll counts total entries and compares to SEEN_LOG_LINES - Only prints the delta (new entries since last poll) - Announces "N new entries" header so it's clear when progress lands - Shows nothing when there are no new entries (no noise) Also removes the every-5-poll gate that was suppressing logs between phase changes. Logs now check every poll but only print when new. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…e to || true sgrep() was used in if-conditions where the exit code determines the branch. Because sgrep wraps grep with || true, it always exits 0, so every condition was always true: interview/health/dead-daemon/build-complete all fired on every loop iteration. Add cgrep() for if-conditions (plain grep, no || true). Keep sgrep() for output-capturing where we don't want a no-match to abort the script. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…rom audit trail The audit trail in pact status output contains historic entries like "Resuming from failed: Phase decompose timed out..." which caused the bare `failed` pattern to false-trigger BUILD FAILED. Similarly, health gate matched anywhere in output instead of the Health: line. Fixes: - `failed`/`complete`/`certified`: anchor to state-line format `^[hex] state` - Health gate: match `Health: CRITICAL|DEGRADED` (not arbitrary occurrence) - Daemon dead: anchor to `Daemon: stopped/not running` prefix Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…sponse When the LLM returns package.json or tsconfig.json as a parsed JSON object (dict) instead of a serialized string, the CodeResponse validator was raising: Input should be a valid string [type=string_type, input_value={...}, input_type=dict] Add a field_validator that auto-serializes any dict value to a JSON string with 2-space indent. This recovers from the LLM's structured-data habit without requiring prompt changes. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…dict serialization field_validator(mode='before') on dict[str, str] was not intercepting dict values before pydantic validated them. Switched to model_validator(mode='before') which preprocesses the entire model input dict before any field validation, reliably serializing dict/list file values to JSON strings. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

vaskoevgen force-pushed the fix/all-health-fixes branch from e464d44 to c76e7dc Compare April 30, 2026 17:10

vaskoevgen and others added 18 commits May 8, 2026 09:12

vaskoevgen force-pushed the fix/all-health-fixes branch from 84fe4cb to dcfa0c9 Compare May 8, 2026 14:14

fix(implementer): handle export async function in TS name extraction

fd567ac

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

jmcentire merged commit 29ed47d into jmcentire:main May 17, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(health): consolidate all health gate fixes#3

fix(health): consolidate all health gate fixes#3
jmcentire merged 19 commits into
jmcentire:mainfrom
vaskoevgen:fix/all-health-fixes

vaskoevgen commented Apr 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

vaskoevgen commented Apr 30, 2026

Summary

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants