fix(health): consolidate all health gate fixes#3
Merged
Conversation
e464d44 to
c76e7dc
Compare
Two related fixes for spurious health gate pauses: 1. `_check_output_planning_ratio`: guard against `generation_tokens == 0`. When no code has been generated yet, the ratio is literally 0/N = 0.00x, which always triggers CRITICAL. The fix returns healthy (insufficient data) until at least one generation token exists — semantically correct since the ratio is undefined before any implementation work begins. 2. `_PRE_ARTIFACT_PHASES`: extend to cover post-build phases (integrate, arbiter, polish, retrospective, complete). These phases produce no new code, so the cumulative planning/generation ratio never recovers after implementation ends, causing 5–6 more spurious pauses after the build is already complete. Without both fixes, a typical build requires 6–7 manual `pact resume .` calls — one before implementation starts and one after each post-build phase. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ositives Preflight runs before any code is generated. The variance_reaches_target check fires CRITICAL when generation_tokens is 0 and total_tokens > 5000, blocking all builds that accumulate token spend in the decompose/contract phases before reaching preflight. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The phase_balance check fires CRITICAL when implement dominates (~89% of tokens in a unary single-component build). This is expected — the check was designed to catch inverted architectures where interview/ decompose consume more than execution. Guard: only flag planning phases, not implement/integrate/polish/retrospective/complete. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…tion - test_author.py: prompt now generates @pytest.mark.anyio instead of @pytest.mark.asyncio — anyio is a pact dependency, no extra install needed - test_harness.py: add pact's site-packages to PYTHONPATH in test subprocess so anyio is always available Fixes: generated async tests failing with "async def functions are not natively supported. Install pytest-asyncio..." Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… names Two bugs in test result parsing: 1. parse_pytest_output triggered a spurious "collection error" when a test name contained the substring "error" (e.g. test_valid_error), even though all tests simply skipped. Now checks for real pytest error markers: "ERROR collecting", "error during collection", "= ERRORS =", etc. 2. all_passed returned False when total==0 (all tests skipped). Skipped ≠ failed — no failures and no errors should pass vacuously. Test-file-not- found still fails correctly because it sets errors=1. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Goodhart tests were generated with `from src.<cid> import *` but pact's test runner sets PYTHONPATH=src/<cid>, so the importable package name is <cid> directly — not src.<cid>. This caused every goodhart test to fail with ModuleNotFoundError on collection. Fix: generate `from <cid> import *` (matching how regular contract tests import their components). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ttern to system prompt Generated contract tests with backend/database fixtures were failing with connection errors instead of skipping gracefully when infrastructure was absent. Added canonical urllib.request connectivity-check pattern to TEST_SYSTEM so future generated fixtures call pytest.skip() when the service is unreachable rather than letting the test fail. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The contract phase is a scheduler no-op (test authoring happens during decompose). When the health check ran after the contract phase with pre_artifact=False, it saw 0% generation tokens vs planning tokens from decompose and falsely triggered variance_reaches_target CRITICAL → pause. Adding decompose and contract to _PRE_ARTIFACT_PHASES suppresses generation-ratio checks during phases where no implementation exists yet. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ance false positive Goodhart system prompt: generated tests were using `requests` (not installed) instead of `httpx`, and HTTP tests had no skip-when-unreachable guard. Now requires httpx and adds a pytestmark_backend skip pattern for live-backend tests. health.py: add decompose to _EXECUTION_PHASES so phase_balance doesn't fire CRITICAL when decompose dominates token usage — it does contract+test authoring, which is legitimate execution work, not coordination overhead. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… non-HTTP components pact deploy was generating 0 edges because leaf components have empty dependency declarations — only the root integration node declares deps, and root is excluded from the deployed node list. Fix edge inference by classifying components via their contract side-effect types: database_* effects imply an API→DB edge, http_* effects imply a client→API edge. Also fix health check generation: components whose side-effects are exclusively DDL/catalog operations (schema init, migrations) have no HTTP server and should not receive a health_check URL. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds monitor-pact.sh — a portable bash script that watches a running pact daemon and keeps it running without manual intervention: - Polls pact status every N seconds (configurable, default 30s) - Auto-runs `pact approve` on interview pauses - Auto-runs `pact resume` on health gate pauses - Restarts daemon if the process dies, then resumes - Prints `pact log` tail on phase changes and every 5 polls - Prints `pact components` during implement/integrate phases - Exits 0 on complete/certified, exits 1 on failed No hardcoded paths: pact binary found via script dir or PATH, API key loaded from environment then walked up parent dirs. Usage: bash monitor-pact.sh <project-dir> [interval-seconds] Documents the script in README.md under a new "Monitoring the Daemon" section with usage examples and sample output. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds silent-hang detection to monitor-pact.sh: if the daemon process is alive and state=active but the pact audit log timestamp hasn't advanced in STUCK_TIMEOUT seconds (default 600s / 10 min), the script kills the daemon PID, restarts it, and runs pact resume. This covers the case where a hanging API call keeps the process alive but blocks all progress indefinitely — previously undetected. STUCK_TIMEOUT is configurable via env var: STUCK_TIMEOUT=300 bash monitor-pact.sh . 15 Also documents the new scenario in README.md. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…tection Three bugs fixed: 1. Removed `set -euo pipefail` — grep returns exit 1 on no-match, which silently killed the script mid-loop (visible as one-poll-then-quit). Replaced with a `sgrep()` helper that always exits 0. 2. Logs now printed every poll instead of every 5 polls. Users were missing log output during active phases. 3. Silent-hang detection now guards against comparing against an empty LAST_AUDIT_TS on the first iteration (was triggering false restarts). Also logs a progress message showing silence duration vs threshold. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The core problem: print_log reprinted the same last-N entries every poll, so the output looked like nothing was happening even when pact was actively writing new entries. New approach — incremental log tail: - Seeds SEEN_LOG_LINES on startup to the current entry count - Each poll counts total entries and compares to SEEN_LOG_LINES - Only prints the delta (new entries since last poll) - Announces "N new entries" header so it's clear when progress lands - Shows nothing when there are no new entries (no noise) Also removes the every-5-poll gate that was suppressing logs between phase changes. Logs now check every poll but only print when new. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…e to || true sgrep() was used in if-conditions where the exit code determines the branch. Because sgrep wraps grep with || true, it always exits 0, so every condition was always true: interview/health/dead-daemon/build-complete all fired on every loop iteration. Add cgrep() for if-conditions (plain grep, no || true). Keep sgrep() for output-capturing where we don't want a no-match to abort the script. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…rom audit trail The audit trail in pact status output contains historic entries like "Resuming from failed: Phase decompose timed out..." which caused the bare `failed` pattern to false-trigger BUILD FAILED. Similarly, health gate matched anywhere in output instead of the Health: line. Fixes: - `failed`/`complete`/`certified`: anchor to state-line format `^[hex] state` - Health gate: match `Health: CRITICAL|DEGRADED` (not arbitrary occurrence) - Daemon dead: anchor to `Daemon: stopped/not running` prefix Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…sponse
When the LLM returns package.json or tsconfig.json as a parsed JSON object
(dict) instead of a serialized string, the CodeResponse validator was raising:
Input should be a valid string [type=string_type, input_value={...}, input_type=dict]
Add a field_validator that auto-serializes any dict value to a JSON string
with 2-space indent. This recovers from the LLM's structured-data habit without
requiring prompt changes.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…dict serialization field_validator(mode='before') on dict[str, str] was not intercepting dict values before pydantic validated them. Switched to model_validator(mode='before') which preprocesses the entire model input dict before any field validation, reliably serializing dict/list file values to JSON strings. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
84fe4cb to
dcfa0c9
Compare
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Consolidates all 4 health gate fixes into a single branch (supersedes #2).
phase_balancecritical check for execution phases — execution phases don't produce generation tokens, causing false-positive balance failurespreflightto_PRE_ARTIFACT_PHASES— pre-artifact phases need preflight included to avoid false positivesTest plan