Phase P4 of the orchestration × complexity matrix work.
New bench/quality/ package with per-language analyzers, registered via entry points:
- Python: ruff + mypy + radon cc → ArtifactQuality(score: float, lint_errors, type_errors, cyclomatic_max)
- TypeScript: eslint + tsc (follow-up)
- Go: vet + staticcheck (follow-up)
DispatchResult round-trips artifact_quality through serialization. Pawbench JSON output includes dim5_artifact_quality.
DQS unchanged in this phase — calibration data first (≥100 dispatches), formula change later.
Gate to P5: scores stable across 3 consecutive runs.
Why: catches the "passes tests, ships slop" failure mode. Most defensible secondary metric, orthogonal to AC pass and immune to test-overfitting.
Phase P4 of the orchestration × complexity matrix work.
New
bench/quality/package with per-language analyzers, registered via entry points:DispatchResult round-trips
artifact_qualitythrough serialization. Pawbench JSON output includesdim5_artifact_quality.DQS unchanged in this phase — calibration data first (≥100 dispatches), formula change later.
Gate to P5: scores stable across 3 consecutive runs.
Why: catches the "passes tests, ships slop" failure mode. Most defensible secondary metric, orthogonal to AC pass and immune to test-overfitting.