spec 009 P4: artifact_quality analyzers (Python first)

Phase P4 of the orchestration × complexity matrix work.

New `bench/quality/` package with per-language analyzers, registered via entry points:

- Python: ruff + mypy + radon cc → ArtifactQuality(score: float, lint_errors, type_errors, cyclomatic_max)
- TypeScript: eslint + tsc (follow-up)
- Go: vet + staticcheck (follow-up)

DispatchResult round-trips `artifact_quality` through serialization. Pawbench JSON output includes `dim5_artifact_quality`.

**DQS unchanged in this phase** — calibration data first (≥100 dispatches), formula change later.

**Gate to P5:** scores stable across 3 consecutive runs.

Why: catches the "passes tests, ships slop" failure mode. Most defensible secondary metric, orthogonal to AC pass and immune to test-overfitting.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

spec 009 P4: artifact_quality analyzers (Python first) #10

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

spec 009 P4: artifact_quality analyzers (Python first) #10

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions