Skip to content

spec 009 P4: artifact_quality analyzers (Python first) #10

@zenprocess

Description

@zenprocess

Phase P4 of the orchestration × complexity matrix work.

New bench/quality/ package with per-language analyzers, registered via entry points:

  • Python: ruff + mypy + radon cc → ArtifactQuality(score: float, lint_errors, type_errors, cyclomatic_max)
  • TypeScript: eslint + tsc (follow-up)
  • Go: vet + staticcheck (follow-up)

DispatchResult round-trips artifact_quality through serialization. Pawbench JSON output includes dim5_artifact_quality.

DQS unchanged in this phase — calibration data first (≥100 dispatches), formula change later.

Gate to P5: scores stable across 3 consecutive runs.

Why: catches the "passes tests, ships slop" failure mode. Most defensible secondary metric, orthogonal to AC pass and immune to test-overfitting.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions