scorecard

A single-command code quality tool for Python projects. Runs 5 analysis tools, aggregates their output into a composite scorecard, detects regressions between runs, and emits prioritized remediation instructions.

$ python scorecard.py

 myproject scorecard                    score: 91/100 (A)
 8 files  3,219 SLOC  0 errors  0 type errors
 0 dead items  3 dup blocks (27 lines)  1 smell

 File                 MI    CC-max  CC-avg  Halstead   LOC  Issues
 db.py              A 33     B 8    A 2.1       412   245  —
 llm.py             A 50     A 5    A 1.8       318   270  —
 pipeline.py        B 18     B 10   A 4.7      1439   782  2 CC boundary, 1 smell
 batch.py           A 32     B 8    A 3.2       890   340  3 dup blocks

 Regressions since last run: none

 Remediation (3 items):
   1. [DUPLICATE] batch.py + pipeline.py: 3 duplicate blocks (27 lines)
      Action: Extract shared logic into pipeline helpers
   2. [SMELL] pipeline.py:run_hierarchy_refine  too-many-locals (25/15)
      Action: Extract variable clusters into NamedTuples
   3. [DEAD] pipeline.py:368 unused parameter 'batch_mode'
      Action: Remove parameter from signature

Why

Code quality tools exist (ruff, mypy, radon, pylint, vulture) but nothing ties them together. Running 5 commands, reading 5 outputs, and deciding what to fix first is tedious. This tool does it in one step and tells you exactly what to do.

It's also designed to be used by LLMs. The --json output and structured remediation items make it easy for Claude or other agents to self-assess code quality and act on the results. See program.md for LLM-specific instructions.

Install

Requires Python 3.11+ and uv.

git clone <repo-url> tools/tool-py-scorecard
cd tools/tool-py-scorecard
uv venv && uv pip install -e .

The tool installs its own dependencies (ruff, mypy, radon, pylint, vulture) into .venv/ so it doesn't pollute your project.

Usage

Run from your project root:

# Full scorecard
tools/tool-py-scorecard/.venv/bin/python tools/tool-py-scorecard/scorecard.py

# JSON output (for CI or programmatic use)
tools/tool-py-scorecard/.venv/bin/python tools/tool-py-scorecard/scorecard.py --json

# Only show regressions since last run
tools/tool-py-scorecard/.venv/bin/python tools/tool-py-scorecard/scorecard.py --diff

# Only show remediation items
tools/tool-py-scorecard/.venv/bin/python tools/tool-py-scorecard/scorecard.py --remediate

# Scope to specific files
tools/tool-py-scorecard/.venv/bin/python tools/tool-py-scorecard/scorecard.py --files src/db.py src/pipeline.py

# Run a single dimension
tools/tool-py-scorecard/.venv/bin/python tools/tool-py-scorecard/scorecard.py --dimension D7

By default, the scorecard scans all *.py files in the current directory (recursively), excluding test_*.py, .venv/, dev/, tools/, and scorecard.py itself.

What it measures

Nine dimensions across three tiers, weighted by impact:

Tier 1 — Correctness (20%)

Dim	Name	Tool	Weight
D1	Lint errors	ruff	10%
D2	Type errors	mypy	10%

Tier 2 — Structure (30%)

Dim	Name	Tool	Weight
D3	Maintainability Index	radon mi	10%
D6	File size / raw metrics	radon raw	5%
D7	Duplicate code	pylint	15%

Tier 3 — Cognitive load (50%)

Dim	Name	Tool	Weight
D4	Cyclomatic complexity	radon cc	20%
D5	Halstead metrics	radon hal	15%
D8	Code smells	pylint	10%
D9	Dead code	vulture	5%

Complexity (D4) has the highest weight because it's the strongest predictor of bug density. Duplicates (D7) are weighted heavily because they cause divergent bugs — a fix applied to one copy but not the other.

Scoring

Each dimension produces a 0–100 score. The composite is a weighted average:

Grade	Score	Meaning
A	90–100	Ship it
B	75–89	Acceptable, minor issues
C	50–74	Needs attention before new features
F	< 50	Stop and remediate

The exit code is 0 for scores >= 75 and 1 otherwise, so you can use it in CI gates.

Regression detection

Each run saves a snapshot to results/.scorecard_history.json. The next run compares against it and flags:

Composite score dropped >= 5 points
Any dimension dropped >= 10 points
New file exceeding 500 SLOC
New duplicate blocks, new smells, or increased dead code

Remediation

The remediation section is the actionable output. Items are sorted by impact:

DUPLICATE — Extract shared code into helpers (highest bug risk)
SMELL — Refactor functions exceeding complexity thresholds
COMPLEXITY — Split or simplify high-CC functions
DEAD — Remove unused code
LINT — Fix style violations
TYPE — Fix type errors

Each item includes the file, line number, what's wrong, what to do, and the expected score impact.

Graceful degradation

If a tool is missing, the scorecard skips that dimension and redistributes its weight across the remaining dimensions. It will never crash due to a missing dependency — it just warns and continues.

Integration

With Claude / LLMs: Copy this repo into tools/tool-py-scorecard/ in your project. This follows a lightweight convention where LLM tools live under tools/, each with their own program.md (the interface doc the LLM reads), their own .venv, and --json support. The LLM bootstraps the venv on first use and discovers tools by looking for tools/*/program.md. See program.md for the full instructions.

As a pre-commit hook: Run scorecard.py --json and fail on composite < threshold or D1/D2 errors.

In CI: Use --json output and check the exit code.

Project structure

scorecard.py          # The tool (single file, all 9 dimensions)
pyproject.toml        # Dependencies and build config
program.md            # LLM-facing usage instructions
SPEC.md               # Detailed technical specification
README.md             # This file

Full specification

See SPEC.md for the complete technical spec including scoring formulas, thresholds, tool invocation details, and output format definitions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

scorecard

Why

Install

Usage

What it measures

Scoring

Regression detection

Remediation

Graceful degradation

Integration

Project structure

Full specification

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.gitignore		.gitignore
README.md		README.md
SPEC.md		SPEC.md
program.md		program.md
pyproject.toml		pyproject.toml
scorecard.py		scorecard.py

Folders and files

Latest commit

History

Repository files navigation

scorecard

Why

Install

Usage

What it measures

Scoring

Regression detection

Remediation

Graceful degradation

Integration

Project structure

Full specification

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages