Skip to content

zhouliang-sjtu/codespect-matrix

Repository files navigation

codespect-matrix

Multi-Agent Code Evolution Platform with Deep Taint Analysis

Version Python License Status Agents Tests Coverage

CPG Taint Analysis · 10-Stage Pipeline · Debate Review · Harness Engineering · Self-Evolution

🌐 中文


What is codespect-matrix?

Not just another linter. codespect-matrix is a virtual QA team — 24 specialized AI agents that conduct debate-style code review across a 10-stage pipeline, cross-validate every finding through a Harness verification engine, and perform deep taint analysis via Code Property Graph to track vulnerabilities across function boundaries.


Why codespect-matrix?

Traditional QA codespect-matrix
Single-dimension (linter / coverage) 24 agents in joint debate review
Rigid rules, heavy config AI agents auto-adapt to project characteristics
Surface-level pattern matching CPG Deep Taint Analysis — AST + Call Graph + Data Flow tracking
No runtime awareness Dynamic Analysis — DB schema, API contract, smoke testing
Reports issues without fixes Every issue includes remediation + auto-fix plan
One-and-done scan Convergence loop — repeats until no new findings
No quality verification gates Harness Engineering — constraint enforcement, drift detection, auto-recovery
Can't quantify quality changes --evolve health dashboard + --evolve-baseline trend tracking
Generic, domain-agnostic Built-in PHI / HIPAA / FHIR / DICOM / HL7 / CDS medical specialties
Starts fresh every time Dual memory — project memory + global knowledge base
No self-improvement SelfEvolver — learns from QA→Fix→ReQA cycles across projects

Pipeline: CPG Pre-scan → Agent Selection → Parallel Inspection → Harness Validation → Cross-Review → Harness Verification → Debate Arbitration → Convergence Detection → Drift Detection → Fix Generation → Evolution Report


Quick Start

# Install from source
git clone https://github.com/zhouliang-sjtu/codespect-matrix.git
cd codespect-matrix
pip install -e .

# Default multi-agent review (10-stage pipeline)
codespect-matrix

# CI gate
codespect-matrix --ci --json

# Code evolution analysis (health dashboard + roadmap)
codespect-matrix --evolve

# Save evolution baseline for trend tracking
codespect-matrix --evolve-baseline

# Full convergence cycle
codespect-matrix --max-rounds 10 --output report.md

Architecture

10-Stage Pipeline

Phase Stage Mechanism
0 CPG Pre-scan Code Property Graph: AST + Call Graph + Data Flow + Taint Analysis. Pure Python, zero dependencies.
1 Agent Selection Project profile + global KB → auto-select most relevant agents
2 Parallel Inspection All agents scan independently (rule+LLM for security/healthcare)
3 Harness Validation Constraint enforcement — consistency checks, evidence quality assessment
4 Cross-Review Each finding cross-validated by agents from different domains
5 Harness Verification Cross-phase verification, feedback routing, agent error recovery
6 Debate Arbitration Disputed findings → challenge → defense → Orchestrator ruling (max 3 rounds)
7 Convergence Detection 2 consecutive rounds with no new findings → auto-terminate
8 Drift Detection Quality trend analysis against historical baseline
9 Fix Generation Confirmed issues → auto-fix proposals with backup
10 Evolution Report Health score + technical debt + architecture + roadmap

CPG Deep Taint Analysis

The Code Property Graph pre-scan elevates codespect-matrix from pattern matching to semantic program analysis:

Component Capability
AST Parsing Extract all functions, classes, imports, variable definitions
Call Graph Map inter-function call dependencies
Data Flow Graph Track variable definition → use chains
Taint Analysis Trace untrusted input → dangerous sink across function boundaries

Detected vulnerability chains: SQL injection (user_input → execute), PHI leaks (patient_data → log/output), path traversal (user_input → open()), cross-function attack vectors.

Agent Roster (24 agents)

Agent Engine Focus
security Rule+LLM Vulnerabilities, injection, weak crypto, secrets
healthcare Rule+LLM HIPAA compliance, patient data protection
phi_protection Rule+LLM PHI detection, ID leaks, data masking
compliance Rule+LLM License compliance, GDPR audit
medical_data Rule+LLM ICD coding, blood pressure/SpO2 validation
fhir Rule+LLM FHIR R4 resource validation, SMART on FHIR auth
dicom Rule+LLM DICOM tag validation, PHI tag detection
hl7 Rule+LLM HL7 v2 message security, MLLP transport
cds Rule+LLM Clinical decision support rule safety, fallback checks
developer Pure LLM Type safety, error handling, code quality
architect Pure LLM Circular dependencies, module coupling, God module
performance Pure LLM N+1 queries, memory bombs, blocking I/O
devops Pure LLM Observability, health checks, graceful shutdown
testing Pure LLM Test coverage, testability
api Pure LLM REST conventions, rate limiting, auth
dependency Pure LLM Dependency versions, CVE scanning
concurrency Pure LLM Race conditions, deadlocks, thread safety
linter Subprocess+LLM Ruff, mypy, flake8 runner with LLM interpretation
datascience Pure LLM Statistical modeling, data integrity, overfitting
hardcode Pure LLM Hardcoded values, magic numbers, cross-file duplicates
db_compatibility Static SQL dialect incompatibilities (MySQL↔PostgreSQL↔SQLite)
db_schema Dynamic ORM model vs actual database schema consistency
api_contract Static+Dynamic OpenAPI schema validation, FastAPI parameter boundaries
smoke_test Dynamic Health check endpoints, API availability testing

Dynamic analysis agents auto-activate based on project characteristics — no manual configuration needed.

Harness Engineering

"Agent = Model + Harness" — Human Steer, Agent Execute

Feature Mechanism
Constraint Validation Severity alignment check, evidence quality assessment
Cross-Phase Verification Inspect → Review → Verify → Output consistency chain
Feedback Routing Review results feed back to improve agent inspection
Auto-Recovery Failed agent retry, fallback to rule-only mode
Drift Detection Compare results against baseline, alert quality degradation

Code Evolution Dashboard

codespect-matrix --evolve          # Full evolution analysis
codespect-matrix --evolve-baseline # Save as baseline for trend tracking
Dimension What It Measures
Health Score 0-100 weighted from agent findings (critical×100, high×50, ...)
Technical Debt TODO/FIXME/HACK density + oversized files + comment ratio
Architecture Import graph → coupling, cycles, God module detection
Test Coverage pytest --cov integration + fallback file counting
Evolution Trend Baseline comparison → improving / degrading / stable
Roadmap P0-P2 prioritized improvement items with effort estimates

Self-Evolution Engine

codespect-matrix --evolve-self     # See what the tool has learned

codespect-matrix learns from every QA cycle across projects:

Phase What Happens
1. Scan Run codespect-matrix on a project → findings
2. Fix Apply remediation → record_qa_cycle() captures fixes
3. Re-scan Verify health improvement → delta tracked
4. Learn Pattern confidence updated, agent weights adjusted
5. Evolve SelfEvolver.evolve() prunes low-confidence templates, promotes proven patterns to global KB

Over time, the tool gets more accurate: fewer false positives, faster fix suggestions, better agent selection per project type.


LLM Support

Adapter-based LLM integration. Falls back to rule-only mode when no LLM is configured:

Provider Config
OpenAI (GPT-4o) LLM_PROVIDER=openai
Anthropic (Claude) LLM_PROVIDER=anthropic
Google (Gemini) LLM_PROVIDER=google
Baidu (ERNIE) LLM_PROVIDER=baidu
Alibaba (Qwen) LLM_PROVIDER=tongyi
Zhipu (GLM) LLM_PROVIDER=zhipu
Hugging Face LLM_PROVIDER=huggingface

Healthcare Specialties

Check Details
PHI Privacy Scan Patient info leaks in logs, print, exceptions, exports, APIs
Medical Data Validation Blood pressure (60-260), SpO2 (60-100), ICD-10 format, Chinese ID checksum
Data Masking Verification Name→alias, ID→hash, export without de-identification

Severity Levels

Level Weight Description
critical (P0) 100 Service crash / PHI leak / build failure
high (P1) 50 Security vulnerability / data loss / missing HTTP timeout
medium (P2) 15 Best practice violation / missing rate limiting
low (P3) 3 Optimization suggestion

CI/CD Integration (GitHub Actions)

- name: codespect-matrix gate
  run: codespect-matrix --ci --json > qa-report.json

Python API

# Multi-agent review
from codespect_matrix.agents import AgentOrchestrator

orch = AgentOrchestrator(project_path="/path/to/project")
orch.initialize()
result = orch.run_full_cycle()
print(result["total_findings"], "issues found")

# Code evolution analysis
from codespect_matrix.evolution import EvolutionReporter

reporter = EvolutionReporter(project_path="/path/to/project")
report = reporter.generate_full_report()
print(f"Health: {report['health']['health_score']}/100")
# Self-evolution
from codespect_matrix.evolution import SelfEvolver

evolver = SelfEvolver()
evolver.record_qa_cycle(
    project_name="my-app",
    before_health=62.3,
    findings=[...],           # from agent scan
    fixes_applied=[...],      # what was fixed
    after_health=85.1,        # re-scan result
)
evolver.evolve()              # prune weak patterns, promote proven ones
print(evolver.get_evolution_summary())

Custom Agents

from codespect_matrix.agents import BaseAgent, AgentRole, Finding

class MyAgent(BaseAgent):
    def get_description(self): return "My custom check"
    def get_domain(self): return "custom"
    
    def inspect(self, files_context):
        return [Finding(
            check_name="my_check",
            severity="medium",
            message="A custom issue detected",
            remediation="Suggested fix",
            confidence=0.8
        )]

# Register and run
from codespect_matrix.agents import AgentOrchestrator
orch = AgentOrchestrator()
orch._add_agent("my_agent", MyAgent(name="my_agent", role=AgentRole.INSPECTOR))

CLI Reference

Flag Description
--path, -p Project path (default: current directory)
--max-rounds Max inspection rounds (default: 5)
--ci CI/CD gate mode — exit_code=1 when thresholds exceeded
--json JSON output
--evolve Code evolution analysis — health + debt + architecture + roadmap
--evolve-baseline Save evolution analysis as baseline for trend comparison
--evolve-self Self-evolution summary — what the tool has learned from past QA cycles
--fix-plan AI fix — step 1: generate plan
--fix-execute AI fix — step 2: execute
--fix-all Execute all fixes including high-risk ones (with --fix-execute)
--output Report output file path

Development

# Install dev dependencies
pip install -e ".[dev]"

# Run tests
pytest tests/ -q

# Run tests with coverage
pytest tests/ --cov=codespect_matrix --cov-report=term

# Type checking
mypy codespect_matrix/

# Lint
ruff check codespect_matrix/

Free + Donate

codespect-matrix is free and open source. Support the project:


License

MIT License © 2026 Zhou Liang · Shanghai Jiao Tong University School of Medicine


Version: 1.0.0
Status: Stable — production-ready
Tests: 82 passed · 59% coverage
Architecture: Multi-Agent · Debate Review · Hybrid Engine · Code Evolution

About

Multi-Agent Code Evolution Platform — debate-style review with health scoring, tech debt tracking, architecture analysis, and improvement roadmaps

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors