Multi-Agent Code Evolution Platform with Deep Taint Analysis
CPG Taint Analysis · 10-Stage Pipeline · Debate Review · Harness Engineering · Self-Evolution
🌐 中文
Not just another linter. codespect-matrix is a virtual QA team — 24 specialized AI agents that conduct debate-style code review across a 10-stage pipeline, cross-validate every finding through a Harness verification engine, and perform deep taint analysis via Code Property Graph to track vulnerabilities across function boundaries.
| Traditional QA | codespect-matrix |
|---|---|
| Single-dimension (linter / coverage) | 24 agents in joint debate review |
| Rigid rules, heavy config | AI agents auto-adapt to project characteristics |
| Surface-level pattern matching | CPG Deep Taint Analysis — AST + Call Graph + Data Flow tracking |
| No runtime awareness | Dynamic Analysis — DB schema, API contract, smoke testing |
| Reports issues without fixes | Every issue includes remediation + auto-fix plan |
| One-and-done scan | Convergence loop — repeats until no new findings |
| No quality verification gates | Harness Engineering — constraint enforcement, drift detection, auto-recovery |
| Can't quantify quality changes | --evolve health dashboard + --evolve-baseline trend tracking |
| Generic, domain-agnostic | Built-in PHI / HIPAA / FHIR / DICOM / HL7 / CDS medical specialties |
| Starts fresh every time | Dual memory — project memory + global knowledge base |
| No self-improvement | SelfEvolver — learns from QA→Fix→ReQA cycles across projects |
Pipeline: CPG Pre-scan → Agent Selection → Parallel Inspection → Harness Validation → Cross-Review → Harness Verification → Debate Arbitration → Convergence Detection → Drift Detection → Fix Generation → Evolution Report
# Install from source
git clone https://github.com/zhouliang-sjtu/codespect-matrix.git
cd codespect-matrix
pip install -e .
# Default multi-agent review (10-stage pipeline)
codespect-matrix
# CI gate
codespect-matrix --ci --json
# Code evolution analysis (health dashboard + roadmap)
codespect-matrix --evolve
# Save evolution baseline for trend tracking
codespect-matrix --evolve-baseline
# Full convergence cycle
codespect-matrix --max-rounds 10 --output report.md| Phase | Stage | Mechanism |
|---|---|---|
| 0 | CPG Pre-scan | Code Property Graph: AST + Call Graph + Data Flow + Taint Analysis. Pure Python, zero dependencies. |
| 1 | Agent Selection | Project profile + global KB → auto-select most relevant agents |
| 2 | Parallel Inspection | All agents scan independently (rule+LLM for security/healthcare) |
| 3 | Harness Validation | Constraint enforcement — consistency checks, evidence quality assessment |
| 4 | Cross-Review | Each finding cross-validated by agents from different domains |
| 5 | Harness Verification | Cross-phase verification, feedback routing, agent error recovery |
| 6 | Debate Arbitration | Disputed findings → challenge → defense → Orchestrator ruling (max 3 rounds) |
| 7 | Convergence Detection | 2 consecutive rounds with no new findings → auto-terminate |
| 8 | Drift Detection | Quality trend analysis against historical baseline |
| 9 | Fix Generation | Confirmed issues → auto-fix proposals with backup |
| 10 | Evolution Report | Health score + technical debt + architecture + roadmap |
The Code Property Graph pre-scan elevates codespect-matrix from pattern matching to semantic program analysis:
| Component | Capability |
|---|---|
| AST Parsing | Extract all functions, classes, imports, variable definitions |
| Call Graph | Map inter-function call dependencies |
| Data Flow Graph | Track variable definition → use chains |
| Taint Analysis | Trace untrusted input → dangerous sink across function boundaries |
Detected vulnerability chains: SQL injection (user_input → execute), PHI leaks (patient_data → log/output), path traversal (user_input → open()), cross-function attack vectors.
| Agent | Engine | Focus |
|---|---|---|
| security | Rule+LLM | Vulnerabilities, injection, weak crypto, secrets |
| healthcare | Rule+LLM | HIPAA compliance, patient data protection |
| phi_protection | Rule+LLM | PHI detection, ID leaks, data masking |
| compliance | Rule+LLM | License compliance, GDPR audit |
| medical_data | Rule+LLM | ICD coding, blood pressure/SpO2 validation |
| fhir | Rule+LLM | FHIR R4 resource validation, SMART on FHIR auth |
| dicom | Rule+LLM | DICOM tag validation, PHI tag detection |
| hl7 | Rule+LLM | HL7 v2 message security, MLLP transport |
| cds | Rule+LLM | Clinical decision support rule safety, fallback checks |
| developer | Pure LLM | Type safety, error handling, code quality |
| architect | Pure LLM | Circular dependencies, module coupling, God module |
| performance | Pure LLM | N+1 queries, memory bombs, blocking I/O |
| devops | Pure LLM | Observability, health checks, graceful shutdown |
| testing | Pure LLM | Test coverage, testability |
| api | Pure LLM | REST conventions, rate limiting, auth |
| dependency | Pure LLM | Dependency versions, CVE scanning |
| concurrency | Pure LLM | Race conditions, deadlocks, thread safety |
| linter | Subprocess+LLM | Ruff, mypy, flake8 runner with LLM interpretation |
| datascience | Pure LLM | Statistical modeling, data integrity, overfitting |
| hardcode | Pure LLM | Hardcoded values, magic numbers, cross-file duplicates |
| db_compatibility | Static | SQL dialect incompatibilities (MySQL↔PostgreSQL↔SQLite) |
| db_schema | Dynamic | ORM model vs actual database schema consistency |
| api_contract | Static+Dynamic | OpenAPI schema validation, FastAPI parameter boundaries |
| smoke_test | Dynamic | Health check endpoints, API availability testing |
Dynamic analysis agents auto-activate based on project characteristics — no manual configuration needed.
"Agent = Model + Harness" — Human Steer, Agent Execute
| Feature | Mechanism |
|---|---|
| Constraint Validation | Severity alignment check, evidence quality assessment |
| Cross-Phase Verification | Inspect → Review → Verify → Output consistency chain |
| Feedback Routing | Review results feed back to improve agent inspection |
| Auto-Recovery | Failed agent retry, fallback to rule-only mode |
| Drift Detection | Compare results against baseline, alert quality degradation |
codespect-matrix --evolve # Full evolution analysis
codespect-matrix --evolve-baseline # Save as baseline for trend tracking| Dimension | What It Measures |
|---|---|
| Health Score | 0-100 weighted from agent findings (critical×100, high×50, ...) |
| Technical Debt | TODO/FIXME/HACK density + oversized files + comment ratio |
| Architecture | Import graph → coupling, cycles, God module detection |
| Test Coverage | pytest --cov integration + fallback file counting |
| Evolution Trend | Baseline comparison → improving / degrading / stable |
| Roadmap | P0-P2 prioritized improvement items with effort estimates |
codespect-matrix --evolve-self # See what the tool has learnedcodespect-matrix learns from every QA cycle across projects:
| Phase | What Happens |
|---|---|
| 1. Scan | Run codespect-matrix on a project → findings |
| 2. Fix | Apply remediation → record_qa_cycle() captures fixes |
| 3. Re-scan | Verify health improvement → delta tracked |
| 4. Learn | Pattern confidence updated, agent weights adjusted |
| 5. Evolve | SelfEvolver.evolve() prunes low-confidence templates, promotes proven patterns to global KB |
Over time, the tool gets more accurate: fewer false positives, faster fix suggestions, better agent selection per project type.
Adapter-based LLM integration. Falls back to rule-only mode when no LLM is configured:
| Provider | Config |
|---|---|
| OpenAI (GPT-4o) | LLM_PROVIDER=openai |
| Anthropic (Claude) | LLM_PROVIDER=anthropic |
| Google (Gemini) | LLM_PROVIDER=google |
| Baidu (ERNIE) | LLM_PROVIDER=baidu |
| Alibaba (Qwen) | LLM_PROVIDER=tongyi |
| Zhipu (GLM) | LLM_PROVIDER=zhipu |
| Hugging Face | LLM_PROVIDER=huggingface |
| Check | Details |
|---|---|
| PHI Privacy Scan | Patient info leaks in logs, print, exceptions, exports, APIs |
| Medical Data Validation | Blood pressure (60-260), SpO2 (60-100), ICD-10 format, Chinese ID checksum |
| Data Masking Verification | Name→alias, ID→hash, export without de-identification |
| Level | Weight | Description |
|---|---|---|
| critical (P0) | 100 | Service crash / PHI leak / build failure |
| high (P1) | 50 | Security vulnerability / data loss / missing HTTP timeout |
| medium (P2) | 15 | Best practice violation / missing rate limiting |
| low (P3) | 3 | Optimization suggestion |
- name: codespect-matrix gate
run: codespect-matrix --ci --json > qa-report.json# Multi-agent review
from codespect_matrix.agents import AgentOrchestrator
orch = AgentOrchestrator(project_path="/path/to/project")
orch.initialize()
result = orch.run_full_cycle()
print(result["total_findings"], "issues found")
# Code evolution analysis
from codespect_matrix.evolution import EvolutionReporter
reporter = EvolutionReporter(project_path="/path/to/project")
report = reporter.generate_full_report()
print(f"Health: {report['health']['health_score']}/100")# Self-evolution
from codespect_matrix.evolution import SelfEvolver
evolver = SelfEvolver()
evolver.record_qa_cycle(
project_name="my-app",
before_health=62.3,
findings=[...], # from agent scan
fixes_applied=[...], # what was fixed
after_health=85.1, # re-scan result
)
evolver.evolve() # prune weak patterns, promote proven ones
print(evolver.get_evolution_summary())from codespect_matrix.agents import BaseAgent, AgentRole, Finding
class MyAgent(BaseAgent):
def get_description(self): return "My custom check"
def get_domain(self): return "custom"
def inspect(self, files_context):
return [Finding(
check_name="my_check",
severity="medium",
message="A custom issue detected",
remediation="Suggested fix",
confidence=0.8
)]
# Register and run
from codespect_matrix.agents import AgentOrchestrator
orch = AgentOrchestrator()
orch._add_agent("my_agent", MyAgent(name="my_agent", role=AgentRole.INSPECTOR))| Flag | Description |
|---|---|
--path, -p |
Project path (default: current directory) |
--max-rounds |
Max inspection rounds (default: 5) |
--ci |
CI/CD gate mode — exit_code=1 when thresholds exceeded |
--json |
JSON output |
--evolve |
Code evolution analysis — health + debt + architecture + roadmap |
--evolve-baseline |
Save evolution analysis as baseline for trend comparison |
--evolve-self |
Self-evolution summary — what the tool has learned from past QA cycles |
--fix-plan |
AI fix — step 1: generate plan |
--fix-execute |
AI fix — step 2: execute |
--fix-all |
Execute all fixes including high-risk ones (with --fix-execute) |
--output |
Report output file path |
# Install dev dependencies
pip install -e ".[dev]"
# Run tests
pytest tests/ -q
# Run tests with coverage
pytest tests/ --cov=codespect_matrix --cov-report=term
# Type checking
mypy codespect_matrix/
# Lint
ruff check codespect_matrix/codespect-matrix is free and open source. Support the project:
- ⭐ Star this repo
- 💰 GitHub Sponsors
MIT License © 2026 Zhou Liang · Shanghai Jiao Tong University School of Medicine
Version: 1.0.0
Status: Stable — production-ready
Tests: 82 passed · 59% coverage
Architecture: Multi-Agent · Debate Review · Hybrid Engine · Code Evolution