ADR-056: Trust But Verify - Ensuring skill outputs are deterministic and trustworthy
AQE v3.4.2 introduces a 4-layer skill validation system with trust tiers that let you know how reliable each skill's output is.
Total: 97 skills
# View skill validation status
aqe eval status --skill security-testing
# Output:
# Skill: security-testing
# Trust Tier: 3 (Verified)
# Schema: ✓ schemas/output.json
# Validator: ✓ scripts/validate-skill.cjs
# Eval Suite: ✓ evals/security-testing.yaml (8 test cases)# Run evaluation for a single skill
aqe eval run --skill security-testing --model claude-sonnet-4
# Run evaluations for all Tier 3 skills
aqe eval run-all --skills-tier 3 --models "claude-sonnet-4,claude-haiku"
# Generate evaluation report
aqe eval report --skill security-testing --format markdown# Validate output against schema
cd .claude/skills/security-testing
node scripts/validate-skill.cjs output.json
# Output: PASS: All validations passed# Aggregate results from multiple runs
aqe skill report --input results/ --output validation-report.md
# Quick summary
aqe skill summary --input results/
# Compare runs for regression detection
aqe skill compare --current results/ --baseline .baseline/ --threshold 0.05These skills have complete validation infrastructure and are recommended for production use:
api-testing-patterns- API testing with contract validationcontract-testing- Consumer-driven contract testingrisk-based-testing- Risk-prioritized testingshift-left-testing/shift-right-testing- Testing strategytest-automation-strategy- Automation framework designtestability-scoring- Code testability assessment
security-testing- OWASP vulnerability scanningaccessibility-testing- WCAG 2.2 complianceperformance-testing- Load and stress testingchaos-engineering-resilience- Fault injectiondatabase-testing- Data integrity validationmutation-testing- Test quality assessmentvisual-testing-advanced- Visual regression
qe-test-generation- AI-powered test creationqe-test-execution- Parallel test executionqe-coverage-analysis- O(log n) gap detectionqe-quality-assessment- Quality gatesqe-security-compliance- Security auditingqe-code-intelligence- Knowledge graph analysis
[Full list: .claude/skills/TRUST-TIERS.md]
┌─────────────────────────────────────────────────────┐
│ Layer 0: SKILL.md (Intent) │
│ - Human-readable instructions │
├─────────────────────────────────────────────────────┤
│ Layer 1: schemas/output.json (Structure) │
│ - JSON Schema validation │
├─────────────────────────────────────────────────────┤
│ Layer 2: scripts/validate-skill.cjs (Correctness) │
│ - Deterministic output verification │
├─────────────────────────────────────────────────────┤
│ Layer 3: evals/*.yaml (Behavior) │
│ - Test cases with expected behaviors │
└─────────────────────────────────────────────────────┘
Tier 0 → Tier 1 (Add Schema):
# Create schema
mkdir -p .claude/skills/my-skill/schemas
# Copy template and customize
cp .claude/skills/.validation/templates/skill-output.template.json \
.claude/skills/my-skill/schemas/output.jsonTier 1 → Tier 2 (Add Validator):
mkdir -p .claude/skills/my-skill/scripts
cp .claude/skills/.validation/templates/validate.template.sh \
.claude/skills/my-skill/scripts/validate-skill.cjs
chmod +x .claude/skills/my-skill/scripts/validate-skill.cjsTier 2 → Tier 3 (Add Evals):
mkdir -p .claude/skills/my-skill/evals
cp .claude/skills/.validation/templates/eval.template.yaml \
.claude/skills/my-skill/evals/my-skill.yamlAdd trust tier to your SKILL.md:
---
trust_tier: 3
validation:
schema_path: schemas/output.json
validator_path: scripts/validate-config.json
eval_path: evals/my-skill.yaml
---npx tsx scripts/update-skill-manifest.tsAdd to your GitHub Actions workflow:
- name: Validate Skills
run: |
aqe eval run-all --skills-tier 3 --output results/
aqe skill report --input results/ --output skill-validation.md
- name: Check for Regressions
run: |
aqe skill compare --current results/ --baseline .baseline/ --threshold 0.05Part of AQE v3.4.2 - Trust But Verify