Implement evidence-based confidence calibration engine with uncertainty-aware policies and APIs by Copilot · Pull Request #81 · IzaacBaptista/GroundedOS-Lab

Copilot · 2026-06-04T15:54:59Z

Summary

This PR upgrades confidence from score heuristics to an evidence-calibrated model that accounts for coverage, agreement, contradiction risk, source diversity, and rerank stability, then selects uncertainty-aware actions. It also exposes confidence scoring/calibration/explain APIs and surfaces calibrated traces in Dev Mode.

Calibration engine
- Added ConfidenceCalibrationEngine + ConfidenceSignalCollector with typed outputs: ConfidenceScore, ConfidenceBreakdown, ConfidenceTrace, ConfidenceFactor, CalibrationProfile.
- Computes: overall confidence, retrieval/evidence/answer/citation confidence, contradiction risk, uncertainty level, uncertainty reasons, recommended action.
Evidence and consistency analyzers
- Added EvidenceCoverageEstimator (QueryFacetExtractor, MissingEvidenceDetector, FacetCoverageResult).
- Added EvidenceAgreementAnalyzer (ClaimExtractor, ClaimSupportMapper, ConsensusScore, EvidenceCluster).
- Added ContradictionDetector (ConflictPair, ConflictSeverity, ConflictResolutionHint).
- Added SourceDiversityAnalyzer + provenance/independence scoring.
- Added RerankStabilityAnalyzer (RankingDelta, RankCorrelationScore, RetrievalStabilityReport).
Policy layer
- Added ConfidencePolicyEngine with ConfidenceThresholds and action selection (answer_with_uncertainty, run_contradiction_check, refuse_due_to_insufficient_evidence, etc.).
Integration
- Extended existing calibrateConfidence path to preserve backward-compatible fields and append calibrated outputs.
- Wired query/chunks/rerank traces from RAG pipeline into calibration.
- Extended observability trace metadata with calibrated confidence details.
API + UI
- Added endpoints: POST /confidence/score, /confidence/calibrate, /confidence/evaluate, /confidence/explain; GET /confidence/policies, /confidence/runs.
- Updated web API typings and reliability panel to show calibrated breakdown, policy action, and uncertainty reasons.

const calibrated = new ConfidenceCalibrationEngine().calibrate({
  query,
  diagnostics,
  chunks,
  rerankTrace,
  evals
});
// calibrated => { overallConfidence, label, breakdown, uncertaintyReasons, recommendedAction, trace }

Roadmap Scope

Phase(s) impacted:
- A (Confidence Calibration Engine), B (Confidence Signals), C (Coverage), D (Agreement), E (Contradictions), F (Source Diversity), G (Rerank Stability), H (Policies), L (Dev Mode), N (API), O (Tests)
Success criteria affected:
- Confidence now reflects evidence quality/coverage/consensus/stability and emits explainable uncertainty + policy action.
Evidence links (tests, artifacts, commands):
- apps/api/src/retrieval-reliability.test.ts
- apps/api/src/server.test.ts
- npm run build, npm test

Validation

Tests pass locally
Relevant commands were executed and verified

Documentation Checklist

I reviewed docs/documentation-governance.md
README.md is updated when roadmap status/scope changed
Impacted module README(s) are updated (apps/*, packages/*, experiments/*, infra/*)
Relevant docs/* contract/runbook/phase document is updated
If no docs update was needed, I explained why in this PR

No documentation files were changed in this PR because scope is implementation-focused in existing reliability/RAG modules and endpoint additions are self-describing via controller/type changes; docs can be updated in a follow-up phase-status pass.

Risk & Rollback

Risk level:
- Medium (confidence behavior and response policy selection changed; backward-compatible shape retained for existing fields).
Rollback plan:
- Revert commit(s) touching retrieval-reliability.ts, rag-service.ts, and apps/api/src/rag/confidence.controller.ts to restore previous confidence path; keep legacy confidenceScore/confidenceLevel/evidenceSignals/factors contract unchanged.

Initial plan

68943b2

Copilot AI assigned Copilot and IzaacBaptista Jun 4, 2026

Copilot started work on behalf of IzaacBaptista June 4, 2026 15:55 View session

Copilot AI linked an issue Jun 4, 2026 that may be closed by this pull request

Confidence Calibration Phase #64

Open

Add evidence-based confidence calibration engine and API

9905ba6

Copilot AI changed the title ~~[WIP] Implement evidence-based confidence calibration for RAG responses~~ Implement evidence-based confidence calibration engine with uncertainty-aware policies and APIs Jun 4, 2026

Copilot finished work on behalf of IzaacBaptista June 4, 2026 16:04

Copilot AI requested a review from IzaacBaptista June 4, 2026 16:04

IzaacBaptista marked this pull request as ready for review June 4, 2026 16:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement evidence-based confidence calibration engine with uncertainty-aware policies and APIs#81

Implement evidence-based confidence calibration engine with uncertainty-aware policies and APIs#81
Copilot wants to merge 2 commits into
mainfrom
copilot/confidence-calibration-phase

Copilot AI commented Jun 4, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Copilot AI commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Roadmap Scope

Validation

Documentation Checklist

Risk & Rollback

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Jun 4, 2026 •

edited

Loading