fix: restore phase 0 qa trust by txmed82 · Pull Request #154 · txmed82/retrace

txmed82 · 2026-05-19T00:12:37Z

Summary

Restores default-branch trust after the tester/storage modularization and adds the premium QA architecture plan.

Changes

Restore deploy/source-map row mapping in IncidentRepository
Restore failure deploy correlation updates
Preserve deterministic evidence ID/time hooks used by tests
Restore tester package facade exports for _run_playwright_spec and _run_shell
Preserve monkeypatch compatibility for retrace.tester.run_spec
Restore native tester consensus details, retry/arbiter behavior, evidence payloads, and failure classification
Add docs/premium-qa-architecture-plan.md

Validation

uv run ruff check src tests
uv run pytest -q (969 passed, 4 skipped)
uv run pytest tests/test_tester_playwright.py -q (3 passed)
cd packages/browser && npm ci && npm test && npm run build (24 passed, TypeScript build passed)

Notes

This PR intentionally does not start Phase 1. It only restores a green base and documents the architecture direction.

Summary by CodeRabbit

Documentation
- Added a comprehensive architecture direction doc outlining the premium QA approach, product principles, and phased roadmap.
New Features
- Visual execution engine now auto-selected for visual exploratory specs.
- Enhanced consensus evaluation with richer outputs, arbiter support, better failure classification, and improved selector-drift detection.
Refactor
- Preserved tester entrypoint compatibility via a facade wrapper; standardized evidence metadata handling.

coderabbitai · 2026-05-19T00:12:50Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 7b703284-63c3-49d4-b906-e05c50dcee59

📥 Commits

Reviewing files that changed from the base of the PR and between ac7937f and 2c07ef2.

📒 Files selected for processing (2)

docs/premium-qa-architecture-plan.md
src/retrace/storage/repositories/incidents.py

🚧 Files skipped from review as they are similar to previous changes (1)

src/retrace/storage/repositories/incidents.py

📝 Walkthrough

Walkthrough

Adds a premium QA architecture plan and implements storage updates for evidence/deploy linking, refactors assertion consensus to support response/evidence and arbiter votes with improved failure/selector detection, and updates tester entry points and execution-engine selection to support visual exploratory runs.

Changes

Premium QA Architecture & Implementation

Layer / File(s)	Summary
Premium QA Architecture & Roadmap `docs/premium-qa-architecture-plan.md`	Adds the premium QA architecture doc: thesis, product principles, `qa_incidents` pipeline, external references, phased roadmap (0–7), launch criteria, non-goals, and immediate next moves.
Storage: Evidence & Failure Deployment Tracking `src/retrace/storage/core.py`, `src/retrace/storage/repositories/incidents.py`	`Storage._append_failure_evidence` uses instance wrapper methods for `evidence_id` and `created_at`. `IncidentRepository.update_failure_deploy(failure_id, deploy_sha)` links failures to deployments and raises on no-match. `_source_map_from_row` builds `SourceMapRow` parsing `source_map_json` and converting `uploaded_at`. Import grouping reformatted.
Assertion Evaluation: Consensus, Arbiter Voting & Failure Classification `src/retrace/tester/assertions.py`	`_evaluate_consensus_assertion` accepts optional `response`, `evidence`, and `arbiter_vote` and returns structured decision/disagreement/vote counts/retry/evidence. `_evaluate_model_backed_consensus_assertion` standardizes evidence handling, enforces prompt validation only when models exist, computes votes from models or collected votes, optionally calls an arbiter model on disagreement, and passes arbiter/evidence into consensus. `_coerce_arbiter_vote` added. `_classify_failure` expands auth-detection (401/403, JWT/auth keywords) before timeouts. `_failed_selector_assertion` matches structured assertion types for selector drift.
Test Execution Entry Point & Engine Routing `src/retrace/tester/__init__.py`, `src/retrace/tester/specs.py`	`tester.__init__.run_spec` is now a local facade that syncs `_harness._run_shell` before delegating to the harness; private helpers re-exported. `select_execution_engine` returns `visual` when `exploratory_goals` and `browser_settings.visual` are enabled.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

txmed82/retrace#10: Related updates to model-backed consensus evaluation and arbiter logic in src/retrace/tester/assertions.py.
txmed82/retrace#9: Overlapping changes to tester consensus/evidence handling and retry/vote behavior.
txmed82/retrace#153: Nearby refactor work touching tester facade and incident storage that intersects with these changes.

Poem

🐇 I found a bug in moonlit code tonight,

I gathered evidence by lantern-light,
An arbiter weighed the votes with care,
Storage kept the trace for all to share,
Now tests can wander, visual and bright.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 6.25% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'fix: restore phase 0 qa trust' directly addresses the main objective of the PR, which is to restore default-branch trust and stability after modularization changes.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch audit-oss-qa-stack

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/premium-qa-architecture-plan.md`:
- Line 219: The prose uses the lowercase platform name `.github/workflows`;
update that occurrence to use the consistent, capitalized platform name `GitHub`
(e.g., change `.github/workflows` in the sentence to `GitHub` while keeping the
repository path literal if intended, or write `GitHub` before the
`.github/workflows` path for clarity) so references to GitHub are capitalized
consistently across the document.

In `@src/retrace/storage/repositories/incidents.py`:
- Around line 1290-1301: The UPDATE in update_failure_deploy currently executes
without verifying that a row was changed; capture the result of conn.execute
(e.g., cursor = conn.execute(...)) and check cursor.rowcount after the
statement, and if it is 0 raise a clear exception (e.g., ValueError or a
domain-specific NotFoundError) that includes the provided failure_id so callers
know the deploy correlation failed because the failure wasn't found; keep the
existing deploy_sha.strip() and timestamp logic but ensure you raise the error
before exiting the with-block.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 0b996a1d-e6cc-441a-b819-c06807157780

📥 Commits

Reviewing files that changed from the base of the PR and between 00c0422 and ac7937f.

📒 Files selected for processing (6)

docs/premium-qa-architecture-plan.md
src/retrace/storage/core.py
src/retrace/storage/repositories/incidents.py
src/retrace/tester/__init__.py
src/retrace/tester/assertions.py
src/retrace/tester/specs.py

fix: restore phase 0 qa trust

ac7937f

coderabbitai Bot reviewed May 19, 2026

View reviewed changes

Comment thread docs/premium-qa-architecture-plan.md Outdated

Comment thread src/retrace/storage/repositories/incidents.py

fix: address phase 0 review comments

2c07ef2

txmed82 merged commit 8ceda5f into master May 19, 2026
11 checks passed

txmed82 deleted the audit-oss-qa-stack branch May 19, 2026 00:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: restore phase 0 qa trust#154

fix: restore phase 0 qa trust#154
txmed82 merged 2 commits into
masterfrom
audit-oss-qa-stack

txmed82 commented May 19, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 19, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

txmed82 commented May 19, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Validation

Notes

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

txmed82 commented May 19, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 19, 2026 •

edited

Loading