Skip to content

fix: restore phase 0 qa trust#154

Merged
txmed82 merged 2 commits into
masterfrom
audit-oss-qa-stack
May 19, 2026
Merged

fix: restore phase 0 qa trust#154
txmed82 merged 2 commits into
masterfrom
audit-oss-qa-stack

Conversation

@txmed82

@txmed82 txmed82 commented May 19, 2026

Copy link
Copy Markdown
Owner

Summary

Restores default-branch trust after the tester/storage modularization and adds the premium QA architecture plan.

Changes

  • Restore deploy/source-map row mapping in IncidentRepository
  • Restore failure deploy correlation updates
  • Preserve deterministic evidence ID/time hooks used by tests
  • Restore tester package facade exports for _run_playwright_spec and _run_shell
  • Preserve monkeypatch compatibility for retrace.tester.run_spec
  • Restore native tester consensus details, retry/arbiter behavior, evidence payloads, and failure classification
  • Add docs/premium-qa-architecture-plan.md

Validation

  • uv run ruff check src tests
  • uv run pytest -q (969 passed, 4 skipped)
  • uv run pytest tests/test_tester_playwright.py -q (3 passed)
  • cd packages/browser && npm ci && npm test && npm run build (24 passed, TypeScript build passed)

Notes

This PR intentionally does not start Phase 1. It only restores a green base and documents the architecture direction.

Summary by CodeRabbit

  • Documentation

    • Added a comprehensive architecture direction doc outlining the premium QA approach, product principles, and phased roadmap.
  • New Features

    • Visual execution engine now auto-selected for visual exploratory specs.
    • Enhanced consensus evaluation with richer outputs, arbiter support, better failure classification, and improved selector-drift detection.
  • Refactor

    • Preserved tester entrypoint compatibility via a facade wrapper; standardized evidence metadata handling.

Review Change Stack

@coderabbitai

coderabbitai Bot commented May 19, 2026

Copy link
Copy Markdown
Contributor

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 7b703284-63c3-49d4-b906-e05c50dcee59

📥 Commits

Reviewing files that changed from the base of the PR and between ac7937f and 2c07ef2.

📒 Files selected for processing (2)
  • docs/premium-qa-architecture-plan.md
  • src/retrace/storage/repositories/incidents.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/retrace/storage/repositories/incidents.py

📝 Walkthrough

Walkthrough

Adds a premium QA architecture plan and implements storage updates for evidence/deploy linking, refactors assertion consensus to support response/evidence and arbiter votes with improved failure/selector detection, and updates tester entry points and execution-engine selection to support visual exploratory runs.

Changes

Premium QA Architecture & Implementation

Layer / File(s) Summary
Premium QA Architecture & Roadmap
docs/premium-qa-architecture-plan.md
Adds the premium QA architecture doc: thesis, product principles, qa_incidents pipeline, external references, phased roadmap (0–7), launch criteria, non-goals, and immediate next moves.
Storage: Evidence & Failure Deployment Tracking
src/retrace/storage/core.py, src/retrace/storage/repositories/incidents.py
Storage._append_failure_evidence uses instance wrapper methods for evidence_id and created_at. IncidentRepository.update_failure_deploy(failure_id, deploy_sha) links failures to deployments and raises on no-match. _source_map_from_row builds SourceMapRow parsing source_map_json and converting uploaded_at. Import grouping reformatted.
Assertion Evaluation: Consensus, Arbiter Voting & Failure Classification
src/retrace/tester/assertions.py
_evaluate_consensus_assertion accepts optional response, evidence, and arbiter_vote and returns structured decision/disagreement/vote counts/retry/evidence. _evaluate_model_backed_consensus_assertion standardizes evidence handling, enforces prompt validation only when models exist, computes votes from models or collected votes, optionally calls an arbiter model on disagreement, and passes arbiter/evidence into consensus. _coerce_arbiter_vote added. _classify_failure expands auth-detection (401/403, JWT/auth keywords) before timeouts. _failed_selector_assertion matches structured assertion types for selector drift.
Test Execution Entry Point & Engine Routing
src/retrace/tester/__init__.py, src/retrace/tester/specs.py
tester.__init__.run_spec is now a local facade that syncs _harness._run_shell before delegating to the harness; private helpers re-exported. select_execution_engine returns visual when exploratory_goals and browser_settings.visual are enabled.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • txmed82/retrace#10: Related updates to model-backed consensus evaluation and arbiter logic in src/retrace/tester/assertions.py.
  • txmed82/retrace#9: Overlapping changes to tester consensus/evidence handling and retry/vote behavior.
  • txmed82/retrace#153: Nearby refactor work touching tester facade and incident storage that intersects with these changes.

Poem

🐇 I found a bug in moonlit code tonight,

I gathered evidence by lantern-light,
An arbiter weighed the votes with care,
Storage kept the trace for all to share,
Now tests can wander, visual and bright.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 6.25% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'fix: restore phase 0 qa trust' directly addresses the main objective of the PR, which is to restore default-branch trust and stability after modularization changes.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch audit-oss-qa-stack

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/premium-qa-architecture-plan.md`:
- Line 219: The prose uses the lowercase platform name `.github/workflows`;
update that occurrence to use the consistent, capitalized platform name `GitHub`
(e.g., change `.github/workflows` in the sentence to `GitHub` while keeping the
repository path literal if intended, or write `GitHub` before the
`.github/workflows` path for clarity) so references to GitHub are capitalized
consistently across the document.

In `@src/retrace/storage/repositories/incidents.py`:
- Around line 1290-1301: The UPDATE in update_failure_deploy currently executes
without verifying that a row was changed; capture the result of conn.execute
(e.g., cursor = conn.execute(...)) and check cursor.rowcount after the
statement, and if it is 0 raise a clear exception (e.g., ValueError or a
domain-specific NotFoundError) that includes the provided failure_id so callers
know the deploy correlation failed because the failure wasn't found; keep the
existing deploy_sha.strip() and timestamp logic but ensure you raise the error
before exiting the with-block.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 0b996a1d-e6cc-441a-b819-c06807157780

📥 Commits

Reviewing files that changed from the base of the PR and between 00c0422 and ac7937f.

📒 Files selected for processing (6)
  • docs/premium-qa-architecture-plan.md
  • src/retrace/storage/core.py
  • src/retrace/storage/repositories/incidents.py
  • src/retrace/tester/__init__.py
  • src/retrace/tester/assertions.py
  • src/retrace/tester/specs.py

Comment thread docs/premium-qa-architecture-plan.md Outdated
Comment thread src/retrace/storage/repositories/incidents.py
@txmed82 txmed82 merged commit 8ceda5f into master May 19, 2026
11 checks passed
@txmed82 txmed82 deleted the audit-oss-qa-stack branch May 19, 2026 00:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant