refactor: modularize tester.py and incident storage layer#153
Conversation
- Convert src/retrace/tester.py into a package with domain-specific modules (models, specs, harness, assertions) - Extract incident logic from src/retrace/storage/core.py into IncidentRepository - Implement facade pattern in Storage class to maintain backward compatibility - Verified zero regressions with 61+ passing tests
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (4)
🚧 Files skipped from review as they are similar to previous changes (3)
📝 WalkthroughWalkthroughAdds an LLM-based candidate reranker and integrates optional LLM usage into repository scoring and fix-proposal flows; extracts tester models/specs/assertions into dedicated modules and updates harness to import them; implements storage helpers and a full IncidentRepository; adds unit and E2E tests for the new flows. ChangesAI reranking & CLI LLM plumbing
Tester refactor: models, specs, assertions
Storage helpers and IncidentRepository
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
src/retrace/tester/harness.py (2)
1630-1630:⚠️ Potential issue | 🔴 Critical | ⚡ Quick win
DEFAULT_APP_URLis undefined — will raiseNameErrorat runtime.Line 1630 uses
DEFAULT_APP_URL, but this constant is not imported from.models(lines 31–43), even though it is defined there. The import block includesDEFAULT_HARNESS_COMMANDbut notDEFAULT_APP_URL, causing a NameError when the code path at line 1630 executes.🐛 Proposed fix: add missing import
from .models import ( DEFAULT_HARNESS_COMMAND, + DEFAULT_APP_URL, FAILURE_CLASSIFICATIONS, SUITE_PROPOSAL_SCHEMA_VERSION, EngineSelection,🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/retrace/tester/harness.py` at line 1630, The code references DEFAULT_APP_URL when computing app_url in harness.py but that constant isn't imported, causing a NameError; update the import block that currently brings in DEFAULT_HARNESS_COMMAND from .models to also import DEFAULT_APP_URL (i.e., add DEFAULT_APP_URL to the tuple of symbols imported from .models) so DEFAULT_APP_URL is defined when used in the expression that sets app_url.
1830-1944:⚠️ Potential issue | 🟠 Major | 🏗️ Heavy liftRemove the local function definitions that shadow imported functions from
.assertions.The functions
_classify_failure,_assertion_text_for_classification,_failed_selector_assertion, and_flake_reason_from_classificationare imported from.assertions(lines 44–55) but also defined locally (lines 1830–1944), causing the imports to be shadowed.The implementations are substantially different, not just duplicates:
_classify_failure: Local version adds "test_bug" classification and more auth/selector checks; assertions.py is simpler_assertion_text_for_classification: Local version extracts more fields (assertion_type, step, assertion) with JSON serialization; assertions.py only extracts message/expected/actual_failed_selector_assertion: Local version checks assertion_type; assertions.py checks for keywords in message string_flake_reason_from_classification: Local version returns the classification itself; assertions.py returns human-readable messagesDetermine which version is correct for the codebase, then remove the other to eliminate duplication and confusion.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/retrace/tester/harness.py` around lines 1830 - 1944, The local definitions of _classify_failure, _assertion_text_for_classification, _failed_selector_assertion, and _flake_reason_from_classification are shadowing the versions imported from .assertions; remove these four local function definitions so the module uses the canonical implementations from .assertions (leave the import lines intact), and then run the test suite and adjust any call sites if necessary to match the .assertions signatures/behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/retrace/storage/helpers.py`:
- Around line 116-139: This block defines _string_values,
_normalize_github_review_run_status, and _normalize_app_error_incident_status
but the same functions are redefined later, making these implementations dead;
remove the stale duplicate definitions appearing later in the module so only
these correct implementations remain (ensure there is exactly one definition for
each of _string_values, _normalize_github_review_run_status, and
_normalize_app_error_incident_status), update any imports/usages if necessary,
and run tests to confirm comma-separated trace IDs split, unknown GitHub
statuses default to "queued", and "new" app-error status maps to "open".
In `@src/retrace/storage/repositories/incidents.py`:
- Around line 1409-1417: The names field in the source_map validator currently
only checks that names is a list but not that its items are strings; update the
validation in the block handling source_map (the variables source_map and names)
to iterate the list and ensure every element is an instance of str, and if any
element is not a string raise a ValueError like "source_map names must be a list
of strings when provided" so malformed entries (e.g., [1,2]) are rejected before
persisting.
- Around line 1217-1228: The get_deploy_marker method currently falls back to
matching sha (OR sha = ?), which is unsafe because sha is only unique within a
workspace; remove the sha branch from the SQL and the third bind parameter so
the query only checks id OR public_id, and rely on the existing
get_deploy_marker_by_sha() for scoped SHA lookups; update the tuple of
parameters passed to conn.execute and any related docstring/comments in the
get_deploy_marker function to reflect the change.
In `@src/retrace/tester/harness.py`:
- Around line 217-228: There is a duplicate import block that repeats the same
symbols (_assertion_result, _assertion_text_for_classification,
_classify_failure, _evaluate_consensus_assertion,
_evaluate_model_backed_consensus_assertion, _evaluate_native_assertion,
_failed_selector_assertion, _flake_reason_from_classification,
_redacted_response_headers, _response_assertion_evidence); remove the redundant
block so each symbol is only imported once. Locate the repeated import statement
and delete the second occurrence, leaving the original import (the one near the
top of the module) intact to avoid accidental breakage.
In `@src/retrace/tester/specs.py`:
- Around line 24-26: The import of _should_use_playwright inside a TYPE_CHECKING
block is wrong because TYPE_CHECKING is False at runtime; remove the
TYPE_CHECKING guard and any static-only import so that the runtime import
pattern used in _needs_playwright_runtime remains the single source of truth.
Specifically, delete or move the "from .harness import _should_use_playwright"
under the TYPE_CHECKING block in this file (src/retrace/tester/specs.py) and
ensure _needs_playwright_runtime continues to perform the runtime import of
_should_use_playwright (so no NameError occurs when _needs_playwright_runtime
calls _should_use_playwright).
---
Outside diff comments:
In `@src/retrace/tester/harness.py`:
- Line 1630: The code references DEFAULT_APP_URL when computing app_url in
harness.py but that constant isn't imported, causing a NameError; update the
import block that currently brings in DEFAULT_HARNESS_COMMAND from .models to
also import DEFAULT_APP_URL (i.e., add DEFAULT_APP_URL to the tuple of symbols
imported from .models) so DEFAULT_APP_URL is defined when used in the expression
that sets app_url.
- Around line 1830-1944: The local definitions of _classify_failure,
_assertion_text_for_classification, _failed_selector_assertion, and
_flake_reason_from_classification are shadowing the versions imported from
.assertions; remove these four local function definitions so the module uses the
canonical implementations from .assertions (leave the import lines intact), and
then run the test suite and adjust any call sites if necessary to match the
.assertions signatures/behavior.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 2540eeb2-a1b6-42c6-a0ab-0c665cc27839
📒 Files selected for processing (10)
src/retrace/storage/core.pysrc/retrace/storage/helpers.pysrc/retrace/storage/repositories/__init__.pysrc/retrace/storage/repositories/base.pysrc/retrace/storage/repositories/incidents.pysrc/retrace/tester/__init__.pysrc/retrace/tester/assertions.pysrc/retrace/tester/harness.pysrc/retrace/tester/models.pysrc/retrace/tester/specs.py
| def _string_values(raw: object) -> list[str]: | ||
| if not raw: | ||
| return [] | ||
| if isinstance(raw, list): | ||
| return [str(v) for v in raw if v] | ||
| if isinstance(raw, str): | ||
| return [v.strip() for v in raw.split(",") if v.strip()] | ||
| return [str(raw)] | ||
|
|
||
| def _normalize_github_review_run_status(value: object) -> str: | ||
| GITHUB_REVIEW_RUN_STATUSES = ("queued", "running", "succeeded", "failed", "canceled") | ||
| s = str(value or "").strip().lower() | ||
| if s in GITHUB_REVIEW_RUN_STATUSES: | ||
| return s | ||
| return "queued" | ||
|
|
||
| def _normalize_app_error_incident_status(value: object) -> str: | ||
| APP_ERROR_INCIDENT_STATUSES = ("open", "triaged", "investigating", "resolved", "ignored") | ||
| s = str(value or "").strip().lower() | ||
| if s in APP_ERROR_INCIDENT_STATUSES: | ||
| return s | ||
| if s == "new": | ||
| return "open" | ||
| raise ValueError(f"invalid app-error incident status: {value!r}") |
There was a problem hiding this comment.
Remove the stale redefinitions below this block.
_string_values, _normalize_github_review_run_status, and _normalize_app_error_incident_status are defined again later in this module, so Python keeps the old implementations instead of these ones. That makes the new behavior dead code: comma-separated trace IDs still won't split, unknown GitHub review statuses will still raise, and "new" incident statuses will still fail normalization.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/retrace/storage/helpers.py` around lines 116 - 139, This block defines
_string_values, _normalize_github_review_run_status, and
_normalize_app_error_incident_status but the same functions are redefined later,
making these implementations dead; remove the stale duplicate definitions
appearing later in the module so only these correct implementations remain
(ensure there is exactly one definition for each of _string_values,
_normalize_github_review_run_status, and _normalize_app_error_incident_status),
update any imports/usages if necessary, and run tests to confirm comma-separated
trace IDs split, unknown GitHub statuses default to "queued", and "new"
app-error status maps to "open".
| def get_deploy_marker(self, deploy_id: str) -> Optional[DeployMarkerRow]: | ||
| with self._conn() as conn: | ||
| row = conn.execute( | ||
| """ | ||
| SELECT * | ||
| FROM deploy_markers | ||
| WHERE id = ? OR public_id = ? OR sha = ? | ||
| ORDER BY deployed_at_ms DESC | ||
| LIMIT 1 | ||
| """, | ||
| (deploy_id, deploy_id, deploy_id), | ||
| ).fetchone() |
There was a problem hiding this comment.
Don't fall back to raw SHA in this unscoped getter.
sha is only unique within a workspace. The OR sha = ? branch can return another project's/environment's deploy marker for the same commit, even though get_deploy_marker_by_sha() already provides the scoped lookup.
Suggested fix
def get_deploy_marker(self, deploy_id: str) -> Optional[DeployMarkerRow]:
with self._conn() as conn:
row = conn.execute(
"""
SELECT *
FROM deploy_markers
- WHERE id = ? OR public_id = ? OR sha = ?
- ORDER BY deployed_at_ms DESC
- LIMIT 1
+ WHERE id = ? OR public_id = ?
""",
- (deploy_id, deploy_id, deploy_id),
+ (deploy_id, deploy_id),
).fetchone()
return self._deploy_marker_from_row(row) if row is not None else None🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/retrace/storage/repositories/incidents.py` around lines 1217 - 1228, The
get_deploy_marker method currently falls back to matching sha (OR sha = ?),
which is unsafe because sha is only unique within a workspace; remove the sha
branch from the SQL and the third bind parameter so the query only checks id OR
public_id, and rely on the existing get_deploy_marker_by_sha() for scoped SHA
lookups; update the tuple of parameters passed to conn.execute and any related
docstring/comments in the get_deploy_marker function to reflect the change.
| names = source_map.get("names") | ||
| if names is not None and not isinstance(names, list): | ||
| raise ValueError("source_map names must be a list when provided") | ||
| source_root = source_map.get("sourceRoot") | ||
| if source_root is not None and not isinstance(source_root, str): | ||
| raise ValueError("source_map sourceRoot must be a string when provided") | ||
| file_value = source_map.get("file") | ||
| if file_value is not None and not isinstance(file_value, str): | ||
| raise ValueError("source_map file must be a string when provided") |
There was a problem hiding this comment.
Reject non-string entries in names.
This validator currently accepts names=[1, 2], so malformed source maps can still be persisted even though this method is the write-time schema gate.
Suggested fix
names = source_map.get("names")
- if names is not None and not isinstance(names, list):
- raise ValueError("source_map names must be a list when provided")
+ if names is not None:
+ if not isinstance(names, list):
+ raise ValueError("source_map names must be a list when provided")
+ if not all(isinstance(item, str) for item in names):
+ raise ValueError("source_map names must contain only strings")🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/retrace/storage/repositories/incidents.py` around lines 1409 - 1417, The
names field in the source_map validator currently only checks that names is a
list but not that its items are strings; update the validation in the block
handling source_map (the variables source_map and names) to iterate the list and
ensure every element is an instance of str, and if any element is not a string
raise a ValueError like "source_map names must be a list of strings when
provided" so malformed entries (e.g., [1,2]) are rejected before persisting.
- Implement tests/e2e/test_killer_demo_loop.py for full pipeline validation - Introduce AI-enhanced candidate reranking in src/retrace/matching/ai_scorer.py - Update scorer.py and auto_fix.py to use LLM for more accurate culprit identification - Add unit tests for AI scorer
There was a problem hiding this comment.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/retrace/commands/qa.py (1)
176-190:⚠️ Potential issue | 🔴 Critical | ⚡ Quick winKeep the loaded config instead of discarding it.
Line 176 binds
_, but Line 189 readscfg.llm, soretrace qa fixwill fail before fix generation starts.🐛 Proposed fix
- store, _ = _open_store(config_path) + store, cfg = _open_store(config_path)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/retrace/commands/qa.py` around lines 176 - 190, The call to _open_store currently discards the returned cfg by binding it to `_`, but later code uses cfg.llm (causing a crash); change the assignment so you capture both return values (e.g., store, cfg = _open_store(config_path)) or otherwise keep the config returned by _open_store; ensure the rest of the function (references to cfg, the LLMClient context manager, and propose_fix_for_incident) use that preserved cfg instead of the discarded value.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/retrace/matching/ai_scorer.py`:
- Around line 1-10: The module-level docstring must appear before any imports
and unused imports should be removed; move the triple-quoted string
"""AI-enhanced code candidate reranking.""" to the very top of
src/retrace/matching/ai_scorer.py, delete the unused stdlib imports json and
typing.Any, and ensure imports are ordered (stdlib -> third-party -> local) so
you only keep logging and the local imports retrace.llm.client.LLMClient and
retrace.matching.scorer.CodeCandidate.
- Around line 68-73: The loop that maps LLM-ranked paths to final_candidates can
append duplicate entries when ranked_paths contains repeats; update the code in
the block that iterates over ranked_paths to maintain a seen set (e.g.,
seen_paths) of cleaned paths (clean_path) and only append
path_to_candidate[clean_path] if clean_path is not in seen_paths, adding
clean_path to seen_paths when appended; this ensures deduplication of LLM-ranked
paths before populating final_candidates and prevents dropping unique candidates
from the top-N.
In `@src/retrace/matching/scorer.py`:
- Line 403: The function signature uses the types Optional and LLMClient (e.g.,
the parameter "llm: Optional[LLMClient] = None") but neither type is imported,
causing undefined-name errors; add the missing imports at module level—import
Optional from typing and import or reference the LLMClient type from its
defining module (or the correct package) and update imports so Optional and
LLMClient are available for the scorer function signature (e.g., ensure "from
typing import Optional" and "from <llm_module> import LLMClient" are present).
In `@tests/e2e/test_killer_demo_loop.py`:
- Around line 15-21: The file contains unused imports "os" and "pytest" (see the
top import block in tests/e2e/test_killer_demo_loop.py); remove those two import
statements so the module no longer triggers F401 from Ruff—keep the other
imports (uuid, HTTPConnection, Path, patch, MagicMock) intact.
In `@tests/test_matching_ai_scorer.py`:
- Around line 3-7: Remove the unused imports in
tests/test_matching_ai_scorer.py: delete the top-level imports of json and
pytest and keep only the needed unittest.mock import (MagicMock); ensure the
file only imports symbols that are referenced to satisfy Ruff F401.
---
Outside diff comments:
In `@src/retrace/commands/qa.py`:
- Around line 176-190: The call to _open_store currently discards the returned
cfg by binding it to `_`, but later code uses cfg.llm (causing a crash); change
the assignment so you capture both return values (e.g., store, cfg =
_open_store(config_path)) or otherwise keep the config returned by _open_store;
ensure the rest of the function (references to cfg, the LLMClient context
manager, and propose_fix_for_incident) use that preserved cfg instead of the
discarded value.
🪄 Autofix (Beta)
✅ Autofix completed
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: e3456ce1-73b8-44dd-81b4-f04a61ab3d2a
📒 Files selected for processing (6)
src/retrace/auto_fix.pysrc/retrace/commands/qa.pysrc/retrace/matching/ai_scorer.pysrc/retrace/matching/scorer.pytests/e2e/test_killer_demo_loop.pytests/test_matching_ai_scorer.py
|
Note Autofix is a beta feature. Expect some limitations and changes as we gather feedback and continue to improve it. Fixes Applied SuccessfullyFixed 12 file(s) based on 5 unresolved review comments. Files modified:
Commit: The changes have been pushed to the Time taken: |
Fixed 12 file(s) based on 5 unresolved review comments. Co-authored-by: CodeRabbit <noreply@coderabbit.ai>
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
src/retrace/tester/harness.py (2)
1612-1612:⚠️ Potential issue | 🔴 Critical | ⚡ Quick winUndefined name
DEFAULT_APP_URLcauses runtimeNameError.
DEFAULT_APP_URLis used but never imported or defined. The constant should be imported from.modelsalongsideDEFAULT_HARNESS_COMMAND, or defined locally if it doesn't exist in the models module.🐛 Proposed fix: add import from .models
from .models import ( DEFAULT_HARNESS_COMMAND, + DEFAULT_APP_URL, SUITE_PROPOSAL_SCHEMA_VERSION, TesterArtifact,If
DEFAULT_APP_URLdoesn't exist in.models, define it locally:+DEFAULT_APP_URL = "http://localhost:3000" + logger = logging.getLogger(__name__)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/retrace/tester/harness.py` at line 1612, The NameError arises because DEFAULT_APP_URL is referenced in the expression that sets app_url (app_url = (app_url_override or spec.app_url or DEFAULT_APP_URL).strip()) but is not defined or imported; fix by importing DEFAULT_APP_URL from .models alongside DEFAULT_HARNESS_COMMAND (or, if .models doesn't export DEFAULT_APP_URL, add a local constant DEFAULT_APP_URL with the expected default value) and ensure the import/definition is placed near other constants so app_url has a valid fallback.
1812-1926:⚠️ Potential issue | 🔴 Critical | ⚡ Quick winRemove redefined functions that are already imported from
.assertions.These four functions are imported from
.assertionsat lines 41-49 but are also defined locally here. This causes Ruff F811 redefinition errors and is a refactoring artifact that should be cleaned up.Remove the following local definitions:
_classify_failure(lines 1812-1882)_assertion_text_for_classification(lines 1885-1903)_failed_selector_assertion(lines 1906-1915)_flake_reason_from_classification(lines 1918-1926)🗑️ Proposed fix: remove redundant local definitions
-def _classify_failure( - *, - harness_log_path: Path, - error: str, - assertion_results: list[dict[str, Any]], - exit_code: int, -) -> str: - text = "" - try: - if harness_log_path.exists(): - text = harness_log_path.read_text(encoding="utf-8", errors="ignore").lower() - except Exception: - text = "" - failed_assertions = [ - item for item in assertion_results if not bool(item.get("ok", False)) - ] - merged = "\n".join( - [ - text, - str(error or ""), - _assertion_text_for_classification(failed_assertions), - ] - ).lower() - if any( - k in merged - for k in [ - "app did not become reachable", - "connection refused", - "econnrefused", - "net::err_connection_refused", - "failed to connect", - "could not connect", - "server unavailable", - ] - ): - return "environment_failure" - if any(k in merged for k in ["timeout", "timed out", "deadline exceeded"]): - return "timeout" - if any( - k in merged - for k in [ - "needs selector", - "unsupported_in_playwright", - "unsupported native assertion type", - "unsupported native step action", - "invalid_regex", - "unknown action", - "malformed", - ] - ): - return "test_bug" - if any(k in merged for k in ["401", "403", "unauthorized", "forbidden"]): - return "auth_failure" - if any(k in merged for k in ["authentication failed", "auth failure", "login failed"]): - return "auth_failure" - if _failed_selector_assertion(failed_assertions) or any( - k in merged - for k in [ - "element not found", - "stale element", - "waiting for selector", - "waiting for locator", - "strict mode violation", - ] - ): - return "selector_drift" - if failed_assertions: - return "app_bug" - if int(exit_code) != 0 or error: - return "unknown" - return "unknown" - - -def _assertion_text_for_classification(items: list[dict[str, Any]]) -> str: - chunks: list[str] = [] - for item in items: - for key in ( - "assertion_type", - "message", - "actual", - "expected", - "step", - "assertion", - ): - value = item.get(key) - if value is None: - continue - if isinstance(value, (dict, list)): - chunks.append(json.dumps(value, sort_keys=True, default=str)) - else: - chunks.append(str(value)) - return "\n".join(chunks) - - -def _failed_selector_assertion(items: list[dict[str, Any]]) -> bool: - for item in items: - assertion_type = str(item.get("assertion_type") or "").lower() - if assertion_type in { - "selector_visible", - "element_visible", - "visible", - }: - return True - return False - - -def _flake_reason_from_classification(failure_classification: str) -> str: - if failure_classification in { - "auth_failure", - "environment_failure", - "selector_drift", - "timeout", - }: - return failure_classification - return "" - - def load_run_summaries(runs_dir: Path, *, limit: int = 25) -> list[dict[str, Any]]:🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/retrace/tester/harness.py` around lines 1812 - 1926, The four functions _classify_failure, _assertion_text_for_classification, _failed_selector_assertion, and _flake_reason_from_classification are duplicates of the versions imported from .assertions; delete their local definitions in this file so the module uses the imported implementations (keep any references to these names elsewhere unchanged), and run tests/linter to confirm the Ruff F811 redefinition errors are resolved.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/retrace/commands/qa.py`:
- Around line 187-200: propose_fix_for_incident currently always receives an
LLMClient because LLMClient(cfg.llm) is unconditionally constructed in the CLI
handlers; to preserve the optional-LLM contract, only instantiate LLMClient when
LLM is configured and otherwise pass llm=None into propose_fix_for_incident.
Update both call sites that create LLMClient (the block using "with
LLMClient(cfg.llm) as llm" around propose_fix_for_incident and the similar block
at the other handler) to check cfg.llm (or an explicit enabled flag) before
creating the client, and call propose_fix_for_incident(..., llm=llm) or
propose_fix_for_incident(..., llm=None) accordingly so the function’s llm=None
path is reachable.
In `@src/retrace/matching/scorer.py`:
- Around line 446-456: The code currently passes the entire positive-scoring
list `scored` into `rerank_candidates_with_ai(...)`; instead, take a bounded
shortlist (e.g., select the top K entries by existing score) before calling the
LLM to avoid huge prompts—compute K (e.g., `ai_shortlist_size` or reuse
`max(top_n, N)`) and create `shortlist = scored[:K]` (or an explicit sort+slice
if `scored` isn’t pre-ordered), then call `rerank_candidates_with_ai(llm=llm,
candidates=shortlist, title=title, category=category,
evidence_text=evidence_text, top_n=top_n)`.
---
Outside diff comments:
In `@src/retrace/tester/harness.py`:
- Line 1612: The NameError arises because DEFAULT_APP_URL is referenced in the
expression that sets app_url (app_url = (app_url_override or spec.app_url or
DEFAULT_APP_URL).strip()) but is not defined or imported; fix by importing
DEFAULT_APP_URL from .models alongside DEFAULT_HARNESS_COMMAND (or, if .models
doesn't export DEFAULT_APP_URL, add a local constant DEFAULT_APP_URL with the
expected default value) and ensure the import/definition is placed near other
constants so app_url has a valid fallback.
- Around line 1812-1926: The four functions _classify_failure,
_assertion_text_for_classification, _failed_selector_assertion, and
_flake_reason_from_classification are duplicates of the versions imported from
.assertions; delete their local definitions in this file so the module uses the
imported implementations (keep any references to these names elsewhere
unchanged), and run tests/linter to confirm the Ruff F811 redefinition errors
are resolved.
🪄 Autofix (Beta)
✅ Autofix completed
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 14467e64-e00f-4445-8182-4c03d064aa63
📒 Files selected for processing (12)
src/retrace/commands/qa.pysrc/retrace/matching/ai_scorer.pysrc/retrace/matching/scorer.pysrc/retrace/storage/core.pysrc/retrace/storage/helpers.pysrc/retrace/storage/repositories/base.pysrc/retrace/storage/repositories/incidents.pysrc/retrace/tester/assertions.pysrc/retrace/tester/harness.pysrc/retrace/tester/specs.pytests/e2e/test_killer_demo_loop.pytests/test_matching_ai_scorer.py
🚧 Files skipped from review as they are similar to previous changes (7)
- src/retrace/storage/repositories/base.py
- tests/test_matching_ai_scorer.py
- src/retrace/matching/ai_scorer.py
- tests/e2e/test_killer_demo_loop.py
- src/retrace/storage/helpers.py
- src/retrace/tester/specs.py
- src/retrace/tester/assertions.py
|
Note Autofix is a beta feature. Expect some limitations and changes as we gather feedback and continue to improve it. Fixes Applied SuccessfullyFixed 4 file(s) based on 2 unresolved review comments. Files modified:
Commit: The changes have been pushed to the Time taken: |
Fixed 4 file(s) based on 2 unresolved review comments. Co-authored-by: CodeRabbit <noreply@coderabbit.ai>
Overview
This PR addresses three critical gaps identified in the Retrace codebase: architectural technical debt, minimal E2E coverage, and naive AI matching capabilities.
Key Changes
1. Architectural Technical Debt (Refactoring)
src/retrace/tester.pyinto a cohesive package (src/retrace/tester/). Logic is now distributed intomodels.py,specs.py,assertions.py, andharness.py.src/retrace/storage/core.pyinto a standaloneIncidentRepository. Established aBaseRepositorypattern and moved utility functions to a sharedhelpers.py.Storageand clean exports intester/__init__.py.2. E2E Testing (The "Killer Demo Loop")
tests/e2e/test_killer_demo_loop.pywhich exercises the entire pipeline: Ingest (Sentry compat) -> Incident Creation -> QA Bridge -> Auto-repro -> Fix Generation -> PR Creation.3. AI Capabilities (Semantic Code Search)
src/retrace/matching/ai_scorer.pywhich uses theLLMClientto rerank code candidates found via keyword search. This pass reasons about the relationship between symptoms (stack traces, console logs) and file paths, significantly improving fix accuracy.scorer.pyandauto_fix.pyto leverage this reranking during the fix proposal flow.Verification
Summary by CodeRabbit
New Features
Tests
Chores