Skip to content

fix(feedback): add transfer evidence cards#47

Open
rodboev wants to merge 7 commits into
hexo-ai:mainfrom
rodboev:pr/38-transfer-evidence-cards
Open

fix(feedback): add transfer evidence cards#47
rodboev wants to merge 7 commits into
hexo-ai:mainfrom
rodboev:pr/38-transfer-evidence-cards

Conversation

@rodboev

@rodboev rodboev commented Jun 21, 2026

Copy link
Copy Markdown

Closes #38

Summary

Add a per-generation transfer_evidence.json artifact that separates reusable improvements from task-specific residue before the next feedback handoff consumes generation output. The new artifact becomes the structured reuse boundary for feedback-context and context.md, while improvement.md stays the human-authored analysis file.
Tighten the transfer-evidence heuristic so higher-is-better metrics only count as reusable when they strictly improve, and percent-formatted scores are parsed instead of silently dropping back to unscored reuse decisions.

Why

Current origin/main only carries forward raw execution and evaluation context plus a truncated carryover from improvement.md. That forces later generations to infer what was actually reusable, which can flatten score gains, formatting wins, task-specific residue, and unsupported claims into the same channel.

Scope

  • write transfer_evidence.json next to results.json after each generation
  • derive a conservative first-pass card from evaluation status, score delta when available, and bounded improvement.md bullets
  • render a deterministic TRANSFER EVIDENCE section in feedback context
  • project only bounded reusable guidance plus explicit cautions into context.md
  • update the harness feedback prompt and the focused goldens and tests that lock these text surfaces
  • keep the change narrow to the transfer-evidence contract; PR #36 may require a rebase because it touches the same files, but this PR does not expand into that separate leak-fix scope

Test plan

  • python -m pytest tests/test_orchestrator_helpers.py tests/test_feedback_context_golden.py tests/test_context_manager.py tests/test_context_golden.py tests/test_prompts_snapshot.py tests/test_generation_loop.py -q, passed, 31 passed in 0.17s; pytest emitted the recurring Windows temp-cleanup warning for C:\Users\Rod\AppData\Local\Temp\pytest-of-Rod\pytest-current after the successful exit
  • python -m ruff check sia/layout.py sia/results.py sia/orchestrator.py sia/context_manager.py sia/prompts.py tests/test_orchestrator_helpers.py tests/test_feedback_context_golden.py tests/test_context_manager.py tests/test_context_golden.py tests/test_prompts_snapshot.py tests/test_generation_loop.py, passed
  • python -m ruff format sia/layout.py sia/results.py sia/orchestrator.py sia/context_manager.py sia/prompts.py tests/test_orchestrator_helpers.py tests/test_feedback_context_golden.py tests/test_context_manager.py tests/test_context_golden.py tests/test_prompts_snapshot.py tests/test_generation_loop.py --check, passed, 11 files already formatted
  • ty check sia/, exited cleanly with pre-existing unresolved-import warnings in optional openhands and pydantic_ai integrations outside this change

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add transfer-evidence cards to separate reusable improvements from task-specific residue

1 participant