Skip to content

Pr/evaluator tooling#342

Closed
dan-moncada wants to merge 14 commits into
mainfrom
pr/evaluator-tooling
Closed

Pr/evaluator tooling#342
dan-moncada wants to merge 14 commits into
mainfrom
pr/evaluator-tooling

Conversation

@dan-moncada
Copy link
Copy Markdown
Contributor

What type of PR is this? (check all applicable)

  • Refactor
  • Feature
  • Bug Fix
  • Optimization
  • Documentation Update
  • Infrastructure
  • Maintenance

Description

Related Tickets & Documents

  • Related Issue #
  • Closes #

QA Instructions, Screenshots, Recordings

Please replace this line with instructions on how to test your changes, a note on the devices and browsers this has been tested on, as well as any relevant images for UI changes.

Added/updated tests?

  • Yes
  • No, and this is why: small fix
  • I need help with writing tests: I am also not sure what sort of tests this would need

Documentation

  • If this PR changes the system architecture, Architecture.md has been updated

[optional] Are there any post deployment tasks we need to perform?

yangm2 and others added 14 commits April 27, 2026 11:32
measure_evaluator_variance.py:
- run evaluator calls concurrently (--max-workers, default 4)
- --show-delta: fetch stored scores from LangSmith and show mean/sigma
  delta inline in the Per-Scenario Consistency table as 0.95(+0.04)

results_display.py:
- widen stat columns when baseline present to fit delta annotation

tone.md:
- add hedging-language and citation-completeness scoring criteria

EVALUATION.md:
- document --show-delta and --max-workers flags with example output
update skill so that it loads automatically in Claude Code
from evaluate.results_display import ScenarioResult

HISTORY_DIR = Path(__file__).parent.parent / ".eval_history"
_logger = logging.getLogger(__name__)
@yangm2
Copy link
Copy Markdown
Contributor

yangm2 commented May 11, 2026

Please close this PR

Copy link
Copy Markdown
Contributor

@yangm2 yangm2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do not merge

@yangm2
Copy link
Copy Markdown
Contributor

yangm2 commented May 16, 2026

superseded by #346

@yangm2 yangm2 closed this May 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants