v1 by mattpocock · Pull Request #240 · mattpocock/evalite

mattpocock · 2025-10-19T11:45:37Z

No description provided.

changeset-bot · 2025-10-19T11:45:41Z

🦋 Changeset detected

Latest commit: 2c55228

The changes in this PR will be included in the next version bump.

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

vercel · 2025-10-19T11:45:41Z

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment

Project	Deployment	Actions	Updated (UTC)
evalite	Ignored		Feb 20, 2026 1:09pm

pkg-pr-new · 2025-10-19T11:46:34Z

Open in StackBlitz

npm i https://pkg.pr.new/mattpocock/evalite@240

commit: 2c55228

mattpocock · 2025-10-19T11:55:29Z

@claude Remove the BACKEND_ONLY_CONSTANT import that's causing the error in CI

claude · 2025-10-19T11:55:47Z

Claude finished @mattpocock's task —— View job

Task completed

Check CI status to understand the error
Read modified files to locate the problematic import
Remove the BACKEND_ONLY_CONSTANT imports
Commit and push the fix

Fixed the CI error:

The error was caused by imports from evalite/backend-only-constants that referenced exports that no longer exist:

test-utils.ts: Removed unused DB_LOCATION import
files.test.ts: Replaced FILES_LOCATION import with a local constant "./node_modules/.evalite/files"

The changes have been committed and pushed to the v1 branch. CI should now pass!

Job Run | Branch: v1

- Remove unused DB_LOCATION import from test-utils.ts - Replace FILES_LOCATION import with local constant in files.test.ts Co-authored-by: Matt Pocock <mattpocock@users.noreply.github.com>

- Add dotenv as a dependency - Create env-setup-file module that imports dotenv/config - Export env-setup-file as 'evalite/env-setup-file' - Automatically prepend env-setup-file to setupFiles array - Update documentation to reflect automatic .env loading - Update example config to remove manual dotenv setup Fixes #234 🤖 Generated with [Claude Code](https://claude.ai/code) Co-authored-by: Matt Pocock <mattpocock@users.noreply.github.com>

… precedence - Add loadVitestSetupFiles() to load setupFiles from vitest.config.ts - Merge setupFiles from both configs with evalite.config.ts taking precedence - Add tests for vitest.config.ts setupFiles support and precedence - setupFiles execution order: env-setup-file -> vitest -> evalite Co-authored-by: Matt Pocock <mattpocock@users.noreply.github.com>

* Add .editorconfig file * Return vitest instance when returning with !shouldKeepRunning This fixes the TS errors. * Introduce ESLint and add typecheck npm script - Include ESLint 9 as root dependency - Set up ESLint to lint the whole repo - Extend the root config and add a few package-specific plugins for Evalite UI - Add a consistent `typecheck` npm script for type checking across the repo Use can now use `pnpm lint` in root and UI app and `pnpm typecheck` anywhere in the repo. Use `pnpm lint --fix` to attempt to fix the issues. * Add missing break in switch case * Fix CI --------- Co-authored-by: Matt Pocock <mattpocockvoice@gmail.com>

…v-setup-file.

🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

…er incorrectly reports success. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

…eshold success from overriding failures. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

* feat: add watchFiles support for Evalite watch mode * refactor: rename watchFiles option to forceRerunTriggers * Fixed errors and made the forceReruntriggers work how it does in vitest * Changeset --------- Co-authored-by: Matt Pocock <mattpocockvoice@gmail.com>

- Add streaming JSON output with jq in afk.sh scripts - Add RALPH commit history context for prior work awareness - Filter issues to only those with 'ralph' label - Update prompts: tracer bullet prioritization, RALPH: commit prefix, pnpm ci feedback loop - Remove progress.txt dependency in favor of structured commit messages Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Fix table crash with double-width characters (emoji) in narrow terminals by adding try-catch fallback in renderTable. The `table` library crashes when wrapWord can't handle emoji characters that take 2 columns. Also enforce minimum column width of 3 to prevent negative widths. - Fix export hang test timeout: increase from 1000ms to 10000ms since exportCommand legitimately takes >1s when running evals from scratch. - Add disableServer: true to exportCommand's runEvalite call since the server is unnecessary when auto-running evals for export. - Remove unused DB_LOCATION import in test-utils.ts (fixes typecheck). Files changed: - packages/evalite/src/reporter/rendering.ts - packages/evalite/src/export-static.ts - packages/evalite-tests/tests/export-static.test.ts - packages/evalite-tests/tests/test-utils.ts Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The server is needed in production for caching. Instead of hardcoding disableServer: true, expose it as an opt-in parameter so only tests disable it. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* RALPH: Bump AI SDK deps to v6 and migrate core types (#379) Task: Foundational vertical slice for AI SDK v5→v6 migration. Key decisions: - Public types use version-agnostic aliases (LanguageModel, EmbeddingModel from "ai") - Internal middleware types use version-specific V3 types from @ai-sdk/provider - Usage shape adapted: inputTokens/outputTokens now objects with .total, totalTokens computed as sum - Middleware specificationVersion: 'v3' added per v6 requirement - Removed obsolete "media" content type check (replaced by "file" in v6) Files changed: - packages/evalite/package.json: ai ^5→^6, @ai-sdk/provider ^2→^3 - packages/evalite/src/ai-sdk.ts: V2→V3 types, LanguageModel public API, usage shape migration - packages/evalite/src/types.ts: LanguageModelV2→LanguageModel, EmbeddingModelV2<string>→EmbeddingModel - packages/evalite-tests/package.json: ai ^5→^6, @ai-sdk/openai ^2→^3 - packages/example/package.json: ai ^5→^6, @ai-sdk/openai ^2→^3, @ai-sdk/provider ^2→^3 - apps/evalite-ui/package.json: ai ^5→^6 - pnpm-lock.yaml: updated Blockers: #380 needs MockLanguageModelV2→V3 migration in test fixtures + scorer generateObject→generateText migration. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * RALPH: Migrate scorers to generateText + Output.object() and test fixtures to MockLanguageModelV3 (#380) Task: Migrate all v5 API usage to v6 patterns per PRD #378. Key decisions: - Scorers: generateObject() → generateText() + Output.object(), result.object → result.output - Mocks: MockLanguageModelV2 → MockLanguageModelV3 with plain object doGenerate (not function) - Usage shape: V3 nested objects { inputTokens: { total }, outputTokens: { total } } - FinishReason: V3 object shape { unified: "stop", raw: undefined } - Removed obsolete rawCall, providerMetadata, request, response from mock fixtures Files changed: - packages/evalite/src/scorers/utils/statement-evaluation.ts (3 call sites) - packages/evalite/src/scorers/answer-correctness.ts (1 call site) - packages/evalite/src/scorers/answer-relevancy.ts (1 call site) - packages/evalite/src/scorers/context-recall.ts (1 call site) - packages/evalite-tests/tests/fixtures/ai-sdk-traces/traces.eval.ts - packages/evalite-tests/tests/fixtures/ai-sdk-caching/caching.eval.ts - packages/evalite-tests/tests/fixtures/ai-sdk-caching-config-disabled/caching.eval.ts - packages/evalite-tests/tests/fixtures/ai-sdk-caching-config-precedence/caching.eval.ts - packages/example/src/fake-models.eval.ts Blockers: #381 (docs + changeset) is now unblocked. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * RALPH: Update documentation for AI SDK v6 and add changeset (#381) Task: Update docs to reflect v6 migration per PRD #378. Key decisions: - Type signature: LanguageModelV2 → LanguageModel (version-agnostic alias) - Structured output: generateObject/streamObject → generateText/streamText + Output.object() - tips/vercel-ai-sdk.mdx already v6-compatible, no changes needed - Minor version changeset for evalite package Files changed: - apps/evalite-docs/src/content/docs/api/ai-sdk.mdx - .changeset/0000-ai-sdk-v6.md Blockers: None. All #378 PRD tasks are now complete. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

mattpocock mentioned this pull request Oct 19, 2025

feat: Support .env files by default via dotenv/config #243

Closed

mattpocock added this to the v1 milestone Oct 20, 2025

This was referenced Oct 21, 2025

Build a library of scorers #250

Closed

Move from React Markdown to Streamdown #256

Closed

mattpocock force-pushed the v1 branch from 9423bf2 to df9484b Compare October 23, 2025 09:32

mattpocock force-pushed the v1 branch 3 times, most recently from c69a19d to 4ae1080 Compare November 6, 2025 14:48

mattpocock mentioned this pull request Nov 8, 2025

Add 'copy' button to each page of the docs #307

Open

mattpocock force-pushed the v1 branch from a5f098c to 8c4667c Compare November 8, 2025 13:51

This was referenced Nov 9, 2025

Remove implicit reading of vitest.config.ts/vite.config.ts files #303

Closed

Add failing test for issue #95 (vitest workspace conflict) #245

Closed

Ability to specify which config files are read (--config) #296

Closed

mattpocock force-pushed the v1 branch from 9b843a9 to 08e62c2 Compare November 9, 2025 17:19

mattpocock and others added 11 commits November 10, 2025 17:39

Changed default storage to in-memory. SQLite still available via config.

93113dc

Remove problematic backend-only-constants imports

82ef941

- Remove unused DB_LOCATION import from test-utils.ts - Replace FILES_LOCATION import with local constant in files.test.ts Co-authored-by: Matt Pocock <mattpocock@users.noreply.github.com>

Fixed CI properly

e586e61

Huge move from evals -> suites, and results -> evals

fca9086

Added changeset

172f5e1

Removed streaming text support from tasks.

460a77e

Fixes after cherrypick

8d8ec99

Formatting

926e1b8

Docs updates

f8a928c

mattpocock and others added 16 commits November 15, 2025 14:36

Added vite-tsconfig-paths to config docs

8e305c0

Update

104c777

Fixes #306. Added variant to the table displayed in the CLI

3e0a32e

Bump version

d0bfdc8

Fixed linting errors

471888b

Various UI improvements to bring the data table further up the page.

c2e07c7

Bump version

0252b5e

Fixes #336. Fix types for viteConfig

ac7c109

Bump version

2156315

Fixed #323. Fixed confusing message when running Evalite regarding en…

04bee19

…v-setup-file.

Bump version

e8ce970

A further fix for the strange env-setup-file.ts error.

f2a8460

Bump version

b7449f9

Fixed #340. Fixed a bug where evalite export would hang until closed.

27401f7

Bump version

d59fade

mattpocock mentioned this pull request Nov 26, 2025

feat: support extra watch file globs in Evalite #351

Merged

mattpocock and others added 13 commits November 28, 2025 20:14

Add docs link to UI header. Fixes #355.

e0feff2

🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Add repro for #357. Module error in one file + passing evals in anoth…

b9d4a40

…er incorrectly reports success. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Added another failing test for 358

8f0c514

Fixed #357. Module errors now counted in failed tasks, preventing thr…

3e591c6

…eshold success from overriding failures. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Fixed once.sh script

0510472

Improved prompt and added .pnpm-store to gitignore

63d7bde

fix: keep server enabled in exportCommand, only disable in tests

04f2bae

The server is needed in production for caching. Instead of hardcoding disableServer: true, expose it as an opt-in parameter so only tests disable it. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Added install to the release script

7e833fa

Bump version

2c55228

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v1#240

v1#240
mattpocock wants to merge 154 commits intomainfrom
v1

mattpocock commented Oct 19, 2025

Uh oh!

changeset-bot bot commented Oct 19, 2025 •

edited

Loading

Uh oh!

vercel bot commented Oct 19, 2025 •

edited

Loading

Uh oh!

pkg-pr-new bot commented Oct 19, 2025 •

edited

Loading

Uh oh!

mattpocock commented Oct 19, 2025

Uh oh!

claude bot commented Oct 19, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

mattpocock commented Oct 19, 2025

Uh oh!

changeset-bot bot commented Oct 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🦋 Changeset detected

Uh oh!

vercel bot commented Oct 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pkg-pr-new bot commented Oct 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mattpocock commented Oct 19, 2025

Uh oh!

claude bot commented Oct 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Task completed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

changeset-bot bot commented Oct 19, 2025 •

edited

Loading

vercel bot commented Oct 19, 2025 •

edited

Loading

pkg-pr-new bot commented Oct 19, 2025 •

edited

Loading

claude bot commented Oct 19, 2025 •

edited

Loading