Skip to content

Update default models to Qwen 3.5 and fix prompt caching#136

Merged
neuromechanist merged 7 commits into
mainfrom
develop
Apr 1, 2026
Merged

Update default models to Qwen 3.5 and fix prompt caching#136
neuromechanist merged 7 commits into
mainfrom
develop

Conversation

@neuromechanist

Copy link
Copy Markdown
Member

Summary

Model updates

  • Evaluation: qwen/qwen3-235b-a22b-2507 -> qwen/qwen3.5-397b-a17b (most capable Qwen MoE, $0.39/M prompt)
  • Vision: qwen/qwen3-vl-30b-a3b-instruct -> qwen/qwen3-vl-32b-instruct (newer VL model, $0.10/M prompt)
  • Annotation: anthropic/claude-haiku-4.5 (unchanged)
  • Replaced all legacy gpt-oss-120b defaults across CLI, API, and utilities
  • Provider: OpenRouter auto-routes for Qwen models (high-throughput US providers)

Cache-friendly prompts (#129, #130)

  • Moved semantic hints from system prompt to user prompt
  • System prompt is now static per schema version, enabling prompt caching across requests

Test plan

cc @neuromechanist

neuromechanist and others added 5 commits March 30, 2026 03:45
* Move semantic hints from system prompt to user prompt

System prompt is now static per schema version, enabling prompt caching
across requests. Semantic hints (which change per image/description)
are placed in the user prompt instead. The system prompt includes a
pointer instructing the LLM to check the user message for hints.

Fixes #129

* Address review findings for cache-friendly prompts

- Rename _format_semantic_hints to format_semantic_hints (public API,
  used cross-module)
- Align header: system prompt pointer and actual section both say
  "SEMANTIC HINTS"
- Soften system prompt wording to "may include" (hints are optional)
- Skip hints with empty tag keys
- Add debug logging when hints are included in user prompt
- Add 10 tests: user prompt with/without hints, confidence bucketing,
  system prompt caching invariant
- Evaluation: qwen/qwen3-235b-a22b-2507 -> qwen/qwen3.5-397b-a17b
  (most capable Qwen MoE, $0.39/M prompt)
- Vision: qwen/qwen3-vl-30b-a3b-instruct -> qwen/qwen3-vl-32b-instruct
  (newer VL model, $0.10/M prompt)
- Annotation: keep anthropic/claude-haiku-4.5 (unchanged)
- Replace all legacy gpt-oss-120b references in defaults and docs
- Provider: let OpenRouter auto-route for Qwen models
@cloudflare-workers-and-pages

cloudflare-workers-and-pages Bot commented Apr 1, 2026

Copy link
Copy Markdown

Deploying hedit with  Cloudflare Pages  Cloudflare Pages

Latest commit: 8ec9ebd
Status:⚡️  Build in progress...

View logs

@codecov

codecov Bot commented Apr 1, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 50.00000% with 7 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/api/main.py 25.00% 6 Missing ⚠️
src/scripts/process_feedback.py 0.00% 1 Missing ⚠️

📢 Thoughts on this report? Let us know!

@neuromechanist neuromechanist merged commit 5b1ed7b into main Apr 1, 2026
23 of 24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant