Move semantic hints to user prompt for cross-request caching#130
Conversation
System prompt is now static per schema version, enabling prompt caching across requests. Semantic hints (which change per image/description) are placed in the user prompt instead. The system prompt includes a pointer instructing the LLM to check the user message for hints. Fixes #129
Deploying hedit with
|
| Latest commit: |
2a33a32
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://c4a1b4e7.hedit.pages.dev |
| Branch Preview URL: | https://feature-issue-129-cache-frie.hedit.pages.dev |
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
- Rename _format_semantic_hints to format_semantic_hints (public API, used cross-module) - Align header: system prompt pointer and actual section both say "SEMANTIC HINTS" - Soften system prompt wording to "may include" (hints are optional) - Skip hints with empty tag keys - Add debug logging when hints are included in user prompt - Add 10 tests: user prompt with/without hints, confidence bucketing, system prompt caching invariant
PR Review Summary (3 agents: code-reviewer, silent-failure-hunter, test-analyzer)Critical Issues (0 found)None. Important Issues (4 found, ALL FIXED in 2a33a32)
Suggestions (noted, not fixed)
All tests pass
|
…hing (#135) * Move semantic hints to user prompt for cross-request caching (#130) * Move semantic hints from system prompt to user prompt System prompt is now static per schema version, enabling prompt caching across requests. Semantic hints (which change per image/description) are placed in the user prompt instead. The system prompt includes a pointer instructing the LLM to check the user message for hints. Fixes #129 * Address review findings for cache-friendly prompts - Rename _format_semantic_hints to format_semantic_hints (public API, used cross-module) - Align header: system prompt pointer and actual section both say "SEMANTIC HINTS" - Soften system prompt wording to "may include" (hints are optional) - Skip hints with empty tag keys - Add debug logging when hints are included in user prompt - Add 10 tests: user prompt with/without hints, confidence bucketing, system prompt caching invariant * Bump version to 0.7.6.dev3 --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
* Move semantic hints to user prompt for cross-request caching (#130) * Move semantic hints from system prompt to user prompt System prompt is now static per schema version, enabling prompt caching across requests. Semantic hints (which change per image/description) are placed in the user prompt instead. The system prompt includes a pointer instructing the LLM to check the user message for hints. Fixes #129 * Address review findings for cache-friendly prompts - Rename _format_semantic_hints to format_semantic_hints (public API, used cross-module) - Align header: system prompt pointer and actual section both say "SEMANTIC HINTS" - Soften system prompt wording to "may include" (hints are optional) - Skip hints with empty tag keys - Add debug logging when hints are included in user prompt - Add 10 tests: user prompt with/without hints, confidence bucketing, system prompt caching invariant * Bump version to 0.7.6.dev3 * Update default models to latest Qwen and Anthropic - Evaluation: qwen/qwen3-235b-a22b-2507 -> qwen/qwen3.5-397b-a17b (most capable Qwen MoE, $0.39/M prompt) - Vision: qwen/qwen3-vl-30b-a3b-instruct -> qwen/qwen3-vl-32b-instruct (newer VL model, $0.10/M prompt) - Annotation: keep anthropic/claude-haiku-4.5 (unchanged) - Replace all legacy gpt-oss-120b references in defaults and docs - Provider: let OpenRouter auto-route for Qwen models * Bump version to 0.7.6.dev4 --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Summary
Problem
Prompt caching broke between requests because semantic hints (different per image) were embedded in the system prompt. Since Anthropic's caching uses prefix matching, any change invalidated the cache for the entire ~1000-tag vocabulary and rules section.
Solution
The system prompt now contains only static content (vocabulary, rules, patterns). A short pointer says "Check the user message for SEMANTIC HINTS." The actual hints are in the user prompt, which already changes per request.
For batch processing of 1000 images, the system prompt cost is paid once and cached for all subsequent requests (within the 5-minute TTL).
Test plan
Fixes #129