Skip to content

fix(agent): gate prompt caching by provider capability, not base_url/model string match#455

Open
sumleo wants to merge 1 commit into
math-inc:mainfrom
sumleo:fix/prompt-caching-capability-gate
Open

fix(agent): gate prompt caching by provider capability, not base_url/model string match#455
sumleo wants to merge 1 commit into
math-inc:mainfrom
sumleo:fix/prompt-caching-capability-gate

Conversation

@sumleo

@sumleo sumleo commented Jun 17, 2026

Copy link
Copy Markdown

What does this PR do?

Anthropic prompt caching was gated on a base_url substring ("openrouter" in base_url). A Claude model served through a custom Anthropic-compatible endpoint (Z.ai, LiteLLM, self-hosted proxy) therefore silently lost caching — a ~75% input-token-cost regression on multi-turn runs for those users.

This replaces the fragile string match with a capability check driven by the runtime provider resolution, and applies it to both sites that compute the gate.

Related Issue

Fixes #454

Type of Change

  • 🐛 Bug fix (non-breaking change that fixes an issue)

Changes Made

  • gauss_cli/runtime_provider.py: add supports_anthropic_prompt_caching(api_mode, model, base_url). Returns True for the native Anthropic Messages API, and for any chat_completions endpoint serving a Claude model (OpenRouter or custom). Codex Responses and non-Claude chat-completions return False.
  • run_agent.py AIAgent.__init__ (L386-393): replace (is_openrouter and is_claude) or is_native_anthropic with the capability call.
  • run_agent.py provider-fallback recompute (L2650-2657): replace the duplicate ("openrouter" in fb_base_url ... and "claude" in fb_model ...) gate with the same capability call, so a Claude fallback on a custom endpoint also keeps caching.

self._use_prompt_caching remains the sole gate on the cache_control marker (L4338-4339), so both code paths now enable caching consistently.

Intentional test-expectation change

The old test_prompt_caching_non_openrouter asserted caching was OFF for a Claude model on a non-OpenRouter endpoint — that asserted the bug. It is renamed to test_prompt_caching_claude_custom_endpoint and now asserts caching is ON, which is the corrected behavior. A new test_prompt_caching_non_claude_custom_endpoint keeps coverage that a non-Claude model on a custom endpoint stays OFF. The existing OpenRouter / non-Claude / native-Anthropic / fallback-to-anthropic tests are unchanged and still pass.

How to Test

  1. pip install -e . (or uv pip install -e .)
  2. python -m pytest tests/test_run_agent.py -q
  3. All 158 tests pass, including the 6 prompt-caching cases.

Checklist

Code

  • My commit messages follow Conventional Commits (fix(agent): ...)
  • My PR contains only changes related to this fix
  • I've run pytest tests/test_run_agent.py -q and all tests pass
  • I've added/updated tests for my changes
  • I've tested on my platform: macOS 15 (Python 3.11)

Documentation & Housekeeping

  • No config keys added/changed — N/A
  • No architecture/workflow docs affected — N/A

@sumleo

sumleo commented Jun 18, 2026

Copy link
Copy Markdown
Author

Hi @math-inc, gentle nudge on this when you have a moment. It's a small, self-contained prompt-caching fix, and I'm happy to rebase or tweak anything if that would make review easier. Thanks for the project and your time!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Prompt caching disabled for Claude on custom (non-OpenRouter) endpoints

1 participant