fix(agent): gate prompt caching by provider capability, not base_url/model string match by sumleo · Pull Request #455 · math-inc/OpenGauss

sumleo · 2026-06-17T01:30:47Z

What does this PR do?

Anthropic prompt caching was gated on a base_url substring ("openrouter" in base_url). A Claude model served through a custom Anthropic-compatible endpoint (Z.ai, LiteLLM, self-hosted proxy) therefore silently lost caching — a ~75% input-token-cost regression on multi-turn runs for those users.

This replaces the fragile string match with a capability check driven by the runtime provider resolution, and applies it to both sites that compute the gate.

Related Issue

Fixes #454

Type of Change

🐛 Bug fix (non-breaking change that fixes an issue)

Changes Made

gauss_cli/runtime_provider.py: add supports_anthropic_prompt_caching(api_mode, model, base_url). Returns True for the native Anthropic Messages API, and for any chat_completions endpoint serving a Claude model (OpenRouter or custom). Codex Responses and non-Claude chat-completions return False.
run_agent.py AIAgent.__init__ (L386-393): replace (is_openrouter and is_claude) or is_native_anthropic with the capability call.
run_agent.py provider-fallback recompute (L2650-2657): replace the duplicate ("openrouter" in fb_base_url ... and "claude" in fb_model ...) gate with the same capability call, so a Claude fallback on a custom endpoint also keeps caching.

self._use_prompt_caching remains the sole gate on the cache_control marker (L4338-4339), so both code paths now enable caching consistently.

Intentional test-expectation change

The old test_prompt_caching_non_openrouter asserted caching was OFF for a Claude model on a non-OpenRouter endpoint — that asserted the bug. It is renamed to test_prompt_caching_claude_custom_endpoint and now asserts caching is ON, which is the corrected behavior. A new test_prompt_caching_non_claude_custom_endpoint keeps coverage that a non-Claude model on a custom endpoint stays OFF. The existing OpenRouter / non-Claude / native-Anthropic / fallback-to-anthropic tests are unchanged and still pass.

How to Test

pip install -e . (or uv pip install -e .)
python -m pytest tests/test_run_agent.py -q
All 158 tests pass, including the 6 prompt-caching cases.

Checklist

Code

My commit messages follow Conventional Commits (fix(agent): ...)
My PR contains only changes related to this fix
I've run pytest tests/test_run_agent.py -q and all tests pass
I've added/updated tests for my changes
I've tested on my platform: macOS 15 (Python 3.11)

Documentation & Housekeeping

No config keys added/changed — N/A
No architecture/workflow docs affected — N/A

…model string match

sumleo · 2026-06-18T03:10:07Z

Hi @math-inc, gentle nudge on this when you have a moment. It's a small, self-contained prompt-caching fix, and I'm happy to rebase or tweak anything if that would make review easier. Thanks for the project and your time!

fix(agent): gate prompt caching by provider capability, not base_url/…

9d5ea22

…model string match

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(agent): gate prompt caching by provider capability, not base_url/model string match#455

fix(agent): gate prompt caching by provider capability, not base_url/model string match#455
sumleo wants to merge 1 commit into
math-inc:mainfrom
sumleo:fix/prompt-caching-capability-gate

sumleo commented Jun 17, 2026

Uh oh!

sumleo commented Jun 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

sumleo commented Jun 17, 2026

What does this PR do?

Related Issue

Type of Change

Changes Made

Intentional test-expectation change

How to Test

Checklist

Code

Documentation & Housekeeping

Uh oh!

sumleo commented Jun 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant