feat: session affinity + prompt cache for multi-turn conversations by icebear0828 · Pull Request #242 · icebear0828/codex-proxy

icebear0828 · 2026-03-27T07:33:13Z

Summary

Session affinity: route all turns of a conversation to the same account via responseId → entryId mapping, fixing previous_response_id breaking across account rotation
prompt_cache_key: send per-conversation UUID to enable backend prompt caching — Turn 4+ sees 93% cache hit rate (was 0% before)
Missing fields: forward service_tier, include on WebSocket path (were silently dropped)
Request-level monitoring: affinity hit/miss, payload size, usage stats

Verified

Turn     in    out  cached  cache%
Turn 1    87   3698       0    0.0%
Turn 2  3772    157       0    0.0%
Turn 3  3948    146       0    0.0%
Turn 4  4111   1519    3840   93.4%  ← prompt cache hit

Changes

File	Change
`session-affinity.ts`	New: `responseId → (entryId, conversationId)` map with 4h TTL
`proxy-handler.ts`	Affinity lookup/record, generate conversationId, set prompt_cache_key + include
`response-processor.ts`	Pass through `onResponseId` callback (was discarded)
`account-lifecycle.ts`	`acquire()` supports `preferredEntryId` hint
`codex-types.ts`	Add `prompt_cache_key`, `include` to request type
`ws-transport.ts`	Add `service_tier`, `prompt_cache_key`, `include` to WS message
`codex-api.ts`	Forward new fields on WS; stop stripping `service_tier` on HTTP

Test plan

1394 tests pass
Session affinity: 10 unit tests (account mapping + conversationId tracking)
Account acquisition: preferredEntryId passthrough
E2E: 4-turn conversation → same account, affinity=hit, cached_tokens > 0

Route subsequent turns of a conversation to the same account that created the initial response. This fixes two issues caused by account rotation breaking conversation chains: 1. previous_response_id becoming invalid across accounts — the backend stores conversation state per-account, so switching accounts meant losing server-side history 2. Prompt cache misses — cache is per-account on the backend, rotating accounts forced full context reprocessing every turn Implementation: - SessionAffinityMap: responseId → entryId mapping with 4h TTL - acquire() accepts preferredEntryId hint, falls back to normal rotation - proxy-handler captures responseId from both streaming and non-streaming paths (onResponseId callback was previously discarded) - Request logs now show affinity=hit/miss, payload size, and usage stats

Send prompt_cache_key (per-conversation UUID) in every request to enable backend prompt caching. The conversation ID is inherited across the previous_response_id chain via SessionAffinityMap. Also: - Forward service_tier on both WebSocket and HTTP paths (was dropped) - Send include: ["reasoning.encrypted_content"] when reasoning is active - Extend SessionAffinityMap with conversationId tracking

icebear0828 added 2 commits March 27, 2026 02:32

icebear0828 changed the title ~~feat: session affinity for multi-turn conversations~~ feat: session affinity + prompt cache for multi-turn conversations Mar 27, 2026

icebear0828 merged commit e069ef4 into master Mar 27, 2026
1 check passed

icebear0828 deleted the feat/session-affinity branch March 27, 2026 08:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: session affinity + prompt cache for multi-turn conversations#242

feat: session affinity + prompt cache for multi-turn conversations#242
icebear0828 merged 2 commits intomasterfrom
feat/session-affinity

icebear0828 commented Mar 27, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

icebear0828 commented Mar 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Verified

Changes

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

icebear0828 commented Mar 27, 2026 •

edited

Loading