Skip to content

feat: session affinity + prompt cache for multi-turn conversations#242

Merged
icebear0828 merged 2 commits intomasterfrom
feat/session-affinity
Mar 27, 2026
Merged

feat: session affinity + prompt cache for multi-turn conversations#242
icebear0828 merged 2 commits intomasterfrom
feat/session-affinity

Conversation

@icebear0828
Copy link
Copy Markdown
Owner

@icebear0828 icebear0828 commented Mar 27, 2026

Summary

  • Session affinity: route all turns of a conversation to the same account via responseId → entryId mapping, fixing previous_response_id breaking across account rotation
  • prompt_cache_key: send per-conversation UUID to enable backend prompt caching — Turn 4+ sees 93% cache hit rate (was 0% before)
  • Missing fields: forward service_tier, include on WebSocket path (were silently dropped)
  • Request-level monitoring: affinity hit/miss, payload size, usage stats

Verified

Turn     in    out  cached  cache%
Turn 1    87   3698       0    0.0%
Turn 2  3772    157       0    0.0%
Turn 3  3948    146       0    0.0%
Turn 4  4111   1519    3840   93.4%  ← prompt cache hit

Changes

File Change
session-affinity.ts New: responseId → (entryId, conversationId) map with 4h TTL
proxy-handler.ts Affinity lookup/record, generate conversationId, set prompt_cache_key + include
response-processor.ts Pass through onResponseId callback (was discarded)
account-lifecycle.ts acquire() supports preferredEntryId hint
codex-types.ts Add prompt_cache_key, include to request type
ws-transport.ts Add service_tier, prompt_cache_key, include to WS message
codex-api.ts Forward new fields on WS; stop stripping service_tier on HTTP

Test plan

  • 1394 tests pass
  • Session affinity: 10 unit tests (account mapping + conversationId tracking)
  • Account acquisition: preferredEntryId passthrough
  • E2E: 4-turn conversation → same account, affinity=hit, cached_tokens > 0

Route subsequent turns of a conversation to the same account that
created the initial response. This fixes two issues caused by account
rotation breaking conversation chains:

1. previous_response_id becoming invalid across accounts — the backend
   stores conversation state per-account, so switching accounts meant
   losing server-side history
2. Prompt cache misses — cache is per-account on the backend, rotating
   accounts forced full context reprocessing every turn

Implementation:
- SessionAffinityMap: responseId → entryId mapping with 4h TTL
- acquire() accepts preferredEntryId hint, falls back to normal rotation
- proxy-handler captures responseId from both streaming and non-streaming
  paths (onResponseId callback was previously discarded)
- Request logs now show affinity=hit/miss, payload size, and usage stats
Send prompt_cache_key (per-conversation UUID) in every request to enable
backend prompt caching. The conversation ID is inherited across the
previous_response_id chain via SessionAffinityMap.

Also:
- Forward service_tier on both WebSocket and HTTP paths (was dropped)
- Send include: ["reasoning.encrypted_content"] when reasoning is active
- Extend SessionAffinityMap with conversationId tracking
@icebear0828 icebear0828 changed the title feat: session affinity for multi-turn conversations feat: session affinity + prompt cache for multi-turn conversations Mar 27, 2026
@icebear0828 icebear0828 merged commit e069ef4 into master Mar 27, 2026
1 check passed
@icebear0828 icebear0828 deleted the feat/session-affinity branch March 27, 2026 08:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant