feat (L3 KVStore): prefetch and backup support by ehuohz · Pull Request #293 · lightseekorg/tokenspeed

ehuohz · 2026-05-28T07:03:32Z

Summary

Prefetch (L3 → Host)

On request submission, query Mooncake for existing KV pages. If hits exceed prefetch_threshold, take an async prefetch path (Submitted → Prefetching → PrefetchDone → Prefilling) instead of direct prefill. Completed pages are inserted into the radix tree's host layer with proper OwnedPages ownership transfer.

Backup (Host → L3)

On WriteBackDone, emit a fire-and-forget BackUpOperation to persist host pages to Mooncake. Backup metadata is captured at WriteBackOperation creation time while the Draining state's host node-ref is still alive.

Key changes

FSM: Add SchedulePrefillFirstChunkEvent::operator()(PrefetchDone&&) via templated applyFirstChunk() so prefetch-completed requests can enter prefill.
forward.cpp: Attempt schedulePrefetch for Submitted requests before falling through to prefill. Treat PrefetchDone with same scheduling priority.
outside_event_handler.cpp: Transfer host page ownership via OwnedPages into Insert(); RAII-free uncompleted pages. Add WriteBackDone hook to emit BackUpOperation.
scheduler.cpp: Capture L3 backup metadata (rolling hashes, host page IDs) in CacheOpSpec during newWriteBackOperation. Drain pending_prefetch_ops_ and pending_backup_ops_ in NextExecutionPlan().

Test Plan

Served kimi-k2.5 with Mooncake KVStore enabled in dev container
Sent a long-generation request (long prompt, max_tokens=262144), then sent multiple different requests to fill device/host cache and force eviction of the first request's KV pages to L3
Re-sent the same first request; verified L3 prefetch path activated (KV pages fetched back from Mooncake → host → device) and output matched the original response
Confirmed L3 fetch via Mooncake batch_get_into logs (439 tokens prompt, 439/64 = 6, 6 * TP4 = 24):

Mooncake log:
 | Requests (Success/Total): PutStart=4/4, PutEnd=4/4, PutRevoke=0/0, Get=4/4, Exist=4/4, Del=0/0, DelAll=0/0, Ping=2228/2228, CopyStart=0/0, CopyEnd=0/0, CopyRevoke=0/0, MoveStart=0/0, MoveEnd=0/0, MoveRevoke=0/0, EvictDiskReplica=0/0 | Batch Requests (Req=Success/PartialSuccess/Total, Item=Success/Total): PutStart:(Req=14/0/51, Item=1369/5238), PutEnd:(Req=14/0/14, Item=1369/1369), PutRevoke:(Req=0/0/0, Item=0/0), Get:(Req=4/0/4, Item=24/24), ExistKey:(Req=64/0/64, Item=5524/5524), QueryIp:(Req=0/0/0, Item=0/0), Clear:(Req=0/0/0, Item=0/0), CreateMoveTask:(Req=0/0), CreateCopyTask:(Req=0/0), QueryTask=(Req=0/0), FetchTasks=(Req=2228/2228), MarkTaskToComplete= (Req=0/0),  | Eviction: Success/Attempts=0/0, keys=0, size=0 B | Discard: Released/Total=0/0, StagingSize=0 B | Snapshots: Success=0, Fail=0}, ha={HA Metrics Summary: last_seq=0, applied_seq=0, lag=0, pending=0, mutation_queue=0, batch_commits=0, sync_commits=0, skipped=0, checksum_fail=0, etcd_fail=0, watch_disconn=0, state=0}

ts log:
[ts] I0528 04:36:33.135007 2732203 real_client.cpp:3556] Time taken for batch_get_into: 9814us, read store: 0us, with memory key count: 6, offload key count: 0
[ts] I0528 04:36:33.135859 2732205 real_client.cpp:3556] Time taken for batch_get_into: 10103us, read store: 0us, with memory key count: 6, offload key count: 0
[ts] I0528 04:36:33.136945 2732206 real_client.cpp:3556] Time taken for batch_get_into: 11312us, read store: 0us, with memory key count: 6, offload key count: 0
[ts] I0528 04:36:33.147481 2732204 real_client.cpp:3556] Time taken for batch_get_into: 22004us, read store: 0us, with memory key count: 6, offload key count: 0

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 487eba9340

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c707d29204

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e732341112

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 074c709457

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Copilot

Pull request overview

This PR wires Mooncake L3 KV-store integration into the scheduler with two flows: (1) an asynchronous prefetch from L3→Host on request submission with proper page-ownership transfer into the radix tree, and (2) a fire-and-forget backup from Host→L3 triggered on WriteBackDone, with backup metadata captured at WriteBackOperation creation time while the Draining state's host node-ref is still alive. FSM support is extended so PrefetchDone can transition into prefill via a templated applyFirstChunk.

Changes:

Add async L3 prefetch path in newForwardOperation (Submitted → Prefetching → PrefetchDone → Prefilling), with host node locking before eviction and ownership transfer of host pages into Insert<Host>().
Add BackUpDone event + BackUpOperation emission on WriteBackDone, with CacheOpSpec carrying captured host page IDs and rolling hashes.
Drain pending_prefetch_ops_ / pending_backup_ops_ in NextExecutionPlan(); extend FSM to allow SchedulePrefillFirstChunkEvent from both Submitted and PrefetchDone.

Reviewed changes

Copilot reviewed 16 out of 16 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
tokenspeed-scheduler/csrc/scheduler/scheduler.h	Declares `BackUpDone` handler and pending L3 op queues.
tokenspeed-scheduler/csrc/scheduler/scheduler.cpp	Captures backup metadata in WriteBackOp; drains pending L3 ops into ExecutionPlan.
tokenspeed-scheduler/csrc/scheduler/request.h / request.cpp	Adds `TakeHostPages()` and `GetHostNode<Draining>()` accessors.
tokenspeed-scheduler/csrc/scheduler/outside_events/cache.h	Adds `BackUpDone` event and includes it in `CacheEvent` variant.
tokenspeed-scheduler/csrc/scheduler/outside_event_handler.cpp	Refactors PrefetchDone to transfer OwnedPages; emits backup op on WriteBackDone; adds (reserved) BackUpDone handler.
tokenspeed-scheduler/csrc/scheduler/operations/forward.cpp	Adds prefetch attempt for Submitted; treats PrefetchDone same as Submitted for prefill priority.
tokenspeed-scheduler/csrc/scheduler/operations/cache.cpp	Locks matched host node before eviction in `schedulePrefetch`.
tokenspeed-scheduler/csrc/resource/types.h	Adds backup fields to `CacheOpSpec`.
tokenspeed-scheduler/csrc/fsm/forward_states.h	Exposes `HostNode()` accessor on `Draining`.
tokenspeed-scheduler/csrc/fsm/forward_events.h / .cpp	Templated `applyFirstChunk` enables transition from PrefetchDone.
tokenspeed-scheduler/csrc/fsm/cache_events.h / .cpp	`SchedulePrefetchEvent` now owns a `HostNodeRef` (RAII lock).
tokenspeed-scheduler/bindings/python_module.cpp	Binds `BackUpDoneEvent`.
python/tokenspeed/runtime/engine/scheduler_utils.py	Registers BackUpDoneEvent and round-trips `completed_pages` in payloads.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

XucSh · 2026-06-08T09:19:13Z

Partial L2+L3 hits insert prefetched suffix pages under the wrong prefix. calc_l3_query_hashes() uses apply_match=True, so the C++ scheduler skips pages already matched in host cache and only returns hashes for token_pages[host_matched:]. However, the PrefetchDone handler still builds insert_token_pages from token_pages.begin(), without carrying the host-match offset. If host has the first h pages and L3 has the next n, the pages fetched from Mooncake are suffix pages but get inserted as the first n prompt pages, so later loadback can use corrupt KV. Please carry the host matched page offset through Prefetching/PrefetchDone and insert starting at that offset.

XucSh · 2026-06-08T09:19:43Z

@@ -40,6 +40,7 @@
 _CACHE_EVENT_TYPES = {


Backup source host pages are not pinned until backup completes. WriteBackDone queues a BackUpOp with only raw host page ids, then immediately applies WriteBackDoneEvent, which releases the WritingBack node refs. The Python backup reads those host pages asynchronously later, so host eviction can reuse them before _run_backup() calls batch_set_v1(), storing unrelated KV under the captured hashes.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f8e5fe234e

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-06-12T07:59:28Z

    std::vector<std::string> hashes(storage.rolling_hashes.begin(),
                                    storage.rolling_hashes.begin() + num_pages_to_fetch);


Align L3 hashes with the match used for prefetch

When a Submitted request waits while the L2 host match changes after admission (for example another request writes back or evicts part of the prefix), the stored rolling_hashes still start at the offset used by calc_l3_query_hashes(..., apply_match=True) at admission, not necessarily at the current match.host.DepthInPage() saved above. This slice always starts at begin(), and PrefetchDone inserts those pages at the current prefetch_start_page, so a changed host prefix downloads/stores the wrong page contents under the wrong token pages. Recompute the hashes against the current match or carry the admission-time start offset and adjust the slice before scheduling the prefetch.

Useful? React with 👍 / 👎.

Signed-off-by: He Zhou <zhouhe2025@gmail.com> Signed-off-by: zhouhe2025 <670085873@qq.com>

…ransition Signed-off-by: He Zhou <zhouhe2025@gmail.com> Signed-off-by: zhouhe2025 <670085873@qq.com>

Signed-off-by: He Zhou <zhouhe2025@gmail.com> Signed-off-by: zhouhe2025 <670085873@qq.com>

Signed-off-by: zhouhe2025 <670085873@qq.com>

github-actions · 2026-06-27T00:27:54Z

This PR has been inactive for 14 days and is marked as stale. It will be closed in 3 days if there is no further activity.

ehuohz requested a review from a team as a code owner May 28, 2026 07:03

XucSh self-assigned this May 28, 2026

chatgpt-codex-connector Bot reviewed May 28, 2026

View reviewed changes

Comment thread tokenspeed-scheduler/csrc/scheduler/operations/forward.cpp

Comment thread tokenspeed-scheduler/csrc/scheduler/scheduler.cpp Outdated

chatgpt-codex-connector Bot reviewed May 28, 2026

View reviewed changes

Comment thread tokenspeed-scheduler/csrc/scheduler/outside_event_handler.cpp

ehuohz force-pushed the main branch 2 times, most recently from b6586ec to e732341 Compare May 28, 2026 08:04

chatgpt-codex-connector Bot reviewed May 28, 2026

View reviewed changes

Comment thread python/tokenspeed/runtime/engine/event_loop.py

ehuohz force-pushed the main branch from e732341 to 074c709 Compare May 28, 2026 08:26

chatgpt-codex-connector Bot reviewed May 28, 2026

View reviewed changes

Comment thread tokenspeed-scheduler/csrc/scheduler/operations/forward.cpp

XucSh requested a review from Copilot June 2, 2026 07:24

Copilot started reviewing on behalf of XucSh June 2, 2026 07:24 View session

Copilot AI reviewed Jun 2, 2026

View reviewed changes

ehuohz changed the title ~~(feat) L3 KVStore: prefetch and backup support~~ feat (L3 KVStore): prefetch and backup support Jun 2, 2026

XucSh reviewed Jun 8, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed Jun 12, 2026

View reviewed changes

ehuohz and others added 4 commits June 12, 2026 16:01

add L3 Mooncake kvstore, support prefetch&backup

d8f5e4a

Signed-off-by: He Zhou <zhouhe2025@gmail.com> Signed-off-by: zhouhe2025 <670085873@qq.com>

fix(l3-prefetch): Insert<Host> page ownership, and PrefetchDone FSM t…

ff6160c

…ransition Signed-off-by: He Zhou <zhouhe2025@gmail.com> Signed-off-by: zhouhe2025 <670085873@qq.com>

bugfix: host prefix protect & backup hash leaf-to-root

58925a3

Signed-off-by: He Zhou <zhouhe2025@gmail.com> Signed-off-by: zhouhe2025 <670085873@qq.com>

preserve L3 prefetch offsets and backup pins

acd4a0f

Signed-off-by: zhouhe2025 <670085873@qq.com>

ehuohz force-pushed the main branch from f8e5fe2 to acd4a0f Compare June 12, 2026 08:01

github-actions Bot added the inactive label Jun 27, 2026

		std::vector<std::string> hashes(storage.rolling_hashes.begin(),
		storage.rolling_hashes.begin() + num_pages_to_fetch);

Uh oh!

Conversation

ehuohz commented May 28, 2026

Summary

Prefetch (L3 → Host)

Backup (Host → L3)

Key changes

Test Plan

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

XucSh commented Jun 8, 2026

Uh oh!

XucSh Jun 8, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Jun 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants