feat (L3 KVStore): prefetch and backup support#293
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 487eba9340
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c707d29204
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
b6586ec to
e732341
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e732341112
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 074c709457
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
There was a problem hiding this comment.
Pull request overview
This PR wires Mooncake L3 KV-store integration into the scheduler with two flows: (1) an asynchronous prefetch from L3→Host on request submission with proper page-ownership transfer into the radix tree, and (2) a fire-and-forget backup from Host→L3 triggered on WriteBackDone, with backup metadata captured at WriteBackOperation creation time while the Draining state's host node-ref is still alive. FSM support is extended so PrefetchDone can transition into prefill via a templated applyFirstChunk.
Changes:
- Add async L3 prefetch path in
newForwardOperation(Submitted → Prefetching → PrefetchDone → Prefilling), with host node locking before eviction and ownership transfer of host pages intoInsert<Host>(). - Add
BackUpDoneevent +BackUpOperationemission onWriteBackDone, withCacheOpSpeccarrying captured host page IDs and rolling hashes. - Drain
pending_prefetch_ops_/pending_backup_ops_inNextExecutionPlan(); extend FSM to allowSchedulePrefillFirstChunkEventfrom bothSubmittedandPrefetchDone.
Reviewed changes
Copilot reviewed 16 out of 16 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| tokenspeed-scheduler/csrc/scheduler/scheduler.h | Declares BackUpDone handler and pending L3 op queues. |
| tokenspeed-scheduler/csrc/scheduler/scheduler.cpp | Captures backup metadata in WriteBackOp; drains pending L3 ops into ExecutionPlan. |
| tokenspeed-scheduler/csrc/scheduler/request.h / request.cpp | Adds TakeHostPages() and GetHostNode<Draining>() accessors. |
| tokenspeed-scheduler/csrc/scheduler/outside_events/cache.h | Adds BackUpDone event and includes it in CacheEvent variant. |
| tokenspeed-scheduler/csrc/scheduler/outside_event_handler.cpp | Refactors PrefetchDone to transfer OwnedPages; emits backup op on WriteBackDone; adds (reserved) BackUpDone handler. |
| tokenspeed-scheduler/csrc/scheduler/operations/forward.cpp | Adds prefetch attempt for Submitted; treats PrefetchDone same as Submitted for prefill priority. |
| tokenspeed-scheduler/csrc/scheduler/operations/cache.cpp | Locks matched host node before eviction in schedulePrefetch. |
| tokenspeed-scheduler/csrc/resource/types.h | Adds backup fields to CacheOpSpec. |
| tokenspeed-scheduler/csrc/fsm/forward_states.h | Exposes HostNode() accessor on Draining. |
| tokenspeed-scheduler/csrc/fsm/forward_events.h / .cpp | Templated applyFirstChunk enables transition from PrefetchDone. |
| tokenspeed-scheduler/csrc/fsm/cache_events.h / .cpp | SchedulePrefetchEvent now owns a HostNodeRef (RAII lock). |
| tokenspeed-scheduler/bindings/python_module.cpp | Binds BackUpDoneEvent. |
| python/tokenspeed/runtime/engine/scheduler_utils.py | Registers BackUpDoneEvent and round-trips completed_pages in payloads. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Partial L2+L3 hits insert prefetched suffix pages under the wrong prefix. calc_l3_query_hashes() uses apply_match=True, so the C++ scheduler skips pages already matched in host cache and only returns hashes for token_pages[host_matched:]. However, the PrefetchDone handler still builds insert_token_pages from token_pages.begin(), without carrying the host-match offset. If host has the first h pages and L3 has the next n, the pages fetched from Mooncake are suffix pages but get inserted as the first n prompt pages, so later loadback can use corrupt KV. Please carry the host matched page offset through Prefetching/PrefetchDone and insert starting at that offset. |
| @@ -40,6 +40,7 @@ | |||
| _CACHE_EVENT_TYPES = { | |||
There was a problem hiding this comment.
Backup source host pages are not pinned until backup completes. WriteBackDone queues a BackUpOp with only raw host page ids, then immediately applies WriteBackDoneEvent, which releases the WritingBack node refs. The Python backup reads those host pages asynchronously later, so host eviction can reuse them before _run_backup() calls batch_set_v1(), storing unrelated KV under the captured hashes.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f8e5fe234e
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| std::vector<std::string> hashes(storage.rolling_hashes.begin(), | ||
| storage.rolling_hashes.begin() + num_pages_to_fetch); |
There was a problem hiding this comment.
Align L3 hashes with the match used for prefetch
When a Submitted request waits while the L2 host match changes after admission (for example another request writes back or evicts part of the prefix), the stored rolling_hashes still start at the offset used by calc_l3_query_hashes(..., apply_match=True) at admission, not necessarily at the current match.host.DepthInPage() saved above. This slice always starts at begin(), and PrefetchDone inserts those pages at the current prefetch_start_page, so a changed host prefix downloads/stores the wrong page contents under the wrong token pages. Recompute the hashes against the current match or carry the admission-time start offset and adjust the slice before scheduling the prefetch.
Useful? React with 👍 / 👎.
Signed-off-by: He Zhou <zhouhe2025@gmail.com> Signed-off-by: zhouhe2025 <670085873@qq.com>
…ransition Signed-off-by: He Zhou <zhouhe2025@gmail.com> Signed-off-by: zhouhe2025 <670085873@qq.com>
Signed-off-by: He Zhou <zhouhe2025@gmail.com> Signed-off-by: zhouhe2025 <670085873@qq.com>
Signed-off-by: zhouhe2025 <670085873@qq.com>
|
This PR has been inactive for 14 days and is marked as stale. It will be closed in 3 days if there is no further activity. |
Summary
Prefetch (L3 → Host)
On request submission, query Mooncake for existing KV pages. If hits exceed prefetch_threshold, take an async prefetch path (Submitted → Prefetching → PrefetchDone → Prefilling) instead of direct prefill. Completed pages are inserted into the radix tree's host layer with proper OwnedPages ownership transfer.
Backup (Host → L3)
On WriteBackDone, emit a fire-and-forget BackUpOperation to persist host pages to Mooncake. Backup metadata is captured at WriteBackOperation creation time while the Draining state's host node-ref is still alive.
Key changes
Test Plan
batch_get_intologs (439 tokens prompt, 439/64 = 6, 6 * TP4 = 24):