[NIXL] use Host buffer to support TP_ratio > 1 for XPU #27140

xuechendi · 2025-10-18T04:37:31Z

Purpose

support XPU Decode/Prefill TP_ratio > 1 scenario with host_buffer

solution:

KV_CONFIG='{"kv_connector":"NixlConnector","kv_role":"kv_both","kv_buffer_device":"cpu","**enable_permute_local_kv**":"True"'}
If enable_permute_local_kv is enabled, initialize host_buffer with HND kv_layout and always for permute copy for device_to_host data copy and host_to_device data copy.

Since memory copy is inevitable, adding additional permute will not introduce performance overhead.

Code change to general

introduce self.host_buffer_kv_cache_layout
remove XPU tp_ratio assert

Test Plan

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Chendi Xue <[email protected]>

xuechendi · 2025-10-18T04:37:52Z

@jikunshang @zhenwei-intel , may you take a look of this PR?

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2025-10-18T04:41:19Z

vllm/platforms/xpu.py

        # check and update parallel config
        parallel_config = vllm_config.parallel_config
        parallel_config.worker_cls = "vllm.v1.worker.xpu_worker.XPUWorker"
+        vllm_config.kv_transfer_config.enable_permute_local_kv = True



Guard kv_transfer_config before forcing permute flag

The new line unconditionally sets vllm_config.kv_transfer_config.enable_permute_local_kv = True. On XPU, check_and_update_config runs for every job, including those that are not using KV transfer at all. Because VllmConfig.kv_transfer_config defaults to None, this call will raise AttributeError: 'NoneType' object has no attribute 'enable_permute_local_kv' at startup, preventing any XPU deployment without explicit KV transfer configuration from launching. A simple null check or default KVTransferConfig initialization is required before accessing this attribute.

Useful? React with 👍 / 👎.

gemini-code-assist

Code Review

This pull request introduces support for tensor parallelism ratios greater than 1 on XPU devices by leveraging a host buffer for KV cache transfers. The key changes involve adding logic to handle different KV cache layouts (NHD on device, HND on host) and performing permutations during data copies. While the overall approach is sound, there's a critical issue in vllm/platforms/xpu.py where enable_permute_local_kv is unconditionally set to True, which could lead to incorrect behavior for non-XPU platforms or scenarios where this feature is not desired. This should be made conditional to the XPU platform.

gemini-code-assist · 2025-10-18T04:45:44Z

vllm/platforms/xpu.py

        # check and update parallel config
        parallel_config = vllm_config.parallel_config
        parallel_config.worker_cls = "vllm.v1.worker.xpu_worker.XPUWorker"
+        vllm_config.kv_transfer_config.enable_permute_local_kv = True


Unconditionally setting enable_permute_local_kv = True in check_and_update_config could have unintended side effects for platforms other than XPU or when this specific permutation logic is not required. This setting should be applied only when the XPU platform is active and the conditions for this permutation are met. It's better to handle this within the XPU-specific logic rather than applying it globally to kv_transfer_config.

zhenwei-intel · 2025-10-18T10:36:41Z

vllm/distributed/kv_transfer/kv_connector/v1/nixl_connector.py

+                    )
+                    # Since NHD will not support Decode/Prefill TP_ratio > 1,
+                    # we can leverage host_buffer for permute
+                    self.host_buffer_kv_cache_layout = "HND"


Should we set it to NHD when homogeneous TP, tp ratio=1?

zhenwei-intel · 2025-10-18T10:39:53Z

Nice work.
LGTM. Just one comment:)

use Host buffer to support TP_ratio > 1 for XPU

9d521d3

Signed-off-by: Chendi Xue <[email protected]>

xuechendi requested review from ApostaC, NickLucche and jikunshang as code owners October 18, 2025 04:37

mergify bot added the kv-connector label Oct 18, 2025

chatgpt-codex-connector bot reviewed Oct 18, 2025

View reviewed changes

gemini-code-assist bot reviewed Oct 18, 2025

View reviewed changes

zhenwei-intel reviewed Oct 18, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[NIXL] use Host buffer to support TP_ratio > 1 for XPU #27140

[NIXL] use Host buffer to support TP_ratio > 1 for XPU #27140

xuechendi commented Oct 18, 2025 •

edited by github-actions bot

Loading

Uh oh!

xuechendi commented Oct 18, 2025

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Oct 18, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Oct 18, 2025

Uh oh!

zhenwei-intel Oct 18, 2025

Uh oh!

zhenwei-intel commented Oct 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

[NIXL] use Host buffer to support TP_ratio > 1 for XPU #27140

Are you sure you want to change the base?

[NIXL] use Host buffer to support TP_ratio > 1 for XPU #27140

Conversation

xuechendi commented Oct 18, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

xuechendi commented Oct 18, 2025

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Oct 18, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Oct 18, 2025

Choose a reason for hiding this comment

Uh oh!

zhenwei-intel Oct 18, 2025

Choose a reason for hiding this comment

Uh oh!

zhenwei-intel commented Oct 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

xuechendi commented Oct 18, 2025 •

edited by github-actions bot

Loading