Skip to content

Fix EP8 DP/TP RSAG init and empty LM head#416

Open
yubofredwang wants to merge 1 commit into
mainfrom
ywang/fix-ep-dp-tp
Open

Fix EP8 DP/TP RSAG init and empty LM head#416
yubofredwang wants to merge 1 commit into
mainfrom
ywang/fix-ep-dp-tp

Conversation

@yubofredwang

@yubofredwang yubofredwang commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Summary

This PR narrows the EP8/DP2/TP4 fix to two concrete runtime issues:

  • Precreate the dense TP TritonRSAG state during distributed initialization for attention-DP + dense-supergroup topologies. This avoids divergent first-use ordering of RSAG state creation across ranks before the hidden/logits boundary is reached.
  • Treat an empty local hidden batch as valid at the LM-head boundary. When DP attention leaves a rank with zero local tokens, the logits processor now returns an empty logits tensor for local-logits/single-TP paths instead of launching the Kimi fused LM-head kernel.

Follow-up work is intentionally left out of this PR: global-to-local gather_ids layout handling, sampler output merge semantics, output logprobs with local logits, and EAGLE3/NextN hidden-capture merging.

Test Plan

  • .venv/bin/python -m pytest test/runtime/distributed/test_comm_ops.py::TestAutoBackendRsagPrecreate test/runtime/test_logits_processor.py -q
  • .venv/bin/pre-commit run --all-files

Additional manual validation from the investigation:

  • Direct EP8/DP2/TP4 engine smoke passed with the RSAG precreate fix.
  • Kimi-K2.5 NVFP4 AIME benchmark passed with score above threshold.

@yubofredwang yubofredwang requested a review from a team as a code owner June 11, 2026 00:07
@yubofredwang yubofredwang marked this pull request as draft June 11, 2026 00:07

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 16db5f6e86

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread python/tokenspeed/runtime/execution/drafter/eagle.py Outdated
@yubofredwang yubofredwang changed the title [WIP] fix DP attention hidden/logits boundary layout correctness during sampling Fix EP8 DP/TP RSAG init and empty LM head Jun 22, 2026
Signed-off-by: Yubo Wang <yubowang2019@gmail.com>
@yubofredwang yubofredwang marked this pull request as ready for review June 22, 2026 04:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant