[Core] CuteDSL MoE with Nvfp4 DeepEP dispatch #27141

wenscarl · 2025-10-18T04:37:34Z

Purpose

Dispatch with nvfp4 DeepEP low latency mode.
deps on deepseek-ai/DeepEP#341 and #25990

Test Plan

VLLM_USE_FLASHINFER_MOE_FP4=1
VLLM_USE_STANDALONE_COMPILE=0
VLLM_FLASHINFER_MOE_BACKEND="cutedsl"
VLLM_WORKER_MULTIPROC_METHOD=spawn
VLLM_ALL2ALL_BACKEND="deepep_low_latency"
lm_eval --model vllm --model_args pretrained=nvidia/DeepSeek-R1-FP4,data_parallel_size=4,enable_expert_parallel=True,tensor_parallel_size=1,enforce_eager=True,max_model_len=2048 --trust_remote_code --tasks gsm8k --num_fewshot 5 --batch_size auto

Test Result

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.9462	±	0.0062
		strict-match	5	exact_match	↑	0.9462	±	0.0062

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Shu Wang <[email protected]>

wenscarl and others added 6 commits October 14, 2025 03:28

Add flashinfer_cutedsl grouped gemm

c063911

Signed-off-by: Shu Wang <[email protected]>

Make fused version work with cuda graph

8a224da

Signed-off-by: Shu Wang <[email protected]>

fix pre-commit

ec6acfd

Signed-off-by: Shu Wang <[email protected]>

wip

eb4fa88

able to run

6a7f95e

ok

2bd88bf

mergify bot added ci/build v1 labels Oct 18, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Core] CuteDSL MoE with Nvfp4 DeepEP dispatch #27141

[Core] CuteDSL MoE with Nvfp4 DeepEP dispatch #27141

wenscarl commented Oct 18, 2025 •

edited by github-actions bot

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

[Core] CuteDSL MoE with Nvfp4 DeepEP dispatch #27141

Are you sure you want to change the base?

[Core] CuteDSL MoE with Nvfp4 DeepEP dispatch #27141

Conversation

wenscarl commented Oct 18, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

wenscarl commented Oct 18, 2025 •

edited by github-actions bot

Loading