Skip to content

Conversation

wenscarl
Copy link
Contributor

@wenscarl wenscarl commented Oct 18, 2025

Purpose

Dispatch with nvfp4 DeepEP low latency mode.
deps on deepseek-ai/DeepEP#341 and #25990

Test Plan

VLLM_USE_FLASHINFER_MOE_FP4=1
VLLM_USE_STANDALONE_COMPILE=0
VLLM_FLASHINFER_MOE_BACKEND="cutedsl"
VLLM_WORKER_MULTIPROC_METHOD=spawn
VLLM_ALL2ALL_BACKEND="deepep_low_latency"
lm_eval --model vllm --model_args pretrained=nvidia/DeepSeek-R1-FP4,data_parallel_size=4,enable_expert_parallel=True,tensor_parallel_size=1,enforce_eager=True,max_model_len=2048 --trust_remote_code --tasks gsm8k --num_fewshot 5 --batch_size auto

Test Result

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match 0.9462 ± 0.0062
strict-match 5 exact_match 0.9462 ± 0.0062

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant