[NV] perf: update MiniMax-M3 FP4 B300 vLLM#1990
Conversation
|
Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers. If additional help is needed, PR authors can reach out to core maintainers over Slack. 感谢你的贡献!对于 vLLM 与 SGLang,请确保你的 recipe 与官方 vLLM recipes 和/或 SGLang cookbook 保持一致 如果不一致,请先创建一个 PR,之后我们才能将你的单节点 PR 合并到 master 分支。让我们确保文档保持一流水准,使整个 ML 社区都能从你的辛勤工作中受益!谢谢 PR 作者有责任确保合并后所有 GitHub Action 任务完全通过。 很多时候失败只是偶发抖动(flake),重新运行失败的任务即可解决。如果选择重新运行失败的任务,PR 作者有责任确保其最终通过。参见 GitHub 关于重新运行失败任务的文档:https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow 一般而言,PR 作者应先向相应公司的 CODEOWNERS 请求审阅并获得 PR 批准,然后再请求核心维护者审阅。 如需更多帮助,PR 作者可通过 Slack 联系核心维护者。 |
| - config-keys: | ||
| - minimaxm3-fp4-b300-vllm | ||
| description: | ||
| - "Update Minimax M3 b300 vllm image tag" | ||
| - "Update search space to cover more configs" | ||
| pr-link: https://github.com/SemiAnalysisAI/InferenceX/tree/codex/minimax-m3-b300-fp4-vllm-update |
There was a problem hiding this comment.
🟡 The new perf-changelog.yaml entry has two issues: (1) pr-link uses a branch URL (tree/codex/minimax-m3-b300-fp4-vllm-update) instead of the canonical pull/1990 form used by every other entry in the file — this will 404 once the branch is deleted after merge; (2) the description says "Update search space to cover more configs", but the diff actually narrows the sweep (isl 1024: 7→4 entries, isl 8192: 6→4 entries, all TP8/EP8 and TP4/EP4 lanes dropped). Consider phrasing similar to the neighboring dsr1-fp4-b200-sglang entry (e.g. "Refocus/narrow the search space and drop TP8/EP8 and TP4/EP4 sweeps").
Extended reasoning...
Issue 1 — branch URL in pr-link:\n\nThe new entry at perf-changelog.yaml line 4435 sets:\n\nyaml\npr-link: https://github.com/SemiAnalysisAI/InferenceX/tree/codex/minimax-m3-b300-fp4-vllm-update\n\n\nOf the ~554 pr-link values in this file, this is the only one that points to a branch tree/... URL — all others use the canonical https://github.com/SemiAnalysisAI/InferenceX/pull/<num> form (see the four most recent entries at lines 4402–4428 for examples). Since branches are typically deleted after merge, this link will 404 shortly after this PR lands. The correct value is https://github.com/SemiAnalysisAI/InferenceX/pull/1990.\n\nIssue 2 — description contradicts the diff:\n\nThe entry claims:\n\nyaml\n- "Update search space to cover more configs"\n\n\nBut the diff clearly narrows the sweep. Step-by-step count from the diff on .github/configs/nvidia-master.yaml:\n\nisl 1024, osl 1024:\n- Before (7 entries): {tp:8, c1-64}, {tp:8/ep:8, c1-512}, {tp:4, c1-64}, {tp:4/ep:4, c64-512}, {tp:4/ep:4/dp-attn, c128-512}, {tp:2/ep:2, c16-128}, {tp:8/ep:8/dp-attn, c256-1024}\n- After (4 entries): {tp:8, c1-2}, {tp:4, c2-2}, {tp:2, c4-256}, {tp:2/ep:2/dp-attn, c512-1024}\n\nisl 8192, osl 1024:\n- Before (6 entries): {tp:8, c1-64}, {tp:8/ep:8, c1-512}, {tp:4, c1-128}, {tp:4/ep:4, c64-256}, {tp:4/ep:4/dp-attn, c64-128}, {tp:8/ep:8/dp-attn, c128-256}\n- After (4 entries): {tp:8, c1-2}, {tp:4, c2-2}, {tp:2, c4-256}, {tp:2/ep:2/dp-attn, c512-1024}\n\nEvery TP8/EP8 and TP4/EP4 lane is dropped, and the TP8/TP4 lanes are drastically clipped (tp:8 from c1-64 down to c1-2; tp:4 from c1-64 down to a single c2 point). While tp:2 and tp:2/ep:2/dp-attn are genuinely new, the aggregate coverage is significantly smaller (roughly 38 concurrency points → ~12 at isl 1024). Calling this "cover more configs" is the opposite of what the change actually does.\n\nImpact:\n\nBoth issues are purely documentation quality — no runtime effect, and no CI schema check enforces the URL form. But the perf-changelog is a user-facing history document that people read to understand why a config changed; a stale link plus a misleading description defeats that purpose.\n\nFix:\n\nyaml\n- config-keys:\n - minimaxm3-fp4-b300-vllm\n description:\n - "Update Minimax M3 b300 vllm image tag"\n - "Refocus/narrow the search space and drop TP8/EP8 and TP4/EP4 sweeps."\n pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1990\n\n\nThe neighboring dsr1-fp4-b200-sglang entry at lines 4422–4428 uses similar honest phrasing when reducing coverage ("Drop the TP8/EP8 single-concurrency points.").
| @@ -12809,7 +12809,7 @@ minimaxm3-fp8-b300-vllm: | |||
| # weights are pre-staged read-only at /scratch/models/MiniMax-M3-NVFP4 (added to | |||
| # the STAGED_MODELS allow-list in launch_b300-nv.sh). | |||
| minimaxm3-fp4-b300-vllm: | |||
| image: vllm/vllm-openai:vllm-minimax-m3-perf-x86_64-13.0.1-8b00f41 | |||
| image: vllm/vllm-openai:nightly-93d8f834dd8acf33eb0e2a75b2711b628cb6e226 | |||
There was a problem hiding this comment.
🟡 The block comment just above this line still says the NVFP4 support 'is baked into the perf container image', but the PR swaps the image from the bespoke vllm-minimax-m3-perf-* tag to a mainline nightly-* tag. Update the comment to say the support has landed in vLLM main and is picked up from nightly so future readers don't chase a nonexistent perf image. Note: the same wording appears at lines 12839-12840 for the EAGLE3 variant (whose image is NOT changed in this PR), so that comment remains accurate and should be left as-is.
Extended reasoning...
What the bug is
At .github/configs/nvidia-master.yaml lines 12805-12810, immediately above the minimaxm3-fp4-b300-vllm entry, there is a block comment that states:
MiniMax-M3 modelopt NVFP4 support (vllm-project/vllm PR #46380) is baked into the perf container image, so no runtime patch is needed.
That wording was true when the config used the bespoke perf-tagged image vllm/vllm-openai:vllm-minimax-m3-perf-x86_64-13.0.1-8b00f41. This PR replaces that tag with vllm/vllm-openai:nightly-93d8f834dd8acf33eb0e2a75b2711b628cb6e226 — a mainline nightly image, not a perf-variant image. After the swap, the phrase "perf container image" no longer describes what is actually being pulled.
Why the invariant still holds, but the comment is misleading
The underlying claim — "no runtime patch is needed" — is still true, because vllm-project/vllm PR #46380 has presumably landed in vLLM main and the nightly image picks it up. So there is no runtime effect and nothing breaks at benchmark time. The issue is purely that the comment cites the wrong reason: a future reader inspecting this entry will look for a "perf container image" that no longer exists in the config and be confused.
Step-by-step proof
- Before this PR, line 12812 reads
image: vllm/vllm-openai:vllm-minimax-m3-perf-x86_64-13.0.1-8b00f41. The tag literally containsminimax-m3-perf, matching "perf container image". - This PR changes line 12812 to
image: vllm/vllm-openai:nightly-93d8f834dd8acf33eb0e2a75b2711b628cb6e226. The tag isnightly-<sha>; there is no "perf" in the tag name. - Lines 12805-12810 (comment above the entry) still literally say
baked into the perf container image, unchanged by this PR. - Therefore the comment and the code disagree about which image class is in use.
Fix
Update the comment to reflect that MiniMax-M3 NVFP4 support has landed upstream and is now picked up from the mainline nightly image, e.g.:
MiniMax-M3 modelopt NVFP4 support (vllm-project/vllm PR #46380) has landed in vLLM main and is picked up from the nightly image below, so no runtime patch is needed.
Scope note
The same wording appears at lines 12839-12840 above the EAGLE3 variant (minimaxm3-fp4-eagle3-b300-vllm), which does not change its image in this PR. That comment remains accurate and should be left as-is; only the copy above minimaxm3-fp4-b300-vllm (lines 12805-12810) needs updating.
Severity
This is a documentation-only inconsistency with no runtime effect on the benchmark. Marking as nit.
|
see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=28612763227 |
|
/reuse-sweep-run |
|
As a PR reviewer and CODEOWNER, I have reviewed this and have:
Additional detail section:
Signed: |
|
Verdict: PASS — all sign-off checks independently verified at head
|
No description provided.