Skip to content

Add MiniMax-M3 MXFP4 (AMD) variant#579

Open
andyluo7 wants to merge 3 commits into
vllm-project:mainfrom
andyluo7:minimax-m3-mxfp4-amd
Open

Add MiniMax-M3 MXFP4 (AMD) variant#579
andyluo7 wants to merge 3 commits into
vllm-project:mainfrom
andyluo7:minimax-m3-mxfp4-amd

Conversation

@andyluo7

@andyluo7 andyluo7 commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Summary

Adds an mxfp4 variant to the MiniMax-M3 recipe for amd/MiniMax-M3-MXFP4, targeting AMD CDNA4 (MI350X/MI355X, gfx950) served through the AITER MoE backend. At ~0.5 bytes/param it is roughly half the VRAM of the existing MXFP8 variant and fits a single 8×MI355X node from TP=4.

The variant reuses the recipe's existing AMD path (block-size 128, TRITON_ATTN MSA attention, minimax_m3 parsers, CUDA-graph env) and adds only --moe-backend aiter + the AITER MoE env vars. No NVIDIA changes; the variant is AMD-only (mxfp4 is ungated, so it does not force Blackwell and does not disable the AMD hardware pill).

Validation

Validated single-node on 8×MI355X (gfx950), TP=4, vLLM 0.23.1 (rocm/vllm-dev ROCm image):

  • vllm serve amd/MiniMax-M3-MXFP4 --tensor-parallel-size 4 --block-size 128 --moe-backend aiter --attention-backend TRITON_ATTN --language-model-only --no-enable-prefix-caching --tool-call-parser minimax_m3 --reasoning-parser minimax_m3 --enable-auto-tool-choice reaches Application startup complete.
  • Engine reports quantization=quark, moe_backend='aiter', kv_cache_dtype=auto.
  • Chat completions return coherent output and the minimax_m3 reasoning parser splits reasoning from content correctly.

KV cache note (corrected after testing)

amd/MiniMax-M3-MXFP4 ships no calibrated KV scales. I verified that --kv-cache-dtype fp8 does still start and serve on vLLM — it falls back to an uncalibrated KV scale of 1.0 and logs Using uncalibrated q_scale 1.0 ... This may cause accuracy issues (it does not hard-fail). The variant therefore keeps the KV cache at its default dtype, and the guide documents the fp8 behavior so users can opt in only after validating accuracy.

The MXFP4 sharding/KV constraints were cross-referenced against the ROCm/ATOM MiniMax-M3 recipe.

Test plan

  • node scripts/build-recipes-api.mjs passes (✓ JSON API: 142 models, 8 strategies)
  • Single-node TP=4 serve + chat completion on MI355X (gfx950)
  • Confirmed fp8-KV fallback behavior (uncalibrated scale, not a crash)

View with Codesmith Autofix with Codesmith
Need help on this PR? Tag /codesmith with what you need. Autofix is disabled.

Add an `mxfp4` variant for `amd/MiniMax-M3-MXFP4` targeting AMD CDNA4
(MI350X/MI355X, gfx950), served through the AITER MoE backend. At ~0.5
bytes/param it is roughly half the VRAM of MXFP8 and fits a single 8x
MI355X node from TP=4.

Validated single-node on 8x MI355X (gfx950), TP=4, vLLM 0.23.1
(rocm/vllm-dev ROCm image): the model serves and the minimax_m3
reasoning/tool parsers split reasoning from content correctly. Flags
mirror the existing AMD MXFP8 path (block-size 128, TRITON_ATTN MSA) plus
the AITER MoE backend; the ATOM MiniMax-M3 recipe was used as a cross
reference for the MXFP4 sharding/KV constraints.

The checkpoint ships no calibrated KV scales: `--kv-cache-dtype fp8`
still serves but falls back to an uncalibrated scale of 1.0 (accuracy
risk), so the variant keeps the KV cache at its default dtype. Documented
in the guide.

Signed-off-by: andyluo7 <andy.luo@amd.com>
@vercel

vercel Bot commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
vllm-recipes Ready Ready Preview, Comment Jun 25, 2026 8:43pm

Request Review

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds a new AMD-quantized MXFP4 variant (amd/MiniMax-M3-MXFP4) for the MiniMax-M3 model, including its configuration, environment variables, and a detailed usage guide. Feedback is provided regarding a version mismatch in the documentation between the minimum required vLLM version and the validated version.

Comment thread models/MiniMaxAI/MiniMax-M3.yaml Outdated
Comment on lines +478 to +479
Validated on 8x MI355X (gfx950), TP=4, with the `rocm/vllm-dev` ROCm image
(vLLM 0.23.1): the model serves and the `minimax_m3` reasoning/tool parsers

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There is a mismatch between the minimum required vLLM version (0.24.0 specified on line 20) and the validation version (0.23.1) mentioned here. To prevent user confusion, please update the validation reference to align with the minimum required version or clarify the version requirements.

  Validated on 8x MI355X (gfx950), TP=4, with the `rocm/vllm-dev` ROCm image
  (vLLM 0.24.0): the model serves and the `minimax_m3` reasoning/tool parsers

Comment thread models/MiniMaxAI/MiniMax-M3.yaml Outdated
This variant is AMD-only; it is not applicable to NVIDIA hardware (use the
**mxfp8** variant on Blackwell for native MX matrix cores).

Validated on 8x MI355X (gfx950), TP=4, with the `rocm/vllm-dev` ROCm image

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we put which image?

@hongxiayang hongxiayang left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks. some nit comments

Comment thread models/MiniMaxAI/MiniMax-M3.yaml Outdated
- "aiter"
extra_env:
VLLM_ROCM_USE_AITER: "1"
VLLM_ROCM_USE_AITER_MOE: "1"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is default to True, maybe not needed?

@functionstackx functionstackx left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this PR may need some changes, firstly the PR should be specific'ed to MI355X only but from verification it seems to be claiming MI300/MI325 [Image 1] supports MXFP4 and claims to say that NVIDIA supports amd MXFP4 checkpoint too [Image 2]

secondly since this is an upstream vllm recipe, from testing following this AMD recipe following the instructions on this recipe branch [Image 3], it does not work and results in an crash

Image 1: screenshot from this dev branch showing that it accientally claims that it works on MI300/Mi325
Image 2: Screenshot from this dev branch showing that it accientally claims that it works on H100/H200/B200 too
Image 3: screenshot from this dev branch showing the recipe & image i am following in this recipe PR that shows it crashing likely due to AITER not enabled on nightly upstream image vllm-project/vllm#46419
Image

Image Image

…, accuracy

The AITER MoE path for amd/MiniMax-M3-MXFP4 needs aiter >=0.1.16.post2
(vllm#46692) and the MoE enablement vllm#46419; until #46419 ships in a
published vllm/vllm-openai-rocm image, a plain nightly will not bring up
MXFP4 on --moe-backend aiter. Add the emulation backend command (TP=8,
runs on current images) that AMD uses for accuracy measurement, and cite
the model card's gsm8k recovery (94.19 vs 95.30 bf16, 98.84%).

Signed-off-by: andyluo7 <andy.luo@amd.com>
…dant env

Address PR vllm-project#579 review:

- functionstackx: MXFP4 is a CDNA4-only checkpoint. Add a variant-level
  `requires_arch: gfx950` gate (new `arch` field on AMD taxonomy GPUs) so the
  hardware pills no longer claim MI300X/MI325X (gfx942) or NVIDIA support. The
  gate is variant-scoped, so gpt-oss MXFP4 (which runs on NVIDIA) is unaffected.
- functionstackx: a plain upstream nightly crashes (AITER MXFP4 MoE not yet
  published, vllm#46419). Pin the working ROCm dev image on the variant and call
  out the crash in the guide.
- hongxiayang: drop the redundant VLLM_ROCM_USE_AITER_MOE=1 (defaults on under
  the VLLM_ROCM_USE_AITER umbrella) and name the validated image.
- gemini: reconcile the 0.23.1 validation image with the 0.24.0 min_vllm_version.

Co-Authored-By: Claude Opus 4 (1M context) <noreply@anthropic.com>
Signed-off-by: andyluo7 <andy.luo@amd.com>
@andyluo7

Copy link
Copy Markdown
Contributor Author

Thanks all — pushed 92fe4c6 addressing the review.

@functionstackx

  1. MXFP4 claiming MI300/MI325 + NVIDIA — fixed. amd/MiniMax-M3-MXFP4 is a CDNA4-only checkpoint (AITER MXFP4 MoE kernels are gfx950). I added a variant-level requires_arch: gfx950 gate (new arch field on the AMD taxonomy GPUs) so the hardware pills now disable MI300X/MI325X (gfx942) and all NVIDIA for this variant. The gate is variant-scoped, so the gpt-oss MXFP4 checkpoints (which do run on NVIDIA) are unaffected — verified both: MXFP4 MiniMax → only MI355X enabled, default hw = MI355X; gpt-oss MXFP4 → still H200/MI300X/MI355X.
  2. Crash on the upstream nightly — correct, that's vllm#46419 (AITER MXFP4 MoE) not being in a published nightly yet. I pinned the working ROCm dev image (rocm/vllm-dev:vllm-0.23.1-rocm723-mi35x-mori-0625) on the variant so the command builder emits it, and the guide now explicitly warns that a plain nightly crashes and points at that image.

@hongxiayang

  • Dropped the redundant VLLM_ROCM_USE_AITER_MOE=1 — it defaults on under the VLLM_ROCM_USE_AITER umbrella. Thanks.
  • Named the validated image in the guide + on the variant (the ROCm dev image above).

@gemini-code-assist

  • Reconciled the version mismatch: the validation note now explains the dev image ships vLLM 0.23.1 + the pre-release AITER MXFP4 MoE path (vllm#46419), while min_vllm_version: 0.24.0 is the first upstream release expected to carry it natively.

node scripts/build-recipes-api.mjs passes (142 models).

@gemini-code-assist

Copy link
Copy Markdown
Contributor

Thanks for the detailed update, @andyluo7. The addition of the requires_arch gate in src/lib/command-synthesis.js and the corresponding updates to taxonomy.yaml are the correct approach to ensure hardware compatibility for the amd/MiniMax-M3-MXFP4 variant. It's also great to see the ROCm dev image pinned to avoid the known upstream nightly crash. The documentation in the guide regarding the AITER backend requirements and the KV cache behavior is clear and helpful for users.

@functionstackx

functionstackx commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Crash on the upstream nightly — correct, that's vllm#46419 (AITER MXFP4 MoE) not being in a published nightly yet. I pinned the working ROCm dev image (rocm/vllm-dev:vllm-0.23.1-rocm723-mi35x-mori-0625) on the variant so the command builder emits it, and the guide now explicitly warns that a plain nightly crashes and points at that image.

considering that this is upstream recipes repo, i am not sure if vLLM upstream maintainers would accept an non-upstream rocm docker image, but of course i can't speak on behalf of them. Will let them jump in

Would the prefer path to be to wait till there is an accessible upstream https://hub.docker.com/r/vllm/ docker image such that this upstream recipes repo can accuracy track the upstream images

@functionstackx

Copy link
Copy Markdown
Contributor

MXFP4 claiming MI300/MI325 + NVIDIA — fixed. amd/MiniMax-M3-MXFP4 is a CDNA4-only checkpoint (AITER MXFP4 MoE kernels are gfx950). I added a variant-level requires_arch: gfx950 gate (new arch field on the AMD taxonomy GPUs) so the hardware pills now disable MI300X/MI325X (gfx942) and all NVIDIA for this variant. The gate is variant-scoped, so the gpt-oss MXFP4 checkpoints (which do run on NVIDIA) are unaffected — verified both: MXFP4 MiniMax → only MI355X enabled, default hw = MI355X; gpt-oss MXFP4 → still H200/MI300X/MI355X.

Thanks for fixing this!

@esmeetu

esmeetu commented Jun 27, 2026

Copy link
Copy Markdown
Member

@andyluo7 Thanks! Can you resolve the conflict? then can merge.

@hongxiayang

Copy link
Copy Markdown
Contributor

since #580 has been merged, we can wait to use upstream docker images when some critical mxfp4 PRs are merged and nightly image is available after that.

mxfp4:
model_id: "amd/MiniMax-M3-MXFP4"
precision: mxfp4
# AMD MXFP4 is a CDNA4-only checkpoint: the AITER MXFP4 MoE kernels are

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove all of the comments. They are all not needed. The yaml attributes already reflect them. And aiter version upgrade already happened. Don't need to state them. Moreover the paged are rendered into html. all these comments won't be seen by any users.

> which has **not** landed in a published `vllm/vllm-openai-rocm` nightly. On a
> plain nightly, `--moe-backend aiter` fails to bring up MXFP4 (AITER MXFP4 MoE
> kernel missing). Use the ROCm dev image that carries the path —
> `rocm/vllm-dev:vllm-0.23.1-rocm723-mi35x-mori-0625` — or build from source.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have landed all of the optimization. Please validate with the team. After validation, we can point the docker image to the vllm/vllm-openai-rocm:nightly now.

@tjtanaa

tjtanaa commented Jun 27, 2026

Copy link
Copy Markdown
Member

considering that this is upstream recipes repo, i am not sure if vLLM upstream maintainers would accept an non-upstream rocm docker image, but of course i can't speak on behalf of them. Will let them jump in

Would the prefer path to be to wait till there is an accessible upstream https://hub.docker.com/r/vllm/ docker image such that this upstream recipes repo can accuracy track the upstream images

Thanks @functionstackx , yes. The commands in the yaml file must work with upstream docker image.

@tjtanaa

tjtanaa commented Jun 27, 2026

Copy link
Copy Markdown
Member

@hongxiayang @andyluo7 please consolidate the content in the guide file between your recipe PRs. If I put them together, they are fragmented. Please make the content for unified, comprehensive and coherent. Thanks.

@andyluo7

Copy link
Copy Markdown
Contributor Author

@tjtanaa @hongxiayang , will waiting for the upstream docker images to update and ensure the content consistent with #580

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants