Add MiniMax-M3 MXFP4 (AMD) variant by andyluo7 · Pull Request #579 · vllm-project/recipes

andyluo7 · 2026-06-25T19:01:27Z

Summary

Adds an mxfp4 variant to the MiniMax-M3 recipe for amd/MiniMax-M3-MXFP4, targeting AMD CDNA4 (MI350X/MI355X, gfx950) served through the AITER MoE backend. At ~0.5 bytes/param it is roughly half the VRAM of the existing MXFP8 variant and fits a single 8×MI355X node from TP=4.

The variant reuses the recipe's existing AMD path (block-size 128, TRITON_ATTN MSA attention, minimax_m3 parsers, CUDA-graph env) and adds only --moe-backend aiter + the AITER MoE env vars. No NVIDIA changes; the variant is AMD-only (mxfp4 is ungated, so it does not force Blackwell and does not disable the AMD hardware pill).

Validation

Validated single-node on 8×MI355X (gfx950), TP=4, vLLM 0.23.1 (rocm/vllm-dev ROCm image):

vllm serve amd/MiniMax-M3-MXFP4 --tensor-parallel-size 4 --block-size 128 --moe-backend aiter --attention-backend TRITON_ATTN --language-model-only --no-enable-prefix-caching --tool-call-parser minimax_m3 --reasoning-parser minimax_m3 --enable-auto-tool-choice reaches Application startup complete.
Engine reports quantization=quark, moe_backend='aiter', kv_cache_dtype=auto.
Chat completions return coherent output and the minimax_m3 reasoning parser splits reasoning from content correctly.

KV cache note (corrected after testing)

amd/MiniMax-M3-MXFP4 ships no calibrated KV scales. I verified that --kv-cache-dtype fp8 does still start and serve on vLLM — it falls back to an uncalibrated KV scale of 1.0 and logs Using uncalibrated q_scale 1.0 ... This may cause accuracy issues (it does not hard-fail). The variant therefore keeps the KV cache at its default dtype, and the guide documents the fp8 behavior so users can opt in only after validating accuracy.

The MXFP4 sharding/KV constraints were cross-referenced against the ROCm/ATOM MiniMax-M3 recipe.

Test plan

node scripts/build-recipes-api.mjs passes (✓ JSON API: 142 models, 8 strategies)
Single-node TP=4 serve + chat completion on MI355X (gfx950)
Confirmed fp8-KV fallback behavior (uncalibrated scale, not a crash)

^{Need help on this PR? Tag /codesmith with what you need. Autofix is disabled.}

Add an `mxfp4` variant for `amd/MiniMax-M3-MXFP4` targeting AMD CDNA4 (MI350X/MI355X, gfx950), served through the AITER MoE backend. At ~0.5 bytes/param it is roughly half the VRAM of MXFP8 and fits a single 8x MI355X node from TP=4. Validated single-node on 8x MI355X (gfx950), TP=4, vLLM 0.23.1 (rocm/vllm-dev ROCm image): the model serves and the minimax_m3 reasoning/tool parsers split reasoning from content correctly. Flags mirror the existing AMD MXFP8 path (block-size 128, TRITON_ATTN MSA) plus the AITER MoE backend; the ATOM MiniMax-M3 recipe was used as a cross reference for the MXFP4 sharding/KV constraints. The checkpoint ships no calibrated KV scales: `--kv-cache-dtype fp8` still serves but falls back to an uncalibrated scale of 1.0 (accuracy risk), so the variant keeps the KV cache at its default dtype. Documented in the guide. Signed-off-by: andyluo7 <andy.luo@amd.com>

vercel · 2026-06-25T19:01:34Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
vllm-recipes	Ready	Preview, Comment	Jun 25, 2026 8:43pm

gemini-code-assist

Code Review

This pull request adds a new AMD-quantized MXFP4 variant (amd/MiniMax-M3-MXFP4) for the MiniMax-M3 model, including its configuration, environment variables, and a detailed usage guide. Feedback is provided regarding a version mismatch in the documentation between the minimum required vLLM version and the validated version.

gemini-code-assist · 2026-06-25T19:02:37Z

+  Validated on 8x MI355X (gfx950), TP=4, with the `rocm/vllm-dev` ROCm image
+  (vLLM 0.23.1): the model serves and the `minimax_m3` reasoning/tool parsers


There is a mismatch between the minimum required vLLM version (0.24.0 specified on line 20) and the validation version (0.23.1) mentioned here. To prevent user confusion, please update the validation reference to align with the minimum required version or clarify the version requirements.

Validated on 8x MI355X (gfx950), TP=4, with the `rocm/vllm-dev` ROCm image (vLLM 0.24.0): the model serves and the `minimax_m3` reasoning/tool parsers

hongxiayang · 2026-06-25T20:20:49Z

+  This variant is AMD-only; it is not applicable to NVIDIA hardware (use the
+  **mxfp8** variant on Blackwell for native MX matrix cores).
+
+  Validated on 8x MI355X (gfx950), TP=4, with the `rocm/vllm-dev` ROCm image


should we put which image?

hongxiayang

thanks. some nit comments

hongxiayang · 2026-06-25T20:21:37Z

+      - "aiter"
+    extra_env:
+      VLLM_ROCM_USE_AITER: "1"
+      VLLM_ROCM_USE_AITER_MOE: "1"


this is default to True, maybe not needed?

functionstackx

this PR may need some changes, firstly the PR should be specific'ed to MI355X only but from verification it seems to be claiming MI300/MI325 [Image 1] supports MXFP4 and claims to say that NVIDIA supports amd MXFP4 checkpoint too [Image 2]

secondly since this is an upstream vllm recipe, from testing following this AMD recipe following the instructions on this recipe branch [Image 3], it does not work and results in an crash

Image 1: screenshot from this dev branch showing that it accientally claims that it works on MI300/Mi325
Image 2: Screenshot from this dev branch showing that it accientally claims that it works on H100/H200/B200 too
Image 3: screenshot from this dev branch showing the recipe & image i am following in this recipe PR that shows it crashing likely due to AITER not enabled on nightly upstream image vllm-project/vllm#46419

…, accuracy The AITER MoE path for amd/MiniMax-M3-MXFP4 needs aiter >=0.1.16.post2 (vllm#46692) and the MoE enablement vllm#46419; until #46419 ships in a published vllm/vllm-openai-rocm image, a plain nightly will not bring up MXFP4 on --moe-backend aiter. Add the emulation backend command (TP=8, runs on current images) that AMD uses for accuracy measurement, and cite the model card's gsm8k recovery (94.19 vs 95.30 bf16, 98.84%). Signed-off-by: andyluo7 <andy.luo@amd.com>

…dant env Address PR vllm-project#579 review: - functionstackx: MXFP4 is a CDNA4-only checkpoint. Add a variant-level `requires_arch: gfx950` gate (new `arch` field on AMD taxonomy GPUs) so the hardware pills no longer claim MI300X/MI325X (gfx942) or NVIDIA support. The gate is variant-scoped, so gpt-oss MXFP4 (which runs on NVIDIA) is unaffected. - functionstackx: a plain upstream nightly crashes (AITER MXFP4 MoE not yet published, vllm#46419). Pin the working ROCm dev image on the variant and call out the crash in the guide. - hongxiayang: drop the redundant VLLM_ROCM_USE_AITER_MOE=1 (defaults on under the VLLM_ROCM_USE_AITER umbrella) and name the validated image. - gemini: reconcile the 0.23.1 validation image with the 0.24.0 min_vllm_version. Co-Authored-By: Claude Opus 4 (1M context) <noreply@anthropic.com> Signed-off-by: andyluo7 <andy.luo@amd.com>

andyluo7 · 2026-06-25T20:41:38Z

Thanks all — pushed 92fe4c6 addressing the review.

@functionstackx

MXFP4 claiming MI300/MI325 + NVIDIA — fixed. amd/MiniMax-M3-MXFP4 is a CDNA4-only checkpoint (AITER MXFP4 MoE kernels are gfx950). I added a variant-level requires_arch: gfx950 gate (new arch field on the AMD taxonomy GPUs) so the hardware pills now disable MI300X/MI325X (gfx942) and all NVIDIA for this variant. The gate is variant-scoped, so the gpt-oss MXFP4 checkpoints (which do run on NVIDIA) are unaffected — verified both: MXFP4 MiniMax → only MI355X enabled, default hw = MI355X; gpt-oss MXFP4 → still H200/MI300X/MI355X.
Crash on the upstream nightly — correct, that's vllm#46419 (AITER MXFP4 MoE) not being in a published nightly yet. I pinned the working ROCm dev image (rocm/vllm-dev:vllm-0.23.1-rocm723-mi35x-mori-0625) on the variant so the command builder emits it, and the guide now explicitly warns that a plain nightly crashes and points at that image.

@hongxiayang

Dropped the redundant VLLM_ROCM_USE_AITER_MOE=1 — it defaults on under the VLLM_ROCM_USE_AITER umbrella. Thanks.
Named the validated image in the guide + on the variant (the ROCm dev image above).

@gemini-code-assist

Reconciled the version mismatch: the validation note now explains the dev image ships vLLM 0.23.1 + the pre-release AITER MXFP4 MoE path (vllm#46419), while min_vllm_version: 0.24.0 is the first upstream release expected to carry it natively.

node scripts/build-recipes-api.mjs passes (142 models).

gemini-code-assist · 2026-06-25T20:41:53Z

Thanks for the detailed update, @andyluo7. The addition of the requires_arch gate in src/lib/command-synthesis.js and the corresponding updates to taxonomy.yaml are the correct approach to ensure hardware compatibility for the amd/MiniMax-M3-MXFP4 variant. It's also great to see the ROCm dev image pinned to avoid the known upstream nightly crash. The documentation in the guide regarding the AITER backend requirements and the KV cache behavior is clear and helpful for users.

functionstackx · 2026-06-25T20:44:24Z

Crash on the upstream nightly — correct, that's vllm#46419 (AITER MXFP4 MoE) not being in a published nightly yet. I pinned the working ROCm dev image (rocm/vllm-dev:vllm-0.23.1-rocm723-mi35x-mori-0625) on the variant so the command builder emits it, and the guide now explicitly warns that a plain nightly crashes and points at that image.

considering that this is upstream recipes repo, i am not sure if vLLM upstream maintainers would accept an non-upstream rocm docker image, but of course i can't speak on behalf of them. Will let them jump in

Would the prefer path to be to wait till there is an accessible upstream https://hub.docker.com/r/vllm/ docker image such that this upstream recipes repo can accuracy track the upstream images

functionstackx · 2026-06-25T20:48:09Z

MXFP4 claiming MI300/MI325 + NVIDIA — fixed. amd/MiniMax-M3-MXFP4 is a CDNA4-only checkpoint (AITER MXFP4 MoE kernels are gfx950). I added a variant-level requires_arch: gfx950 gate (new arch field on the AMD taxonomy GPUs) so the hardware pills now disable MI300X/MI325X (gfx942) and all NVIDIA for this variant. The gate is variant-scoped, so the gpt-oss MXFP4 checkpoints (which do run on NVIDIA) are unaffected — verified both: MXFP4 MiniMax → only MI355X enabled, default hw = MI355X; gpt-oss MXFP4 → still H200/MI300X/MI355X.

Thanks for fixing this!

esmeetu · 2026-06-27T01:05:09Z

@andyluo7 Thanks! Can you resolve the conflict? then can merge.

hongxiayang · 2026-06-27T01:25:17Z

since #580 has been merged, we can wait to use upstream docker images when some critical mxfp4 PRs are merged and nightly image is available after that.

tjtanaa · 2026-06-27T15:21:52Z

+  mxfp4:
+    model_id: "amd/MiniMax-M3-MXFP4"
+    precision: mxfp4
+    # AMD MXFP4 is a CDNA4-only checkpoint: the AITER MXFP4 MoE kernels are


remove all of the comments. They are all not needed. The yaml attributes already reflect them. And aiter version upgrade already happened. Don't need to state them. Moreover the paged are rendered into html. all these comments won't be seen by any users.

tjtanaa · 2026-06-27T15:23:48Z

+  > which has **not** landed in a published `vllm/vllm-openai-rocm` nightly. On a
+  > plain nightly, `--moe-backend aiter` fails to bring up MXFP4 (AITER MXFP4 MoE
+  > kernel missing). Use the ROCm dev image that carries the path —
+  > `rocm/vllm-dev:vllm-0.23.1-rocm723-mi35x-mori-0625` — or build from source.


I think we have landed all of the optimization. Please validate with the team. After validation, we can point the docker image to the vllm/vllm-openai-rocm:nightly now.

tjtanaa · 2026-06-27T15:26:12Z

considering that this is upstream recipes repo, i am not sure if vLLM upstream maintainers would accept an non-upstream rocm docker image, but of course i can't speak on behalf of them. Will let them jump in

Would the prefer path to be to wait till there is an accessible upstream https://hub.docker.com/r/vllm/ docker image such that this upstream recipes repo can accuracy track the upstream images

Thanks @functionstackx , yes. The commands in the yaml file must work with upstream docker image.

tjtanaa · 2026-06-27T15:28:33Z

@hongxiayang @andyluo7 please consolidate the content in the guide file between your recipe PRs. If I put them together, they are fragmented. Please make the content for unified, comprehensive and coherent. Thanks.

andyluo7 · 2026-06-27T16:22:59Z

@tjtanaa @hongxiayang , will waiting for the upstream docker images to update and ensure the content consistent with #580

gemini-code-assist Bot reviewed Jun 25, 2026

View reviewed changes

vercel Bot deployed to Preview June 25, 2026 19:02 View deployment

This was referenced Jun 25, 2026

[AMD] Add MiniMax-M3-MXFP4 MI355X single-node vLLM recipe SemiAnalysisAI/InferenceX#1936

Closed

[codex] add MiniMax M3 FP4 MI355X vLLM benchmark SemiAnalysisAI/InferenceX#1935

Merged

hongxiayang reviewed Jun 25, 2026

View reviewed changes

hongxiayang approved these changes Jun 25, 2026

View reviewed changes

functionstackx suggested changes Jun 25, 2026

View reviewed changes

vercel Bot deployed to Preview June 25, 2026 20:30 View deployment

vercel Bot deployed to Preview June 25, 2026 20:43 View deployment

tjtanaa reviewed Jun 27, 2026

View reviewed changes

		Validated on 8x MI355X (gfx950), TP=4, with the `rocm/vllm-dev` ROCm image
		(vLLM 0.23.1): the model serves and the `minimax_m3` reasoning/tool parsers

Uh oh!

Conversation

andyluo7 commented Jun 25, 2026 • edited by blacksmith-sh Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Validation

KV cache note (corrected after testing)

Test plan

Uh oh!

vercel Bot commented Jun 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

hongxiayang Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

hongxiayang left a comment

Choose a reason for hiding this comment

Uh oh!

hongxiayang Jun 25, 2026

Choose a reason for hiding this comment

Uh oh!

functionstackx left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

andyluo7 commented Jun 25, 2026

Uh oh!

gemini-code-assist Bot commented Jun 25, 2026

Uh oh!

functionstackx commented Jun 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

functionstackx commented Jun 25, 2026

Uh oh!

esmeetu commented Jun 27, 2026

Uh oh!

hongxiayang commented Jun 27, 2026

Uh oh!

tjtanaa Jun 27, 2026

Choose a reason for hiding this comment

Uh oh!

tjtanaa Jun 27, 2026

Choose a reason for hiding this comment

Uh oh!

tjtanaa commented Jun 27, 2026

Uh oh!

tjtanaa commented Jun 27, 2026

Uh oh!

andyluo7 commented Jun 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

andyluo7 commented Jun 25, 2026 •

edited by blacksmith-sh Bot

Loading

vercel Bot commented Jun 25, 2026 •

edited

Loading

functionstackx left a comment •

edited

Loading

functionstackx commented Jun 25, 2026 •

edited

Loading