[AMD] MiniMax-M3: enable AITER + AMD runtime knobs in the ROCm hardware override by JohnQinAMD · Pull Request #556 · vllm-project/recipes

JohnQinAMD · 2026-06-16T16:14:47Z

MiniMax-M3 is the only MiniMax recipe that doesn't enable AITER on AMD; its siblings (M2/M2.1/M2.5/M2.7) all set VLLM_ROCM_USE_AITER=1. Enable it here so the hot decode GEMMs and fused MoE run on AITER (master toggle; the per-component flags default True behind it). Keep MHA off AITER (VLLM_ROCM_USE_AITER_MHA=0) so MSA sparse attention stays on TRITON_ATTN — the MXFP8 checkpoint lacks calibrated q/prob scales for ROCm FP8 attention.

Also add AMD-recommended, numerically-inert runtime knobs to the AMD override: TORCH_BLAS_PREFER_HIPBLASLT=1 and GPU_MAX_HW_QUEUES=2.

NCCL_MIN_NCHANNELS=112 is documented in the guide as a gfx942-only RCCL tuning (the gfx942 default is ~32-64) rather than set for all AMD: gfx950 already defaults to 112 channels for an 8-GPU node, and setting it explicitly bypasses RCCL's adaptive channel-tuning model (RCCL 2.26.6 / ROCm 7.0).

Measured +5.6..+10.8% total tok/s/gpu on 8xMI300X (MXFP8, 1k1k random, conc 4..256); GSM8K exact-match holds ~0.95.

…rride MiniMax-M3 is the only MiniMax recipe that doesn't enable AITER on AMD; its siblings (M2/M2.1/M2.5/M2.7) all set VLLM_ROCM_USE_AITER=1. Enable it here so the hot decode GEMMs and fused MoE run on AITER (master toggle; the per-component flags default True behind it). Keep MHA off AITER (VLLM_ROCM_USE_AITER_MHA=0) so MSA sparse attention stays on TRITON_ATTN — the MXFP8 checkpoint lacks calibrated q/prob scales for ROCm FP8 attention. Also add AMD-recommended, numerically-inert runtime knobs to the AMD override: TORCH_BLAS_PREFER_HIPBLASLT=1 and GPU_MAX_HW_QUEUES=2. NCCL_MIN_NCHANNELS=112 is documented in the guide as a gfx942-only RCCL tuning (the gfx942 default is ~32-64) rather than set for all AMD: gfx950 already defaults to 112 channels for an 8-GPU node, and setting it explicitly bypasses RCCL's adaptive channel-tuning model (RCCL 2.26.6 / ROCm 7.0). Measured +5.6..+10.8% total tok/s/gpu on 8xMI300X (MXFP8, 1k1k random, conc 4..256); GSM8K exact-match holds ~0.95. Co-authored-by: Gong Zheng <zgong@amd.com> Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: JohnQinAMD <yanyuan.qin@amd.com>

vercel · 2026-06-16T16:14:53Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
vllm-recipes	Ready	Preview, Comment	Jun 16, 2026 4:16pm

gemini-code-assist

Code Review

This pull request updates the hardware overrides and guide documentation for MiniMax-M3 on AMD ROCm, enabling AITER kernels and adding recommended runtime knobs such as TORCH_BLAS_PREFER_HIPBLASLT and GPU_MAX_HW_QUEUES. Feedback suggests correcting the comment for GPU_MAX_HW_QUEUES to refer to hardware queues rather than HIP streams to avoid technical inaccuracy.

gemini-code-assist · 2026-06-16T16:16:57Z

+  export VLLM_ROCM_USE_AITER=1            # AITER kernels: hot decode GEMMs + fused MoE
+  export VLLM_ROCM_USE_AITER_MHA=0        # keep MSA attention on TRITON_ATTN (MXFP8 lacks calibrated ROCm FP8 attn scales)
+  export TORCH_BLAS_PREFER_HIPBLASLT=1
+  export GPU_MAX_HW_QUEUES=2              # cap HIP streams below the default of 4


The environment variable GPU_MAX_HW_QUEUES limits the maximum number of hardware queues (HSA/AQL queues) allocated per process on the GPU, rather than capping HIP streams (which are software-level constructs multiplexed onto these hardware queues). Updating the comment to refer to hardware queues avoids technical confusion.

export GPU_MAX_HW_QUEUES=2 # cap hardware queues below the default of 4

esmeetu · 2026-06-24T03:13:25Z

@hongxiayang Can you help review this?

vercel Bot deployed to Preview June 16, 2026 16:16 View deployment

gemini-code-assist Bot reviewed Jun 16, 2026

View reviewed changes

JohnQinAMD mentioned this pull request Jun 16, 2026

[AMD] minimaxm3-fp8-mi300x-vllm: enable AITER kernels + safe ROCm knobs SemiAnalysisAI/InferenceX#1804

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AMD] MiniMax-M3: enable AITER + AMD runtime knobs in the ROCm hardware override#556

[AMD] MiniMax-M3: enable AITER + AMD runtime knobs in the ROCm hardware override#556
JohnQinAMD wants to merge 1 commit into
vllm-project:mainfrom
JohnQinAMD:minimaxm3-amd-aiter-env

JohnQinAMD commented Jun 16, 2026

Uh oh!

vercel Bot commented Jun 16, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Jun 16, 2026

Uh oh!

esmeetu commented Jun 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

JohnQinAMD commented Jun 16, 2026

Uh oh!

vercel Bot commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

esmeetu commented Jun 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

vercel Bot commented Jun 16, 2026 •

edited

Loading