[AMD] Update MiniMax-M3 FP8 MI355X vLLM image and enable INT6 quick-reduce#1946

Merged

cquil11 merged 7 commits into

mainfrom

chun_hongxia/minimaxm3_fp8

Jun 30, 2026

chunfangamd commented Jun 26, 2026

Collaborator

Update MiniMax-M3 FP8 MI355X vLLM image and enable INT6 quick-reduce

Pin the minimaxm3-fp8-mi355x-vllm config to nightly
3f5a1e1733200760169ff31ebe60a271072b199e, which includes the gfx950
mxfp8 moe/linear tuning for MiniMax-M3 (vllm-project/vllm#45725).

Export VLLM_ROCM_QUICK_REDUCE_QUANTIZATION=INT6 in the standard and MTP
bench scripts to use INT6 quick all-reduce on CDNA4/gfx950, improving
TP communication throughput for the mxfp8 workload.

Co-authored with @hongxiayang

chunfangamd requested a review from a team

June 26, 2026 20:15

github-project-automation Bot added this to InferenceMAX Board

chunfangamd requested review from 1am9trash, billishyahao, seungrokj and yctseng0211 as code owners

June 26, 2026 20:15

github-actions Bot commented Jun 26, 2026

Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

感谢你的贡献！对于 vLLM 与 SGLang，请确保你的 recipe 与官方 vLLM recipes 和/或 SGLang cookbook 保持一致

如果不一致，请先创建一个 PR，之后我们才能将你的单节点 PR 合并到 master 分支。让我们确保文档保持一流水准，使整个 ML 社区都能从你的辛勤工作中受益！谢谢

PR 作者有责任确保合并后所有 GitHub Action 任务完全通过。 很多时候失败只是偶发抖动（flake），重新运行失败的任务即可解决。如果选择重新运行失败的任务，PR 作者有责任确保其最终通过。参见 GitHub 关于重新运行失败任务的文档：https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

一般而言，PR 作者应先向相应公司的 CODEOWNERS 请求审阅并获得 PR 批准，然后再请求核心维护者审阅。

如需更多帮助，PR 作者可通过 Slack 联系核心维护者。

2 similar comments

github-actions Bot commented Jun 26, 2026

Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

感谢你的贡献！对于 vLLM 与 SGLang，请确保你的 recipe 与官方 vLLM recipes 和/或 SGLang cookbook 保持一致

如果不一致，请先创建一个 PR，之后我们才能将你的单节点 PR 合并到 master 分支。让我们确保文档保持一流水准，使整个 ML 社区都能从你的辛勤工作中受益！谢谢

PR 作者有责任确保合并后所有 GitHub Action 任务完全通过。 很多时候失败只是偶发抖动（flake），重新运行失败的任务即可解决。如果选择重新运行失败的任务，PR 作者有责任确保其最终通过。参见 GitHub 关于重新运行失败任务的文档：https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

一般而言，PR 作者应先向相应公司的 CODEOWNERS 请求审阅并获得 PR 批准，然后再请求核心维护者审阅。

如需更多帮助，PR 作者可通过 Slack 联系核心维护者。

github-actions Bot commented Jun 26, 2026

Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.

感谢你的贡献！对于 vLLM 与 SGLang，请确保你的 recipe 与官方 vLLM recipes 和/或 SGLang cookbook 保持一致

如果不一致，请先创建一个 PR，之后我们才能将你的单节点 PR 合并到 master 分支。让我们确保文档保持一流水准，使整个 ML 社区都能从你的辛勤工作中受益！谢谢

PR 作者有责任确保合并后所有 GitHub Action 任务完全通过。 很多时候失败只是偶发抖动（flake），重新运行失败的任务即可解决。如果选择重新运行失败的任务，PR 作者有责任确保其最终通过。参见 GitHub 关于重新运行失败任务的文档：https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

一般而言，PR 作者应先向相应公司的 CODEOWNERS 请求审阅并获得 PR 批准，然后再请求核心维护者审阅。

如需更多帮助，PR 作者可通过 Slack 联系核心维护者。

chunfangamd added the full-sweep-enabled label

claude Bot reviewed

View reviewed changes

.github/configs/amd-master.yaml Outdated

    
              # MXFP8 runs from TP=4 on gfx950; block size 128 is mandatory for MSA.

              minimaxm3-fp8-mi355x-vllm:

                image: vllm/vllm-openai-rocm:minimax-m3

                image: vllm/vllm-openai-rocm:nightly-3f5a1e1733200760169ff31ebe60a271072b199e

claude Bot Jun 26, 2026

Contributor

🔴 This PR bumps the minimaxm3-fp8-mi355x-vllm image and adds VLLM_ROCM_QUICK_REDUCE_QUANTIZATION=INT6 to both the non-MTP and MTP bench scripts, but does not append a perf-changelog.yaml entry — AGENTS.md (§Updating Docker images, lines 124-135) requires one for both kinds of change, and changelog entries are what trigger the benchmark sweep. Without an entry the new image+INT6 combination will land unbenchmarked, so the PR description's throughput claim cannot be validated. Append an entry under config-keys minimaxm3-fp8-mi355x-vllm (image pin + INT6) and minimaxm3-fp8-mi355x-vllm-mtp (the MTP script also gets the INT6 env var) — see #1941 (the directly analogous MTP image bump to the same nightly) for the precedent.

Extended reasoning...

What the bug is

AGENTS.md lines 124-135 (§Updating Docker images) state explicitly: "Update the image tag in the relevant .github/configs/*-master.yaml and/or benchmarks/*.sh, update any related env vars / config params, and append a perf-changelog.yaml entry (required - triggers benchmarks)". Line 58 of the same doc reiterates: "Changes to perf-changelog.yaml trigger benchmark runs".

This PR does both of the change classes the policy enumerates:

Image bump in .github/configs/amd-master.yaml line 2528: vllm/vllm-openai-rocm:minimax-m3 → vllm/vllm-openai-rocm:nightly-3f5a1e1733200760169ff31ebe60a271072b199e.
New env var VLLM_ROCM_QUICK_REDUCE_QUANTIZATION=INT6 exported in both benchmarks/single_node/fixed_seq_len/minimaxm3_fp8_mi355x.sh (line 34) and benchmarks/single_node/fixed_seq_len/minimaxm3_fp8_mi355x_mtp.sh (line 64).

The PR diff modifies exactly three files (amd-master.yaml + the two .sh scripts); no perf-changelog.yaml entry is added.

Why this matters / impact

perf-changelog.yaml is the trigger for the sweep generator. Without an entry, this PR will not produce a benchmark run for the new image+INT6 combination, so the PR description's claim — "improving TP communication throughput for the mxfp8 workload" — lands unvalidated. That is precisely the failure mode the policy is designed to prevent.

Sibling-PR precedent

The tail of perf-changelog.yaml shows every recent sibling MiniMax-M3 PR followed this convention:

[codex] update MiniMax M3 FP8 MI355X vLLM MTP image #1941 (the directly-analogous MTP image bump to the same nightly 3f5a1e1733...) — appended an entry under minimaxm3-fp8-mi355x-vllm-mtp.
[codex] add MiniMax M3 FP4 MI355X vLLM benchmark #1935 (FP4 MI355X vLLM add) — appended an entry.
[codex] add MiniMax M3 FP4 MI355X vLLM MTP benchmark #1939 (FP4 MI355X vLLM MTP add) — appended an entry.

This PR is the missing twin to #1941 (it pins -vllm to the same nightly that #1941 pinned -vllm-mtp to), and additionally exports INT6 quick-reduce in both scripts — yet no changelog entry exists.

Step-by-step proof

git diff for this PR returns three files: amd-master.yaml, minimaxm3_fp8_mi355x.sh, minimaxm3_fp8_mi355x_mtp.sh — no perf-changelog.yaml.
Inspecting amd-master.yaml line 2528 confirms the image string change for the minimaxm3-fp8-mi355x-vllm config-key.
grep -n VLLM_ROCM_QUICK_REDUCE_QUANTIZATION benchmarks/single_node/fixed_seq_len/minimaxm3_fp8_mi355x*.sh shows the env var exported at line 34 of the non-MTP script and line 64 of the MTP script.
AGENTS.md lines 124-126 say a perf-changelog.yaml entry is required and triggers benchmarks; line 58 confirms the trigger mechanism.
The last entry in perf-changelog.yaml is PR [codex] update MiniMax M3 FP8 MI355X vLLM MTP image #1941 — the analogous image bump to the same nightly hash on the sibling MTP config. It is on the list of sibling MiniMax-M3 PRs that all appended entries.
Therefore the new image+INT6 combination will not be swept on merge, and the PR-description throughput claim cannot be validated before landing.

Fix

Append an entry like the following (note the MTP script also picks up INT6, so the entry should cover both config-keys, or use a minimaxm3-fp8-mi355x-vllm* wildcard):

- config-keys:
    - minimaxm3-fp8-mi355x-vllm
    - minimaxm3-fp8-mi355x-vllm-mtp
  description:
    - "Pin minimaxm3-fp8-mi355x-vllm image to nightly-3f5a1e1733200760169ff31ebe60a271072b199e (includes gfx950 mxfp8 moe/linear tuning from vllm-project/vllm#45725)."
    - "Export VLLM_ROCM_QUICK_REDUCE_QUANTIZATION=INT6 in the standard and MTP bench scripts to use INT6 quick all-reduce on CDNA4/gfx950."
  pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1946

github-actions Bot commented Jun 26, 2026

Contributor

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=28263143889
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=28263143889

3 similar comments

github-actions Bot commented Jun 26, 2026

Contributor

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=28263143889
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=28263143889

github-actions Bot commented Jun 28, 2026

Contributor

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=28263143889
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=28263143889

github-actions Bot commented Jun 28, 2026

Contributor

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=28263143889
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=28263143889

github-actions Bot commented Jun 29, 2026

Contributor

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=28359274917
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=28359274917

2 similar comments

github-actions Bot commented Jun 29, 2026

Contributor

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=28359274917
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=28359274917

github-actions Bot commented Jun 29, 2026

Contributor

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=28359274917
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=28359274917

github-actions Bot commented Jun 29, 2026

Contributor

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=28372433980
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=28372433980

github-actions Bot commented Jun 29, 2026

Contributor

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=28372574236
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=28372574236

3 similar comments

github-actions Bot commented Jun 29, 2026

Contributor

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=28372574236
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=28372574236

github-actions Bot commented Jun 29, 2026

Contributor

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=28372574236
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=28372574236

github-actions Bot commented Jun 29, 2026

Contributor

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=28372574236
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=28372574236

github-actions Bot commented Jun 30, 2026

Contributor

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=28411738716
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=28411738716

1 similar comment

github-actions Bot commented Jun 30, 2026

Contributor

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=28411738716
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=28411738716

seungrokj added the AMD label

functionstackx added full-sweep-enabled and removed full-sweep-enabled labels

github-actions Bot commented Jun 30, 2026

Contributor

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=28411738716
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=28411738716

github-actions Bot commented Jun 30, 2026

Contributor

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=28417724900
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=28417724900

functionstackx commented Jun 30, 2026

Collaborator

@chunfangamd @hongxiayang 512 conc for 1k1k seem to be failing, i am gotta re-run it and see if it is an flake. if not, then i will just remove it https://github.com/SemiAnalysisAI/InferenceX/actions/runs/28417724900?pr=1946

github-actions Bot commented Jun 30, 2026

Contributor

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=28417724900
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=28417724900

chunfangamd and others added 7 commits

June 30, 2026 03:29


          [AMD] Update MiniMax-M3 FP8 MI355X vLLM image and enable INT6 quick-r…

fc9924d

…educe


          [AMD] Update changelog

97fc8d6


          [AMD] Retune MiniMax-M3 FP8 MI355X vLLM search space

5c4ab68


          [AMD] Bump MiniMax-M3 FP8 MI355X image and enable AITER fused experts

87d4da1

Pin minimaxm3-fp8-mi355x-vllm{,-mtp} to nightly-4559c43a, which bakes in
fused shared-experts MoE (vllm-project/vllm#46545) and the AITER flydsl
MoE backend (#46184).
Align both bench scripts with vllm-project/recipes#581 by exporting
VLLM_ROCM_USE_AITER=1 and VLLM_ROCM_USE_AITER_FUSION_SHARED_EXPERTS=1
alongside the existing INT6 quick-reduce; no --moe-backend override, so
AITER is auto-selected.


          aiter master flag and ep

70521c1

Signed-off-by: Hongxia Yang <hongxia.yang@amd.com>


          [AMD] Gate AITER master switch on EP for MiniMax-M3 MXFP8 recipes

5c8fa81

Set VLLM_ROCM_USE_AITER on only for expert-parallel (EP/DP-attention)
runs, where the AITER fused MoE is the auto-selected backend. TP-only
runs leave it off and use the native MXFP8 path (the master switch
otherwise produces degenerate MiniMax-M3 output).

Keep VLLM_ROCM_USE_AITER_FUSION_SHARED_EXPERTS=1 unconditional: the
router-append shared-experts fusion checks the env directly (independent
of the master switch) and self-disables under EP inside the model.

Co-authored-by: Claude <noreply@anthropic.com>


          [AMD] Drop conc=512 from FP8 MI355X vLLM MTP tp4/ep4 1k1k sweep

a06a499

The minimaxm3-fp8-mi355x-vllm-mtp tp=4 ep=4 (dp-attn=false) 1k1k point was
failing at concurrency 512; lower conc-end 512 -> 256 so it sweeps 128/256.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

functionstackx force-pushed the chun_hongxia/minimaxm3_fp8 branch from 90decb3 to a06a499 Compare

June 30, 2026 07:31

github-actions Bot commented Jun 30, 2026

Contributor

see unofficial run visualizer at https://inferencex.semianalysis.com/inference?unofficialRun=28427946510
see unofficial run visualizer at https://inferencex.semianalysis.com/evaluation?unofficialRun=28427946510

billishyahao approved these changes

View reviewed changes

billishyahao left a comment

Collaborator

As a PR reviewer and CODEOWNER, I have reviewed this and have:

Verified that this is the latest version of PR_REVIEW_CHECKLIST.md
Verified that the general code quality meets the InferenceX standard and does not make it worse.
Verified that this PR has passed PR validation: https://inferencex.semianalysis.com/inference?unofficialRun=28427946510
Verified that this PR passes evals: https://inferencex.semianalysis.com/evaluation?unofficialRun=28427946510
vLLM/SGLang submitted before ATOM: This is VLLM update
Single-node recipes similar to official vLLM/SGLang recipes:
If any criteria cannot be satisfied, reasoning provided below.

Signed: billishyahao

Klaud-Cold commented Jun 30, 2026

Collaborator

@billishyahao Blocked: the sign-off carries no recipe link (Check 3a). Checks 0-2 pass.

Check 0 — PASS: signer is a CODEOWNER for .github/configs/amd-master.yaml; the bench scripts + perf-changelog.yaml are catch-all (* @InferenceX/core) and covered.
Check 1 — PASS: green, non-skipped single-node 1k1k/8k1k and per-config eval / check-runs on in-PR head commit a06a499 — https://github.com/SemiAnalysisAI/InferenceX/actions/runs/28427946510
Check 2 — PASS: 4 gsm8k evals, em_strict 0.957-0.965 (above bar), image nightly-4559c43a9526597c00cbcc4f59979496500268d1 matching the PR config.
Check 3 — FAIL: no link to a vllm-project/recipes PR/commit or published recipe page in the sign-off's detail section; the "Single-node recipes similar to official vLLM/SGLang recipes:" line is unchecked and empty. The PR diff/changelog reference vllm-project/recipes#581, but per the merge standard the link must be in the sign-off itself. Add the recipe link and re-sign.

chunfangamd commented Jun 30, 2026

Collaborator Author

@cquil11 The corresponding vllm-project/recipes PR: vllm-project/recipes#581

cquil11 commented Jun 30, 2026

Collaborator

@claude open a PR as me that adds an explicit check to the PR verifier workflow that checks to make sure the /swee with reuse command is explicitly commented before PASSing

Klaud-Cold commented Jun 30, 2026 •

edited

Loading

Collaborator

Claude finished @cquil11's task in 2m 19s —— View job

Done — opened PR #1959

Added an explicit Check 4 to the CODEOWNER sign-off verifier (.github/workflows/codeowner-signoff-verify.yml) that requires a /reuse-sweep-run command to be on record before the gate can PASS.

Located the PR verifier workflow (codeowner-signoff-verify.yml) and its PASS logic
Confirmed the reuse-command convention (/reuse-sweep-run, per .github/workflows/README.md → Reusing an Approved PR Full Sweep; it's the input utils/merge_with_reuse.sh consumes)
Implemented the check
Opened PR #1959 as you (co-authored)

What the check does

## Check 4 — Reuse-sweep command explicitly posted lists the PR's conversation comments via gh api repos/<repo>/issues/<pr>/comments and PASSes only if a comment beginning with /reuse-sweep-run (bare, or pinning a <run_id>) exists and its author's author_association is OWNER, MEMBER, or COLLABORATOR — the same authorization the reuse path itself enforces. Otherwise it FAILs with the root issue: "No authorized /reuse-sweep-run command has been posted on this PR."

The verdict now requires Checks 0, 1, 2, 3, and 4 to all pass (five PASS rows instead of four), and the header doc block lists the new check. YAML validated locally.

Note on interpretation

I read "the /swee with reuse command" as /reuse-sweep-run — the slash command an authorized maintainer comments to mark a green sweep for reuse-merge (vs. /sweep, which kicks off a fresh sweep). If you actually meant something else (e.g. a literal /sweep ... --reuse flag), let me know and I'll adjust.
· branch cquil11/signoff-verify-require-reuse-command

cquil11 commented Jun 30, 2026

Collaborator

/reuse-sweep-run

cquil11 approved these changes

View reviewed changes

cquil11 left a comment

Collaborator

LGTM. Merging.

cquil11 merged commit c512990 into main

83 checks passed

cquil11 deleted the chun_hongxia/minimaxm3_fp8 branch

June 30, 2026 14:42

github-project-automation Bot moved this to Done in InferenceMAX Board

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

Labels

AMD full-sweep-enabled