Skip to content

[AMD] Update MiniMax-M3 FP8 MI355X vLLM image and enable INT6 quick-reduce#1946

Merged
cquil11 merged 7 commits into
mainfrom
chun_hongxia/minimaxm3_fp8
Jun 30, 2026
Merged

[AMD] Update MiniMax-M3 FP8 MI355X vLLM image and enable INT6 quick-reduce#1946
cquil11 merged 7 commits into
mainfrom
chun_hongxia/minimaxm3_fp8

Conversation

@chunfangamd

Copy link
Copy Markdown
Collaborator

Update MiniMax-M3 FP8 MI355X vLLM image and enable INT6 quick-reduce

Pin the minimaxm3-fp8-mi355x-vllm config to nightly
3f5a1e1733200760169ff31ebe60a271072b199e, which includes the gfx950
mxfp8 moe/linear tuning for MiniMax-M3 (vllm-project/vllm#45725).

Export VLLM_ROCM_QUICK_REDUCE_QUANTIZATION=INT6 in the standard and MTP
bench scripts to use INT6 quick all-reduce on CDNA4/gfx950, improving
TP communication throughput for the mxfp8 workload.

Co-authored with @hongxiayang

@github-actions

Copy link
Copy Markdown
Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.


感谢你的贡献!对于 vLLM 与 SGLang,请确保你的 recipe 与官方 vLLM recipes 和/或 SGLang cookbook 保持一致

如果不一致,请先创建一个 PR,之后我们才能将你的单节点 PR 合并到 master 分支。让我们确保文档保持一流水准,使整个 ML 社区都能从你的辛勤工作中受益!谢谢

PR 作者有责任确保合并后所有 GitHub Action 任务完全通过。 很多时候失败只是偶发抖动(flake),重新运行失败的任务即可解决。如果选择重新运行失败的任务,PR 作者有责任确保其最终通过。参见 GitHub 关于重新运行失败任务的文档:https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

一般而言,PR 作者应先向相应公司的 CODEOWNERS 请求审阅并获得 PR 批准,然后再请求核心维护者审阅。

如需更多帮助,PR 作者可通过 Slack 联系核心维护者。

2 similar comments
@github-actions

Copy link
Copy Markdown
Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.


感谢你的贡献!对于 vLLM 与 SGLang,请确保你的 recipe 与官方 vLLM recipes 和/或 SGLang cookbook 保持一致

如果不一致,请先创建一个 PR,之后我们才能将你的单节点 PR 合并到 master 分支。让我们确保文档保持一流水准,使整个 ML 社区都能从你的辛勤工作中受益!谢谢

PR 作者有责任确保合并后所有 GitHub Action 任务完全通过。 很多时候失败只是偶发抖动(flake),重新运行失败的任务即可解决。如果选择重新运行失败的任务,PR 作者有责任确保其最终通过。参见 GitHub 关于重新运行失败任务的文档:https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

一般而言,PR 作者应先向相应公司的 CODEOWNERS 请求审阅并获得 PR 批准,然后再请求核心维护者审阅。

如需更多帮助,PR 作者可通过 Slack 联系核心维护者。

@github-actions

Copy link
Copy Markdown
Contributor

Thanks for the contribution! For vLLM & SGLang, please ensure that your recipes is similar to the official vLLM recipes and/or the SGLang cookbook

If it is not, please create a PR first before we can merge your single node PR into the master branch. Let's ensure that the documentation is first class such that the entire ML community can benefit from your hard work! Thank you

PR authors are responsible for ensuring that after merging, all GitHub Action jobs fully pass. A lot of the time, failures are just flakes and simply re-running the failed jobs will fix it. If re-running failed jobs is attempted, PR authors are responsible for ensuring it passes. See GitHub's docs on re-running failed jobs: https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

As a rule of thumb, generally, PR authors should request a review & get a PR approval from the respective companies' CODEOWNERS before requesting a review from core maintainers.

If additional help is needed, PR authors can reach out to core maintainers over Slack.


感谢你的贡献!对于 vLLM 与 SGLang,请确保你的 recipe 与官方 vLLM recipes 和/或 SGLang cookbook 保持一致

如果不一致,请先创建一个 PR,之后我们才能将你的单节点 PR 合并到 master 分支。让我们确保文档保持一流水准,使整个 ML 社区都能从你的辛勤工作中受益!谢谢

PR 作者有责任确保合并后所有 GitHub Action 任务完全通过。 很多时候失败只是偶发抖动(flake),重新运行失败的任务即可解决。如果选择重新运行失败的任务,PR 作者有责任确保其最终通过。参见 GitHub 关于重新运行失败任务的文档:https://docs.github.com/en/actions/how-tos/manage-workflow-runs/re-run-workflows-and-jobs#re-running-failed-jobs-in-a-workflow

一般而言,PR 作者应先向相应公司的 CODEOWNERS 请求审阅并获得 PR 批准,然后再请求核心维护者审阅。

如需更多帮助,PR 作者可通过 Slack 联系核心维护者。

Comment thread .github/configs/amd-master.yaml Outdated
# MXFP8 runs from TP=4 on gfx950; block size 128 is mandatory for MSA.
minimaxm3-fp8-mi355x-vllm:
image: vllm/vllm-openai-rocm:minimax-m3
image: vllm/vllm-openai-rocm:nightly-3f5a1e1733200760169ff31ebe60a271072b199e

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 This PR bumps the minimaxm3-fp8-mi355x-vllm image and adds VLLM_ROCM_QUICK_REDUCE_QUANTIZATION=INT6 to both the non-MTP and MTP bench scripts, but does not append a perf-changelog.yaml entry — AGENTS.md (§Updating Docker images, lines 124-135) requires one for both kinds of change, and changelog entries are what trigger the benchmark sweep. Without an entry the new image+INT6 combination will land unbenchmarked, so the PR description's throughput claim cannot be validated. Append an entry under config-keys minimaxm3-fp8-mi355x-vllm (image pin + INT6) and minimaxm3-fp8-mi355x-vllm-mtp (the MTP script also gets the INT6 env var) — see #1941 (the directly analogous MTP image bump to the same nightly) for the precedent.

Extended reasoning...

What the bug is

AGENTS.md lines 124-135 (§Updating Docker images) state explicitly: "Update the image tag in the relevant .github/configs/*-master.yaml and/or benchmarks/*.sh, update any related env vars / config params, and append a perf-changelog.yaml entry (required - triggers benchmarks)". Line 58 of the same doc reiterates: "Changes to perf-changelog.yaml trigger benchmark runs".

This PR does both of the change classes the policy enumerates:

  1. Image bump in .github/configs/amd-master.yaml line 2528: vllm/vllm-openai-rocm:minimax-m3vllm/vllm-openai-rocm:nightly-3f5a1e1733200760169ff31ebe60a271072b199e.
  2. New env var VLLM_ROCM_QUICK_REDUCE_QUANTIZATION=INT6 exported in both benchmarks/single_node/fixed_seq_len/minimaxm3_fp8_mi355x.sh (line 34) and benchmarks/single_node/fixed_seq_len/minimaxm3_fp8_mi355x_mtp.sh (line 64).

The PR diff modifies exactly three files (amd-master.yaml + the two .sh scripts); no perf-changelog.yaml entry is added.

Why this matters / impact

perf-changelog.yaml is the trigger for the sweep generator. Without an entry, this PR will not produce a benchmark run for the new image+INT6 combination, so the PR description's claim — "improving TP communication throughput for the mxfp8 workload" — lands unvalidated. That is precisely the failure mode the policy is designed to prevent.

Sibling-PR precedent

The tail of perf-changelog.yaml shows every recent sibling MiniMax-M3 PR followed this convention:

This PR is the missing twin to #1941 (it pins -vllm to the same nightly that #1941 pinned -vllm-mtp to), and additionally exports INT6 quick-reduce in both scripts — yet no changelog entry exists.

Step-by-step proof

  1. git diff for this PR returns three files: amd-master.yaml, minimaxm3_fp8_mi355x.sh, minimaxm3_fp8_mi355x_mtp.sh — no perf-changelog.yaml.
  2. Inspecting amd-master.yaml line 2528 confirms the image string change for the minimaxm3-fp8-mi355x-vllm config-key.
  3. grep -n VLLM_ROCM_QUICK_REDUCE_QUANTIZATION benchmarks/single_node/fixed_seq_len/minimaxm3_fp8_mi355x*.sh shows the env var exported at line 34 of the non-MTP script and line 64 of the MTP script.
  4. AGENTS.md lines 124-126 say a perf-changelog.yaml entry is required and triggers benchmarks; line 58 confirms the trigger mechanism.
  5. The last entry in perf-changelog.yaml is PR [codex] update MiniMax M3 FP8 MI355X vLLM MTP image #1941 — the analogous image bump to the same nightly hash on the sibling MTP config. It is on the list of sibling MiniMax-M3 PRs that all appended entries.
  6. Therefore the new image+INT6 combination will not be swept on merge, and the PR-description throughput claim cannot be validated before landing.

Fix

Append an entry like the following (note the MTP script also picks up INT6, so the entry should cover both config-keys, or use a minimaxm3-fp8-mi355x-vllm* wildcard):

- config-keys:
    - minimaxm3-fp8-mi355x-vllm
    - minimaxm3-fp8-mi355x-vllm-mtp
  description:
    - "Pin minimaxm3-fp8-mi355x-vllm image to nightly-3f5a1e1733200760169ff31ebe60a271072b199e (includes gfx950 mxfp8 moe/linear tuning from vllm-project/vllm#45725)."
    - "Export VLLM_ROCM_QUICK_REDUCE_QUANTIZATION=INT6 in the standard and MTP bench scripts to use INT6 quick all-reduce on CDNA4/gfx950."
  pr-link: https://github.com/SemiAnalysisAI/InferenceX/pull/1946

@github-actions

Copy link
Copy Markdown
Contributor

3 similar comments
@github-actions

Copy link
Copy Markdown
Contributor

@github-actions

Copy link
Copy Markdown
Contributor

@github-actions

Copy link
Copy Markdown
Contributor

@github-actions

Copy link
Copy Markdown
Contributor

2 similar comments
@github-actions

Copy link
Copy Markdown
Contributor

@github-actions

Copy link
Copy Markdown
Contributor

@github-actions

Copy link
Copy Markdown
Contributor

@github-actions

Copy link
Copy Markdown
Contributor

3 similar comments
@github-actions

Copy link
Copy Markdown
Contributor

@github-actions

Copy link
Copy Markdown
Contributor

@github-actions

Copy link
Copy Markdown
Contributor

@github-actions

Copy link
Copy Markdown
Contributor

1 similar comment
@github-actions

Copy link
Copy Markdown
Contributor

@github-actions

Copy link
Copy Markdown
Contributor

@github-actions

Copy link
Copy Markdown
Contributor

@functionstackx

Copy link
Copy Markdown
Collaborator

@chunfangamd @hongxiayang 512 conc for 1k1k seem to be failing, i am gotta re-run it and see if it is an flake. if not, then i will just remove it https://github.com/SemiAnalysisAI/InferenceX/actions/runs/28417724900?pr=1946

@github-actions

Copy link
Copy Markdown
Contributor

chunfangamd and others added 7 commits June 30, 2026 03:29
Pin minimaxm3-fp8-mi355x-vllm{,-mtp} to nightly-4559c43a, which bakes in
fused shared-experts MoE (vllm-project/vllm#46545) and the AITER flydsl
MoE backend (#46184).
Align both bench scripts with vllm-project/recipes#581 by exporting
VLLM_ROCM_USE_AITER=1 and VLLM_ROCM_USE_AITER_FUSION_SHARED_EXPERTS=1
alongside the existing INT6 quick-reduce; no --moe-backend override, so
AITER is auto-selected.
Signed-off-by: Hongxia Yang <hongxia.yang@amd.com>
Set VLLM_ROCM_USE_AITER on only for expert-parallel (EP/DP-attention)
runs, where the AITER fused MoE is the auto-selected backend. TP-only
runs leave it off and use the native MXFP8 path (the master switch
otherwise produces degenerate MiniMax-M3 output).

Keep VLLM_ROCM_USE_AITER_FUSION_SHARED_EXPERTS=1 unconditional: the
router-append shared-experts fusion checks the env directly (independent
of the master switch) and self-disables under EP inside the model.

Co-authored-by: Claude <noreply@anthropic.com>
The minimaxm3-fp8-mi355x-vllm-mtp tp=4 ep=4 (dp-attn=false) 1k1k point was
failing at concurrency 512; lower conc-end 512 -> 256 so it sweeps 128/256.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@functionstackx functionstackx force-pushed the chun_hongxia/minimaxm3_fp8 branch from 90decb3 to a06a499 Compare June 30, 2026 07:31
@github-actions

Copy link
Copy Markdown
Contributor

@billishyahao billishyahao left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a PR reviewer and CODEOWNER, I have reviewed this and have:

Signed: billishyahao

@Klaud-Cold

Copy link
Copy Markdown
Collaborator

@billishyahao Blocked: the sign-off carries no recipe link (Check 3a). Checks 0-2 pass.

  • Check 0 — PASS: signer is a CODEOWNER for .github/configs/amd-master.yaml; the bench scripts + perf-changelog.yaml are catch-all (* @InferenceX/core) and covered.
  • Check 1 — PASS: green, non-skipped single-node 1k1k/8k1k and per-config eval / check-runs on in-PR head commit a06a499https://github.com/SemiAnalysisAI/InferenceX/actions/runs/28427946510
  • Check 2 — PASS: 4 gsm8k evals, em_strict 0.957-0.965 (above bar), image nightly-4559c43a9526597c00cbcc4f59979496500268d1 matching the PR config.
  • Check 3 — FAIL: no link to a vllm-project/recipes PR/commit or published recipe page in the sign-off's detail section; the "Single-node recipes similar to official vLLM/SGLang recipes:" line is unchecked and empty. The PR diff/changelog reference vllm-project/recipes#581, but per the merge standard the link must be in the sign-off itself. Add the recipe link and re-sign.

@chunfangamd

Copy link
Copy Markdown
Collaborator Author

@cquil11 The corresponding vllm-project/recipes PR: vllm-project/recipes#581

@cquil11

cquil11 commented Jun 30, 2026

Copy link
Copy Markdown
Collaborator

@claude open a PR as me that adds an explicit check to the PR verifier workflow that checks to make sure the /swee with reuse command is explicitly commented before PASSing

@Klaud-Cold

Klaud-Cold commented Jun 30, 2026

Copy link
Copy Markdown
Collaborator

Claude finished @cquil11's task in 2m 19s —— View job


Done — opened PR #1959

Added an explicit Check 4 to the CODEOWNER sign-off verifier (.github/workflows/codeowner-signoff-verify.yml) that requires a /reuse-sweep-run command to be on record before the gate can PASS.

  • Located the PR verifier workflow (codeowner-signoff-verify.yml) and its PASS logic
  • Confirmed the reuse-command convention (/reuse-sweep-run, per .github/workflows/README.mdReusing an Approved PR Full Sweep; it's the input utils/merge_with_reuse.sh consumes)
  • Implemented the check
  • Opened PR #1959 as you (co-authored)

What the check does

## Check 4 — Reuse-sweep command explicitly posted lists the PR's conversation comments via gh api repos/<repo>/issues/<pr>/comments and PASSes only if a comment beginning with /reuse-sweep-run (bare, or pinning a <run_id>) exists and its author's author_association is OWNER, MEMBER, or COLLABORATOR — the same authorization the reuse path itself enforces. Otherwise it FAILs with the root issue: "No authorized /reuse-sweep-run command has been posted on this PR."

The verdict now requires Checks 0, 1, 2, 3, and 4 to all pass (five PASS rows instead of four), and the header doc block lists the new check. YAML validated locally.

Note on interpretation

I read "the /swee with reuse command" as /reuse-sweep-run — the slash command an authorized maintainer comments to mark a green sweep for reuse-merge (vs. /sweep, which kicks off a fresh sweep). If you actually meant something else (e.g. a literal /sweep ... --reuse flag), let me know and I'll adjust.
· branch cquil11/signoff-verify-require-reuse-command

@cquil11

cquil11 commented Jun 30, 2026

Copy link
Copy Markdown
Collaborator

/reuse-sweep-run

@cquil11 cquil11 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Merging.

@cquil11 cquil11 merged commit c512990 into main Jun 30, 2026
83 checks passed
@cquil11 cquil11 deleted the chun_hongxia/minimaxm3_fp8 branch June 30, 2026 14:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Development

Successfully merging this pull request may close these issues.

7 participants