[ROCm] update MiniMax-M3 bf16 recipe on docker image and fp8_per_chanel quantization notes by hongxiayang · Pull Request #598 · vllm-project/recipes

hongxiayang · 2026-06-29T17:10:07Z

For MiniMax-M3 default bf16 model:

(1) use nightly docker image
(2) add a section about support of fp8_per_chanel quantization and extra env for performance.

…nel quantization notes Signed-off-by: Hongxia Yang <hongxia.yang@amd.com>

vercel · 2026-06-29T17:10:13Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
vllm-recipes	Ready	Preview, Comment	Jun 29, 2026 5:11pm

gemini-code-assist

Code Review

This pull request updates the MiniMax-M3 model configuration to use the nightly AMD ROCm Docker image and adds a new documentation section for running the model with TP8 and per-channel FP8 quantization. The review feedback suggests improving the markdown formatting by splitting a multi-line inline code block into separate blocks and refining the sentence structure, capitalization, and trailing spaces in the quantization description.

gemini-code-assist · 2026-06-29T17:11:04Z

+  Add the vision-encoder flags (`--mm-encoder-tp-mode data
+  --mm-encoder-attn-backend ROCM_AITER_FA`) for multimodal serving.


The inline code block (using backticks) is split across two lines. This can cause markdown parsers to render the newline and leading spaces literally inside the code block, making it difficult to read and copy. It is better to wrap each flag in its own inline code block.

Add the vision-encoder flags (`--mm-encoder-tp-mode data` and `--mm-encoder-attn-backend ROCM_AITER_FA`) for multimodal serving.

gemini-code-assist · 2026-06-29T17:11:04Z

+  Online **per-channel FP8 (PTPC)** quantization of the BF16 checkpoint halves the
+  weight footprint (≈100 → **≈50 GiB/GPU**, ~1.75× more KV cache) and lifts
+  batched throughput with **gsm8k unchanged from BF16** (lossless); 
+  These env vars and the flag are for the **BF16 checkpoint only**.


The sentence structure here is slightly awkward due to the semicolon at the end of line 284 followed by a capitalized sentence on line 285. Additionally, there is a trailing space on line 284, and "gsm8k" should be capitalized as "GSM8K" to match the style used elsewhere in the repository.

Online **per-channel FP8 (PTPC)** quantization of the BF16 checkpoint halves the weight footprint (≈100 → **≈50 GiB/GPU**, ~1.75× more KV cache) and lifts batched throughput with **GSM8K unchanged from BF16** (lossless). These env vars and the flag are for the **BF16 checkpoint only**.

hongxiayang · 2026-06-29T21:04:58Z

cc @functionstackx

[ROCm] update MiniMax-M3 bf16 recipe on docker image and fp8_per_chan…

15eee50

…nel quantization notes Signed-off-by: Hongxia Yang <hongxia.yang@amd.com>

gemini-code-assist Bot reviewed Jun 29, 2026

View reviewed changes

vercel Bot deployed to Preview June 29, 2026 17:11 View deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ROCm] update MiniMax-M3 bf16 recipe on docker image and fp8_per_chanel quantization notes#598

[ROCm] update MiniMax-M3 bf16 recipe on docker image and fp8_per_chanel quantization notes#598
hongxiayang wants to merge 1 commit into
vllm-project:mainfrom
hongxiayang:amd-bf16-minimax-m3-override

hongxiayang commented Jun 29, 2026

Uh oh!

vercel Bot commented Jun 29, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Jun 29, 2026

Uh oh!

gemini-code-assist Bot Jun 29, 2026

Uh oh!

hongxiayang commented Jun 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		Add the vision-encoder flags (`--mm-encoder-tp-mode data
		--mm-encoder-attn-backend ROCM_AITER_FA`) for multimodal serving.

Uh oh!

Conversation

hongxiayang commented Jun 29, 2026

Uh oh!

vercel Bot commented Jun 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 29, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 29, 2026

Choose a reason for hiding this comment

Uh oh!

hongxiayang commented Jun 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vercel Bot commented Jun 29, 2026 •

edited

Loading