Skip to content

[ROCm] update MiniMax-M3 bf16 recipe on docker image and fp8_per_chanel quantization notes#598

Open
hongxiayang wants to merge 1 commit into
vllm-project:mainfrom
hongxiayang:amd-bf16-minimax-m3-override
Open

[ROCm] update MiniMax-M3 bf16 recipe on docker image and fp8_per_chanel quantization notes#598
hongxiayang wants to merge 1 commit into
vllm-project:mainfrom
hongxiayang:amd-bf16-minimax-m3-override

Conversation

@hongxiayang

Copy link
Copy Markdown
Contributor

For MiniMax-M3 default bf16 model:

(1) use nightly docker image
(2) add a section about support of fp8_per_chanel quantization and extra env for performance.

…nel quantization notes

Signed-off-by: Hongxia Yang <hongxia.yang@amd.com>
@vercel

vercel Bot commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
vllm-recipes Ready Ready Preview, Comment Jun 29, 2026 5:11pm

Request Review

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the MiniMax-M3 model configuration to use the nightly AMD ROCm Docker image and adds a new documentation section for running the model with TP8 and per-channel FP8 quantization. The review feedback suggests improving the markdown formatting by splitting a multi-line inline code block into separate blocks and refining the sentence structure, capitalization, and trailing spaces in the quantization description.

Comment on lines +277 to +278
Add the vision-encoder flags (`--mm-encoder-tp-mode data
--mm-encoder-attn-backend ROCM_AITER_FA`) for multimodal serving.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The inline code block (using backticks) is split across two lines. This can cause markdown parsers to render the newline and leading spaces literally inside the code block, making it difficult to read and copy. It is better to wrap each flag in its own inline code block.

  Add the vision-encoder flags (`--mm-encoder-tp-mode data` and
  `--mm-encoder-attn-backend ROCM_AITER_FA`) for multimodal serving.

Comment on lines +282 to +285
Online **per-channel FP8 (PTPC)** quantization of the BF16 checkpoint halves the
weight footprint (≈100 → **≈50 GiB/GPU**, ~1.75× more KV cache) and lifts
batched throughput with **gsm8k unchanged from BF16** (lossless);
These env vars and the flag are for the **BF16 checkpoint only**.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The sentence structure here is slightly awkward due to the semicolon at the end of line 284 followed by a capitalized sentence on line 285. Additionally, there is a trailing space on line 284, and "gsm8k" should be capitalized as "GSM8K" to match the style used elsewhere in the repository.

  Online **per-channel FP8 (PTPC)** quantization of the BF16 checkpoint halves the
  weight footprint (≈100 → **≈50 GiB/GPU**, ~1.75× more KV cache) and lifts
  batched throughput with **GSM8K unchanged from BF16** (lossless).
  These env vars and the flag are for the **BF16 checkpoint only**.

@hongxiayang

Copy link
Copy Markdown
Contributor Author

cc @functionstackx

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant