Add GLM-5.2 MXFP4 recipe support by xiao-llm · Pull Request #583 · vllm-project/recipes

xiao-llm · 2026-06-26T15:37:36Z

Summary

Add GLM-5.2 MXFP4 as a recipe variant alongside the existing FP8 and BF16 options.
Configure the MXFP4 variant for the AMD Quark checkpoint with --quantization quark.
Restrict the MXFP4 variant to supported MI355X hardware, following the MiniMax-M3 MXFP4 recipe pattern.
Set the MXFP4 generated command to the validated TP8 path and include the MI355X-specific --trust-remote-code override.
Add ROCm/AITER serving guidance for GLM-5.2 MXFP4, including recommended MI355X launch flags and KV-cache settings.
Document MTP usage caveats for MXFP4, including the need for a vLLM build that honors Quark exclude entries for the unquantized MTP layer.

Motivation

GLM-5.2 has an AMD Quark MXFP4 checkpoint that significantly reduces HBM footprint compared with the FP8 and BF16 variants. This recipe update makes that checkpoint selectable in the recipe UI while preventing unsupported hardware combinations from being generated.

The MTP note is tied to the corresponding vLLM Quark loading fix: vLLM PR #46757.

Test Plan

Verified the GLM-5.2 recipe YAML parses successfully.
Compared the MXFP4 hardware gating pattern with models/MiniMaxAI/MiniMax-M3.yaml.
Confirmed the generated diff updates the GLM-5.2 recipe metadata, variant config, and guide text only.

Co-authored-by: Cursor <cursoragent@cursor.com>

vercel · 2026-06-26T15:37:41Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
vllm-recipes	Ready	Preview, Comment	Jun 26, 2026 6:30pm

gemini-code-assist

Code Review

This pull request introduces support for the AMD Quark MXFP4 quantized variant of the GLM-5.2 model, optimized for AMD MI350 and MI355X GPUs. It adds the mxfp4 variant configuration to the model YAML and updates the documentation with prerequisites, environment variables, and run commands for utilizing ROCm AITER kernels. The feedback suggests updating the mxfp4 variant metadata to include --trust-remote-code in extra_args and the required environment variables in extra_env to ensure consistency with the guide and support automated deployment tools.

gemini-code-assist · 2026-06-26T15:38:29Z

+  mxfp4:
+    model_id: "amd/GLM-5.2-MXFP4"
+    precision: mxfp4
+    vram_minimum_gb: 446
+    description: "AMD Quark MXFP4 checkpoint — MoE weights quantized for MI350 / MI355X"
+    extra_args:
+      - "--quantization"
+      - "quark"


To ensure consistency with the launch instructions in the guide and to support automated deployment tools that parse this recipe, the mxfp4 variant metadata should include --trust-remote-code in extra_args and the required environment variables (VLLM_ROCM_USE_AITER_FP8BMM: "0" and VLLM_ROCM_USE_AITER_FP4BMM: "0") in extra_env.

mxfp4: model_id: "amd/GLM-5.2-MXFP4" precision: mxfp4 vram_minimum_gb: 446 description: "AMD Quark MXFP4 checkpoint — MoE weights quantized for MI350 / MI355X" extra_args: - "--quantization" - "quark" - "--trust-remote-code" extra_env: VLLM_ROCM_USE_AITER_FP8BMM: "0" VLLM_ROCM_USE_AITER_FP4BMM: "0"

Setting them to 0 keeps serving on the known-good kernels and avoids accidental dispatch into experimental/unsupported FP8/FP4 BMM implementations.

BowenBao · 2026-06-26T17:38:30Z

+  export VLLM_ROCM_USE_AITER_FP8BMM=0
+  export VLLM_ROCM_USE_AITER_FP4BMM=0


related to gemini's comment, are these needed?

Will remove it after verified, its a extra condition to keep it safe.

Follow the MiniMax-M3 MXFP4 recipe pattern by allowing the GLM-5.2 MXFP4 variant only on MI355X hardware and keeping the generated command on the validated TP8 Quark path.

Link the related vLLM Quark MTP loading fix from the GLM-5.2 recipe guide so users know which vLLM change is required before enabling MXFP4 speculative decoding.

Add GLM-5.2 MXFP4 recipe support

7dd47fa

Co-authored-by: Cursor <cursoragent@cursor.com>

gemini-code-assist Bot reviewed Jun 26, 2026

View reviewed changes

vercel Bot deployed to Preview June 26, 2026 15:39 View deployment

BowenBao reviewed Jun 26, 2026

View reviewed changes

Comment thread models/zai-org/GLM-5.2.yaml Outdated

BowenBao reviewed Jun 26, 2026

View reviewed changes

Restrict GLM-5.2 MXFP4 hardware support

8f0ba82

Follow the MiniMax-M3 MXFP4 recipe pattern by allowing the GLM-5.2 MXFP4 variant only on MI355X hardware and keeping the generated command on the validated TP8 Quark path.

vercel Bot deployed to Preview June 26, 2026 18:17 View deployment

Reference GLM-5.2 MXFP4 MTP fix

9ae84eb

Link the related vLLM Quark MTP loading fix from the GLM-5.2 recipe guide so users know which vLLM change is required before enabling MXFP4 speculative decoding.

vercel Bot deployed to Preview June 26, 2026 18:30 View deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add GLM-5.2 MXFP4 recipe support#583

Add GLM-5.2 MXFP4 recipe support#583
xiao-llm wants to merge 3 commits into
vllm-project:mainfrom
xiao-llm:main

xiao-llm commented Jun 26, 2026 •

edited

Loading

Uh oh!

vercel Bot commented Jun 26, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Jun 26, 2026

Uh oh!

xiao-llm Jun 26, 2026

Uh oh!

Uh oh!

BowenBao Jun 26, 2026

Uh oh!

xiao-llm Jun 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		export VLLM_ROCM_USE_AITER_FP8BMM=0
		export VLLM_ROCM_USE_AITER_FP4BMM=0

Uh oh!

Conversation

xiao-llm commented Jun 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Motivation

Test Plan

Uh oh!

vercel Bot commented Jun 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

xiao-llm Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

BowenBao Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

xiao-llm Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

xiao-llm commented Jun 26, 2026 •

edited

Loading

vercel Bot commented Jun 26, 2026 •

edited

Loading