Skip to content

[WIP] Initial glm 5.2 support on amd#528

Draft
borontion wants to merge 12 commits into
mainfrom
borontion/amd-glm-support
Draft

[WIP] Initial glm 5.2 support on amd#528
borontion wants to merge 12 commits into
mainfrom
borontion/amd-glm-support

Conversation

@borontion

@borontion borontion commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Summary

Early draft for GLM 5.2 support on AMD MI350 with Triton kernels.

Test Plan

tokenspeed serve zai-org/GLM-5.2-FP8 \
  --served-model-name glm-5.2 \
  --trust-remote-code \
  --tensor-parallel-size 8 \
  --enable-expert-parallel \
  --moe-backend triton \
  --kv-cache-dtype fp8 \
  --max-model-len 262144 \
  --chunked-prefill-size 8192 \
  --max-num-seqs 128 \
  --host 0.0.0.0 \
  --port 8000

Signed-off-by: Pengzhan Zhao <borontion@gmail.com>
@borontion borontion changed the title [WIP] Initial glm support on amd [WIP] Initial glm 5.2 support on amd Jun 26, 2026
borontion added 11 commits June 26, 2026 10:15
Signed-off-by: Pengzhan Zhao <borontion@gmail.com>
Signed-off-by: Pengzhan Zhao <borontion@gmail.com>
Signed-off-by: Pengzhan Zhao <borontion@gmail.com>
Signed-off-by: Pengzhan Zhao <borontion@gmail.com>
Signed-off-by: Pengzhan Zhao <borontion@gmail.com>
Signed-off-by: Pengzhan Zhao <borontion@gmail.com>
Signed-off-by: Pengzhan Zhao <borontion@gmail.com>
Signed-off-by: Pengzhan Zhao <borontion@gmail.com>
Signed-off-by: Pengzhan Zhao <borontion@gmail.com>
Signed-off-by: Pengzhan Zhao <borontion@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant