Fix gathered MXFP4 activation scales in Gluon MoE by qedawkins · Pull Request #534 · lightseekorg/tokenspeed

qedawkins · 2026-06-26T15:38:33Z

When MXFP4 activations are gathered through routed MoE metadata, X block-scale rows must be gathered using the scale tile's own row layout instead of reusing the already-gathered activation data rows.

Keep W scales on the swizzled LDS path, but route gathered X scales through the direct load path so the scale indices match the gathered activation rows. This is scoped to MXFP4-X with gather and does not affect the existing FP8 activation path.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ce977a1095

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

When MXFP4 activations are gathered through routed MoE metadata, X block-scale rows must be gathered using the scale tile's own row layout instead of reusing the already-gathered activation data rows. Keep W scales on the swizzled LDS path, but route gathered X scales through the direct load path so the scale indices match the gathered activation rows. This is scoped to MXFP4-X with gather and does not affect the existing FP8 activation path. Signed-off-by: Quinn Dawkins <quinn.dawkins@gmail.com>

Split X and W scale LDS handling fully by removing the aggregate SCALE_VIA_LDS config field, and make SliceMN direct scale loads read the requested subtile from the descriptor instead of assuming an LDS scale buffer exists. Also delay the vendor-neutral quantize_mxfp4 test import until after gfx950 collection gating and run the new gathered-scale test over the existing key token counts. Signed-off-by: Quinn Dawkins <quinn.dawkins@gmail.com>

Replace the gathered activation test's tokenspeed_kernel quantization dependency with a small test-local MXFP4 quantizer. This keeps tokenspeed-kernel-amd tests free of vendor-neutral package imports while preserving the packed e2m1/e8m0 inputs needed by the Gluon MoE path. Signed-off-by: Quinn Dawkins <quinn.dawkins@gmail.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 17a1c146d5

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-06-26T20:08:41Z

+            weights.w13_bias,
+            a_ragged_metadata=ragged_metadata,
+            gather_indx=gather_indx,
+            precision_config=precision_config,


Pass scales through the AMD test API

On gfx950 this new test never exercises the gathered-scale path: gluon_moe is fused_mxfp_gfx950, whose gluon_mxfp_ragged_matmul requires w_mx_scale and uses x_format="e2m1"/x_mx_scale for MXFP4 activations, while this precision_config keyword is only captured in **extra_kwargs (and PrecisionConfig is not exported by that module). Pass the W and X scale keywords directly here; otherwise the test raises before validating the fix.

Useful? React with 👍 / 👎.

qedawkins requested review from Max191 and panditsa June 26, 2026 15:38

chatgpt-codex-connector Bot reviewed Jun 26, 2026

View reviewed changes

Comment thread tokenspeed-kernel-amd/python/tokenspeed_kernel_amd/ops/moe/fused_mxfp_gfx950.py Outdated

Comment thread tokenspeed-kernel-amd/test/ops/test_gluon_moe_gemm_gfx950.py Outdated

qedawkins force-pushed the qedawkins/gluon-mxfp4-gather-scale-fix branch from ce977a1 to b2918fe Compare June 26, 2026 15:50

Max191 reviewed Jun 26, 2026

View reviewed changes

Comment thread tokenspeed-kernel-amd/python/tokenspeed_kernel_amd/ops/moe/fused_mxfp_gfx950.py Outdated

Comment thread tokenspeed-kernel-amd/test/ops/test_gluon_moe_gemm_gfx950.py Outdated

qedawkins added 3 commits June 26, 2026 15:02

qedawkins force-pushed the qedawkins/gluon-mxfp4-gather-scale-fix branch from bdeb4b5 to 17a1c14 Compare June 26, 2026 20:03

chatgpt-codex-connector Bot reviewed Jun 26, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix gathered MXFP4 activation scales in Gluon MoE#534

Fix gathered MXFP4 activation scales in Gluon MoE#534
qedawkins wants to merge 3 commits into
lightseekorg:mainfrom
qedawkins:qedawkins/gluon-mxfp4-gather-scale-fix

qedawkins commented Jun 26, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jun 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

qedawkins commented Jun 26, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants