Skip to content

Fix gathered MXFP4 activation scales in Gluon MoE#534

Open
qedawkins wants to merge 3 commits into
lightseekorg:mainfrom
qedawkins:qedawkins/gluon-mxfp4-gather-scale-fix
Open

Fix gathered MXFP4 activation scales in Gluon MoE#534
qedawkins wants to merge 3 commits into
lightseekorg:mainfrom
qedawkins:qedawkins/gluon-mxfp4-gather-scale-fix

Conversation

@qedawkins

Copy link
Copy Markdown
Contributor

When MXFP4 activations are gathered through routed MoE metadata, X block-scale rows must be gathered using the scale tile's own row layout instead of reusing the already-gathered activation data rows.

Keep W scales on the swizzled LDS path, but route gathered X scales through the direct load path so the scale indices match the gathered activation rows. This is scoped to MXFP4-X with gather and does not affect the existing FP8 activation path.

@qedawkins qedawkins requested review from Max191 and panditsa June 26, 2026 15:38

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ce977a1095

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread tokenspeed-kernel-amd/python/tokenspeed_kernel_amd/ops/moe/fused_mxfp_gfx950.py Outdated
Comment thread tokenspeed-kernel-amd/test/ops/test_gluon_moe_gemm_gfx950.py Outdated
@qedawkins qedawkins force-pushed the qedawkins/gluon-mxfp4-gather-scale-fix branch from ce977a1 to b2918fe Compare June 26, 2026 15:50
Comment thread tokenspeed-kernel-amd/python/tokenspeed_kernel_amd/ops/moe/fused_mxfp_gfx950.py Outdated
Comment thread tokenspeed-kernel-amd/test/ops/test_gluon_moe_gemm_gfx950.py Outdated
When MXFP4 activations are gathered through routed MoE metadata, X block-scale rows must be gathered using the scale tile's own row layout instead of reusing the already-gathered activation data rows.

Keep W scales on the swizzled LDS path, but route gathered X scales through the direct load path so the scale indices match the gathered activation rows. This is scoped to MXFP4-X with gather and does not affect the existing FP8 activation path.

Signed-off-by: Quinn Dawkins <quinn.dawkins@gmail.com>
Split X and W scale LDS handling fully by removing the aggregate SCALE_VIA_LDS config field, and make SliceMN direct scale loads read the requested subtile from the descriptor instead of assuming an LDS scale buffer exists.

Also delay the vendor-neutral quantize_mxfp4 test import until after gfx950 collection gating and run the new gathered-scale test over the existing key token counts.

Signed-off-by: Quinn Dawkins <quinn.dawkins@gmail.com>
Replace the gathered activation test's tokenspeed_kernel quantization dependency with a small test-local MXFP4 quantizer. This keeps tokenspeed-kernel-amd tests free of vendor-neutral package imports while preserving the packed e2m1/e8m0 inputs needed by the Gluon MoE path.

Signed-off-by: Quinn Dawkins <quinn.dawkins@gmail.com>
@qedawkins qedawkins force-pushed the qedawkins/gluon-mxfp4-gather-scale-fix branch from bdeb4b5 to 17a1c14 Compare June 26, 2026 20:03

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 17a1c146d5

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

weights.w13_bias,
a_ragged_metadata=ragged_metadata,
gather_indx=gather_indx,
precision_config=precision_config,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Pass scales through the AMD test API

On gfx950 this new test never exercises the gathered-scale path: gluon_moe is fused_mxfp_gfx950, whose gluon_mxfp_ragged_matmul requires w_mx_scale and uses x_format="e2m1"/x_mx_scale for MXFP4 activations, while this precision_config keyword is only captured in **extra_kwargs (and PrecisionConfig is not exported by that module). Pass the W and X scale keywords directly here; otherwise the test raises before validating the fix.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants