Fix gathered MXFP4 activation scales in Gluon MoE#534
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ce977a1095
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
ce977a1 to
b2918fe
Compare
When MXFP4 activations are gathered through routed MoE metadata, X block-scale rows must be gathered using the scale tile's own row layout instead of reusing the already-gathered activation data rows. Keep W scales on the swizzled LDS path, but route gathered X scales through the direct load path so the scale indices match the gathered activation rows. This is scoped to MXFP4-X with gather and does not affect the existing FP8 activation path. Signed-off-by: Quinn Dawkins <quinn.dawkins@gmail.com>
Split X and W scale LDS handling fully by removing the aggregate SCALE_VIA_LDS config field, and make SliceMN direct scale loads read the requested subtile from the descriptor instead of assuming an LDS scale buffer exists. Also delay the vendor-neutral quantize_mxfp4 test import until after gfx950 collection gating and run the new gathered-scale test over the existing key token counts. Signed-off-by: Quinn Dawkins <quinn.dawkins@gmail.com>
Replace the gathered activation test's tokenspeed_kernel quantization dependency with a small test-local MXFP4 quantizer. This keeps tokenspeed-kernel-amd tests free of vendor-neutral package imports while preserving the packed e2m1/e8m0 inputs needed by the Gluon MoE path. Signed-off-by: Quinn Dawkins <quinn.dawkins@gmail.com>
bdeb4b5 to
17a1c14
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 17a1c146d5
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| weights.w13_bias, | ||
| a_ragged_metadata=ragged_metadata, | ||
| gather_indx=gather_indx, | ||
| precision_config=precision_config, |
There was a problem hiding this comment.
Pass scales through the AMD test API
On gfx950 this new test never exercises the gathered-scale path: gluon_moe is fused_mxfp_gfx950, whose gluon_mxfp_ragged_matmul requires w_mx_scale and uses x_format="e2m1"/x_mx_scale for MXFP4 activations, while this precision_config keyword is only captured in **extra_kwargs (and PrecisionConfig is not exported by that module). Pass the W and X scale keywords directly here; otherwise the test raises before validating the fix.
Useful? React with 👍 / 👎.
When MXFP4 activations are gathered through routed MoE metadata, X block-scale rows must be gathered using the scale tile's own row layout instead of reusing the already-gathered activation data rows.
Keep W scales on the swizzled LDS path, but route gathered X scales through the direct load path so the scale indices match the gathered activation rows. This is scoped to MXFP4-X with gather and does not affect the existing FP8 activation path.