Summary
Add support for MXFP4-quantized Linear layers in phyai.
Motivation
MXFP4 is an OCP microscaling FP4 format that stores 4-bit floating-point values with block-wise shared scaling. It can significantly reduce memory footprint for large model inference while preserving better numerical behavior than simple INT4-style quantization.
Supporting MXFP4 linear layers would let phyai run models or checkpoints that are distributed with MXFP4 weights, and provide a hardware-agnostic FP4 path where runtime support is available.
References
Summary
Add support for MXFP4-quantized
Linearlayers inphyai.Motivation
MXFP4 is an OCP microscaling FP4 format that stores 4-bit floating-point values with block-wise shared scaling. It can significantly reduce memory footprint for large model inference while preserving better numerical behavior than simple INT4-style quantization.
Supporting MXFP4 linear layers would let
phyairun models or checkpoints that are distributed with MXFP4 weights, and provide a hardware-agnostic FP4 path where runtime support is available.References