Skip to content

[phyai] nvfp4 linear #17

@chenghuaWang

Description

@chenghuaWang

Summary

Add support for NVFP4-quantized Linear layers in phyai.

Motivation

NVFP4 is NVIDIA’s FP4 format for Blackwell Tensor Cores, targeting efficient low-precision inference with 4-bit values, 16-element block scaling, and FP8 scale factors. Since LLM inference is heavily dominated by Linear / GEMM workloads, supporting NVFP4 linear layers can reduce memory bandwidth and improve throughput on compatible NVIDIA Blackwell hardware.

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    help wantedExtra attention is needed
    No fields configured for Feature.

    Projects

    Status
    Todo

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions