[phyai] nvfp4 linear

## Summary

Add support for NVFP4-quantized `Linear` layers in `phyai`.

## Motivation

NVFP4 is NVIDIA’s FP4 format for Blackwell Tensor Cores, targeting efficient low-precision inference with 4-bit values, 16-element block scaling, and FP8 scale factors. Since LLM inference is heavily dominated by `Linear` / GEMM workloads, supporting NVFP4 linear layers can reduce memory bandwidth and improve throughput on compatible NVIDIA Blackwell hardware.

## References

- https://developer.nvidia.com/blog/introducing-nvfp4-for-efficient-and-accurate-low-precision-inference/
- https://docs.pytorch.org/TensorRT/user_guide/shapes_precision/quantization.html
- https://docs.pytorch.org/ao/stable/api_reference/generated/torchao.prototype.mx_formats.NVFP4DynamicActivationNVFP4WeightConfig.html


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[phyai] nvfp4 linear #17

Summary

Motivation

References

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[phyai] nvfp4 linear #17

Description

Summary

Motivation

References

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions