Summary
Add support for Marlin INT4 kernels and AWQ/GPTQ quantized linear layers in phyai.
Motivation
Marlin provides efficient FP16xINT4 matrix multiplication kernels for LLM inference. Supporting Marlin with AWQ/GPTQ checkpoints would allow phyai to run common 4-bit quantized models with better memory efficiency and throughput on supported NVIDIA GPUs.
References
Summary
Add support for Marlin INT4 kernels and AWQ/GPTQ quantized linear layers in
phyai.Motivation
Marlin provides efficient FP16xINT4 matrix multiplication kernels for LLM inference. Supporting Marlin with AWQ/GPTQ checkpoints would allow
phyaito run common 4-bit quantized models with better memory efficiency and throughput on supported NVIDIA GPUs.References