-
Couldn't load subscription status.
- Fork 353
Add NPU (Ascend) backend support for INT4 weight-only quantization workflow #3172
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3172
Note: Links to docs will display an error until the docs builds have been completed. This comment was automatically generated by Dr. CI and updates every 15 minutes. |
| try: | ||
| import torch_npu | ||
| except ImportError: | ||
| torch_npu = None | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PyTorch provide Autoload mechinasm, so we do not need to import it explicitly.
| @unittest.skipIf(torch_npu is None, "torch_npu is not available") | ||
| @unittest.skipIf(not torch_npu.npu.is_available(), "NPU not available") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| @unittest.skipIf(torch_npu is None, "torch_npu is not available") | |
| @unittest.skipIf(not torch_npu.npu.is_available(), "NPU not available") | |
| @unittest.skipIf(torch.accelerator.current_accelerator(True).type == "npu" and torch.accelerator.is_available(), "NPU not available") |
| @unittest.skipIf( | ||
| version.parse(torch_npu.__version__) < version.parse("2.7.1rc1"), | ||
| "Need torch_npu 2.7.1rc1+", | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can remove it because there are some strcit version mapping between PyTorch and Torch_NPU
| ) | ||
|
|
||
| assert int_data.dtype == torch.int32, ( | ||
| f"torch_npu.npu_convert_weight_to_int4pack expects `int32` dtype" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| f"torch_npu.npu_convert_weight_to_int4pack expects `int32` dtype" | |
| f"torch.ops.npu.npu_convert_weight_to_int4pack expects `int32` dtype" |
| ) | ||
|
|
||
| assert int_data.shape[-1] % 8 == 0, ( | ||
| f"torch_npu.npu_convert_weight_to_int4pack expects last dim must be aligned to 8,but got {int_data.shape[-1]}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| f"torch_npu.npu_convert_weight_to_int4pack expects last dim must be aligned to 8,but got {int_data.shape[-1]}" | |
| f"torch.ops.npu.npu_convert_weight_to_int4pack expects last dim must be aligned to 8,but got {int_data.shape[-1]}" |
|
Hi @jcaip @jerryzh168 , please help to review it, thanks! |
| and torch.accelerator.is_available(), | ||
| "NPU not available", | ||
| ) | ||
| class Int4PlainInt32TensorNPU(TestCase): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious, do we need NPUs to test this? I don't think we have any in CI.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @orangeH25 @fffrog!
The code looks good to me, but I'm curious on how to best test this? It looks like we skip tests in CI because we don't have NPU devices. I believe that NPU support was added to TorchTune as well, do you know how they test device specific functionality there?
Also, just a heads up most of the team is at PTC / Open source AI week in SF this week, so we might be a little slow in responding :)
|
please don't include device |
| int4 weight-only quantization on Ascend NPU backend (groupwise quantization only) | ||
|
|
||
| Tensor Attributes: | ||
| qdata: (N, K/8), packed int4 weight, the data type is int32 here with 8*int4, the original dtype can be float16 or bfloat16 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this exactly align with Int4PlainInt32Tensor? if so, please merge with that tensor subclass
|
Hi @jcaip @jerryzh168 ,thanks for the review!
Yes, this case is actually pretty common in open-source projects. A typical approach is to set up a
You mean that we should keep the entry logic in elif int4_packing_format == Int4PackingFormat.PLAIN_INT32:
new_weight = Int4PlainInt32Tensor.from_hp(
weight,
block_size,
)
return new_weightand then handle different backend implementations in the class Int4PlainInt32Tensor(TorchAOBaseTensor):
...
@classmethod
def from_hp(
cls,
w: torch.Tensor,
block_size: List[int],
):
if w.device.type == "xpu":
from_hp_xpu(cls, w, block_size)
elif w.device.type == "npu":
from_hp_npu(cls, w, block_size)
implements = Int4PlainInt32Tensor.implements
implements_torch_function = Int4PlainInt32Tensor.implements_torch_function
@implements(aten.linear.default)
@implements_torch_function(torch.nn.functional.linear)
def _(func, types, args, kwargs):
input_tensor, weight_tensor, bias = (
args[0],
args[1],
args[2] if len(args) > 2 else None,
)
if input_tensor.device.type == "xpu":
return linear_xpu(input_tensor, weight_tensor, bias)
elif input_tensor.device.type == "npu":
return linear_npu(input_tensor, weight_tensor, bias)Did I get that right? Happy to hear any thoughts or suggestions you might have! |
Yes that's correct |
Got it, I will follow this approach, thanks! |
7808297 to
ea2aa7a
Compare
Hi @jerryzh168 @jcaip , I’ve made those changes, please take a look, really appreciate it! |
Related to #3044
Summary
This PR adds NPU (Ascend) backend support for the INT4 weight-only quantization workflow.
It introduces a new tensor subclass,
Int4PlainInt32TensorNPU, aligned with the existingInt4PlainInt32Tensorfor theplain_int32packing format.Environment
Files changed
Modified
torchao/quantization/__init__.pytorchao/quantization/quant_api.pytorchao/quantization/quantize_/workflows/__init__.pyAdded
torchao/quantization/quantize_/workflows/int4/int4_plain_int32_tensor_npu.pytest/quantization/quantize_/workflows/int4/test_int4_plain_int32_tensor_npu.pyImplementation Overview
Int4PlainInt32TensorNPUto enable NPU backend support for INT4 weight-only quantization.quant_api.pyfor dispatch.__init__.pyfiles to ensure proper import and exposure.Test Case
test/quantization/quantize_/workflows/int4/test_int4_plain_int32_tensor_npu.py