-
Notifications
You must be signed in to change notification settings - Fork 243
Open
Description
Describe the bug
Olive GptqQuantizer pass fails with RuntimeError: The size of tensor a (32) must match the size of tensor b (96) at non-singleton dimension 3
To Reproduce
Follow Phi 3.5 mini example: https://github.com/microsoft/Olive/tree/main/examples/phi3_5
Expected behavior
olive run --config qnn_config.json
should complete successfully.
Olive config
Phi 3.5 mini example config: https://github.com/microsoft/Olive/blob/main/examples/phi3_5/qnn_config.json
Olive logs
...
[2025-08-04 19:30:21,431] [INFO] [engine.py:686:_run_pass] Running pass g:gptqquantizer
WARNING - AutoGPTQ has stopped development. Please transition to GPTQModel: https://github.com/ModelCoud/GPTQModel
GPTQModel has been merged into Transformers/Optimum and full deprecation of AutoGPTQ within HF frameworks is planned in the near-future.
/venv-quant/lib/python3.12/site-packages/auto_gptq/nn_modules/triton_utils/kernels.py:410: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.
@custom_fwd
/venv-quant/lib/python3.12/site-packages/auto_gptq/nn_modules/triton_utils/kernels.py:418: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.
@custom_bwd
/venv-quant/lib/python3.12/site-packages/auto_gptq/nn_modules/triton_utils/kernels.py:461: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.
@custom_fwd(cast_inputs=torch.float16)
CUDA extension not installed.
CUDA extension not installed.
Loading checkpoint shards: 100%|██████████| 2/2 [00:00<00:00, 160.16it/s]
INFO - Start quantizing layer 1/32
INFO - Quantizing self_attn.qkv_proj in layer 1/32...
INFO - Quantizing self_attn.o_proj in layer 1/32...
INFO - Quantizing mlp.gate_up_proj in layer 1/32...
INFO - Quantizing mlp.down_proj in layer 1/32...
INFO - Start quantizing layer 2/32
[2025-08-04 19:30:30,591] [ERROR] [engine.py:755:_run_pass] Pass run failed.
Traceback (most recent call last):
File "/venv-quant/lib/python3.12/site-packages/olive/engine/engine.py", line 743, in _run_pass
output_model_config = host.run_pass(p, input_model_config, output_model_path)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/venv-quant/lib/python3.12/site-packages/olive/systems/local.py", line 29, in run_pass
output_model = the_pass.run(model, output_model_path)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/venv-quant/lib/python3.12/site-packages/olive/passes/olive_pass.py", line 242, in run
output_model = self._run_for_config(model, self.config, output_model_path)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/venv-quant/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/venv-quant/lib/python3.12/site-packages/olive/passes/pytorch/autogptq.py", line 175, in _run_for_config
quantized_model.quantize(dataset)
File "/venv-quant/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/venv-quant/lib/python3.12/site-packages/auto_gptq/modeling/_base.py", line 334, in quantize
layer(*layer_input, **additional_layer_inputs)
File "/venv-quant/lib/python3.12/site-packages/transformers/modeling_layers.py", line 94, in __call__
return super().__call__(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/venv-quant/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/venv-quant/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/venv-quant/lib/python3.12/site-packages/transformers/models/phi3/modeling_phi3.py", line 260, in forward
hidden_states, self_attn_weights = self.self_attn(
^^^^^^^^^^^^^^^
File "/venv-quant/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/venv-quant/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/venv-quant/lib/python3.12/site-packages/transformers/models/phi3/modeling_phi3.py", line 185, in forward
query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/venv-quant/lib/python3.12/site-packages/transformers/models/phi3/modeling_phi3.py", line 139, in apply_rotary_pos_emb
q_embed = torch.cat([(q_rot * cos) + (rotate_half(q_rot) * sin), q_pass], dim=-1)
~~~~~~^~~~~
RuntimeError: The size of tensor a (32) must match the size of tensor b (96) at non-singleton dimension 3
[2025-08-04 19:30:30,593] [WARNING] [engine.py:318:run_accelerator] Failed to run Olive on npu-qnn.
Traceback (most recent call last):
File "/venv-quant/lib/python3.12/site-packages/olive/engine/engine.py", line 314, in run_accelerator
output_footprint = self._run_no_search(input_model_config, input_model_id, accelerator_spec, output_dir)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/venv-quant/lib/python3.12/site-packages/olive/engine/engine.py", line 358, in _run_no_search
should_prune, signal, model_ids = self._run_passes(input_model_config, input_model_id, accelerator_spec)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/venv-quant/lib/python3.12/site-packages/olive/engine/engine.py", line 642, in _run_passes
model_config, model_id = self._run_pass(
^^^^^^^^^^^^^^^
File "/venv-quant/lib/python3.12/site-packages/olive/engine/engine.py", line 743, in _run_pass
output_model_config = host.run_pass(p, input_model_config, output_model_path)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/venv-quant/lib/python3.12/site-packages/olive/systems/local.py", line 29, in run_pass
output_model = the_pass.run(model, output_model_path)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/venv-quant/lib/python3.12/site-packages/olive/passes/olive_pass.py", line 242, in run
output_model = self._run_for_config(model, self.config, output_model_path)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/venv-quant/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/venv-quant/lib/python3.12/site-packages/olive/passes/pytorch/autogptq.py", line 175, in _run_for_config
quantized_model.quantize(dataset)
File "/venv-quant/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/venv-quant/lib/python3.12/site-packages/auto_gptq/modeling/_base.py", line 334, in quantize
layer(*layer_input, **additional_layer_inputs)
File "/venv-quant/lib/python3.12/site-packages/transformers/modeling_layers.py", line 94, in __call__
return super().__call__(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/venv-quant/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/venv-quant/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/venv-quant/lib/python3.12/site-packages/transformers/models/phi3/modeling_phi3.py", line 260, in forward
hidden_states, self_attn_weights = self.self_attn(
^^^^^^^^^^^^^^^
File "/venv-quant/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/venv-quant/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/venv-quant/lib/python3.12/site-packages/transformers/models/phi3/modeling_phi3.py", line 185, in forward
query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/venv-quant/lib/python3.12/site-packages/transformers/models/phi3/modeling_phi3.py", line 139, in apply_rotary_pos_emb
q_embed = torch.cat([(q_rot * cos) + (rotate_half(q_rot) * sin), q_pass], dim=-1)
~~~~~~^~~~~
RuntimeError: The size of tensor a (32) must match the size of tensor b (96) at non-singleton dimension 3
Other information
- OS: Ubuntu 22.04.4 LTS
- Olive version: main - (70b5beb)
- Transformers package version:
transformers==4.54.1
Additional context
Downgrading to transformers==4.53.*
avoids the error.
Metadata
Metadata
Assignees
Labels
No labels