Phi-3.5-mini QNN example broken by latest transformers

**Describe the bug**
Olive GptqQuantizer pass fails with `RuntimeError: The size of tensor a (32) must match the size of tensor b (96) at non-singleton dimension 3`

**To Reproduce**
Follow Phi 3.5 mini example: https://github.com/microsoft/Olive/tree/main/examples/phi3_5

**Expected behavior**
`olive run --config qnn_config.json` should complete successfully.

**Olive config**
Phi 3.5 mini example config: https://github.com/microsoft/Olive/blob/main/examples/phi3_5/qnn_config.json

**Olive logs**
```
...
[2025-08-04 19:30:21,431] [INFO] [engine.py:686:_run_pass] Running pass g:gptqquantizer
WARNING - AutoGPTQ has stopped development. Please transition to GPTQModel: https://github.com/ModelCoud/GPTQModel
GPTQModel has been merged into Transformers/Optimum and full deprecation of AutoGPTQ within HF frameworks is planned in the near-future.
/venv-quant/lib/python3.12/site-packages/auto_gptq/nn_modules/triton_utils/kernels.py:410: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.
  @custom_fwd
/venv-quant/lib/python3.12/site-packages/auto_gptq/nn_modules/triton_utils/kernels.py:418: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.
  @custom_bwd
/venv-quant/lib/python3.12/site-packages/auto_gptq/nn_modules/triton_utils/kernels.py:461: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.
  @custom_fwd(cast_inputs=torch.float16)
CUDA extension not installed.
CUDA extension not installed.
Loading checkpoint shards: 100%|██████████| 2/2 [00:00<00:00, 160.16it/s]
INFO - Start quantizing layer 1/32
INFO - Quantizing self_attn.qkv_proj in layer 1/32...
INFO - Quantizing self_attn.o_proj in layer 1/32...
INFO - Quantizing mlp.gate_up_proj in layer 1/32...
INFO - Quantizing mlp.down_proj in layer 1/32...
INFO - Start quantizing layer 2/32
[2025-08-04 19:30:30,591] [ERROR] [engine.py:755:_run_pass] Pass run failed.
Traceback (most recent call last):
  File "/venv-quant/lib/python3.12/site-packages/olive/engine/engine.py", line 743, in _run_pass
    output_model_config = host.run_pass(p, input_model_config, output_model_path)
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/venv-quant/lib/python3.12/site-packages/olive/systems/local.py", line 29, in run_pass
    output_model = the_pass.run(model, output_model_path)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/venv-quant/lib/python3.12/site-packages/olive/passes/olive_pass.py", line 242, in run
    output_model = self._run_for_config(model, self.config, output_model_path)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/venv-quant/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/venv-quant/lib/python3.12/site-packages/olive/passes/pytorch/autogptq.py", line 175, in _run_for_config
    quantized_model.quantize(dataset)
  File "/venv-quant/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/venv-quant/lib/python3.12/site-packages/auto_gptq/modeling/_base.py", line 334, in quantize
    layer(*layer_input, **additional_layer_inputs)
  File "/venv-quant/lib/python3.12/site-packages/transformers/modeling_layers.py", line 94, in __call__
    return super().__call__(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/venv-quant/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/venv-quant/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/venv-quant/lib/python3.12/site-packages/transformers/models/phi3/modeling_phi3.py", line 260, in forward
    hidden_states, self_attn_weights = self.self_attn(
                                       ^^^^^^^^^^^^^^^
  File "/venv-quant/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/venv-quant/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/venv-quant/lib/python3.12/site-packages/transformers/models/phi3/modeling_phi3.py", line 185, in forward
    query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin)
                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/venv-quant/lib/python3.12/site-packages/transformers/models/phi3/modeling_phi3.py", line 139, in apply_rotary_pos_emb
    q_embed = torch.cat([(q_rot * cos) + (rotate_half(q_rot) * sin), q_pass], dim=-1)
                          ~~~~~~^~~~~
RuntimeError: The size of tensor a (32) must match the size of tensor b (96) at non-singleton dimension 3
[2025-08-04 19:30:30,593] [WARNING] [engine.py:318:run_accelerator] Failed to run Olive on npu-qnn.
Traceback (most recent call last):
  File "/venv-quant/lib/python3.12/site-packages/olive/engine/engine.py", line 314, in run_accelerator
    output_footprint = self._run_no_search(input_model_config, input_model_id, accelerator_spec, output_dir)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/venv-quant/lib/python3.12/site-packages/olive/engine/engine.py", line 358, in _run_no_search
    should_prune, signal, model_ids = self._run_passes(input_model_config, input_model_id, accelerator_spec)
                                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/venv-quant/lib/python3.12/site-packages/olive/engine/engine.py", line 642, in _run_passes
    model_config, model_id = self._run_pass(
                             ^^^^^^^^^^^^^^^
  File "/venv-quant/lib/python3.12/site-packages/olive/engine/engine.py", line 743, in _run_pass
    output_model_config = host.run_pass(p, input_model_config, output_model_path)
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/venv-quant/lib/python3.12/site-packages/olive/systems/local.py", line 29, in run_pass
    output_model = the_pass.run(model, output_model_path)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/venv-quant/lib/python3.12/site-packages/olive/passes/olive_pass.py", line 242, in run
    output_model = self._run_for_config(model, self.config, output_model_path)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/venv-quant/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/venv-quant/lib/python3.12/site-packages/olive/passes/pytorch/autogptq.py", line 175, in _run_for_config
    quantized_model.quantize(dataset)
  File "/venv-quant/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/venv-quant/lib/python3.12/site-packages/auto_gptq/modeling/_base.py", line 334, in quantize
    layer(*layer_input, **additional_layer_inputs)
  File "/venv-quant/lib/python3.12/site-packages/transformers/modeling_layers.py", line 94, in __call__
    return super().__call__(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/venv-quant/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/venv-quant/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/venv-quant/lib/python3.12/site-packages/transformers/models/phi3/modeling_phi3.py", line 260, in forward
    hidden_states, self_attn_weights = self.self_attn(
                                       ^^^^^^^^^^^^^^^
  File "/venv-quant/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/venv-quant/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/venv-quant/lib/python3.12/site-packages/transformers/models/phi3/modeling_phi3.py", line 185, in forward
    query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin)
                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/venv-quant/lib/python3.12/site-packages/transformers/models/phi3/modeling_phi3.py", line 139, in apply_rotary_pos_emb
    q_embed = torch.cat([(q_rot * cos) + (rotate_half(q_rot) * sin), q_pass], dim=-1)
                          ~~~~~~^~~~~
RuntimeError: The size of tensor a (32) must match the size of tensor b (96) at non-singleton dimension 3
```

**Other information**
- OS: Ubuntu 22.04.4 LTS
- Olive version: main - (https://github.com/microsoft/Olive/commit/70b5beb631007473f023256b002adab267c390d0)
- Transformers package version: `transformers==4.54.1`

**Additional context**
Downgrading to `transformers==4.53.*` avoids the error.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Phi-3.5-mini QNN example broken by latest transformers #2051

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Phi-3.5-mini QNN example broken by latest transformers #2051

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions