Skip to content

Phi-MoE attention 使用 QNN LPBQ/W4A16 Conv2D 编译失败,Qwen3 同路径可以成功,想确认是否有 op/shape 限制 #678

@Lucyliu1234

Description

@Lucyliu1234

你好,我在尝试把 Phi-mini-MoE模型的 attention projection 编译成 QNN AOT 图时遇到一个问题。

Qwen3 的 AOT 示例中,attention/MLP linear 可以用 Conv2D + LPBQ W4A16 路径编译成功,例如:

  • Conv2D weight layout: [1, 1, In, Out]
  • quant recipe: LPBQ / w4a16
  • Qwen3 SHA 模式下 q/k/v 按 head 切分后也可以编译

但在 Phi-MoE 模型上,我尝试让 attention 的 q_proj/k_proj/o_proj 也走类似路径时,QNN prepare 失败。
Phi 当前导出的 attention 权重情况:

  • q_proj: 有 weight + scale1 + scale2,类似 LPBQ/int4
  • k_proj: 有 weight + scale1 + scale2,类似 LPBQ/int4
  • o_proj: 有 weight + scale1 + scale2,类似 LPBQ/int4
  • v_proj: 是 W8A16 风格,weight + scale + zero_point

我尝试过两种方式:

  1. 整块 Conv2D LPBQ
    例如:
  • q_proj.weight shape: [1, 1, 4096, 4096]
  • k_proj.weight shape: [1, 1, 4096, 1024]
  • o_proj.weight shape: [1, 1, 4096, 4096]

QNN lower 可以生成 Conv2d_w_blk_exp_scale,但 graph prepare 失败,日志里有:

no properties registered for q::GroupedConv2d_w_scale
Selecting disabled op ... q::pack_4bit_lpbq_weights_2x
Selecting disabled op ... q::pack_4bit_lpbq_scales
Graph prepare failed with err:-1

  1. 仿照 Qwen2/Qwen3 SHA,把 q/k 按 head 切分成小 Conv2D
    例如:
  • q_proj_sha.0 ... q_proj_sha.31
  • k_proj_sha.0 ... k_proj_sha.7

但这次更早失败,日志里出现:

"model.layers.3.self_attn.q_proj_sha.0" generated: could not create op
"model.layers.3.self_attn.q_proj_sha.1" generated: could not create op
...
"model.layers.3.self_attn.k_proj_sha.0" generated: could not create op
...
Received signal11 - SIGSEGV

谢谢!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions