fix: apply QK norm in MultiheadAttention hook methods by orrzohar · Pull Request #85 · Cerebras/modelzoo

orrzohar · 2026-04-15T01:24:29Z

Bug

attention_qk_norm_layer config instantiates self.q_norm and self.k_norm in MultiheadAttention.__init__, but neither is ever called in the forward pass. QK normalization is silently a no-op.

Fix

Apply self.q_norm / self.k_norm inside process_q_before_logits_calc and process_k_before_logits_calc, so Q and K are normalized before the logits matmul when configured.

q_norm and k_norm modules were instantiated from attention_qk_norm_layer config but never invoked in the forward pass. Apply them in process_q_before_logits_calc and process_k_before_logits_calc so that QK normalization actually affects attention logits when configured. Made-with: Cursor

orrzohar · 2026-04-15T01:26:54Z

@bhargav-cerebras

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: apply QK norm in MultiheadAttention hook methods#85

fix: apply QK norm in MultiheadAttention hook methods#85
orrzohar wants to merge 1 commit into
Cerebras:mainfrom
orrzohar:fix/apply-qk-norm-in-attention-hooks

orrzohar commented Apr 15, 2026 •

edited

Loading

Uh oh!

orrzohar commented Apr 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

orrzohar commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Bug

Fix

Uh oh!

orrzohar commented Apr 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

orrzohar commented Apr 15, 2026 •

edited

Loading