[CPU] Improve INT8 SDPA template #3230

Xia-Weiwen · 2025-10-23T02:52:27Z

It brings about 1% E2E improvement when running int8 VIT on 4 cores.

pytorch-bot · 2025-10-23T02:52:30Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3230

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 64eae18 with merge base f3fc5e7 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Valentine233 · 2025-10-23T05:07:55Z

torchao/prototype/inductor/codegen/cpp_int8_sdpa_template.py

-        auto tmp2 = tmp1.round();
-        auto tmp3 = tmp2 + vec_beta1;
+        auto tmp1 = at::vec::fmadd(tmp0, vec_sum_scale, vec_beta1);
+        auto tmp3 = tmp1.round();


Could we also apply the optimization to the below masked vectorization part?

Updated. Thanks.

Valentine233 · 2025-10-23T05:08:10Z

torchao/prototype/inductor/codegen/cpp_int8_sdpa_template.py

-      auto tmp6 = tmp5.round();
-      auto tmp7 = tmp6 + vec_beta2;
+      auto tmp5 = at::vec::fmadd(tmp4, vec_alpha, vec_beta2);
+      auto tmp7 = tmp5.round();


Updated. Thanks.

Valentine233 · 2025-10-23T05:08:24Z

torchao/prototype/inductor/codegen/cpp_int8_sdpa_template.py

-      auto tmp6 = tmp5.round();
-      auto tmp7 = tmp6 + vec_beta2;
+      auto tmp5 = at::vec::fmadd(tmp4, vec_alpha, vec_beta2);
+      auto tmp7 = tmp5.round();


Updated. Thanks.

[CPU] Improve INT8 SDPA template

6c7a03a

Xia-Weiwen requested a review from Valentine233 October 23, 2025 02:52

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 23, 2025

Xia-Weiwen added the topic: not user facing Use this tag if you don't want this PR to show up in release notes label Oct 23, 2025

Valentine233 reviewed Oct 23, 2025

View reviewed changes

Update tail

64eae18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[CPU] Improve INT8 SDPA template #3230

[CPU] Improve INT8 SDPA template #3230

Uh oh!

Xia-Weiwen commented Oct 23, 2025

Uh oh!

pytorch-bot bot commented Oct 23, 2025 •

edited

Loading

Uh oh!

Valentine233 Oct 23, 2025

Uh oh!

Xia-Weiwen Oct 23, 2025

Uh oh!

Valentine233 Oct 23, 2025

Uh oh!

Xia-Weiwen Oct 23, 2025

Uh oh!

Valentine233 Oct 23, 2025

Uh oh!

Xia-Weiwen Oct 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[CPU] Improve INT8 SDPA template #3230

Are you sure you want to change the base?

[CPU] Improve INT8 SDPA template #3230

Uh oh!

Conversation

Xia-Weiwen commented Oct 23, 2025

Uh oh!

pytorch-bot bot commented Oct 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/3230

✅ No Failures

Uh oh!

Valentine233 Oct 23, 2025

Choose a reason for hiding this comment

Uh oh!

Xia-Weiwen Oct 23, 2025

Choose a reason for hiding this comment

Uh oh!

Valentine233 Oct 23, 2025

Choose a reason for hiding this comment

Uh oh!

Xia-Weiwen Oct 23, 2025

Choose a reason for hiding this comment

Uh oh!

Valentine233 Oct 23, 2025

Choose a reason for hiding this comment

Uh oh!

Xia-Weiwen Oct 23, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pytorch-bot bot commented Oct 23, 2025 •

edited

Loading