[WIP] Fused RMSNorm implementation #2205

guangyey · 2025-10-22T09:46:19Z

Motivation

Fix #1905.
Refer to pytorch/pytorch#153666, add fused RMSNorm support on XPU.

Copilot

Pull Request Overview

This PR adds fused RMSNorm (Root Mean Square Normalization) support for XPU devices to match PyTorch's recent implementation. RMSNorm is a simpler normalization technique compared to LayerNorm that eliminates the mean centering step.

Key Changes:

Adds forward and backward RMSNorm kernel registrations in the native functions YAML
Refactors existing LayerNorm kernels to support both LayerNorm and RMSNorm via a template parameter
Implements RMSNorm-specific computation paths that skip mean centering

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File	Description
yaml/native/native_functions.yaml	Registers `_fused_rms_norm` and `_fused_rms_norm_backward` functions with XPU dispatch
src/ATen/native/xpu/sycl/LayerNormKernels.cpp	Adds `rms_norm` template parameter to kernel functors and implements RMSNorm computation logic

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

Copilot · 2025-10-22T09:47:06Z

src/ATen/native/xpu/sycl/LayerNormKernels.cpp

+        mean_[i] = m1;
+        rstd_[i] = c10::xpu::compat::rsqrt(m2 + eps_);
+      } else {
+        rstd_[i] = c10::xpu::compat::rsqrt(m2 + m1 * m1 + eps_);


The RMSNorm formula appears incorrect. For RMSNorm, m1 (mean) should be zero since we skip mean computation. The formula should be rsqrt(m2 + eps_) where m2 represents the mean of squares. The term m1 * m1 should not be added.

Suggested change

rstd_[i] = c10::xpu::compat::rsqrt(m2 + m1 * m1 + eps_);

rstd_[i] = c10::xpu::compat::rsqrt(m2 + eps_);

Copilot · 2025-10-22T09:47:07Z

src/ATen/native/xpu/sycl/LayerNormKernels.cpp

+                static_cast<T_ACC>(rstd_[i]) * gamma_v +
+            beta_v;
+      } else {
+        Y_[index] = (static_cast<T_ACC>(X_[index])) *


[nitpick] Remove the unnecessary inner parentheses around static_cast<T_ACC>(X_[index]). The expression can be simplified to Y_[index] = static_cast<T_ACC>(X_[index]) * static_cast<T_ACC>(rstd_[i]) * gamma_v;

Suggested change

Y_[index] = (static_cast<T_ACC>(X_[index])) *

Y_[index] = static_cast<T_ACC>(X_[index]) *

Copilot · 2025-10-22T09:47:07Z

src/ATen/native/xpu/sycl/LayerNormKernels.cpp

+    U new_mean = curr_sum.mean + delta * (1.f / new_count);
+    return {new_mean, curr_sum.sigma2 + delta * (val - new_mean), new_count};
+  } else {
+    return {0.f, curr_sum.sigma2 + val * val, 0};


The return statement uses integer literal 0 for the mean and count fields, but the struct fields are of type float. For consistency and clarity, use 0.f for all three fields: return {0.f, curr_sum.sigma2 + val * val, 0.f};

Suggested change

return {0.f, curr_sum.sigma2 + val * val, 0};

return {0.f, curr_sum.sigma2 + val * val, 0.f};

Copilot AI review requested due to automatic review settings October 22, 2025 09:46

guangyey marked this pull request as draft October 22, 2025 09:46

guangyey changed the title ~~Fused RMSNorm implementation~~ [WIP] Fused RMSNorm implementation Oct 22, 2025

Copilot AI reviewed Oct 22, 2025

View reviewed changes

Fused RMSNorm implementation

27cf9d1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[WIP] Fused RMSNorm implementation #2205

[WIP] Fused RMSNorm implementation #2205

Uh oh!

guangyey commented Oct 22, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Oct 22, 2025

Uh oh!

Copilot AI Oct 22, 2025

Uh oh!

Copilot AI Oct 22, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	rstd_[i] = c10::xpu::compat::rsqrt(m2 + m1 * m1 + eps_);
	rstd_[i] = c10::xpu::compat::rsqrt(m2 + eps_);

	Y_[index] = (static_cast<T_ACC>(X_[index])) *
	Y_[index] = static_cast<T_ACC>(X_[index]) *

	return {0.f, curr_sum.sigma2 + val * val, 0};
	return {0.f, curr_sum.sigma2 + val * val, 0.f};

[WIP] Fused RMSNorm implementation #2205

Are you sure you want to change the base?

[WIP] Fused RMSNorm implementation #2205

Uh oh!

Conversation

guangyey commented Oct 22, 2025

Motivation

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Oct 22, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 22, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Oct 22, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants