[MoE][compile][full ac] weave torch.compile around the FSDP(GroupedExperts) graph break #1895

xmfan · 2025-10-16T00:53:34Z

Stacked PRs:

->[MoE][compile][full ac] weave torch.compile around the FSDP(GroupedExperts) graph break #1895

This PR changes how we compile MoE layers to work around Compile + AC limitations. When you AC(Compile(block)) or Compile(AC(block)) and there is a graph break in block, we fall back the entire block to eager. For llama3, we've worked around this problem by addressing all graph breaks. With MoE models particularly dp2ep, we need to wrapFSDP(block.moe.experts), meaning that we will have graph breaks when tracing block.moe.experts.__call__, meaning that whenever AC was enabled, the entire block for MoE would fallback to eager: https://gist.github.com/xmfan/50f4de1e89d789cd63a21aca9e600132 (Note in the tlparse, graph 0/1 is empty and it corresponds to the block containing the MoE).

The workaround in this PR is to avoid tracing block.moe.experts.__call__. This is done by individually wrapping torch.compile on submodules of TransformerBlock. Note that we are leaving some perf on the table as this might exclude some ops in TransformerBlock.forward and MoE.forward. This is an API limitation, as we have no way to acquire those ops while decoupling the wrapper from model code. This workaround will no longer be necessity when either:

We can do Compile + AC with graph breaks
We remove the FSDP graph break

This change introduces a small regression to the non-AC configuration. You can see a small perf dip from before this PR and after this PR. Given that AC is a necessity to run non-toy configurations of these models, I chose to stick to this implementation to make comparisons easier.

Validated on DSv3 debug model:

dp2ep, no AC, no compile: https://gist.github.com/xmfan/927f354158ad36f4c5c1ffedde4e4ebe
dp2ep, no AC, compile: https://gist.github.com/xmfan/11561b5406b3f92ecd08da94bc5ee4e3
- before this PR (compile w/ nested graph break): https://gist.github.com/xmfan/0b32e95980d263cf3f62869fa4d85921
dp2ep, full AC, compile: https://gist.github.com/xmfan/6ed5b48aa51ce0ac2b6bfceb86a0c482
- before this PR (whole moe block in eager): https://gist.github.com/xmfan/50f4de1e89d789cd63a21aca9e600132
dp2ep, full AC, no compile: https://gist.github.com/xmfan/2308355c2aa4814fe3d12243445555fa
dp2ep, pp, full AC, compile: https://gist.github.com/xmfan/5a1ac23f00abdf93dbcc1539f552e840
dp2ep, pp, full AC, no compile: https://gist.github.com/xmfan/302cda7191e53ffad5c4dc1e4b8f02de

…perts) graph break stack-info: PR: #1895, branch: xmfan/stack/2

xmfan added a commit that referenced this pull request Oct 16, 2025

[MoE][compile][full ac] weave torch.compile around the FSDP(GroupedEx…

ab78bc9

…perts) graph break stack-info: PR: #1895, branch: xmfan/stack/2

xmfan force-pushed the xmfan/stack/2 branch from b9efec7 to ab78bc9 Compare October 16, 2025 00:53

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Oct 16, 2025

xmfan added a commit that referenced this pull request Oct 16, 2025

[MoE][compile][full ac] weave torch.compile around the FSDP(GroupedEx…

9030875

…perts) graph break stack-info: PR: #1895, branch: xmfan/stack/2

xmfan force-pushed the xmfan/stack/2 branch from ab78bc9 to 9030875 Compare October 16, 2025 02:59

xmfan added a commit that referenced this pull request Oct 16, 2025

[MoE][compile][full ac] weave torch.compile around the FSDP(GroupedEx…

86a0ff3

…perts) graph break stack-info: PR: #1895, branch: xmfan/stack/2

xmfan force-pushed the xmfan/stack/2 branch from 9030875 to 86a0ff3 Compare October 16, 2025 16:12

[MoE][compile][full ac] weave torch.compile around the FSDP(GroupedEx…

3a5417a

…perts) graph break stack-info: PR: #1895, branch: xmfan/stack/2

xmfan force-pushed the xmfan/stack/2 branch from 86a0ff3 to 3a5417a Compare October 16, 2025 16:15

xmfan marked this pull request as ready for review October 20, 2025 21:58

xmfan requested review from fegin, tianyu-l, wconstab and wwwjn as code owners October 20, 2025 21:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[MoE][compile][full ac] weave torch.compile around the FSDP(GroupedExperts) graph break #1895

[MoE][compile][full ac] weave torch.compile around the FSDP(GroupedExperts) graph break #1895

Uh oh!

xmfan commented Oct 16, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

[MoE][compile][full ac] weave torch.compile around the FSDP(GroupedExperts) graph break #1895

Are you sure you want to change the base?

[MoE][compile][full ac] weave torch.compile around the FSDP(GroupedExperts) graph break #1895

Uh oh!

Conversation

xmfan commented Oct 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

xmfan commented Oct 16, 2025 •

edited

Loading