Skip to content

Conversation

xmfan
Copy link
Member

@xmfan xmfan commented Oct 16, 2025

Stacked PRs:


This PR changes how we compile MoE layers to work around Compile + AC limitations. When you AC(Compile(block)) or Compile(AC(block)) and there is a graph break in block, we fall back the entire block to eager. For llama3, we've worked around this problem by addressing all graph breaks. With MoE models particularly dp2ep, we need to wrapFSDP(block.moe.experts), meaning that we will have graph breaks when tracing block.moe.experts.__call__, meaning that whenever AC was enabled, the entire block for MoE would fallback to eager: https://gist.github.com/xmfan/50f4de1e89d789cd63a21aca9e600132 (Note in the tlparse, graph 0/1 is empty and it corresponds to the block containing the MoE).

The workaround in this PR is to avoid tracing block.moe.experts.__call__. This is done by individually wrapping torch.compile on submodules of TransformerBlock. Note that we are leaving some perf on the table as this might exclude some ops in TransformerBlock.forward and MoE.forward. This is an API limitation, as we have no way to acquire those ops while decoupling the wrapper from model code. This workaround will no longer be necessity when either:

  • We can do Compile + AC with graph breaks
  • We remove the FSDP graph break

This change introduces a small regression to the non-AC configuration. You can see a small perf dip from before this PR and after this PR. Given that AC is a necessity to run non-toy configurations of these models, I chose to stick to this implementation to make comparisons easier.

Validated on DSv3 debug model:

xmfan added a commit that referenced this pull request Oct 16, 2025
…perts) graph break

stack-info: PR: #1895, branch: xmfan/stack/2
@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Oct 16, 2025
xmfan added a commit that referenced this pull request Oct 16, 2025
…perts) graph break

stack-info: PR: #1895, branch: xmfan/stack/2
xmfan added a commit that referenced this pull request Oct 16, 2025
…perts) graph break

stack-info: PR: #1895, branch: xmfan/stack/2
…perts) graph break

stack-info: PR: #1895, branch: xmfan/stack/2
@xmfan xmfan marked this pull request as ready for review October 20, 2025 21:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant