-
-
Notifications
You must be signed in to change notification settings - Fork 10.7k
[torch.compile] Enable silu_mul_fp8_quant fusion without custom ops enabled #27146
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: zjy0516 <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request enables the silu_mul_fp8_quant
fusion pass to work even when the silu_and_mul
custom operator is not enabled, by matching against the native PyTorch implementation. This is achieved by introducing a MatcherSiluAndMul
utility that can trace either the custom op or the native implementation. The changes are well-structured and the tests have been updated to cover both scenarios. My review found a minor issue in the test suite where TestSiluMulNvfp4QuantModel
is not correctly handled by the new test parameterization, which would cause test failures. I've provided a suggestion to fix this by adding appropriate skip conditions.
Signed-off-by: zjy0516 <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! For tests, could you only generate relevant tests, and then skip based on support (right now it's a little bit mixed up)
Signed-off-by: zjy0516 <[email protected]>
Signed-off-by: zjy0516 <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work! Could you post some E2E perf and accuracy numbers? And would you be interested in adding dynamic quant support as a follow-up?
Signed-off-by: zjy0516 <[email protected]>
Do you know which model use
Sure. |
silu_mul is used by basically all models. fp8 quant is used by the -FP8 quantized models. For example you can use |
Signed-off-by: zjy0516 <[email protected]>
Signed-off-by: zjy0516 <[email protected]>
Purpose
Based on #24604, modified activation fusion pass to do op matching w/o needing to enable the custom op.
Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.md
andexamples
for a new model.