Skip to content

Conversation

ZJY0516
Copy link
Contributor

@ZJY0516 ZJY0516 commented Oct 18, 2025

Purpose

Based on #24604, modified activation fusion pass to do op matching w/o needing to enable the custom op.

Test Plan

pytest -s tests/compile/test_silu_mul_quant_fusion.py

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: zjy0516 <[email protected]>
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enables the silu_mul_fp8_quant fusion pass to work even when the silu_and_mul custom operator is not enabled, by matching against the native PyTorch implementation. This is achieved by introducing a MatcherSiluAndMul utility that can trace either the custom op or the native implementation. The changes are well-structured and the tests have been updated to cover both scenarios. My review found a minor issue in the test suite where TestSiluMulNvfp4QuantModel is not correctly handled by the new test parameterization, which would cause test failures. I've provided a suggestion to fix this by adding appropriate skip conditions.

Signed-off-by: zjy0516 <[email protected]>
Copy link
Collaborator

@ProExpertProg ProExpertProg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! For tests, could you only generate relevant tests, and then skip based on support (right now it's a little bit mixed up)

Signed-off-by: zjy0516 <[email protected]>
Signed-off-by: zjy0516 <[email protected]>
@ZJY0516 ZJY0516 requested a review from ProExpertProg October 19, 2025 06:22
Copy link
Collaborator

@ProExpertProg ProExpertProg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! Could you post some E2E perf and accuracy numbers? And would you be interested in adding dynamic quant support as a follow-up?

@ProExpertProg ProExpertProg added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 20, 2025
@ZJY0516
Copy link
Contributor Author

ZJY0516 commented Oct 20, 2025

Great work! Could you post some E2E perf and accuracy numbers?

Do you know which model use silu_mul and fp8 quant?

And would you be interested in adding dynamic quant support as a follow-up?

Sure.

@ProExpertProg
Copy link
Collaborator

Do you know which model use silu_mul and fp8 quant?

silu_mul is used by basically all models. fp8 quant is used by the -FP8 quantized models. For example you can use redhatai/meta-llama3.1-8b-instruct-fp8 or redhatai/meta-llama3.1-70B-instruct-fp8

Signed-off-by: zjy0516 <[email protected]>
Signed-off-by: zjy0516 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants