-
-
Couldn't load subscription status.
- Fork 10.8k
Early exit for MoE LoRA kernels #27131
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
This pull request has merge conflicts that must be resolved before it can be |
|
Documentation preview: https://vllm--27131.org.readthedocs.build/en/27131/ |
48fd6ad to
7760556
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
Signed-off-by: gnovack <[email protected]>
7760556 to
8e77c65
Compare
|
Thank you, can we add the related tests to verify if the outputs of lora and non-lora conform to expectations? We can place these tests in https://github.com/vllm-project/vllm/blob/main/tests/lora/test_olmoe_tp.py |
Purpose
This PR adds early exit logic the
moe_lora_align_sum_kerneland_fused_moe_lora_kernelkernels. This is to handle the case where LoRA adapters are active, but they do not include weights for MoE layers (e.g. attention only adapters).Test Plan
Test Result
Serve Command
Benchmark Command
Benchmark Results (before)
Benchmark Results (after)
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.