-
-
Notifications
You must be signed in to change notification settings - Fork 10.7k
[Bugfix] fixes the decoding metadata of dense mla's fp8 kvcache. #27144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bugfix] fixes the decoding metadata of dense mla's fp8 kvcache. #27144
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request fixes an issue with decoding metadata for dense MLA's FP8 K/V cache by introducing a specialized operator. The changes in the Python code correctly route the execution to this new operator when appropriate. However, there is a critical issue in the CMake configuration where the flashmla
dependency is pointed to a personal fork. This practice introduces significant risks and should be rectified by merging the required changes into the official upstream repository and updating the commit hash accordingly.
GIT_REPOSITORY https://github.com/sighingnow/FlashMLA | ||
GIT_TAG 7af725e6c2a3f0262e5b8573c715411a6d895cae |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pointing the GIT_REPOSITORY
to a personal fork (sighingnow/FlashMLA
) introduces a significant dependency risk. For project stability, security, and long-term maintainability, dependencies should point to official repositories. The required changes should be merged into the official vllm-project/FlashMLA
repository first. Afterward, this pull request can be updated to use the new commit hash from the official repository.
GIT_REPOSITORY https://github.com/vllm-project/FlashMLA
GIT_TAG <new_commit_hash_from_official_repo>
Signed-off-by: Tao He <[email protected]>
30a5293
to
aa99183
Compare
@LucasWilkinson could you please take a look? Thanks! |
Signed-off-by: Lucas Wilkinson <[email protected]>
Signed-off-by: Lucas Wilkinson <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM; will run evals and post here
edit: gsm8k looks good: DeepSeek-V2-Lite-Chat with fp8 kv-cache, 64.6%
…m-project#27144) Signed-off-by: Tao He <[email protected]> Signed-off-by: Lucas Wilkinson <[email protected]> Co-authored-by: Lucas Wilkinson <[email protected]>
Require the flashmla patch vllm-project/FlashMLA#7 to be landed first.