Skip to content

Conversation

sighingnow
Copy link
Collaborator

Require the flashmla patch vllm-project/FlashMLA#7 to be landed first.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request fixes an issue with decoding metadata for dense MLA's FP8 K/V cache by introducing a specialized operator. The changes in the Python code correctly route the execution to this new operator when appropriate. However, there is a critical issue in the CMake configuration where the flashmla dependency is pointed to a personal fork. This practice introduces significant risks and should be rectified by merging the required changes into the official upstream repository and updating the commit hash accordingly.

Comment on lines 21 to 22
GIT_REPOSITORY https://github.com/sighingnow/FlashMLA
GIT_TAG 7af725e6c2a3f0262e5b8573c715411a6d895cae
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

Pointing the GIT_REPOSITORY to a personal fork (sighingnow/FlashMLA) introduces a significant dependency risk. For project stability, security, and long-term maintainability, dependencies should point to official repositories. The required changes should be merged into the official vllm-project/FlashMLA repository first. Afterward, this pull request can be updated to use the new commit hash from the official repository.

        GIT_REPOSITORY https://github.com/vllm-project/FlashMLA
        GIT_TAG <new_commit_hash_from_official_repo>

@sighingnow sighingnow force-pushed the fixes-dense-mla-fp8kv branch from 30a5293 to aa99183 Compare October 19, 2025 14:37
@sighingnow
Copy link
Collaborator Author

@LucasWilkinson could you please take a look? Thanks!

Signed-off-by: Lucas Wilkinson <[email protected]>
Copy link
Collaborator

@LucasWilkinson LucasWilkinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM; will run evals and post here

edit: gsm8k looks good: DeepSeek-V2-Lite-Chat with fp8 kv-cache, 64.6%

@LucasWilkinson LucasWilkinson enabled auto-merge (squash) October 21, 2025 16:17
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 21, 2025
@LucasWilkinson LucasWilkinson merged commit 250fb1b into vllm-project:main Oct 21, 2025
88 checks passed
baonudesifeizhai pushed a commit to baonudesifeizhai/vllm that referenced this pull request Oct 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/build ready ONLY add when PR is ready to merge/full CI is needed v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants