-
Notifications
You must be signed in to change notification settings - Fork 662
[PD Disaggregation] support different tp_size for prefill and decode #5296
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thanks for your contribution! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 12 out of 13 changed files in this pull request and generated 13 comments.
You can also share your feedback on Copilot code review for a chance to win a $100 gift card. Take the survey.
fastdeploy/cache_manager/transfer_factory/kvcache_transfer/src/pybind.cpp
Outdated
Show resolved
Hide resolved
fastdeploy/cache_manager/transfer_factory/kvcache_transfer/src/kvcache_rdma.cpp
Show resolved
Hide resolved
ddchenhao66
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## develop #5296 +/- ##
==========================================
Coverage ? 59.63%
==========================================
Files ? 324
Lines ? 39629
Branches ? 5961
==========================================
Hits ? 23631
Misses ? 14122
Partials ? 1876
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| ) | ||
|
|
||
| def connect(self, ip, port): | ||
| def connect(self, ip, port, tp_size): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里是不是加个默认值=0
Motivation
pd 分离下 pd 支持不同的 tp_size
Modifications
Usage or Command
不变
Accuracy Tests
参考单侧
Checklist
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.