-
Notifications
You must be signed in to change notification settings - Fork 188
Remove policy offload for async grpo dtensor #1608
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Sadegh Mahdavi <[email protected]>
📝 WalkthroughWalkthroughRemoved a call to Changes
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~2 minutes Suggested labels
Suggested reviewers
Pre-merge checks and finishing touches✅ Passed checks (4 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
💤 Files with no reviewable changes (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@smahdavi4 just curious if you can point to the exceptions you saw? I don't see them in any of my runs. |
|
Here the exception, it only happens after a checkpoint is saved: |
terrykong
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@parthchadha to review
What does this PR do ?
The async GRPO does not need to offload the policy after checkpointing since the training and inference are not colocated. Currently this leads to exception after checkpointing since some tensors are not moved back to cuda.
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.