Hi @insujang ,
Thanks for open-sourcing Oobleck, great work!
From the paper, it seems that the experiments show it supports both adding and removing nodes during training.
I successfully ran Oobleck with node failures (removing nodes), but I couldn't find a way to add nodes dynamically during training. Could you let me know how to make it work?
Thank you!
Lam