sweeps and train on hip/rocm/amdgpu, for 5.0#573
Conversation
|
Willing to merge if greatly cleaned up. There's a lot of random sweeps etc stuff here not related to AMD support and env changes as well. Also would have to have at least decent enough perf on AMD to be worth supporting |
|
I've apparently messed up a rebase when moving this from 4.0 to 5.0 branch, so there have been some wrong changes [which has not affected my test sweep/train on breakout]. Sorry for that. About the stuff related to sweeps in this PR - there has been a problem with sweep_obj [used for early stopping] being passed between processes when protein sweep is on amd gpu - torch has special handling for this on cuda, but not rocm] - this PR changes how info related to early stopping is passed around. It has decent performance on a consumer amd gpu [RX 6850M XT], but I have not tested it on anything else. |
6b59ae4 to
d6272f2
Compare
|
I've cleaned it up as much as I've been able to. Tested with a long sweep on an amd gpu. |
|
Tested with nvidia gpu too [thanks to ViolentGlizzy ]. - performance on my amd gpu is 2-3 times worse than on a supposedly equivalent nvidia gpu, but not worse than pufferlib 3.0 performance on amd gpu. |
|
rocblas tensile tuning of gemm operations for my gpu seems to have improved performance 2x, but it is an unreasonably messy process. |
Alternative to #562 for the 5.0 branch,
tested only a short breakout sweep and eval on an amd gpu.made with a lot of help from codex gpt-5.5.