Conversation
eb0af94 to
54ce736
Compare
evgenyrp
left a comment
There was a problem hiding this comment.
Great improvement!
Let's clarify the CUDA installation. Can we continue using the toolchain?
| fetch: | ||
| - cudnn | ||
| toolchain: | ||
| - cuda-toolkit |
There was a problem hiding this comment.
A different version of cuda/cudnn can be added in fetch and toolchain
There was a problem hiding this comment.
I was not able to make the new Tensorflow version work with the toolchain that Taskcluster fetches. Tensorflow now it is using the CUDA pip packages like torch has been doing for some time. I'm not sure if I was not able to make it work because Tensorflow was not compatible with our toolchain version or just because it was not looking at the LD_LIBRARY_PATH. The instructions do not mention anything about optional cuda installation, so maybe the pip install tensorflow (without the cuda) just installs cpu version and cannot be linked to existing CUDA installation because it does not have symlinks.
There was a problem hiding this comment.
If it does turn out that this causes a CUDA installation, and it uses a different CUDA than we currently have in toolchains, @evgenyrp is correct that we can add another fetch+toolchain task to grab that and use it here.
There was a problem hiding this comment.
@ZJaume AI tells me it might be possible, but I know nothing about this specific version of Tensorflow. I guess let's double-check that there's no way to install it the old way by adding another version to the toolchain. If not, let's just do the pip install. It will impact CI costs (not sure how much). In prod Bicleaner takes days anyway, so it doesn't make a difference.
Benchmarks with one GPU on a interactive task (en-bs model and en-xx-large model)
We can squeeze a bit more speed with bigger batches but I guess we prefer to be on the safer side regarding possible OOM issues. I tested 1024 and 2048 batch sizes and does not crash, so 512 should be pretty safe.