Skip to content

Faster Bicleaner AI#1387

Open
ZJaume wants to merge 3 commits intomainfrom
dev_bcai_fp16
Open

Faster Bicleaner AI#1387
ZJaume wants to merge 3 commits intomainfrom
dev_bcai_fp16

Conversation

@ZJaume
Copy link
Collaborator

@ZJaume ZJaume commented Mar 2, 2026

Benchmarks with one GPU on a interactive task (en-bs model and en-xx-large model)

configuration speed (rows/s)
base fp32 batch32 block10k (previous config) 273
base fp32 batch128 block10k 320
base fp16 batch128 block10k 729
base fp16 batch512 block10k 746
base fp16 batch512 block50k 768
large fp32 batch32 block10k (previous config) 126
large fp16 batch512 block50k 276

We can squeeze a bit more speed with bigger batches but I guess we prefer to be on the safer side regarding possible OOM issues. I tested 1024 and 2048 batch sizes and does not crash, so 512 should be pretty safe.

@ZJaume ZJaume force-pushed the dev_bcai_fp16 branch 2 times, most recently from eb0af94 to 54ce736 Compare March 4, 2026 11:55
@ZJaume ZJaume marked this pull request as ready for review March 4, 2026 15:00
@ZJaume ZJaume requested review from a team as code owners March 4, 2026 15:00
@ZJaume ZJaume requested review from bhearsum and evgenyrp and removed request for bhearsum March 4, 2026 15:00
Copy link
Collaborator

@evgenyrp evgenyrp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great improvement!

Let's clarify the CUDA installation. Can we continue using the toolchain?

fetch:
- cudnn
toolchain:
- cuda-toolkit
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is a common optimization to prevent a lengthy installation of CUDA at runtime, right? @bhearsum Does it not work with the new version of Bicleaner? @ZJaume

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A different version of cuda/cudnn can be added in fetch and toolchain

Copy link
Collaborator Author

@ZJaume ZJaume Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was not able to make the new Tensorflow version work with the toolchain that Taskcluster fetches. Tensorflow now it is using the CUDA pip packages like torch has been doing for some time. I'm not sure if I was not able to make it work because Tensorflow was not compatible with our toolchain version or just because it was not looking at the LD_LIBRARY_PATH. The instructions do not mention anything about optional cuda installation, so maybe the pip install tensorflow (without the cuda) just installs cpu version and cannot be linked to existing CUDA installation because it does not have symlinks.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it does turn out that this causes a CUDA installation, and it uses a different CUDA than we currently have in toolchains, @evgenyrp is correct that we can add another fetch+toolchain task to grab that and use it here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ZJaume AI tells me it might be possible, but I know nothing about this specific version of Tensorflow. I guess let's double-check that there's no way to install it the old way by adding another version to the toolchain. If not, let's just do the pip install. It will impact CI costs (not sure how much). In prod Bicleaner takes days anyway, so it doesn't make a difference.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants