- 
                Notifications
    You must be signed in to change notification settings 
- Fork 319
Open
Description
System Info
While starting using docker as below I get error
docker run --gpus all -p 8912:80 -v $volume:/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.2 --model-id $model
I can run cpu only image.
Error: Could not create backend
Caused by:
    Could not start backend: Runtime compute cap 70 is not compatible with compile time compute cap 80
nvidia-smi output is
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  Tesla V100-PCIE-16GB           Off |   00000000:04:01.0 Off |                    0 |
| N/A   30C    P0             38W /  250W |    7695MiB /  16384MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A     63756      C   python                                       2564MiB |
|    0   N/A  N/A     63785      C   python                                       2564MiB |
|    0   N/A  N/A     63813      C   python                                       2564MiB |
+-----------------------------------------------------------------------------------------+
OS
Distributor ID:	Ubuntu
Description:	Ubuntu 22.04.4 LTS
Release:	22.04
Codename:	jammy
Arch
x86_64
I also tried building image with setting cuda_compute_cap to 70
runtime_compute_cap=70
docker build . -f Dockerfile-cuda --build-arg CUDA_COMPUTE_CAP=$runtime_compute_cap
Here I get error as cuda compute cap 70 is not supported
------
 > [builder 2/9] RUN if [ 70 -ge 75 -a 70 -lt 80 ];     then          nvprune --generate-code code=sm_70 /usr/local/cuda/lib64/libcublas_static.a -o /usr/local/cuda/lib64/libcublas_static.a;     elif [ 70 -ge 80 -a 70 -lt 90 ];     then          nvprune --generate-code code=sm_80 --generate-code code=sm_70 /usr/local/cuda/lib64/libcublas_static.a -o /usr/local/cuda/lib64/libcublas_static.a;     elif [ 70 -eq 90 ];     then          nvprune --generate-code code=sm_90 /usr/local/cuda/lib64/libcublas_static.a -o /usr/local/cuda/lib64/libcublas_static.a;     else          echo "cuda compute cap 70 is not supported"; exit 1;     fi;:
Information
- Docker
- The CLI directly
Tasks
- An officially supported command
- My own modifications
Reproduction
- Use machine with above config
- Run below command with using any supported model
docker run --gpus all -p 8912:80 -v $volume:/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.2 --model-id $model
Expected behavior
Should able to run image on v100 gpu
Metadata
Metadata
Assignees
Labels
No labels