@@ -97,7 +97,7 @@ model=BAAI/bge-large-en-v1.5
9797revision=refs/pr/5
9898volume=$PWD /data # share a volume with the Docker container to avoid downloading weights every run
9999
100- docker run --gpus all -p 8080:80 -v $volume :/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.0 --model-id $model --revision $revision
100+ docker run --gpus all -p 8080:80 -v $volume :/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.1 --model-id $model --revision $revision
101101```
102102
103103And then you can make requests like
@@ -245,13 +245,13 @@ Text Embeddings Inference ships with multiple Docker images that you can use to
245245
246246| Architecture | Image |
247247| -------------------------------------| -------------------------------------------------------------------------|
248- | CPU | ghcr.io/huggingface/text-embeddings-inference: cpu-1 .0 |
248+ | CPU | ghcr.io/huggingface/text-embeddings-inference: cpu-1 .1 |
249249| Volta | NOT SUPPORTED |
250- | Turing (T4, RTX 2000 series, ...) | ghcr.io/huggingface/text-embeddings-inference: turing-1 .0 (experimental) |
251- | Ampere 80 (A100, A30) | ghcr.io/huggingface/text-embeddings-inference:1.0 |
252- | Ampere 86 (A10, A40, ...) | ghcr.io/huggingface/text-embeddings-inference:86-1.0 |
253- | Ada Lovelace (RTX 4000 series, ...) | ghcr.io/huggingface/text-embeddings-inference:89-1.0 |
254- | Hopper (H100) | ghcr.io/huggingface/text-embeddings-inference: hopper-1 .0 (experimental) |
250+ | Turing (T4, RTX 2000 series, ...) | ghcr.io/huggingface/text-embeddings-inference: turing-1 .1 (experimental) |
251+ | Ampere 80 (A100, A30) | ghcr.io/huggingface/text-embeddings-inference:1.1 |
252+ | Ampere 86 (A10, A40, ...) | ghcr.io/huggingface/text-embeddings-inference:86-1.1 |
253+ | Ada Lovelace (RTX 4000 series, ...) | ghcr.io/huggingface/text-embeddings-inference:89-1.1 |
254+ | Hopper (H100) | ghcr.io/huggingface/text-embeddings-inference: hopper-1 .1 (experimental) |
255255
256256** Warning** : Flash Attention is turned off by default for the Turing image as it suffers from precision issues.
257257You can turn Flash Attention v1 ON by using the ` USE_FLASH_ATTENTION=True ` environment variable.
@@ -280,7 +280,7 @@ model=<your private model>
280280volume=$PWD /data # share a volume with the Docker container to avoid downloading weights every run
281281token=< your cli READ token>
282282
283- docker run --gpus all -e HUGGING_FACE_HUB_TOKEN=$token -p 8080:80 -v $volume :/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.0 --model-id $model
283+ docker run --gpus all -e HUGGING_FACE_HUB_TOKEN=$token -p 8080:80 -v $volume :/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.1 --model-id $model
284284```
285285
286286### Using Re-rankers models
@@ -298,7 +298,7 @@ model=BAAI/bge-reranker-large
298298revision=refs/pr/4
299299volume=$PWD /data # share a volume with the Docker container to avoid downloading weights every run
300300
301- docker run --gpus all -p 8080:80 -v $volume :/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.0 --model-id $model --revision $revision
301+ docker run --gpus all -p 8080:80 -v $volume :/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.1 --model-id $model --revision $revision
302302```
303303
304304And then you can rank the similarity between a query and a list of texts with:
@@ -318,7 +318,7 @@ You can also use classic Sequence Classification models like `SamLowe/roberta-ba
318318model=SamLowe/roberta-base-go_emotions
319319volume=$PWD /data # share a volume with the Docker container to avoid downloading weights every run
320320
321- docker run --gpus all -p 8080:80 -v $volume :/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.0 --model-id $model
321+ docker run --gpus all -p 8080:80 -v $volume :/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.1 --model-id $model
322322```
323323
324324Once you have deployed the model you can use the ` predict ` endpoint to get the emotions most associated with an input:
@@ -347,7 +347,7 @@ model=BAAI/bge-large-en-v1.5
347347revision=refs/pr/5
348348volume=$PWD /data # share a volume with the Docker container to avoid downloading weights every run
349349
350- docker run --gpus all -p 8080:80 -v $volume :/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.0 -grpc --model-id $model --revision $revision
350+ docker run --gpus all -p 8080:80 -v $volume :/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.1 -grpc --model-id $model --revision $revision
351351```
352352
353353``` shell
0 commit comments