Native compilation of Text Embeddings Inference on Windows for running Gemma Embeddings 300m.
Download and run the installer from https://rustup.rs/ or use winget:
winget install Rustlang.RustupRestart your terminal after installation.
TEI requires the MSVC compiler. Install Visual Studio Build Tools with the C++ workload:
winget install Microsoft.VisualStudio.2022.BuildToolsThen open Visual Studio Installer and add "Desktop development with C++".
git clone https://github.com/huggingface/text-embeddings-inference.git
cd text-embeddings-inferencecargo install --path router -F ortRequires CUDA 12.2+ and cuDNN installed. Ensure nvcc is in your PATH.
# Ampere/Ada (RTX 3000 series, RTX 4000 series)
cargo install --path router -F candle-cuda
# Turing (RTX 2000 series)
cargo install --path router -F candle-cuda-turingtext-embeddings-router --model-id unsloth/embeddinggemma-300m --port 8080curl http://localhost:8080/embed -X POST -H "Content-Type: application/json" -d "{\"inputs\": \"Hello world\"}"Or with PowerShell:
Invoke-RestMethod -Uri "http://localhost:8080/embed" -Method Post -ContentType "application/json" -Body '{"inputs": "Hello world"}'