Deploy TEI on EC2

Quick setup for Text Embeddings Inference on a t3.micro instance.

Launch Instance

Name: tei-server
AMI: Amazon Linux 2023 (default)
Instance type: t3.micro
Key pair: Create new key pair (save the .pem file)
Storage: 8 GB gp3 (default)
Network settings → Edit → Add security group rule:
- Type: Custom TCP
- Port range: 8080
- Source type: Anywhere
Advanced details → User data:

#!/bin/bash
set -euxo pipefail

dnf install -y ca-certificates procps docker
systemctl enable --now docker

mkdir -p /opt/tei/lib

TEI_IMAGE="ghcr.io/huggingface/text-embeddings-inference:cpu-1.8.2"
CID=$(docker create ${TEI_IMAGE})

docker cp ${CID}:/usr/local/bin/text-embeddings-router /opt/tei/
docker cp ${CID}:/usr/lib/llvm-14/lib/libomp.so.5 /opt/tei/lib/libomp.so.5
docker rm ${CID}

ln -s libomp.so.5 /opt/tei/lib/libiomp5.so
chmod +x /opt/tei/text-embeddings-router

fallocate -l 2G /swapfile || dd if=/dev/zero of=/swapfile bs=1M count=2048
chmod 600 /swapfile
mkswap /swapfile
swapon /swapfile
echo '/swapfile swap swap defaults 0 0' >> /etc/fstab

cat >/etc/systemd/system/tei.service <<'EOF'
[Unit]
Description=Text Embeddings Inference
After=network.target

[Service]
Type=simple
Environment=LD_LIBRARY_PATH=/opt/tei/lib
Environment=OMP_NUM_THREADS=1
Environment=TOKENIZERS_PARALLELISM=false
ExecStart=/opt/tei/text-embeddings-router --model-id nomic-ai/nomic-embed-text-v1.5 --port 8080 --max-batch-tokens 1024
Restart=always
RestartSec=3

[Install]
WantedBy=multi-user.target
EOF

systemctl daemon-reload
systemctl enable --now tei

Click Launch instance

Wait for Startup

The instance needs ~3-5 minutes to:

Install dependencies
Download the model
Start the service

Test

curl http://<public-ip>:8080/embed \
  -X POST \
  -H 'Content-Type: application/json' \
  -d '{"inputs": "Hello world"}'

SSH Access (optional)

chmod 400 your-key.pem
ssh -i your-key.pem ec2-user@<public-ip>

# Check service status
sudo systemctl status tei

# View logs
sudo journalctl -u tei -f

Add Elastic Load Balancer (ALB) + Auto Scaling (Optional but Powerful)

To make this setup horizontally scalable and highly available, you can place the TEI instances behind an Application Load Balancer (ALB) and run them in an Auto Scaling Group (ASG).

This turns your single free-tier instance into a distributed embeddings service that can scale out automatically under load.

Architecture Overview

Clients
   │
   ▼
Application Load Balancer (HTTP :80)
   │
   ▼
Target Group (port 8080)
   │
   ├── t3.micro (TEI)
   ├── t3.micro (TEI)
   └── t3.micro (TEI)
        (Auto Scaling Group)

Each instance runs the same systemd-based TEI service you already set up.

Step 1: Create a Target Group

Go to EC2 → Target Groups → Create target group
Target type: Instance
Protocol: HTTP
Port: 8080
VPC: same VPC as your instances
Health check:
- Protocol: HTTP
- Path: /health
- Healthy threshold: 2
- Unhealthy threshold: 2

TEI exposes /health automatically — no extra config needed.

Create the target group but do not register instances yet.

Step 2: Create an Application Load Balancer

Go to EC2 → Load Balancers → Create load balancer
Choose Application Load Balancer
Name: tei-alb
Scheme: Internet-facing
IP address type: IPv4
Listeners:
- HTTP :80
Availability Zones:
- Select at least 2 AZs
Security group:
- Allow inbound HTTP (port 80) from your desired sources

Attach the target group you created earlier.

Step 3: Create a Launch Template

Go to EC2 → Launch Templates → Create launch template
Base it on your working instance:
- AMI: Amazon Linux 2023
- Instance type: t3.micro
- Key pair: optional (for debugging)
- Security group:
  - Allow inbound 8080 from ALB security group
Advanced details → User data:
- Paste the exact same user-data script from your single-instance setup

This ensures every new instance automatically installs and starts TEI.

Step 4: Create an Auto Scaling Group

Go to EC2 → Auto Scaling Groups → Create
Use the launch template you just created
VPC: same VPC
Subnets: at least 2 (different AZs)
Attach to existing load balancer
- Choose your ALB target group
Scaling configuration:
- Min: 1
- Desired: 1
- Max: N (e.g. 5 or 10)

Step 5: Configure Auto Scaling Policy

A simple and effective policy:

Target tracking scaling

Metric: Average CPU Utilization
Target value: 60%

Optional alternatives:

ALB RequestCountPerTarget
ALB TargetResponseTime

CPU works well because TEI is CPU-bound.

Step 6: Update Your Client Requests

Instead of calling the instance IP:

curl http://<alb-dns-name>/embed \
  -X POST \
  -H 'Content-Type: application/json' \
  -d '{"inputs": "Hello world"}'

The ALB will automatically distribute requests across instances.

Why This Works So Well for Embeddings

Stateless requests → perfect load balancing
CPU-bound inference → clean horizontal scaling
Slow tolerance → cold starts are acceptable
Cheap nodes → failures don’t matter

You get:

High availability
Automatic recovery
Linear throughput scaling
Predictable cost

All without GPUs.

Notes & Best Practices

Warm-up time: new instances may take a few minutes to download the model — ALB health checks handle this safely.
Security:
- Restrict instance port 8080 to ALB only
- Optionally add auth or IP allowlists at ALB level
HTTPS:
- Add an ACM certificate to the ALB listener for TLS
Spot instances:
- Great for batch embedding jobs

Summary

Adding ALB + Auto Scaling transforms this from:

“a clever free-tier hack”

into:

a production-grade, horizontally scalable embeddings service built entirely on CPUs.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deploy TEI on EC2

Launch Instance

Wait for Startup

Test

SSH Access (optional)

Add Elastic Load Balancer (ALB) + Auto Scaling (Optional but Powerful)

Architecture Overview

Step 1: Create a Target Group

Step 2: Create an Application Load Balancer

Step 3: Create a Launch Template

Step 4: Create an Auto Scaling Group

Step 5: Configure Auto Scaling Policy

Step 6: Update Your Client Requests

Why This Works So Well for Embeddings

Notes & Best Practices

Summary

FilesExpand file tree

EC2.md

Latest commit

History

EC2.md

File metadata and controls

Deploy TEI on EC2

Launch Instance

Wait for Startup

Test

SSH Access (optional)

Add Elastic Load Balancer (ALB) + Auto Scaling (Optional but Powerful)

Architecture Overview

Step 1: Create a Target Group

Step 2: Create an Application Load Balancer

Step 3: Create a Launch Template

Step 4: Create an Auto Scaling Group

Step 5: Configure Auto Scaling Policy

Step 6: Update Your Client Requests

Why This Works So Well for Embeddings

Notes & Best Practices

Summary