API Embeddings

Embeddings API

The Embeddings API generates vector representations of text for semantic search, clustering, and similarity tasks.

Endpoint

POST /v1/embeddings

Authentication

Header	Description	Required
`X-API-Key`	TensorWall application API key	Yes
`Authorization`	LLM provider API key	Depends on provider

Request

Headers

POST /v1/embeddings HTTP/1.1
Host: localhost:8000
X-API-Key: gw_xxxx
Authorization: Bearer sk-your-key
Content-Type: application/json

Body Parameters

Parameter	Type	Required	Description
`model`	string	Yes	Embedding model identifier
`input`	string/array	Yes	Text(s) to embed
`encoding_format`	string	No	`float` or `base64`
`dimensions`	integer	No	Output dimensions (if supported)

Single Input

{
  "model": "text-embedding-3-small",
  "input": "Hello, world!"
}

Multiple Inputs

{
  "model": "text-embedding-3-small",
  "input": [
    "First document",
    "Second document",
    "Third document"
  ]
}

Response

Success Response

{
  "object": "list",
  "data": [
    {
      "object": "embedding",
      "index": 0,
      "embedding": [0.0023, -0.0094, 0.0156, ...]
    }
  ],
  "model": "text-embedding-3-small",
  "usage": {
    "prompt_tokens": 5,
    "total_tokens": 5
  }
}

Multiple Embeddings

{
  "object": "list",
  "data": [
    {
      "object": "embedding",
      "index": 0,
      "embedding": [0.0023, -0.0094, ...]
    },
    {
      "object": "embedding",
      "index": 1,
      "embedding": [0.0045, -0.0012, ...]
    },
    {
      "object": "embedding",
      "index": 2,
      "embedding": [0.0089, -0.0034, ...]
    }
  ],
  "model": "text-embedding-3-small",
  "usage": {
    "prompt_tokens": 15,
    "total_tokens": 15
  }
}

Supported Models

OpenAI

Model	Dimensions	Description
`text-embedding-3-small`	1536	Fast, cost-effective
`text-embedding-3-large`	3072	Higher quality
`text-embedding-ada-002`	1536	Legacy model

Mistral

Model	Dimensions	Description
`mistral-embed`	1024	Mistral embeddings

Ollama

Model	Dimensions	Description
`nomic-embed-text`	768	Nomic embeddings
`mxbai-embed-large`	1024	MixedBread embeddings
`all-minilm`	384	Fast local embeddings

Dimension Reduction

Some models support reducing output dimensions:

{
  "model": "text-embedding-3-large",
  "input": "Hello, world!",
  "dimensions": 1024
}

Cost Tracking

TensorWall tracks embedding costs:

Model	Cost per 1M tokens
text-embedding-3-small	$0.02
text-embedding-3-large	$0.13
mistral-embed	$0.10

Examples

Python

import requests

response = requests.post(
    "http://localhost:8000/v1/embeddings",
    headers={
        "X-API-Key": "gw_xxx",
        "Authorization": "Bearer sk-xxx",
    },
    json={
        "model": "text-embedding-3-small",
        "input": "Hello, world!",
    },
)

data = response.json()
embedding = data["data"][0]["embedding"]
print(f"Embedding dimensions: {len(embedding)}")

Batch Embeddings

import requests

documents = [
    "TensorWall provides LLM governance.",
    "Security features protect against prompt injection.",
    "Cost tracking monitors API spending.",
]

response = requests.post(
    "http://localhost:8000/v1/embeddings",
    headers={
        "X-API-Key": "gw_xxx",
        "Authorization": "Bearer sk-xxx",
    },
    json={
        "model": "text-embedding-3-small",
        "input": documents,
    },
)

data = response.json()
embeddings = [item["embedding"] for item in data["data"]]

Semantic Search

import numpy as np
from numpy.linalg import norm

def cosine_similarity(a, b):
    return np.dot(a, b) / (norm(a) * norm(b))

# Get query embedding
query_response = requests.post(
    "http://localhost:8000/v1/embeddings",
    headers={"X-API-Key": "gw_xxx", "Authorization": "Bearer sk-xxx"},
    json={
        "model": "text-embedding-3-small",
        "input": "How does security work?",
    },
)
query_embedding = query_response.json()["data"][0]["embedding"]

# Compare with document embeddings
for i, doc_embedding in enumerate(embeddings):
    similarity = cosine_similarity(query_embedding, doc_embedding)
    print(f"Document {i}: {similarity:.4f}")

OpenAI SDK

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8000/v1",
    api_key="sk-xxx",
    default_headers={"X-API-Key": "gw_xxx"},
)

response = client.embeddings.create(
    model="text-embedding-3-small",
    input="Hello, world!",
)

embedding = response.data[0].embedding

Error Responses

Invalid Model

{
  "error": {
    "code": "INVALID_MODEL",
    "message": "Model 'unknown-embed' is not supported for embeddings"
  }
}

Input Too Long

{
  "error": {
    "code": "CONTEXT_LENGTH_EXCEEDED",
    "message": "Input exceeds maximum token limit",
    "details": {
      "max_tokens": 8191,
      "input_tokens": 10000
    }
  }
}

Best Practices

Batch inputs - Send multiple texts in one request for efficiency
Cache embeddings - Store embeddings to avoid recomputing
Choose appropriate model - Use smaller models for speed, larger for quality
Normalize vectors - Some use cases benefit from L2 normalization

Security Considerations

TensorWall applies security checks to embedding inputs:

PII detection still applies
Secrets detection still applies
Cost limits are enforced

However, prompt injection detection is less relevant for embeddings since the text is not used as instructions.

API Embeddings

Embeddings API

Endpoint

Authentication

Request

Headers

Body Parameters

Single Input

Multiple Inputs

Response

Success Response

Multiple Embeddings

Supported Models

OpenAI

Mistral

Ollama

Dimension Reduction

Cost Tracking

Examples

Python

Batch Embeddings

Semantic Search

OpenAI SDK

Error Responses

Invalid Model

Input Too Long

Best Practices

Security Considerations

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Documentation

Getting Started

Features

Providers

API Reference

Advanced

Community

Clone this wiki locally