Complete API documentation for Jan Server services.
OpenAI-compatible API for chat completions, conversations, and models.
What it does:
- Generate AI responses to user messages
- Manage conversations and chat history
- Organize conversations in projects
- List available AI models
- Handle user authentication
- Support images via jan_* IDs
- Generate images from text prompts
Documentation:
- Complete Documentation - Full API reference, endpoints, examples
- Authentication - Auth methods, API keys, and token management
- Chat Completions - Main completion endpoint
- Image Generation - Generate images from text prompts
- Conversations - Conversation CRUD operations
- Projects - Project management for organizing conversations
- Admin Endpoints - Provider and model catalog management
- With Media - Media references using
jan_*IDs - Examples - cURL, Python, and JavaScript snippets
Executes tools and generates AI responses for complex tasks.
What it does:
- Run multiple tools in sequence (up to 8 steps)
- Chain tool outputs together
- Generate final answers using LLM
- Track execution time and status
Documentation:
- Complete Documentation - Full API reference, configuration, examples
- Create Response - Main orchestration endpoint
- Tool Execution Flow - How tools are executed
- Configuration - Depth and timeout settings
Handles image uploads and storage.
What it does:
- Upload images from URLs or base64 data
- Store images in S3 cloud storage
- Generate jan_* IDs for images
- Create temporary download links
- Prevent duplicate uploads
Documentation:
- Complete Documentation - Full API reference, storage flow, examples
- Upload Media - Upload from remote URL or data URL
- Presigned URL - Client-side S3 upload
- Jan ID System - Understanding
jan_*identifiers - Resolution - Convert IDs to presigned URLs
Provides Model Context Protocol tools for search, scraping, lightweight vector search, and sandboxed execution.
Available Tools:
- google_search - Serper/SearXNG-backed web search with filters and location hints
- scrape - Fetch and parse a web page (optional Markdown output)
- file_search_index / file_search_query - Index custom text into the bundled vector store and run similarity queries
- python_exec - Run trusted code via SandboxFusion, returning stdout/stderr/artifacts
Documentation:
- Complete Documentation - Full API reference, tool descriptions, examples
- JSON-RPC Protocol - Standard protocol format
- Call Tool - Execute any tool
- List Tools - Discover available tools
- Tool Details - Specific tool parameters
- Providers - MCP provider configuration
- Integration - Integration guide
- Decision Guides - When to use which API, choosing upload methods, memory configuration
- Endpoint Matrix - Full endpoint inventory.
- Error Codes - HTTP status codes and handling patterns.
- Rate Limiting - Token buckets, quotas, examples.
- Performance - SLAs, latency, scaling, cost levers.
- API Versioning - Policy and compatibility.
- Patterns - Streaming, pagination, batching, uploads.
- Examples Index - cURL/SDK samples across services.
| Environment | LLM API | Response API | Media API | MCP Tools | Gateway |
|---|---|---|---|---|---|
| Local | http://localhost:8080 | http://localhost:8082 | http://localhost:8285 | http://localhost:8091 | http://localhost:8000 |
| Docker | http://llm-api:8080 | http://response-api:8082 | http://media-api:8285 | http://mcp-tools:8091 | http://kong:8000 |
Recommended: Point all public clients at the Kong gateway (port 8000) so authentication, rate limiting, and routing stay consistent. Direct service ports remain available for internal tests but still require JWT/API key headers.
All API endpoints require authentication. The Kong gateway (port 8000) validates your credentials and forwards requests to backend services.
1. Bearer Token (Recommended for Development)
Get a guest token from Keycloak and use it in the Authorization header:
# Request a guest token
curl -X POST http://localhost:8000/llm/auth/guest-login
# Response
{
"access_token": "eyJhbGci...",
"refresh_token": "eyJhbGci...",
"expires_in": 300,
"token_type": "Bearer"
}
# Use the token in requests
curl -H "Authorization: Bearer eyJhbGci..." \
http://localhost:8000/v1/chat/completions2. API Key (For Production Clients)
Use the X-API-Key header with your API key:
curl -H "X-API-Key: sk_your_api_key_here" \
http://localhost:8000/v1/chat/completionsRefresh Tokens:
curl -X POST http://localhost:8000/llm/auth/refresh \
-H "Content-Type: application/json" \
-d '{"refresh_token": "eyJhbGci..."}'Revoke Tokens:
curl -X POST http://localhost:8000/llm/auth/revoke \
-H "Authorization: Bearer <token>" \
-H "Content-Type: application/json" \
-d '{"token": "eyJhbGci..."}'When calling services directly (ports 8080/8082/8285/8091) instead of through Kong:
- You still need a valid Keycloak JWT
- Use the same
Authorization: Bearer <token>header - API key authentication is NOT available (Kong-only feature)
Example direct call:
# Still requires JWT token from Keycloak
curl -H "Authorization: Bearer <token>" \
http://localhost:8080/v1/chat/completions- Client requests guest login or uses API key
- Kong validates credentials (JWT signature + expiry, or API key lookup)
- Kong forwards request to backend service with JWT in header
- Backend service validates JWT signature and claims
- Request is processed and response returned
Best Practice: Always use the Kong gateway (port 8000) for client applications. Direct service ports are for internal communication and debugging only.
curl -X POST http://localhost:8000/v1/chat/completions \
-H "Authorization: Bearer <token>" \
-H "Content-Type: application/json" \
-d '{
"model": "jan-v1-4b",
"messages": [
{"role": "user", "content": "Hello!"}
]
}'curl -X POST http://localhost:8000/v1/mcp \
-H "Authorization: Bearer <token>" \
-H "Content-Type: application/json" \
-d '{
"jsonrpc": "2.0",
"method": "tools/call",
"params": {
"name": "google_search",
"arguments": {"q": "AI news"}
}
}'Calling MCP Tools directly (e.g.,
http://localhost:8091/v1/mcp) is supported for internal testing, but the gateway-provided JWT/API key is still required when Kong proxies the request.
curl -H "Authorization: Bearer <token>" \
http://localhost:8000/v1/modelsAll successful responses return JSON:
{
"data": {...},
"meta": {...}
}All errors follow this structure:
{
"error": {
"type": "invalid_request_error",
"code": "invalid_parameter",
"message": "Parameter 'model' is required",
"param": "model",
"request_id": "req_123xyz"
}
}| Type | Description | HTTP Status |
|---|---|---|
invalid_request_error |
Invalid request parameters | 400 |
auth_error |
Authentication failed | 401 |
permission_error |
Insufficient permissions | 403 |
not_found_error |
Resource not found | 404 |
rate_limit_error |
Too many requests | 429 |
internal_error |
Server error | 500 |
Request Headers:
Authorization: Bearer <token>- Required for authenticated endpointsContent-Type: application/json- For POST/PUT requestsIdempotency-Key: <uuid>- Optional, for idempotent POST requestsX-Request-Id: <uuid>- Optional, for request tracing
Response Headers:
X-Request-Id- Request identifier for tracingX-Auth-Method- Authentication method used (jwt or api_key)Content-Type: application/json- JSON responseContent-Type: text/event-stream- SSE streaming response
List endpoints support pagination:
curl "http://localhost:8000/v1/conversations?limit=10&after=conv_123"Response:
{
"data": [...],
"next_after": "conv_456"
}Chat completions support Server-Sent Events (SSE) streaming:
curl -X POST http://localhost:8000/v1/chat/completions \
-H "Authorization: Bearer <token>" \
-d '{"model":"jan-v1-4b","messages":[...],"stream":true}'Response:
data: {"id":"chat-123","choices":[{"delta":{"content":"Hello"}}]}
data: {"id":"chat-123","choices":[{"delta":{"content":"!"}}]}
data: [DONE]
Access the interactive Swagger UI:
Local: http://localhost:8000/api/swagger/index.html
Try API calls directly from your browser with built-in authentication.
Official SDKs are coming soon. In the meantime, use OpenAI-compatible clients with the Jan Server base URL.
Contributions welcome! Jan Server is OpenAI-compatible, so most OpenAI client libraries work with minor configuration changes.
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "http://localhost:8000/v1",
apiKey: "your_guest_token_here",
});
const response = await client.chat.completions.create({
model: "jan-v1-4b",
messages: [{ role: "user", content: "Hello!" }],
});
console.log(response.choices[0].message.content);Currently, Jan Server does not enforce rate limits in development mode.
Production deployments should configure rate limiting via Kong Gateway.
All APIs are versioned using URL path versioning:
- Current version:
/v1/ - Future versions will be:
/v2/,/v3/, etc.
Breaking changes will only occur in new major versions.
- Docs Full Documentation
- Bug Report API Issues
- Discussion API Discussions
Explore APIs: LLM API -> | MCP Tools -> | Interactive Docs: Swagger UI ->