Skip to content

Scalable API gateway that aggregates calls to multiple LLMs (OpenAI, Hugging Face, Groq, Anthropic, Gemini, etc.), includes caching, rate limiting, logging, monitoring and production-ready deployment.

Notifications You must be signed in to change notification settings

JawherKl/llm-api-gateway

Repository files navigation

🚀 Unified LLM API Gateway

Repository Size Last Commit Issues Forks Stars

Gateway Banner


✨ Overview

Unified LLM API Gateway is a scalable, extensible platform that aggregates and normalises calls to multiple LLM backends (OpenAI, Hugging Face, Groq, Anthropic, Gemini, and more).
It provides a unified API with built-in caching, rate limiting, authentication, logging, metrics, and production-ready deployment manifests for Docker and Kubernetes.


Go Version Gin Framework OpenRouter AI License: MIT


🏗️ Architecture

  • API Gateway (Go):
    • Accepts client requests
    • Handles authentication, routing and request transformation
    • Aggregates/fans-out to LLM backends
  • LLM Adapters (microservices):
    • Wrap each provider’s API (OpenAI, Hugging Face, etc.) with a unified internal interface
  • Cache Layer:
    • Redis for result caching (prompt+params as cache key)
  • Rate Limiter:
    • Redis-based leaky-bucket or token-bucket (shared across instances)
  • Auth & Quotas:
    • API keys / JWT, per-key quotas (Redis or DB)
  • Observability:
    • Structured logs (JSON), Prometheus metrics, traces
  • Deployment:
    • Docker images, Helm charts, Kubernetes manifests, CI builds

📁 Monorepo Layout

llm-api-gateway/
├── README.md
├── LICENSE
├── .github/           # CI/CD workflows
├── infra/             # Docker Compose & Kubernetes manifests
│   ├── k8s/
│   └── docker-compose.yml
├── gateway/           # Go API gateway
│   ├── cmd/server/
│   ├── internal/
│   │   ├── handlers/
│   │   ├── adapters/
│   │   ├── cache/
│   │   ├── ratelimit/
│   │   └── metrics/
│   ├── go.mod
│   └── Dockerfile
├── adapters/          # Per-provider adapters (microservices)
│   ├── openai-adapter/
│   └── hf-adapter/
├── admin/             # NestJS admin dashboard (API keys, usage, logs)
│   ├── src/
│   ├── package.json
│   └── Dockerfile
└── tooling/
    └── tests/         # e2e test helpers

⚡ Quickstart

  1. Start all services:
    docker-compose up --build
  2. Gateway API:
    http://localhost:3020/gateway/query
  3. Admin Dashboard:
    http://localhost:3040

🔌 Supported Providers

  • OpenAI (GPT-3.5, GPT-4, GPT-4o, etc.)
  • Hugging Face Inference API
  • Groq
  • OpenRouter
  • Anthropic (Claude)
  • Gemini (Google)
  • More coming soon!

🛡️ Features

  • Unified API: One endpoint for all LLMs
  • Authentication: API key/JWT middleware
  • Caching: Redis-based, prompt+params as key
  • Rate Limiting: Per-key, Redis-backed
  • Logging: Structured, JSON logs
  • Monitoring: Prometheus metrics endpoint
  • Adapters: Microservices for each provider
  • Kubernetes & Docker: Production-ready manifests

🧑‍💻 API Usage

Request

POST /gateway/query
Authorization: <your-gateway-api-key>
Content-Type: application/json

{
  "provider": "openai" | "hf" | "groq" | "openrouter" | "anthropic" | "gemini",
  "prompt": "Your prompt here"
}

Response

{
  "cached": false,
  "response": "LLM output"
}

🚦 Development Phases

Phase 1: Core Gateway

  • Unified /query endpoint
  • OpenAI, Hugging Face, Groq, OpenRouter, Anthropic, Gemini support
  • Redis caching
  • API key authentication
  • Rate limiting
  • Logging

Phase 2: Adapters & Extensibility

  • Per-provider adapters as microservices
  • Unified internal API for adapters
  • Docker Compose & K8s manifests

Phase 3: Observability & Admin

  • Prometheus metrics
  • Admin dashboard (NestJS)
  • Usage quotas & billing
  • Tracing (OpenTelemetry)

Phase 4: Advanced Features (Planned)

  • Multi-provider aggregation/fan-out
  • Request/response transforms
  • Fine-grained quotas & billing
  • User/project management
  • Webhooks & streaming
  • Model selection & fallback
  • More adapters (Cohere, Mistral, etc.)

📈 Roadmap

  • Add more LLM providers & adapters
  • Streaming & webhooks support
  • Advanced admin features (usage, billing, analytics)
  • Helm charts for K8s
  • OpenAPI/Swagger docs

🤝 Contributing

Contributions are welcome! Please open issues or PRs for bugs, features, or improvements.

About

Scalable API gateway that aggregates calls to multiple LLMs (OpenAI, Hugging Face, Groq, Anthropic, Gemini, etc.), includes caching, rate limiting, logging, monitoring and production-ready deployment.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published