Skip to content

RouteScope/awesome-ai-gateway

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

151 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Awesome AI Gateway Awesome

GitHub stars Evaluation set Data updated daily

Pick the right AI gateway for your need in ~10 seconds — then trust the answer. A decision tree, a reproducible cost benchmark, and independent evidence for what we exclude. Organized by what you actually need, not by vendor.

Built the hard way: I burned $788 on AI coding in a single day — one flagship model ate 78% of it, just because I'd defaulted everything to the priciest option. So I mapped the whole gateway landscape. → the story

Languages: English · 简体中文

  🧭 Pick a gateway       🚀 Live interactive site       📊 Cost & scorecard  

📑 Full contents — pick fast · browse by need · reference

PRs Welcome License: CC0 Last commit

Pick fast · Which gateway should I use? · 📊 Latest evaluations · Quick comparison

Browse by need · 💰 Cost-first · 🔓 Self-hosted · 🏢 Enterprise & compliance · ☁️ First-party clouds · 🇨🇳 China ecosystem · 🤖 MCP & agent gateways

Reference · 📊 Evaluation set · How to choose safely · FAQ · 📚 Essential reading · 📰 What's new · Glossary · Why this exists · Contributing

📊 Latest evaluations

A running digest of fresh model, pricing and gateway evals — newest first, every entry dated and sourced. This is the fast-moving signal layer; for our own reproducible cost tables and model scorecard, see the full evaluation set. Spotted a new eval worth tracking? Add it.

Date Category Finding Source
2026-06-21 💰 Pricing The API pricing market now spans 123 models across 12 providers, with a >400× price spread over the full input/output range — cheapest flagship DeepSeek V4 Flash ($0.14/M input) vs priciest GPT-5.5 Pro ($30.00/M input) is already ~214× on input alone. Tiering has hardened: top reasoning (o3) runs ~20× a nano-tier model on input, wider on output. aipricing.guru
2026-06 📈 Adoption ChatGPT hit ~900M weekly active users and >2.5B queries/day — demand scaling about as fast as the price spread. DemandSage
💸 Same ¥100 (≈ $14.66) — how much can each model read? The 400× spread, made concrete.

How many input+output tokens ¥100 buys, by model (blended estimate · snapshot 2026-06-21 · aipricing.guru):

Tier Model Tokens / ¥100 ≈ Chinese chars
🥇 Rock-bottom DeepSeek V4 Flash 35.2M ~26.4M
🥇 Rock-bottom GPT-4.1 nano 29.6M ~22.2M
🥇 Rock-bottom GPT-5.4 nano 10.2M ~7.7M
💚 Value GPT-5.4 mini 2.83M ~2.1M
💚 Value DeepSeek V4 Pro 2.83M ~2.1M
🧠 Reasoning o3 1.48M ~1.1M
🏁 Flagship Gemini 2.5 Pro 1.31M ~0.98M
🏁 Flagship GPT-5.5 0.42M ~0.32M
🏁 Flagship GPT-5.5 Pro 0.07M ~0.05M

One line: ¥100 reads ~26M Chinese characters on DeepSeek V4 Flash — roughly 52× the Three-Body trilogy — but only ~50K on GPT-5.5 Pro, about one short story. Choosing a model is choosing the scale factor on your money; the Cost-first gateways exist to exploit exactly this spread.

Which gateway should I use?

Decision tree: which AI gateway should you use? Hosted (OpenRouter, Vercel, Cloudflare, Bedrock, Azure, Vertex, Portkey) vs self-hosted open source (LiteLLM, Bifrost, new-api, one-api, GPT-Load, Kong, Higress, APISIX, Envoy AI Gateway, agentgateway), chosen by what you need.

⚡ Fast answer — one sane default per need (alternatives in each linked section):

I need… Start with Drill into
Cheapest access to many models, zero ops OpenRouter Cost-first
Zero markup on my own keys Vercel / Cloudflare Cost-first
Self-host, broadest features LiteLLM Self-hosted
Self-host, lowest overhead Bifrost (Go) Self-hosted
China models + team key billing new-api China ecosystem
Enterprise K8s + audit Kong / Higress Enterprise
Strongest compliance (HIPAA/FedRAMP) Azure / Bedrock First-party
Govern agents / MCP traffic agentgateway MCP & agents
📋 The full decision tree — every branch, copy-pasteable
Do you want to self-host?
│
├─ NO — hosted, minimal ops
│   ├─ Cheapest access to many models ──────────▶ OpenRouter · Vercel AI Gateway (0% markup)
│   ├─ Free control plane over your own keys ───▶ Cloudflare AI Gateway
│   ├─ EU data residency matters ───────────────▶ Requesty · Eden AI · nexos.ai
│   └─ Already on one cloud ────────────────────▶ AWS Bedrock · Azure APIM · Vertex AI
│
└─ YES — self-hosted / open source
    ├─ Python stack, broadest features ─────────▶ LiteLLM
    ├─ Raw performance (Go/Rust/TS) ────────────▶ Bifrost · Portkey Gateway
    ├─ Built-in evals + observability ──────────▶ Helicone · Portkey Gateway
    ├─ Key distribution / billing / CN models ──▶ new-api · one-api · GPT-Load
    ├─ Enterprise K8s, audit, guardrails ───────▶ Kong · Higress · APISIX · Envoy AI Gateway
    └─ Governing AI agents & MCP traffic ───────▶ agentgateway · Lunar.dev

✅ Why trust this list

  • Independent — no vendor money, no affiliate links, CC0. Unlike affiliate-driven relay "rankings," nobody pays to appear here.
  • Reproducible, not asserted. Every cost cell is computed from open pricing data by a unit-tested script; stars refresh daily via CI.
  • Honest about risk. We disclose CVEs, label archived/stale projects, and exclude gray-market relays — with the research to back it.

Why this matters: the same task can cost 100× more depending on the model behind your gateway. An AI gateway sits between your code and LLM providers — one endpoint, one key, many models — handling routing, failover, caching, rate limits, cost tracking and guardrails, so you change a base_url instead of rewriting your app. Pick the gateway here, then the evaluation set shows which model to route to.

Cost to write one 100K-token report: $0.03 on DeepSeek vs $3.01 on GPT-5.5 — a 106x spread, computed by a unit-tested script

Found this useful? Star it — that's how the next engineer choosing a gateway finds it. CC0, no signup, no tracking, no vendor money.

Quick comparison

Stars auto-refresh daily. ✅ built-in · ➕ via plugin/paid tier · ❌ not available.

Project Type Stars License Multi-provider Fallback / LB Caching Guardrails Cost tracking
LiteLLM OSS proxy + SDK ⭐ 51.6k MIT¹ ✅ 100+
new-api OSS relay/billing ⭐ 40.1k AGPL-3.0
one-api OSS relay/billing ⭐ 35.3k MIT
Kong AI Gateway OSS API gateway ⭐ 43.7k Apache-2.0 ✅ semantic
Apache APISIX OSS API gateway ⭐ 16.8k Apache-2.0
Portkey Gateway OSS gateway + SaaS ⭐ 12.2k MIT ✅ 1600+ ✅ 50+ ➕ SaaS
TensorZero OSS LLMOps · ⚠️ archived '26 ⭐ 11.7k Apache-2.0
Higress OSS AI-native gateway ⭐ 8.7k Apache-2.0
GPT-Load OSS key-pool proxy ⭐ 6.2k MIT ✅ key rotation
Bifrost OSS gateway (Go) ⭐ 6k Apache-2.0 ✅ adaptive
Helicone OSS observability + gateway ⭐ 5.9k Apache-2.0
Envoy AI Gateway OSS K8s gateway ⭐ 1.8k Apache-2.0
OpenRouter SaaS marketplace Commercial ✅ 400+
Vercel AI Gateway SaaS (0% markup) Commercial ✅ 100s
Cloudflare AI Gateway SaaS control plane Commercial (free tier) ✅ dynamic ✅ budgets

¹ LiteLLM core is MIT; the repo contains a separately licensed enterprise directory.

📂 Browse the raw data (machine-readable, CC0): models & pricing JSON · cost table CSV · gateway scorecard CSV. Every cost cell is regenerated from this data by a unit-tested script.

The AI Gateway Landscape: 100+ gateways across 9 categories — hosted aggregators (OpenRouter, Vercel, Cloudflare, AIMLAPI, Novita), self-hosted OSS (LiteLLM, Portkey, Bifrost, Plano), enterprise & API gateways (Kong, APISIX, Envoy, Tyk, Gravitee, KrakenD), first-party clouds (Bedrock, Azure, Vertex, Databricks), China ecosystem (new-api, one-api, Higress, GPT-Load, VoAPI), smart routing (Not Diamond, Martian, RouteLLM, Claude Code Router, NVIDIA LLM Router), observability (Helicone, MLflow, Respan), MCP & agent (agentgateway, Lunar.dev, IBM ContextForge, MetaMCP, Pomerium), and K8s & inference (KServe, GPUStack, llm-d, AIBrix).

The full directory at a glance — browse the sections below by your need.

💰 Cost-first: cheapest multi-model access

Pain point: "I want many models for the least money and zero ops."

  • OpenRouter — The dominant model marketplace: 400+ models behind one OpenAI-compatible API, pay-as-you-go with automatic failover; ~5.5% fee when buying credits. $113M Series B (May 2026), ~8M users.
  • Vercel AI Gateway — Hundreds of models at provider list price (0% markup), $5/month free credits, zero-data-retention option; pairs naturally with the AI SDK.
  • RouteScope — Unified AI model aggregation & distribution gateway. Cross-format conversion of 100+ LLMs into OpenAI/Claude/Gemini-compatible interfaces. Single API key, centralized dashboard, prepaid credits from $5, no monthly commitment.
  • Cloudflare AI Gateway — Free control plane in front of your own provider keys: caching, dynamic routing, unified billing, and dollar-denominated spend limits (2026 beta).
  • Requesty — EU-friendly OpenRouter alternative: 400+ models, sub-20ms failover, ~5% markup.
  • Eden AI — Unified API for 500+ models plus vision/OCR/speech; EU-based, ~5.5% platform fee.
  • Helicone AI Gateway (cloud) — Passthrough billing at 0% markup with observability bundled.
  • GPT-Load ⭐ 6.2k — High-performance Go proxy that rotates pools of API keys across channels to maximize quota usage.
  • Loop Gateway — OpenAI-compatible proxy that meters every request in Bitcoin sats instead of dollars. 311 models via OpenRouter at a 15% markup. No accounts, no email, no card; top up over Lightning, get a bearer token. Three auth rails (prepaid bearer, L402, Cashu). Self-hostable in Go via docker-compose, live at api.loopxxi.com. New & unverified (anonymous 1★ repo) — it resells frontier models through the operator's own OpenRouter account at a 15% markup, and account-less + crypto-prepaid means no recourse if it swaps models or vanishes; confirm fidelity with canary_check.py and only top up what you can afford to lose.
  • nullsink (repo) — Account-less, metered proxy for frontier-model APIs, paid in Monero or Bitcoin. No accounts, no email, no card; mint a bearer token, prepay on-chain, and point the official SDKs at one base URL. ~10% markup taken once at top-up; no IP logging, no request logs; payment and token kept unlinkable. Self-hostable single binary (TypeScript/Bun, AGPL-3.0), live at nullsink.is. New & unverified (repo created 2026-06, 3★) — account-less + crypto-prepaid + no logs means no recourse if it swaps models or vanishes; confirm fidelity with canary_check.py and only top up what you can afford to lose.
  • AIMLAPI — One OpenAI/Anthropic-compatible endpoint fronting 400+ models (chat, image, video, audio, embeddings); prepaid, OpenRouter-style aggregator.
  • Novita AI — Unified API to 200+ open-source models (DeepSeek/Qwen/Llama…) with load balancing, autoscaling and failover; also a GPU cloud.
  • FlintAPI (repo) — Hosted OpenAI-compatible gateway aggregating 25+ Chinese LLMs (DeepSeek, Qwen, Kimi, GLM, MiniMax) with $2 free credits. New and unverified — confirm model fidelity (e.g. with canary_check.py) before relying on it in production.
  • FlowBar — Hosted OpenAI-compatible relay reselling 50+ models (GPT, Claude, Gemini, DeepSeek, Qwen, GLM, Kimi) below OpenRouter, with USD/CNY/crypto payment. New and unverified — confirm model fidelity (e.g. with canary_check.py) before relying on it in production.
  • lxg2it ModelRouter (repo) — Solo-built, OpenAI-compatible router over 7+ providers (Anthropic, OpenAI, Google, Cerebras, Groq, Grok, GLM) with tiered automatic fallback that selects the cheapest available model. Free tier plus a paid tier advertised at 0% markup on Anthropic (a deposit fee may apply — verify current pricing). New and unverified — the public repo is now a deprecated reference stub (routing moved to the closed hosted service; still no license file) — confirm model fidelity (e.g. with canary_check.py) before relying on it in production.
  • OpenPaths — Hosted OpenAI-compatible router across 15+ providers spanning chat, image, video, music, speech, embeddings, transcription and search; source/dev on Codex Infinity. Newer SaaS.
  • Glama Gateway — OpenAI-compatible gateway to 100+ models with consolidated billing, caching and logging (OSS core glama-ai/lightport).

💡 Squeeze more from any gateway: enable semantic caching (Kong, Bifrost, Zuplo), set spend limits (Cloudflare, Zuplo, Pydantic/Logfire), and route easy prompts to cheap models (see Smart routing).

🔓 Self-hosted open source

Pain point: "My keys, my infra, no per-token middleman fee."

  • LiteLLM ⭐ 51.6k — The default choice: Python SDK + proxy server speaking OpenAI format to 100+ providers, with virtual keys, budgets, load balancing and guardrails.
  • Portkey Gateway ⭐ 12.2k — Fast TypeScript gateway (1,600+ models, 50+ guardrails) that also powers Portkey's commercial LLMOps platform.
  • CLIProxyAPI ⭐ 38.4k — Go gateway that wraps coding-agent CLI subscriptions (Claude Code, Codex, Gemini, Grok, Antigravity) into OpenAI/Gemini/Claude/Codex-compatible APIs with multi-account pools, round-robin load balancing and a management API; one of the highest-starred OSS gateways in the space. BYO accounts — but routing OAuth coding-tier subscriptions through an API can violate provider ToS, so weigh account-ban risk.
  • 9router ⭐ 18.5k — MIT self-hosted BYOK local proxy that auto-routes across 40+ providers with subscription→cheap→free fallback, multi-account load balancing and token compression; cost-first and very popular, but its free/OAuth coding-tier routing (Claude Code, Codex, Kiro) carries provider-ToS/account-ban risk.
  • TensorZero ⭐ 11.7k — ⚠️ Archived June 2026 (company wound down; repo read-only, Apache-2.0 code + community forks remain). Rust gateway unified with observability, evals, experimentation and optimization.
  • Bifrost ⭐ 6k — Go gateway from Maxim AI claiming ~50x LiteLLM throughput; adaptive load balancing, cluster mode, MCP support.
  • Helicone ⭐ 5.9k — Observability-first platform (YC W23) with a Rust ai-gateway ⭐ 605.
  • Plano ⭐ 6.6k — AI-native proxy and data plane for agents (formerly Arch Gateway / archgw).
  • LLM Gateway ⭐ 1.3k — Open-source OpenRouter alternative: route, manage and analyze requests across providers.
  • APIPark ⭐ 1.8k — Cloud-native LLM API management and distribution platform.
  • Pydantic AI Gateway ⭐ 190 — BYOK gateway with cost caps and OTel; ⚠️ repo archived, now folded into Pydantic Logfire.
  • OptiLLM ⭐ 4.2k — Optimizing inference proxy that boosts accuracy via test-time compute techniques.
  • aisuite ⭐ 14.8k — Andrew Ng's unified multi-provider client. A library rather than a deployable proxy — fits when you don't want network hops.
  • Shepherd Model Gateway (SMG) ⭐ 357 — Engine-agnostic gateway in Rust: one OpenAI/Anthropic-compatible endpoint over vLLM/SGLang/TRT-LLM + cloud providers, with KV-cache-aware routing and WASM plugins.
  • RelayPlane ⭐ 184 — MIT, local-first proxy (npm): 11 providers behind one endpoint with per-request cost attribution and hard daily/hourly budget caps.
  • SentryNode Gateway ⭐ 0 — Open-core (Apache-2.0) AI proxy for cost governance / FinOps routing: adaptive model routing, budget caps and audit logging. Early-stage; the public repo currently ships a demo scaffold.
  • GoModel ⭐ 969 — Lightweight single-binary Go gateway (open-source LiteLLM alternative) exposing one OpenAI/Anthropic-compatible API across 18+ providers with caching, guardrails and usage/cost tracking; fast-growing, though its throughput-vs-LiteLLM figures are vendor-run.
  • OpenGateLLM ⭐ 166 — Production-grade open-source GenAI gateway from France's Etalab (powers the government's "Albert" assistant): one OpenAI-compatible API over self-hosted + provider models, with auth, rate limits and usage tracking. Distinct public-sector / EU-sovereignty angle.
  • ⚠️ Stale but historically notable: BricksLLM ⭐ 1.2k (PII masking, per-key limits; inactive since early 2025), Glide ⭐ 161 (inactive since 2024).

🏢 Enterprise & compliance

Pain point: "Audit logs, PII redaction, RBAC, on-prem, and the EU AI Act (enforceable Aug 2026)."

  • Kong AI Gateway ⭐ 43.7k — Mature API gateway with AI plugins: semantic caching/routing, prompt guard, token rate-limiting; Konnect for managed control plane.
  • Apache APISIX ⭐ 16.8k — Cloud-native API + AI gateway with ai-proxy / ai-proxy-multi plugins.
  • Envoy AI Gateway ⭐ 1.8k — CNCF-aligned GenAI access on Envoy Gateway, backed by Tetrate and Bloomberg.
  • kgateway ⭐ 5.6k — CNCF API/AI gateway, the base of Solo.io's commercial Gloo AI Gateway.
  • TrueFoundry AI Gateway — Enterprise gateway with routing, guardrails and RBAC, deployable into your K8s/VPC.
  • nexos.ai — Enterprise AI gateway/orchestration from the Nord Security founders (€30M Series A, Oct 2025).
  • Tyk AI Studio — AI governance suite: budgets, model catalogs, guardrails on Tyk's gateway.
  • Gravitee Agent Mesh — LLM Proxy, MCP Proxy and A2A support inside Gravitee APIM.
  • WSO2 AI Gateway — Egress management for LLM traffic: model routing, semantic caching, guardrails.
  • F5 AI Gateway — Containerized AI traffic gateway; data-leakage detection via the LeakSignal acquisition (announced Jul 2025).
  • IBM API Connect AI Gateway — Policy enforcement, masking and audit for LLM traffic.
  • MuleSoft AI / Omni Gateway — Governs LLM, MCP and agent traffic alongside classic APIs.
  • Lunar.dev ⭐ 462 — Egress consumption gateway repositioned around MCP/agent governance.
  • KrakenD AI Gateway — High-performance, stateless Go API gateway (krakend/krakend-ce ⭐ 2.6k) with an AI proxy + prompt-security layer.
  • Broadcom Layer7 AI Gateway — LLM traffic governance, threat protection and quotas on the mature Layer7 API platform.
  • Cequence AI Gateway — API-security-first AI gateway: discovery, guardrails and threat protection for LLM/agent traffic.
  • Axway Amplify AI Gateway — Centralized control plane on Axway's Amplify platform governing LLM/MCP/agent traffic with business-logic model routing, RBAC, spend caps, prompt-injection controls and RAG integration, from a 10× Gartner MQ API-management Leader.
  • Red Hat Connectivity Link — Kubernetes-native gateway (built on the Kuadrant project, successor to 3scale) unifying AI gateway, API management and multicluster connectivity; powers OpenShift AI Models-as-a-Service as the front door governing external and self-hosted LLM endpoints.
  • Sensedia AI Gateway — Gartner-recognized APIM vendor's agnostic AI gateway governing LLMs, MCP servers and AI agents with multi-model routing, guardrails, cost controls and observability across a multi-cloud control plane.
  • Ambassador Edge Stack — Envoy-based, Kubernetes-native API gateway (OSS core emissary-ingress ⭐ 4.5k) whose AI Gateway layer adds LLM-provider routing, token rate-limiting and fallback — a peer to Kong/Tyk/APISIX in the API-vendor cohort.

☁️ First-party gateways (cloud & model vendors)

Pain point: "We're already committed to one cloud — give us the native path."

🇨🇳 China ecosystem

Pain point: "Domestic models (Qwen/DeepSeek/GLM/Kimi), CNY payment, key distribution & billing for teams."

  • new-api ⭐ 40.1k — The most active one-api fork, now a "unified AI model hub": protocol conversion, billing, Rerank/Realtime endpoints. AGPL-3.0.
  • one-api ⭐ 35.3k — The original LLM API 管理&分发系统 (OpenAI/Azure/Claude/Gemini/DeepSeek/豆包…); development has slowed.
  • Higress ⭐ 8.7k — Alibaba's AI-native gateway on Envoy/Istio, first-class 通义/DeepSeek support; hosted version at higress.ai.
  • GPT-Load ⭐ 6.2k — 智能密钥轮询 multi-channel proxy in Go.
  • one-hub ⭐ 2.8k — one-api fork with better non-OpenAI function calling and stats.
  • simple-one-api ⭐ 2.3k — Single binary adapting 千帆/星火/混元/MiniMax/DeepSeek to the OpenAI interface.
  • Octopus ⭐ 2.3k — Personal LLM API aggregation gateway unifying multiple providers behind one endpoint, with load balancing and OpenAI/Anthropic protocol conversion (Go + Next.js).
  • Veloera ⭐ 1.6k — Newer relay platform in the one-api/new-api lineage.
  • uni-api ⭐ 1.2k — Lightweight single-config unified API manager, no frontend.
  • APIPark ⭐ 1.8k — China-origin, cloud-native AI & API gateway with an open developer portal.
  • VoAPI ⭐ 1.1k — Polished new-api-lineage relay/billing panel (Go), focused on UI and operations.
  • done-hub ⭐ 778 — one-api/new-api fork with richer billing and channel management.
  • sub2api ⭐ 29.1k — Go relay platform that pools Claude/OpenAI/Gemini/Antigravity subscription accounts (OAuth, session keys, API keys) behind one OpenAI/Anthropic-compatible endpoint, adding cost-sharing "carpool" billing (Stripe/Alipay/WeChat), key distribution and per-token rate limits. One of 2026's fastest-rising China-ecosystem relays — but account-pooling sits adjacent to the resold-relay category this list excludes; BYO accounts and vet before use.
  • AI Proxy ⭐ 489 — Self-hosted Go gateway from the Sealos team that accepts OpenAI/Claude/Gemini protocols, converts between them, and adds multi-channel routing, load balancing, rate limiting, multi-tenant isolation, and a caching/web-search/reasoning plugin layer.
  • metapi ⭐ 3k — Self-hosted "router of routers": aggregates your accounts across new-api/one-api/OneHub/DoneHub/Veloera/AnyRouter/sub2api into one key, with cost/balance/utilization-weighted smart routing, channel cool-down/retry, model auto-discovery and OpenAI⇄Claude conversion (TypeScript, MIT). Routing software only — vet the upstream relays it points at.
  • Volcengine AI Gateway — ByteDance's cloud AI gateway: unified access, routing and governance for Doubao + third-party models.

⚠️ This list deliberately excludes reverse-engineered / resold "free-api" relays — and not on principle alone. Two 2026 measurement studies found systematic fraud across the relay population: Real Money, Fake Models measured model-identity failures in 45.8% of fingerprint tests and output divergence up to 47%; Your Agent Is Mine caught routers injecting malicious code and exfiltrating planted API keys. If you're forced to vet one anyway, use the canary-diff test in How to choose safely.

🤖 MCP & agent gateways

Pain point: "Agents call tools now — govern MCP traffic like you govern APIs." The newest category (2025–2026).

  • agentgateway ⭐ 3.5k — CNCF proxy for agentic traffic: MCP governance and agent-to-agent (A2A) communication.
  • Lunar.dev MCPX ⭐ 462 — Gateway for managing MCP server consumption.
  • Tetrate Agent Router Service — Managed Envoy AI Gateway fleet: LLM + MCP gateway with guardrails (~5% fee).
  • Zuplo AI Gateway — Programmable policies: USD spend limits, prompt-injection detection, secret masking, MCP support.
  • NetFoundry MCP/LLM Gateways — Zero-trust gateways for AI deployments (launched June 2026).
  • AWS AgentCore Gateway — Tool/MCP gateway inside Bedrock AgentCore.
  • IBM ContextForge ⭐ 4k — MCP gateway/registry federating many MCP servers behind one endpoint with auth, rate limits and observability.
  • Docker MCP Gateway ⭐ 1.5k — Docker-maintained docker mcp CLI plugin that runs and federates MCP servers as containers behind one endpoint, with secret management, call interception and per-tool access control.
  • MetaMCP ⭐ 2.5k — Aggregates MCP servers into one endpoint with middleware (auth, filtering) and a management UI.
  • ToolHive ⭐ 1.9k — Go platform that runs MCP servers in isolated containers and fronts them with a unified, secured gateway (access policies, "virtual MCP" aggregation).
  • Microsoft MCP Gateway ⭐ 711 — Microsoft-maintained reverse proxy + management layer for MCP servers: session-aware stateful routing and lifecycle management on Kubernetes.
  • 1MCP ⭐ 459 — Unified MCP server (TypeScript) aggregating many MCP servers behind one endpoint, with HTTP access and CLI-based discovery for agents.
  • mcpproxy-go ⭐ 269 — Local Go MCP proxy that federates multiple MCP servers behind one endpoint, with BM25 tool-search filtering, token reduction, and auto-quarantine/security scanning of new servers.
  • MCPJungle ⭐ 1.1k — Self-hosted MCP registry + gateway for central tool governance in enterprises.
  • Obot ⭐ 857 — Open-source agent platform with an MCP gateway for governing tool access.
  • Director ⭐ 479 — Middleware to run, secure and observe MCP servers behind one connection.
  • Lasso MCP Gateway ⭐ 377 — Security-first MCP gateway: plugin guardrails, secret masking, threat detection.
  • Armorer Guard ⭐ 40 — Local Rust MCP proxy that wraps stdio servers and inspects tool-call arguments for prompt injection, credential leakage, exfiltration, and risky actions.
  • fak ⭐ 4 — Security-first agent/MCP firewall: a single dependency-free Go binary (Apache-2.0) fronting any OpenAI/Anthropic/MCP backend, where a default-deny capability allow-list adjudicates every tool call and suspicious tool results are quarantined out of the model's context, plus bearer/x-api-key auth, an X-Trace-Id audit trail and Prometheus /metrics. New and early-stage.
  • Archestra ⭐ 3.9k — Kubernetes-native MCP gateway with OAuth On-Behalf-Of user-delegated tool access, an A2A agent-to-agent gateway, and deterministic dual-LLM / "lethal trifecta" guardrails plus per-environment egress and cost limits, built for enterprise agent deployments ($13.5M funding).
  • Unla ⭐ 2.2k — Lightweight Go MCP gateway that turns existing REST/gRPC APIs and MCP servers into standardized MCP endpoints with zero code changes, behind one gateway with multi-tenant sessions, OAuth, hot-reload config and a management UI.
  • Jarvis Registry ⭐ 1.6k — Enterprise MCP/agent gateway fronting internal tools behind one authenticated MCP-over-SSE/HTTP endpoint with OAuth2/OIDC identity (Keycloak/Cognito/Entra), tool-level RBAC/ACL, agent orchestration, and OpenTelemetry/Prometheus observability.
  • MCP Gateway & Registry ⭐ 744 — Enterprise MCP gateway + registry centralizing access to many MCP servers behind one OAuth-protected endpoint, with virtual MCP servers, semantic tool discovery, A2A agent discovery and fine-grained governance/audit; AWS-aligned.
  • Nexus (Grafbase) ⭐ 431 — Rust AI router from Grafbase that aggregates MCP servers (STDIO/SSE/HTTP) and LLM providers behind one endpoint with context-aware fuzzy tool search, OAuth2/TLS security, rate limiting and OpenTelemetry.
  • Pomerium ⭐ 4.9k — Identity-aware access proxy with MCP support: policy-based auth in front of MCP servers.

🔧 More by capability (cross-cutting)

These cut across the need-based sections above — routing intelligence, observability, and Kubernetes infra that complement whichever gateway you picked.

🧠 Smart routing & model selection

Pain point: "Send each prompt to the cheapest model that can handle it."

  • Not Diamond — SOTA model-routing intelligence; powers OpenRouter's Auto router.
  • Martian — Pioneer commercial model router; Accenture partnership.
  • Inworld Router — One API for 200+ models with real-time complexity-based routing and 0% markup (pass-through pricing); adds first-party realtime inference for open models. Research preview.
  • RouteLLM ⭐ 5.1k — LMSYS's open router framework (research-grade; inactive since 2024 but still the canonical paper/code).
  • OpenRouter Auto — One model id (openrouter/auto) that routes per-prompt.
  • Unify — Early neural LLM router (company since pivoted to agents).
  • Bifrost adaptive load balancing / Cloudflare dynamic routing — routing built into gateways themselves.
  • Claude Code Router ⭐ 35.3k — Route Claude Code (and other agent CLIs) to any model/provider — DeepSeek, Qwen, local — by request type.
  • ClawRouter ⭐ 6.6k — Agent-native LLM router (TypeScript) with local sub-ms routing across 41+ models, built so autonomous agents can pay per call via x402/USDC with no signup or API key. The routing client is open-source — but its account-less hosted access (8 free models + crypto pay-per-use) is resold access: verify model fidelity with canary_check.py and prefer your own keys in production.
  • RouterArena ⭐ 97 — Open evaluation framework + live leaderboard for LLM routers (standardized datasets, cost/quality metrics) — pick a router on data, in the spirit of this list's benchmarks.
  • vLLM Semantic Router ⭐ 4.6k — Mixture-of-models router that picks a model per prompt by intent/complexity; a vLLM project.
  • NVIDIA LLM Router ⭐ 307 — NIM-based blueprint routing each prompt to the best model by task and complexity.
  • LLMRouter ⭐ 2k — Research framework for graph/learned cost–quality model routing.
  • Orq.ai — Hosted routing control plane: 500+ models across 30+ providers with retries, fallbacks, caching and governance (BYOK).
  • NadirClaw ⭐ 547 — Self-hosted, OpenAI-compatible router (Python) that sends simple prompts to cheap/local models and hard ones to premium, with a trained cascade verifier to cut API cost 40–70%.
  • ngrok AI Gateway — Managed proxy routing to OpenAI/Anthropic/Google + local Ollama/vLLM/LM Studio, with automatic failover, key rotation, and CEL traffic-policy controls (PII redaction).

📊 Observability & cost tracking

Pain point: "Who spent what, on which model, and why did quality drop?"

🔎 How to evaluate a gateway's observability (table-stakes vs differentiating vs advanced, grounded in the OpenTelemetry GenAI conventions): see BENCHMARKS → Part 6. For the research landscape — theory, seminal papers, company writing, standards & open problems: see the observability survey.

  • Helicone ⭐ 5.9k — Logs, costs, sessions, prompt experiments; one-line proxy integration.
  • TensorZero ⭐ 11.7k — ⚠️ Archived June 2026 (repo read-only; Apache-2.0 code + community forks remain). Gateway + observability + evals in one Rust binary, data stays in your ClickHouse.
  • Portkey — Full LLMOps suite over its OSS gateway: traces, budgets, prompt management.
  • vLLora (ex-LangDB) ⭐ 806 — Agent debugging and observability from the LangDB team.
  • Braintrust Proxy ⭐ 400 — Caching proxy wired into Braintrust evals.
  • MLflow AI Gateway ⭐ 26.7k — Unified endpoints + governance inside the MLflow platform.
  • Respan (ex–Keywords AI) — One endpoint to 250+ models with routing/fallback/caching, plus built-in observability and evals.

☸️ Kubernetes-native & inference infra

Pain point: "Routing to self-hosted models (vLLM/Ollama) inside the cluster, GPU-aware."

  • Gateway API Inference Extension ⭐ 698 — The Kubernetes standard for inference-aware routing.
  • AIBrix ⭐ 4.9k — Cost-efficient control plane for vLLM on K8s (ByteDance-origin).
  • llm-d ⭐ 3.5k — K8s-native distributed inference serving (Red Hat/Google/IBM-backed).
  • Higress ⭐ 8.7k / Kong ⭐ 43.7k / Envoy AI Gateway ⭐ 1.8k — all implement inference-extension-style routing.
  • Traefik Hub AI Gateway — LLM routing/security in Traefik's commercial runtime.
  • Inference Gateway ⭐ 127 — Small cloud-native gateway unifying cloud + local (Ollama) providers.
  • Olla ⭐ 249 — Lightweight Go proxy + load balancer for LLM infra: intelligent routing and automatic failover across inference backends (Ollama, vLLM, LM Studio, OpenAI-compatible).
  • KServe ⭐ 5.6k — The standard model-inference platform on K8s; LLM serving with an inference-gateway / OpenAI-compatible runtime.
  • GPUStack ⭐ 5.2k — Manage GPU clusters and serve LLMs behind one OpenAI-compatible endpoint.
  • vLLM Production Stack ⭐ 2.4k — Reference K8s stack to serve vLLM at scale with a KV-cache-aware routing layer.
  • NVIDIA Dynamo ⭐ 7.3k — NVIDIA's datacenter-scale distributed inference framework whose Endpoint Picker (EPP) plugin for the Gateway API Inference Extension does KV-cache-aware, LLM-aware request routing at the gateway layer over vLLM/SGLang/TensorRT-LLM backends.
  • llmaz ⭐ 306 — K8s-native inference platform fronting heterogeneous backends (vLLM, SGLang, TGI, llama.cpp, TensorRT-LLM) with Envoy AI Gateway-based model routing and token rate-limiting, Gateway-API inference-pool routing, and LLM-metric HPA plus Karpenter autoscaling. Maintained but slower cadence (still v0.1.x).

📰 What's new

Curated monthly. Last review: 2026-06-15.

  • 2026-06 · TensorZero shut down — the VC-backed open-source LLMOps gateway ($7.3M seed) archived its repo on June 12, as first-party clouds ship native gateway/observability features and squeeze independents. (byteiota)
  • 2026-03 · Helicone acquired by Mintlify (now maintenance mode); the same month LiteLLM hit a PyPI supply-chain attack — v1.82.7/1.82.8 were backdoored via a CI-token compromise and quarantined in ~3h, a sharp reminder to pin gateway versions. (Mintlify, Trend Micro)
  • 2026-05 · Palo Alto Networks completed its acquisition of Portkey (announced Apr 30, closed May 29), making the AI gateway the control plane for its Prisma AIRS security platform — a sign gateways are becoming core security infrastructure. (Palo Alto Networks)
  • 2026-05 · OpenRouter raised a $113M Series B led by CapitalG at a $1.3B valuation — ~8M users, ~100T tokens/month. (TechCrunch)
  • 2026-06 · NetFoundry launched zero-trust MCP and LLM gateways; Cisco Investments joined its Series A. (PR Newswire)
  • 2026 · Cloudflare AI Gateway shipped dollar-denominated spend limits (public beta) on top of dynamic routing and unified billing. (Cloudflare blog)
  • 2025-11 · Pydantic AI Gateway went open beta and has since merged into Logfire. (Pydantic Logfire)
  • Trend · MCP gateways emerged as a distinct category; spend-limit enforcement became table stakes; the EU AI Act (enforceable Aug 2026) is driving the compliance bucket; new-api overtook one-api as the most active China-ecosystem relay; and an independent-gateway shakeout is underway — Portkey (→Palo Alto) and Helicone (→Mintlify) acquired, TensorZero shut down.

🚀 Recent releases (auto-updated)

Glossary

Key terms used in the tables above (click to expand)
  • AI gateway / LLM gateway — a proxy between your app and LLM providers; one endpoint and key for many models.
  • LLM router — the part that decides which model serves each request (cheap vs flagship, by cost or quality).
  • Fallback — automatically retry on another model/provider when the first fails or times out.
  • Load balancing (LB) — spread traffic across keys/providers to dodge rate limits and outages.
  • Semantic caching — return a cached answer when a new prompt is semantically similar to a past one (not just identical).
  • Prompt / cached input — providers bill reused prompt prefixes at a steep discount (≈0.1×); the gateway must not mangle the prefix or the cache misses.
  • Guardrails — input/output checks: prompt-injection detection, PII redaction, content filtering, schema enforcement.
  • Virtual keys — per-user/team keys the gateway issues in front of your real provider keys, with their own budgets and limits.
  • ZDR (zero data retention) — provider/gateway contractually does not store your prompts or completions.
  • BYOK — bring your own key: the gateway uses your provider accounts rather than reselling tokens.
  • Markup — the gateway's fee on top of provider token cost (0% to ~6%).
  • MCP gateway — governs agent ↔ tool traffic (Model Context Protocol), the agentic counterpart to an LLM gateway.

How to choose safely

Start by matching the gateway's trust level to your data's sensitivity — this one call decides most of the rest:

Your data Route it to Don't
🔴 Secrets / regulated (PII, PHI, financial, source code, keys) First-party direct + ZDR (Azure / Bedrock / Vertex) or a gateway self-hosted in your VPC …send it through any third-party relay — full stop
🟡 Internal / business Compliant hosted (Cloudflare, Vercel, Portkey) or self-hosted (LiteLLM, Bifrost) …use an unvetted relay; get ZDR in writing
🟢 Low-stakes / public / throwaway (demos, scraped public text) Cheapest wins — a gray relay can even be economically rational here …skip the canary test: assume model-swap + data-harvest until you've proven otherwise

The mistake is using one trust tier for all your traffic. Sensitive prompts through a $0.50/M relay is how keys leak; throwaway prompts through a FedRAMP endpoint is how you overpay 100×. Match the tier to the data.

Then, whatever tier you're in:

  1. Check the markup. Marketplaces charge 0–6% — for high volume, self-hosting or 0%-markup gateways (Vercel, Helicone cloud) pay for themselves fast.
  2. Verify model fidelity (canary-diff test). Some relays silently downgrade or quantize models. Send fixed "canary" prompts — a known-hard reasoning question plus a tokenizer/fingerprint probe — through the gateway and direct to the provider, then diff the outputsscripts/canary_check.py automates exactly this (relay vs. official → a verdict you can attach to a watch-list report). 2026 research found model-identity failures in ~46% of audited relays (arXiv:2603.01919). Community monitors apiranking.com and rate.linux.do (browser-only) track relay authenticity/stability — usable as signal if you must vet one, but listing there is not endorsement, and this list includes none of them.
  3. Mind data flow. Every gateway sees your prompts. For sensitive data: self-host, or require ZDR (zero data retention) in writing.
  4. License check before embedding. new-api is AGPL-3.0; LiteLLM has an enterprise-licensed directory; "open core" ≠ everything free.
  5. Project health. Star count ≠ maintenance. Check last release date — several once-popular gateways (BricksLLM, Glide, RouteLLM) are effectively unmaintained; this list labels them.
  6. Avoid gray-market relays reselling reverse-engineered or stolen-quota access. Beyond account-ban risk, 2026 research caught relays serving poisoned models and exfiltrating planted secrets (Your Agent Is Mine) — and the most-visible relay "rankings" are often paid press releases or carry affiliate links. Account bans and data leaks are your risk, not theirs. Caught one swapping models, harvesting data, or vanishing with your balance? Report it — with evidence — and we'll build the community watch list together.

🧰 Companion tools — verify what you picked

This list tells you which gateway to start with; these two open-source tools — from this list's maintainer (disclosed) — help you prove it behaves before trusting it in production:

  • llm-gateway-bench (live dashboard) — black-box benchmark for any OpenAI-compatible gateway/relay: TTFT & throughput, success rate, price multiple, plus fidelity probes (model-echo, fake-streaming, usage inflation, context truncation). Test your own gateway with your own key and compare it to the best.
  • modelprobe — a tiny, dependency-free Go availability prober: point it at a base URL + key and it reports, per model, is it up and how fast. One static binary — drop it in CI or a cron on a $5 VM.

Community relay watch-list

Built on evidence, not hearsay. Newer or unusually cheap relays we've listed but not yet independently fidelity-checked sit here as "vet before use." Run the canary-diff test and report your verdict to move an entry to ✅ verified or ⛔ confirmed-problematic. The script diffs across one or more models in a single pass (--model a,b) and adds a tokenizer/fingerprint probe — system_fingerprint mismatch and prompt_tokens divergence on identical prompts — an independent tell beyond text similarity. A passing canary from a project's own team is logged as self-reported — reaching ✅ verified takes an independent reproduction by someone unaffiliated.

Relay Listed in Status Why it's here
FlintAPI (repo) Cost-first ⚠️ Unverified — vet before use Aggregates 25+ Chinese LLMs (DeepSeek/Qwen/Kimi/GLM/MiniMax) with $2 free credits; model fidelity unconfirmed.
FlowBar Cost-first ⚠️ Unverified — vet before use Resells frontier models (GPT/Claude/Gemini) below OpenRouter with crypto/CNY payment; model fidelity unconfirmed.
lxg2it ModelRouter (repo) Cost-first ⚠️ Unverified — self-reported canary OK (2026-06-22); needs independent repro Solo-built router reselling Anthropic/OpenAI/Google frontier models at an advertised 0% markup (deposit fee may apply). A canary-diff posted by the project's own side passed (mean sim 1.0 on Opus 4.8); not yet independently reproduced. Public repo is now a deprecated stub — routing is closed/hosted.
Loop Gateway Cost-first ⚠️ Unverified — vet before use Anonymous 1★ repo reselling 311 frontier models through its own OpenRouter account at a 15% markup, account-less + crypto-only; model fidelity unconfirmed.
nullsink (repo) Cost-first ⚠️ Unverified — vet before use Account-less, no-logs, Monero/Bitcoin-only relay proxying OpenAI/Anthropic through the operator's own account at ~10% markup; repo 3★, model fidelity unconfirmed.

Nothing is ⛔ confirmed-problematic yet — that status needs a reproducible canary verdict or a documented incident, never hearsay.

FAQ

What is an AI gateway (LLM gateway)? A proxy between your code and LLM providers: one OpenAI-compatible endpoint and key for many models, adding routing, failover, caching, rate limits, cost tracking and guardrails. See the intro.

AI gateway vs LLM router — what's the difference? A router decides which model gets each request (e.g. cheap vs flagship); a gateway is the full proxy layer (auth, caching, observability, guardrails) that usually includes routing. See smart routing.

What's the best open-source AI gateway? LiteLLM is the default for breadth (Python, 100+ providers). For raw performance pick Bifrost (Go); for enterprise K8s pick Kong or Higress. Full list under self-hosted.

LiteLLM vs OpenRouter — which should I use? OpenRouter is hosted (zero ops, ~5.5% fee, 400+ models); LiteLLM is self-hosted (your keys, your infra, $0 markup). Hosted to start, self-host when volume justifies it. Cost math in the evaluation set.

What's the cheapest way to call many LLMs? For zero ops: Vercel AI Gateway or Cloudflare AI Gateway (0% markup). For lowest token cost, route bulk work to cheap models — a 100K-token report runs $0.03 on DeepSeek vs $3.01 on GPT-5.5. See cost-first.

Are AI gateways safe? Who sees my prompts? Every gateway sees your prompts. For sensitive data self-host or require zero-data-retention in writing; check the gateway scorecard for compliance/security ratings and known CVEs.

📚 Essential reading

A short, vetted shelf — every link below was HTTP-checked live (2026-06-15). These are the concepts the comparison tables assume; read them before you commit to a gateway.

What an AI gateway actually is

Routing & fallback

Semantic caching

Prompt caching (it's a prefix match)

  • Prompt caching — Anthropic — the authoritative spec: cache key from exact bytes up to a breakpoint, write/read pricing, and TTLs.
  • Prompt caching — OpenAI — cache hits require an exact prefix; put static instructions first and variable content last to maximize reuse.

Reasoning-token cost

  • Building with extended thinking — Anthropic — reasoning/thinking tokens are billed and consume the output budget — the economics to grasp before enabling reasoning models behind a gateway.

Security & guardrails

MCP & agent gateways

  • Model Context Protocol — specification — the open standard any MCP gateway must speak and govern.
  • Building effective agents — Anthropic, 2024 — when to use workflows vs. agents and the composable patterns (routing, orchestrator-workers) the traffic flowing through an agent gateway is made of.
  • LLM Powered Autonomous Agents — Lilian Weng, 2023 — the canonical map of agent architecture (planning, memory, tool use) — what an MCP/agent gateway sits in front of and governs.

Observability

  • AI Gateway observability — Cloudflare — per-request logs, token usage, cost estimation and OpenTelemetry export across all providers.
  • How to monitor your LLM API costs — Helicone — practical cost-per-query tracking and spotting caching / model-downgrade opportunities.
  • Your AI Product Needs Evals — Hamel Husain, 2024 — why systematic evals (not vibes) are how you actually catch quality regressions in the request/response data your gateway logs.

Self-hosting economics

  • Automatic prefix caching — vLLM — KV-block prefix caching (and per-request cache isolation), the mechanism behind the savings when you self-host behind your own gateway.

Guides & comparisons

In-depth, data-backed comparisons for the questions people actually search:

More comparisons coming. Suggest one via an issue.

Why this exists

On June 10 I ran Claude Code hard for ~13 hours, and the bill came to ≈ $788. One look at the per-model breakdown told the whole story: the flagship (Fable 5) alone was $617 — 78% of the bill — while the cheap model (Haiku) did 242 real tasks for $1.70. I hadn't done anything clever to rack that up; I'd done the opposite — defaulted every request to the most capable (and most expensive) model because I couldn't be bothered to set up routing.

Claude Code usage for one day: 11 sessions, 3,572 API calls across 4 models, ≈ $788 — Fable 5 alone $617 (78% of the bill), while Haiku did 242 tasks for $1.70.

The fix wasn't "stop using good models." It was route by task — default to a cheap model, escalate to a flagship only when the work is genuinely hard. That's exactly what an AI gateway is for. While I was at it, I couldn't find a single gateway list organized by what you actually need, that scored the options honestly (CVEs and all), and shipped reproducible cost numbers instead of vibes. So I built one — that's this repo.

No vendor money, no affiliate links, CC0. If it saves you one surprise bill, it did its job. ⭐ Star it so the next person mid-$788-day finds it.

Contributing

Contributions welcome! Please read CONTRIBUTING.md first. Inclusion criteria, in short: the project must be an actual gateway/proxy/router for LLM or agent traffic (not an SDK wrapper or chat UI), publicly available, and active within the last 12 months — or clearly labeled as stale.

🔗 Related lists

This list lives in the awesome-list ecosystem. If it doesn't have what you need, these well-maintained neighbors might — and the gateways here sit between their tools and the models:

Maintain a related list and think this belongs in yours? Open an issue — cross-linking helps every list's readers.

Star history

Star History Chart

License

CC0

To the extent possible under law, the contributors have waived all copyright and related rights to this work.

About

⚡ Awesome AI Gateway — curated comparison of 100+ AI gateways & LLM proxies (LiteLLM, OpenRouter, Portkey, Kong, Higress, new-api, Bifrost) by cost, security, compliance & self-hosting. Decision tree + reproducible benchmarks. Open source, bilingual, updated daily.

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • HTML 56.2%
  • Python 43.8%