Fireworks Rate-Limiting Proxy

A local HTTP proxy that sits between Kimi CLI and the Fireworks AI API. It queues concurrent Kimi terminals behind request-per-second and token-per-minute budgets so they are less likely to hit Fireworks adaptive 429 rate limit exceeded errors.

Problem

Fireworks AI applies adaptive serverless rate limits. For Kimi K2.6 turbo, live response headers showed limits for total prompt tokens, uncached prompt tokens, and generated tokens per minute. When 4-8 Kimi terminals all send large project context at once, token bursts can hit those adaptive limits even when raw request count looks fine.

Solution

This proxy turns many hard rejections into queued waits. Requests are held locally until the request bucket and rolling token budgets have room, then forwarded to Fireworks.

Quick Start

# 1. Start the proxy (leave this terminal open)
cd "C:\Users\phillip\Coding projects\fireworks-proxy"
python fireworks_proxy.py

# 2. Point Kimi CLI at the proxy
# Edit: C:\Users\phillip\.kimi\config.toml
[providers.fireworks]
type = "anthropic"
base_url = "http://localhost:8787"
api_key = "your_key"

Then open 4–8 Kimi terminals normally. They'll route through the proxy automatically.

Configuration

Proxy settings (env vars)

Variable	Default	Description
`FIREWORKS_PROXY_RPS`	`20`	Requests per second (20 RPS = 1,200 RPM)
`FIREWORKS_PROXY_PORT`	`8787`	Local listen port
`FIREWORKS_PROXY_MAX_WAIT`	`120`	Max seconds to queue a request
`FIREWORKS_PROXY_PROMPT_TPM_LIMIT`	`4500000`	Total prompt tokens per minute before safety margin
`FIREWORKS_PROXY_UNCACHED_PROMPT_TPM_LIMIT`	`900000`	Uncached prompt tokens per minute before safety margin
`FIREWORKS_PROXY_GENERATED_TPM_LIMIT`	`36000`	Generated/output tokens per minute before safety margin
`FIREWORKS_PROXY_TPM_SAFETY_RATIO`	`0.80`	Fraction of Fireworks TPM limits used locally
`FIREWORKS_PROXY_OUTPUT_TOKEN_RESERVE`	`2048`	Output-token reserve when a request omits `max_tokens`
`FIREWORKS_PROXY_CHARS_PER_TOKEN`	`3.5`	Prompt-token estimate used before forwarding
`FIREWORKS_PROXY_DAILY_TOKEN_LIMIT`	`200000000`	Daily token cap before the proxy returns 429
`FIREWORKS_PROXY_DAILY_TOKEN_WARN_RATIO`	`0.80`	Log a warning after this share of the daily cap
`FIREWORKS_PROXY_TIMEZONE`	`Australia/Melbourne`	Calendar day used for daily usage rollover
`FIREWORKS_PROXY_USAGE_STATE`	`daily_usage.json`	Local file used to persist today's counted tokens

# Slower but safer for 8+ terminals
$env:FIREWORKS_PROXY_RPS = 20
python fireworks_proxy.py

# Faster, less headroom (4 terminals)
$env:FIREWORKS_PROXY_RPS = 35
python fireworks_proxy.py

Kimi CLI settings

Also cap max_steps_per_turn in ~/.kimi/config.toml so no single terminal monopolises the bucket:

[loop_control]
max_steps_per_turn = 100

How It Works

Request bucket: 20 RPS sustained, 20 burst capacity (1x)
Token budgets: rolling 60-second prompt, uncached prompt, and generated-token budgets
Adaptive limit learning: Fireworks X-Ratelimit-Limit-Tokens-* headers update live local budgets
Queueing: When request or token budgets are low, requests wait instead of being sent immediately
Streaming: SSE responses from Fireworks are forwarded chunk-by-chunk
Connection safety: Fresh TCP connection per request (no reuse corruption)
Metrics: Queue depth, wait times, token budgets, and daily usage are logged every 10 seconds

Why Burst Matters

Your 8 terminals may only sustain a low request rate over time, but they can all send large prompts at once. Fireworks' adaptive limits can drop after spikes, so the proxy leaves headroom instead of trying to ride the dashboard's dotted rate-limit line.

Example Output

2026-05-21 00:31:59,656 INFO Fireworks proxy starting on http://127.0.0.1:8787 -> https://api.fireworks.ai/inference (RPS limit=20.0, prompt TPM=4500000, uncached prompt TPM=900000, generated TPM=36000, TPM safety=0.80, daily limit=200000000)
2026-05-21 00:32:09,658 INFO stats | bucket={'rate': 20.0, 'capacity': 20.0, 'tokens': 20.0} | last_60s=0 req | token_budgets={...} | daily_tokens=722859/200000000 remaining=199277141

Files

File	Purpose
`fireworks_proxy.py`	The proxy server (aiohttp)
`kill_proxy.ps1`	PowerShell helper to kill a stuck proxy process

Troubleshooting

Port already in use

# Run the kill script
.\kill_proxy.ps1

# Or manually
Get-NetTCPConnection -LocalPort 8787 | Select-Object OwningProcess
Stop-Process -Id <PID> -Force

Proxy not working

Check health endpoint:

curl http://localhost:8787/health

Check daily token usage:

curl http://localhost:8787/usage

Daily usage limit

The proxy defaults to a 200M tokens/day cap. This comes from the recent analytics total: 1.19B tokens over 3 days = 396.7M/day, then reduced by 50% to 198.3M/day, rounded to 200M.

When the proxy sees Fireworks token usage in a response, it records it in daily_usage.json. At 80% of the cap it logs a warning. Once the cap is hit, new requests receive a clean 429 daily_token_limit response until the next Melbourne calendar day.

Still getting 429s

First lower the token safety ratio because Fireworks' adaptive dotted limit may have moved down:

$env:FIREWORKS_PROXY_TPM_SAFETY_RATIO = 0.60
python fireworks_proxy.py

If the log shows request bursts rather than token throttling, lower the RPS:

$env:FIREWORKS_PROXY_RPS = 15
python fireworks_proxy.py

Requirements

Python 3.10+
aiohttp and PyJWT (pip install aiohttp PyJWT)
Fireworks API key (set in Kimi config)

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.github/workflows		.github/workflows
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
fireworks_proxy.py		fireworks_proxy.py
kill_proxy.ps1		kill_proxy.ps1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Fireworks Rate-Limiting Proxy

Problem

Solution

Quick Start

Configuration

Proxy settings (env vars)

Kimi CLI settings

How It Works

Why Burst Matters

Example Output

Files

Troubleshooting

Port already in use

Proxy not working

Daily usage limit

Still getting 429s

Requirements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Fireworks Rate-Limiting Proxy

Problem

Solution

Quick Start

Configuration

Proxy settings (env vars)

Kimi CLI settings

How It Works

Why Burst Matters

Example Output

Files

Troubleshooting

Port already in use

Proxy not working

Daily usage limit

Still getting 429s

Requirements

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages