A local HTTP proxy that sits between Kimi CLI and the Fireworks AI API. It queues concurrent Kimi terminals behind request-per-second and token-per-minute budgets so they are less likely to hit Fireworks adaptive 429 rate limit exceeded errors.
Fireworks AI applies adaptive serverless rate limits. For Kimi K2.6 turbo, live response headers showed limits for total prompt tokens, uncached prompt tokens, and generated tokens per minute. When 4-8 Kimi terminals all send large project context at once, token bursts can hit those adaptive limits even when raw request count looks fine.
This proxy turns many hard rejections into queued waits. Requests are held locally until the request bucket and rolling token budgets have room, then forwarded to Fireworks.
# 1. Start the proxy (leave this terminal open)
cd "C:\Users\phillip\Coding projects\fireworks-proxy"
python fireworks_proxy.py
# 2. Point Kimi CLI at the proxy
# Edit: C:\Users\phillip\.kimi\config.toml
[providers.fireworks]
type = "anthropic"
base_url = "http://localhost:8787"
api_key = "your_key"Then open 4–8 Kimi terminals normally. They'll route through the proxy automatically.
| Variable | Default | Description |
|---|---|---|
FIREWORKS_PROXY_RPS |
20 |
Requests per second (20 RPS = 1,200 RPM) |
FIREWORKS_PROXY_PORT |
8787 |
Local listen port |
FIREWORKS_PROXY_MAX_WAIT |
120 |
Max seconds to queue a request |
FIREWORKS_PROXY_PROMPT_TPM_LIMIT |
4500000 |
Total prompt tokens per minute before safety margin |
FIREWORKS_PROXY_UNCACHED_PROMPT_TPM_LIMIT |
900000 |
Uncached prompt tokens per minute before safety margin |
FIREWORKS_PROXY_GENERATED_TPM_LIMIT |
36000 |
Generated/output tokens per minute before safety margin |
FIREWORKS_PROXY_TPM_SAFETY_RATIO |
0.80 |
Fraction of Fireworks TPM limits used locally |
FIREWORKS_PROXY_OUTPUT_TOKEN_RESERVE |
2048 |
Output-token reserve when a request omits max_tokens |
FIREWORKS_PROXY_CHARS_PER_TOKEN |
3.5 |
Prompt-token estimate used before forwarding |
FIREWORKS_PROXY_DAILY_TOKEN_LIMIT |
200000000 |
Daily token cap before the proxy returns 429 |
FIREWORKS_PROXY_DAILY_TOKEN_WARN_RATIO |
0.80 |
Log a warning after this share of the daily cap |
FIREWORKS_PROXY_TIMEZONE |
Australia/Melbourne |
Calendar day used for daily usage rollover |
FIREWORKS_PROXY_USAGE_STATE |
daily_usage.json |
Local file used to persist today's counted tokens |
# Slower but safer for 8+ terminals
$env:FIREWORKS_PROXY_RPS = 20
python fireworks_proxy.py
# Faster, less headroom (4 terminals)
$env:FIREWORKS_PROXY_RPS = 35
python fireworks_proxy.pyAlso cap max_steps_per_turn in ~/.kimi/config.toml so no single terminal monopolises the bucket:
[loop_control]
max_steps_per_turn = 100- Request bucket: 20 RPS sustained, 20 burst capacity (1x)
- Token budgets: rolling 60-second prompt, uncached prompt, and generated-token budgets
- Adaptive limit learning: Fireworks
X-Ratelimit-Limit-Tokens-*headers update live local budgets - Queueing: When request or token budgets are low, requests wait instead of being sent immediately
- Streaming: SSE responses from Fireworks are forwarded chunk-by-chunk
- Connection safety: Fresh TCP connection per request (no reuse corruption)
- Metrics: Queue depth, wait times, token budgets, and daily usage are logged every 10 seconds
Your 8 terminals may only sustain a low request rate over time, but they can all send large prompts at once. Fireworks' adaptive limits can drop after spikes, so the proxy leaves headroom instead of trying to ride the dashboard's dotted rate-limit line.
2026-05-21 00:31:59,656 INFO Fireworks proxy starting on http://127.0.0.1:8787 -> https://api.fireworks.ai/inference (RPS limit=20.0, prompt TPM=4500000, uncached prompt TPM=900000, generated TPM=36000, TPM safety=0.80, daily limit=200000000)
2026-05-21 00:32:09,658 INFO stats | bucket={'rate': 20.0, 'capacity': 20.0, 'tokens': 20.0} | last_60s=0 req | token_budgets={...} | daily_tokens=722859/200000000 remaining=199277141
| File | Purpose |
|---|---|
fireworks_proxy.py |
The proxy server (aiohttp) |
kill_proxy.ps1 |
PowerShell helper to kill a stuck proxy process |
# Run the kill script
.\kill_proxy.ps1
# Or manually
Get-NetTCPConnection -LocalPort 8787 | Select-Object OwningProcess
Stop-Process -Id <PID> -ForceCheck health endpoint:
curl http://localhost:8787/healthCheck daily token usage:
curl http://localhost:8787/usageThe proxy defaults to a 200M tokens/day cap. This comes from the recent analytics total: 1.19B tokens over 3 days = 396.7M/day, then reduced by 50% to 198.3M/day, rounded to 200M.
When the proxy sees Fireworks token usage in a response, it records it in
daily_usage.json. At 80% of the cap it logs a warning. Once the cap is hit,
new requests receive a clean 429 daily_token_limit response until the next
Melbourne calendar day.
First lower the token safety ratio because Fireworks' adaptive dotted limit may have moved down:
$env:FIREWORKS_PROXY_TPM_SAFETY_RATIO = 0.60
python fireworks_proxy.pyIf the log shows request bursts rather than token throttling, lower the RPS:
$env:FIREWORKS_PROXY_RPS = 15
python fireworks_proxy.py- Python 3.10+
aiohttpandPyJWT(pip install aiohttp PyJWT)- Fireworks API key (set in Kimi config)