fix(prod): bump gunicorn timeout to 120s; drop dead Bing API#20
Merged
Conversation
Two prod issues from Render logs at 2026-05-09T17:00:
1. Gunicorn killed /answer mid-generation:
[CRITICAL] WORKER TIMEOUT (pid:42)
Worker (pid:42) was sent SIGKILL! Perhaps out of memory?
The default 30s worker timeout is too short for the new chat path:
parallel search providers (~3s) → LLM rerank (~1–2s) → main answer
on a reasoning-tuned model (gpt-oss-120b 5–15s; qwq-32b 30–60s for
chain-of-thought). Bumped to 120s — long-tail still gets a clean 504
instead of a SIGKILL.
2. Bing Search API returns 410 Gone on every call:
Bing search error: 410 Client Error: Gone for url:
https://api.bing.microsoft.com/v7.0/search?...
Microsoft permanently retired the Bing v7 Search API on 2025-08-11.
Removed from perform_search's executor pool. Still ~3s of pointless
wait per /search before this; with it gone, /search drops noticeably.
`search_bing` is left in the file as dead code — same shape if we
want to slot a replacement (Tavily, Brave) in later. See PR #1.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
✅ Deploy Preview for mini-perplexity canceled.
|
3 tasks
paritoshtripathi935
added a commit
that referenced
this pull request
May 10, 2026
…#24) Render's start command does not pass -c, so gunicorn auto-discovery is the only way the config file is loaded. Auto-discovery only matches gunicorn.conf.py (with a dot), not gunicorn_config.py — so the timeout=120 setting added in #20 was a no-op and /answer continued to SIGKILL at the default 30s. Rename the file to make discovery actually find it.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Two prod-breaking findings from Render logs
1.
/answerworker timeouts (chats failing) 🔴The default gunicorn worker timeout is 30s. The new chat path:
/answeron a reasoning model —gpt-oss-120b5–15s,qwq-32b30–60s for chain-of-thoughtTotal can comfortably exceed 30s. The user's "QwQ 32B" attempt confirmed this — QwQ's reasoning chain alone breaks the budget.
Fix:
timeout = 120+graceful_timeout = 30inbackend/gunicorn_config.py. Long-tail requests still get a clean 504 rather than a 30s SIGKILL.2. Bing Search API permanently dead (HTTP 410) 🟡
Microsoft retired the Bing v7 Search API on 2025-08-11. Every call has been failing for months — not a key issue, the endpoint is gone. Beyond polluting logs, the future was burning ~3s of executor time before failing, contributing to the timeout above.
Fix: drop
bing_futurefromperform_search'sThreadPoolExecutor. Pool max-workers reduced from 3 → 2 (Google + YouTube only). Thesearch_bingfunction is left in place as dead code — same shape if we slot a replacement in later (Tavily — see PR #1 — or Brave Search).Test plan
gpt-oss-120b→ returns within ~30sqwq-32b→ returns within 60–90s (no SIGKILL)Bing search error: 410line/searchend-to-end is noticeably faster (3–5s shaved)Not in this PR
/api/v1/modelsgetting hammered ~8x per page load (cosmetic; ModelSelector mounts in two spots and doesn't share the catalog fetch). Will fix tomorrow.🤖 Generated with Claude Code