Skip to content

fix(prod): bump gunicorn timeout to 120s; drop dead Bing API#20

Merged
paritoshtripathi935 merged 1 commit into
mainfrom
fix/answer-timeout-bing-410
May 9, 2026
Merged

fix(prod): bump gunicorn timeout to 120s; drop dead Bing API#20
paritoshtripathi935 merged 1 commit into
mainfrom
fix/answer-timeout-bing-410

Conversation

@paritoshtripathi935
Copy link
Copy Markdown
Owner

Two prod-breaking findings from Render logs

1. /answer worker timeouts (chats failing) 🔴

2026-05-09T17:00:11 [CRITICAL] WORKER TIMEOUT (pid:42)
2026-05-09T17:00:12 Worker (pid:42) was sent SIGKILL! Perhaps out of memory?

The default gunicorn worker timeout is 30s. The new chat path:

  • parallel search providers (~3s, made worse by Bing's slow 410)
  • LLM rerank call (~1–2s)
  • /answer on a reasoning model — gpt-oss-120b 5–15s, qwq-32b 30–60s for chain-of-thought

Total can comfortably exceed 30s. The user's "QwQ 32B" attempt confirmed this — QwQ's reasoning chain alone breaks the budget.

Fix: timeout = 120 + graceful_timeout = 30 in backend/gunicorn_config.py. Long-tail requests still get a clean 504 rather than a 30s SIGKILL.

2. Bing Search API permanently dead (HTTP 410) 🟡

Bing search error: 410 Client Error: Gone for url:
https://api.bing.microsoft.com/v7.0/search?q=...

Microsoft retired the Bing v7 Search API on 2025-08-11. Every call has been failing for months — not a key issue, the endpoint is gone. Beyond polluting logs, the future was burning ~3s of executor time before failing, contributing to the timeout above.

Fix: drop bing_future from perform_search's ThreadPoolExecutor. Pool max-workers reduced from 3 → 2 (Google + YouTube only). The search_bing function is left in place as dead code — same shape if we slot a replacement in later (Tavily — see PR #1 — or Brave Search).

Test plan

  • Render redeploys; new worker boots without errors
  • Send a chat with gpt-oss-120b → returns within ~30s
  • Send a chat with qwq-32b → returns within 60–90s (no SIGKILL)
  • Render logs no longer contain the Bing search error: 410 line
  • /search end-to-end is noticeably faster (3–5s shaved)

Not in this PR

  • /api/v1/models getting hammered ~8x per page load (cosmetic; ModelSelector mounts in two spots and doesn't share the catalog fetch). Will fix tomorrow.
  • Replacement search provider for the lost Bing volume — Tavily is sitting in PR #1; worth reviewing as a follow-up.

🤖 Generated with Claude Code

Two prod issues from Render logs at 2026-05-09T17:00:

1. Gunicorn killed /answer mid-generation:
     [CRITICAL] WORKER TIMEOUT (pid:42)
     Worker (pid:42) was sent SIGKILL! Perhaps out of memory?
   The default 30s worker timeout is too short for the new chat path:
   parallel search providers (~3s) → LLM rerank (~1–2s) → main answer
   on a reasoning-tuned model (gpt-oss-120b 5–15s; qwq-32b 30–60s for
   chain-of-thought). Bumped to 120s — long-tail still gets a clean 504
   instead of a SIGKILL.

2. Bing Search API returns 410 Gone on every call:
     Bing search error: 410 Client Error: Gone for url:
     https://api.bing.microsoft.com/v7.0/search?...
   Microsoft permanently retired the Bing v7 Search API on 2025-08-11.
   Removed from perform_search's executor pool. Still ~3s of pointless
   wait per /search before this; with it gone, /search drops noticeably.
   `search_bing` is left in the file as dead code — same shape if we
   want to slot a replacement (Tavily, Brave) in later. See PR #1.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@netlify
Copy link
Copy Markdown

netlify Bot commented May 9, 2026

Deploy Preview for mini-perplexity canceled.

Name Link
🔨 Latest commit bd492e5
🔍 Latest deploy log https://app.netlify.com/projects/mini-perplexity/deploys/69ff69f16dac870008e4d848

@paritoshtripathi935 paritoshtripathi935 merged commit 5153a76 into main May 9, 2026
4 checks passed
@paritoshtripathi935 paritoshtripathi935 deleted the fix/answer-timeout-bing-410 branch May 9, 2026 17:08
paritoshtripathi935 added a commit that referenced this pull request May 10, 2026
…#24)

Render's start command does not pass -c, so gunicorn auto-discovery is the only way the config file is loaded. Auto-discovery only matches gunicorn.conf.py (with a dot), not gunicorn_config.py — so the timeout=120 setting added in #20 was a no-op and /answer continued to SIGKILL at the default 30s. Rename the file to make discovery actually find it.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant