fix: AI SDK compatibility, rootless Docker, preflight checks by aWN4Y25pa2EK · Pull Request #5 · infernet-org/foundry

aWN4Y25pa2EK · 2026-03-02T18:25:27Z

Summary

Add --reasoning-format none to Qwen3.5-9B profiles to prevent AI SDK crash on empty <think> blocks
Fix docker-compose.yml for rootless Docker (remove failing sysctls/ulimits, fix prometheus duplicate volumes)
Add startup preflight check that warns about suboptimal host kernel params (BBR, NUMA, swappiness, GPU persistence)
Update AGENTS.md OpenCode config to working @ai-sdk/openai-compatible setup
Add comprehensive TROUBLESHOOTING.md

Problem

The Unsloth Qwen3.5-9B GGUF always emits <think> tokens even with thinking disabled. When llama-server's default --reasoning-format auto detects these, it produces a reasoning_content field in the API response. The Vercel AI SDK's extractReasoningMiddleware crashes on empty reasoning blocks (vercel/ai #12054), causing:

@ai-sdk/openai: "text part msg_... not found" crash
@ai-sdk/openai-compatible: Raw <tool_call> XML in content instead of executing tools

Additionally, docker-compose.yml had container-level sysctls (tcp_congestion_control=bbr, busy_read, busy_poll) and ulimits.memlock that fail on rootless Docker. These settings are already applied at the host level by scripts/host-setup.sh, but there was no warning when they weren't applied.

Solution

Server-side: --reasoning-format none in PROFILE_EXTRA_ARGS tells llama-server to leave <think> tags as plain text in content and never produce a reasoning_content field.

Client-side: Use @ai-sdk/openai-compatible (not @ai-sdk/openai) which doesn't have the extractReasoningMiddleware.

Docker: Remove rootless-incompatible settings from compose, add comment pointing to host-setup.sh.

Preflight check: New preflight_check() in the entrypoint reads /proc/sys/ on startup and warns about missing host tuning:

[foundry] Checking host kernel parameters...
[foundry]   tcp_congestion_control = cubic (expected: bbr) -- BBR reduces token streaming latency
[foundry]
[foundry]   1 performance issue(s) detected. Run on the host:
[foundry]     sudo ./scripts/host-setup.sh

Files Changed

File	Change
`models/qwen3.5-9b/profiles/rtx5090.sh`	Add `--reasoning-format none` with comment
`models/qwen3.5-9b/profiles/default.sh`	Add `--reasoning-format none`
`scripts/entrypoint.sh`	Add `preflight_check()` for host kernel param warnings
`models/qwen3.5-9b/entrypoint.sh`	Sync with scripts/entrypoint.sh
`docker-compose.yml`	Remove rootless-incompatible sysctls/ulimits, fix prometheus, add host-setup.sh comment
`AGENTS.md`	Update OpenCode config, add troubleshooting cross-refs
`TROUBLESHOOTING.md`	New comprehensive troubleshooting guide
`.gitignore`	Add `/opencode.json`, `remote-build.log`

Testing

Verified on eae@192.168.0.219 (RTX 5090, rootless Docker):

Preflight check correctly detects missing BBR
--reasoning-format none confirmed: no reasoning_content in API responses
Streaming tool calls work with @ai-sdk/openai-compatible
4 parallel slots, 262K context/slot, all 33/33 layers on GPU

…shooting docs - Add --reasoning-format none to Qwen3.5-9B profiles (rtx5090, default) to prevent AI SDK extractReasoningMiddleware crash on empty <think> blocks emitted by the Unsloth GGUF chat template (vercel/ai #12054) - Fix docker-compose.yml for rootless Docker compatibility: - Remove container-level sysctls (bbr, busy_read, busy_poll) that fail on rootless Docker. These are host-level settings already applied by scripts/host-setup.sh - Remove ulimits.memlock (fails on rootless Docker, --mlock in profile handles memory locking at the application level) - Fix duplicate volumes key in prometheus service (YAML bug) - Add preflight_check() to entrypoint that warns on startup about suboptimal host kernel params (BBR, NUMA balancing, swappiness, GPU persistence mode) with actionable fix: sudo ./scripts/host-setup.sh - Update AGENTS.md OpenCode config to working @ai-sdk/openai-compatible setup with correct model ID (Qwen3.5-9B-UD-Q4_K_XL.gguf) and note about @ai-sdk/openai crash - Add comprehensive TROUBLESHOOTING.md with symptom-first headings, exact error messages, diagnostic commands, and verification steps Tested on eae@192.168.0.219 (RTX 5090, rootless Docker): - Preflight check detects missing BBR correctly - --reasoning-format none confirmed: no reasoning_content in responses - Tool calling works with @ai-sdk/openai-compatible - 4 slots, 262K context/slot, all 33 layers on GPU

aWN4Y25pa2EK merged commit ca38cb4 into main Mar 2, 2026
6 checks passed

aWN4Y25pa2EK deleted the fix/ai-sdk-compat-rootless-docker branch March 2, 2026 18:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: AI SDK compatibility, rootless Docker, preflight checks#5

fix: AI SDK compatibility, rootless Docker, preflight checks#5
aWN4Y25pa2EK merged 1 commit into
mainfrom
fix/ai-sdk-compat-rootless-docker

aWN4Y25pa2EK commented Mar 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

aWN4Y25pa2EK commented Mar 2, 2026

Summary

Problem

Solution

Files Changed

Testing

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant