Skip to content

fix: AI SDK compatibility, rootless Docker, preflight checks#5

Merged
aWN4Y25pa2EK merged 1 commit into
mainfrom
fix/ai-sdk-compat-rootless-docker
Mar 2, 2026
Merged

fix: AI SDK compatibility, rootless Docker, preflight checks#5
aWN4Y25pa2EK merged 1 commit into
mainfrom
fix/ai-sdk-compat-rootless-docker

Conversation

@aWN4Y25pa2EK
Copy link
Copy Markdown
Member

Summary

  • Add --reasoning-format none to Qwen3.5-9B profiles to prevent AI SDK crash on empty <think> blocks
  • Fix docker-compose.yml for rootless Docker (remove failing sysctls/ulimits, fix prometheus duplicate volumes)
  • Add startup preflight check that warns about suboptimal host kernel params (BBR, NUMA, swappiness, GPU persistence)
  • Update AGENTS.md OpenCode config to working @ai-sdk/openai-compatible setup
  • Add comprehensive TROUBLESHOOTING.md

Problem

The Unsloth Qwen3.5-9B GGUF always emits <think> tokens even with thinking disabled. When llama-server's default --reasoning-format auto detects these, it produces a reasoning_content field in the API response. The Vercel AI SDK's extractReasoningMiddleware crashes on empty reasoning blocks (vercel/ai #12054), causing:

  • @ai-sdk/openai: "text part msg_... not found" crash
  • @ai-sdk/openai-compatible: Raw <tool_call> XML in content instead of executing tools

Additionally, docker-compose.yml had container-level sysctls (tcp_congestion_control=bbr, busy_read, busy_poll) and ulimits.memlock that fail on rootless Docker. These settings are already applied at the host level by scripts/host-setup.sh, but there was no warning when they weren't applied.

Solution

Server-side: --reasoning-format none in PROFILE_EXTRA_ARGS tells llama-server to leave <think> tags as plain text in content and never produce a reasoning_content field.

Client-side: Use @ai-sdk/openai-compatible (not @ai-sdk/openai) which doesn't have the extractReasoningMiddleware.

Docker: Remove rootless-incompatible settings from compose, add comment pointing to host-setup.sh.

Preflight check: New preflight_check() in the entrypoint reads /proc/sys/ on startup and warns about missing host tuning:

[foundry] Checking host kernel parameters...
[foundry]   tcp_congestion_control = cubic (expected: bbr) -- BBR reduces token streaming latency
[foundry]
[foundry]   1 performance issue(s) detected. Run on the host:
[foundry]     sudo ./scripts/host-setup.sh

Files Changed

File Change
models/qwen3.5-9b/profiles/rtx5090.sh Add --reasoning-format none with comment
models/qwen3.5-9b/profiles/default.sh Add --reasoning-format none
scripts/entrypoint.sh Add preflight_check() for host kernel param warnings
models/qwen3.5-9b/entrypoint.sh Sync with scripts/entrypoint.sh
docker-compose.yml Remove rootless-incompatible sysctls/ulimits, fix prometheus, add host-setup.sh comment
AGENTS.md Update OpenCode config, add troubleshooting cross-refs
TROUBLESHOOTING.md New comprehensive troubleshooting guide
.gitignore Add /opencode.json, remote-build.log

Testing

Verified on eae@192.168.0.219 (RTX 5090, rootless Docker):

  • Preflight check correctly detects missing BBR
  • --reasoning-format none confirmed: no reasoning_content in API responses
  • Streaming tool calls work with @ai-sdk/openai-compatible
  • 4 parallel slots, 262K context/slot, all 33/33 layers on GPU

…shooting docs

- Add --reasoning-format none to Qwen3.5-9B profiles (rtx5090, default)
  to prevent AI SDK extractReasoningMiddleware crash on empty <think>
  blocks emitted by the Unsloth GGUF chat template (vercel/ai #12054)

- Fix docker-compose.yml for rootless Docker compatibility:
  - Remove container-level sysctls (bbr, busy_read, busy_poll) that fail
    on rootless Docker. These are host-level settings already applied by
    scripts/host-setup.sh
  - Remove ulimits.memlock (fails on rootless Docker, --mlock in profile
    handles memory locking at the application level)
  - Fix duplicate volumes key in prometheus service (YAML bug)

- Add preflight_check() to entrypoint that warns on startup about
  suboptimal host kernel params (BBR, NUMA balancing, swappiness, GPU
  persistence mode) with actionable fix: sudo ./scripts/host-setup.sh

- Update AGENTS.md OpenCode config to working @ai-sdk/openai-compatible
  setup with correct model ID (Qwen3.5-9B-UD-Q4_K_XL.gguf) and note
  about @ai-sdk/openai crash

- Add comprehensive TROUBLESHOOTING.md with symptom-first headings,
  exact error messages, diagnostic commands, and verification steps

Tested on eae@192.168.0.219 (RTX 5090, rootless Docker):
- Preflight check detects missing BBR correctly
- --reasoning-format none confirmed: no reasoning_content in responses
- Tool calling works with @ai-sdk/openai-compatible
- 4 slots, 262K context/slot, all 33 layers on GPU
@aWN4Y25pa2EK aWN4Y25pa2EK merged commit ca38cb4 into main Mar 2, 2026
6 checks passed
@aWN4Y25pa2EK aWN4Y25pa2EK deleted the fix/ai-sdk-compat-rootless-docker branch March 2, 2026 18:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant