fix: AI SDK compatibility, rootless Docker, preflight checks#5
Merged
Conversation
…shooting docs
- Add --reasoning-format none to Qwen3.5-9B profiles (rtx5090, default)
to prevent AI SDK extractReasoningMiddleware crash on empty <think>
blocks emitted by the Unsloth GGUF chat template (vercel/ai #12054)
- Fix docker-compose.yml for rootless Docker compatibility:
- Remove container-level sysctls (bbr, busy_read, busy_poll) that fail
on rootless Docker. These are host-level settings already applied by
scripts/host-setup.sh
- Remove ulimits.memlock (fails on rootless Docker, --mlock in profile
handles memory locking at the application level)
- Fix duplicate volumes key in prometheus service (YAML bug)
- Add preflight_check() to entrypoint that warns on startup about
suboptimal host kernel params (BBR, NUMA balancing, swappiness, GPU
persistence mode) with actionable fix: sudo ./scripts/host-setup.sh
- Update AGENTS.md OpenCode config to working @ai-sdk/openai-compatible
setup with correct model ID (Qwen3.5-9B-UD-Q4_K_XL.gguf) and note
about @ai-sdk/openai crash
- Add comprehensive TROUBLESHOOTING.md with symptom-first headings,
exact error messages, diagnostic commands, and verification steps
Tested on eae@192.168.0.219 (RTX 5090, rootless Docker):
- Preflight check detects missing BBR correctly
- --reasoning-format none confirmed: no reasoning_content in responses
- Tool calling works with @ai-sdk/openai-compatible
- 4 slots, 262K context/slot, all 33 layers on GPU
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
--reasoning-format noneto Qwen3.5-9B profiles to prevent AI SDK crash on empty<think>blocks@ai-sdk/openai-compatiblesetupProblem
The Unsloth Qwen3.5-9B GGUF always emits
<think>tokens even with thinking disabled. When llama-server's default--reasoning-format autodetects these, it produces areasoning_contentfield in the API response. The Vercel AI SDK'sextractReasoningMiddlewarecrashes on empty reasoning blocks (vercel/ai #12054), causing:@ai-sdk/openai:"text part msg_... not found"crash@ai-sdk/openai-compatible: Raw<tool_call>XML in content instead of executing toolsAdditionally,
docker-compose.ymlhad container-level sysctls (tcp_congestion_control=bbr,busy_read,busy_poll) andulimits.memlockthat fail on rootless Docker. These settings are already applied at the host level byscripts/host-setup.sh, but there was no warning when they weren't applied.Solution
Server-side:
--reasoning-format noneinPROFILE_EXTRA_ARGStells llama-server to leave<think>tags as plain text incontentand never produce areasoning_contentfield.Client-side: Use
@ai-sdk/openai-compatible(not@ai-sdk/openai) which doesn't have theextractReasoningMiddleware.Docker: Remove rootless-incompatible settings from compose, add comment pointing to
host-setup.sh.Preflight check: New
preflight_check()in the entrypoint reads/proc/sys/on startup and warns about missing host tuning:Files Changed
models/qwen3.5-9b/profiles/rtx5090.sh--reasoning-format nonewith commentmodels/qwen3.5-9b/profiles/default.sh--reasoning-format nonescripts/entrypoint.shpreflight_check()for host kernel param warningsmodels/qwen3.5-9b/entrypoint.shdocker-compose.ymlAGENTS.mdTROUBLESHOOTING.md.gitignore/opencode.json,remote-build.logTesting
Verified on
eae@192.168.0.219(RTX 5090, rootless Docker):--reasoning-format noneconfirmed: noreasoning_contentin API responses@ai-sdk/openai-compatible