feat: add QMD_GPU env var to override GPU backend detection#272
Open
huzaifahkhojani-cyber wants to merge 1 commit intotobi:mainfrom
Open
feat: add QMD_GPU env var to override GPU backend detection#272huzaifahkhojani-cyber wants to merge 1 commit intotobi:mainfrom
huzaifahkhojani-cyber wants to merge 1 commit intotobi:mainfrom
Conversation
On AMD ROCm systems, CUDA is reported as available by node-llama-cpp's detection even when no CUDA Toolkit is installed. This causes a failed build attempt followed by a CPU fallback, which segfaults in Bun when loading large models. This adds a QMD_GPU environment variable that allows users to force a specific GPU backend (cuda, metal, vulkan) or disable GPU entirely (false), bypassing auto-detection when it gets it wrong. Example: QMD_GPU=vulkan qmd query 'my search' Tested on AMD Ryzen 7 255 with Radeon 780M (gfx1103) + ROCm + Vulkan.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
On AMD systems with ROCm installed,
node-llama-cpp'sgetLlamaGpuTypes()reports CUDA as available (via ROCm's CUDA compatibility layer). QMD then tries to build with CUDA, fails because there's no actual CUDA Toolkit, and falls back to CPU mode — which causes a Bun segfault when loading embedding/reranking models into RAM.This affects AMD GPU systems (tested: Radeon 780M / gfx1103) that have ROCm + Vulkan but no NVIDIA hardware.
Fix
Add a
QMD_GPUenvironment variable that overrides the auto-detection:When unset, behaviour is unchanged (auto-detect: CUDA > Metal > Vulkan > CPU).
Changes
src/llm.ts: ReadQMD_GPUenv var before falling through to auto-detectionREADME.md: Document the new env var under Model ConfigurationTesting
Tested on:
QMD_GPU=vulkan qmd query "test"→ works, GPU-accelerated, no crash