Update DeepSeek-V3.2-Exp AMD recipe YAML format#546
Conversation
Signed-off-by: haic0 <haic0@users.noreply.github.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
There was a problem hiding this comment.
Code Review
This pull request adds support and documentation for running DeepSeek-V3.2-Exp on AMD ROCm GPUs (MI300X, MI325X, and MI355X), including installation steps and serving commands. Feedback on these changes highlights that the --no-enable-prefix-caching flag is invalid in vLLM and should be removed, that DeepGEMM is CUDA-only and should be explicitly skipped in the AMD installation instructions, and that the overview section needs an update to reflect the newly added AMD ROCm support.
| vllm serve deepseek-ai/DeepSeek-V3.2-Exp \ | ||
| --tensor-parallel-size 8 \ | ||
| --max-num-batched-tokens 32768 \ | ||
| --trust-remote-code \ | ||
| --no-enable-prefix-caching \ | ||
| --kv-cache-dtype bfloat16 \ | ||
| --block-size 1 |
There was a problem hiding this comment.
The --no-enable-prefix-caching flag is not a valid vLLM CLI argument. Since vLLM uses standard argparse for boolean flags like --enable-prefix-caching, passing --no-enable-prefix-caching will result in an unrecognized arguments error and prevent the server from starting.
Since prefix caching is disabled by default, you can simply omit this flag.
Note: Please also remove "--no-enable-prefix-caching" from the hardware_overrides.amd.extra_args list on line 76 of this file to prevent similar runtime errors when the recipe is parsed.
vllm serve deepseek-ai/DeepSeek-V3.2-Exp \
--tensor-parallel-size 8 \
--max-num-batched-tokens 32768 \
--trust-remote-code \
--kv-cache-dtype bfloat16 \
--block-size 1| AMD ROCm wheel: | ||
|
|
||
| ```bash | ||
| uv venv --python 3.12 | ||
| source .venv/bin/activate | ||
| uv pip install vllm --extra-index-url https://wheels.vllm.ai/rocm/ | ||
| ``` |
There was a problem hiding this comment.
DeepGEMM is a CUDA-only library optimized specifically for NVIDIA Hopper and Blackwell architectures. It does not support AMD ROCm and will fail to compile/install on ROCm systems.
To prevent AMD users from running the NVIDIA-specific installation steps (which include installing DeepGEMM), we should explicitly separate the AMD ROCm installation instructions and advise them to skip the DeepGEMM step.
Consider updating this section to:
### AMD ROCm Installation
For AMD ROCm, do not install DeepGEMM (which is CUDA-only). Instead, install the ROCm-compatible vLLM wheel directly:
```bash
uv venv --python 3.12
source .venv/bin/activate
uv pip install vllm --extra-index-url https://wheels.vllm.ai/rocm/| ## Prerequisites | ||
|
|
||
| - **Hardware**: 8x H200 (or H20, or 8xB200) GPUs | ||
| - **AMD ROCm**: 8x MI300X / MI325X / MI355X GPUs |
There was a problem hiding this comment.
With the addition of AMD ROCm support for MI300X, MI325X, and MI355X, the statement in the Overview section (lines 91-92) which says:
Only Hopper and Blackwell data center GPUs are supported for now.
is now outdated and contradictory.
Please update the Overview section to reflect that AMD ROCm GPUs are also supported.
Summary
models/...yaml.Test plan
node scripts/build-recipes-api.mjs.claude/skills/add-recipe/SKILL.md.haic0.Replaces #279.