Update Ling 2.6 Flash ROCm command#594
Conversation
Adjust Ling 2.6 Flash to the provided TP=2 ROCm launch command. Signed-off-by: haic0 <haichzha@amd.com> Co-authored-by: Cursor <cursoragent@cursor.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
There was a problem hiding this comment.
Code Review
This pull request updates the configuration for inclusionAI/Ling-2.6-flash.yaml to reduce the default tensor parallel size from 4 to 2, updates the strategy overrides and description accordingly, and adds a ROCm vLLM serve command to the guide. The reviewer pointed out an issue in the Docker run command where prepending vllm serve to the arguments will break container startup due to the image's existing entrypoint.
| -e VLLM_ROCM_USE_AITER=1 \ | ||
| vllm/vllm-openai-rocm:v0.20.2 \ | ||
| inclusionAI/Ling-2.6-flash \ | ||
| vllm serve inclusionAI/Ling-2.6-flash \ |
There was a problem hiding this comment.
Prepending vllm serve to the Docker command arguments will cause the container to fail at startup. The official vllm/vllm-openai-rocm image already defines an entrypoint (such as python3 -m vllm.entrypoints.openai.api_server or vllm serve). Passing vllm serve as arguments appends them to the entrypoint, resulting in an invalid command execution (e.g., vllm serve vllm serve ... or api_server vllm serve ...).
To fix this, we should revert to passing only the model ID and its arguments, as the entrypoint already handles the execution of the server.
inclusionAI/Ling-2.6-flash \
Summary
Test plan
node scripts/build-recipes-api.mjson the complete validated recipe update set.Made with Cursor