Skip to content

Update Ling 2.6 Flash ROCm command#594

Open
haic0 wants to merge 1 commit into
vllm-project:mainfrom
haic0:haic0/update-ling26-flash-rocm-command
Open

Update Ling 2.6 Flash ROCm command#594
haic0 wants to merge 1 commit into
vllm-project:mainfrom
haic0:haic0/update-ling26-flash-rocm-command

Conversation

@haic0

@haic0 haic0 commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Adjust Ling 2.6 Flash to the provided TP=2 ROCm launch command.
  • Aligns the recipe launch guidance with the provided vLLM serve command.

Test plan

  • Ran node scripts/build-recipes-api.mjs on the complete validated recipe update set.

Made with Cursor

Adjust Ling 2.6 Flash to the provided TP=2 ROCm launch command.

Signed-off-by: haic0 <haichzha@amd.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
@vercel

vercel Bot commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
vllm-recipes Ready Ready Preview, Comment Jun 29, 2026 1:50pm

Request Review

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the configuration for inclusionAI/Ling-2.6-flash.yaml to reduce the default tensor parallel size from 4 to 2, updates the strategy overrides and description accordingly, and adds a ROCm vLLM serve command to the guide. The reviewer pointed out an issue in the Docker run command where prepending vllm serve to the arguments will break container startup due to the image's existing entrypoint.

-e VLLM_ROCM_USE_AITER=1 \
vllm/vllm-openai-rocm:v0.20.2 \
inclusionAI/Ling-2.6-flash \
vllm serve inclusionAI/Ling-2.6-flash \

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Prepending vllm serve to the Docker command arguments will cause the container to fail at startup. The official vllm/vllm-openai-rocm image already defines an entrypoint (such as python3 -m vllm.entrypoints.openai.api_server or vllm serve). Passing vllm serve as arguments appends them to the entrypoint, resulting in an invalid command execution (e.g., vllm serve vllm serve ... or api_server vllm serve ...).

To fix this, we should revert to passing only the model ID and its arguments, as the entrypoint already handles the execution of the server.

      inclusionAI/Ling-2.6-flash \

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants