Skip to content

Update Mellum2 Instruct serve command#590

Open
haic0 wants to merge 1 commit into
vllm-project:mainfrom
haic0:haic0/update-mellum2-instruct-command
Open

Update Mellum2 Instruct serve command#590
haic0 wants to merge 1 commit into
vllm-project:mainfrom
haic0:haic0/update-mellum2-instruct-command

Conversation

@haic0

@haic0 haic0 commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Add trust-remote-code to the Mellum2 Instruct recipe and show the TP=1 launch command.
  • Aligns the recipe launch guidance with the provided vLLM serve command.

Test plan

  • Ran node scripts/build-recipes-api.mjs on the complete validated recipe update set.

Made with Cursor

Add trust-remote-code to the Mellum2 Instruct recipe and show the TP=1 launch command.

Signed-off-by: haic0 <haichzha@amd.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
@vercel

vercel Bot commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
vllm-recipes Ready Ready Preview, Comment Jun 29, 2026 1:44pm

Request Review

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the configuration for the Mellum2-12B-A2.5B-Instruct model, adding hardware overrides and replacing the --max-model-len argument with --trust-remote-code in both the base arguments and the guide commands. The reviewer feedback correctly points out that --max-model-len should be retained alongside --trust-remote-code to prevent potential out-of-memory errors during KV cache allocation due to the model's large context length.

base_args:
- "--max-model-len"
- "131072"
- "--trust-remote-code"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

It looks like --max-model-len was accidentally replaced by --trust-remote-code instead of appending it. For models with large context lengths like Mellum2 (131,072), explicitly setting --max-model-len is important to prevent out-of-memory errors during KV cache allocation in vLLM. We should keep both arguments.

    - "--max-model-len"
    - "131072"
    - "--trust-remote-code"

Comment on lines +88 to +89
--trust-remote-code \
--tensor-parallel-size 1

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

To align with the base_args and ensure the launch command is robust, we should retain the --max-model-len 131072 parameter in the guide's serving commands.

    --max-model-len 131072 \
    --trust-remote-code \
    --tensor-parallel-size 1

Comment on lines +93 to +94
--trust-remote-code \
--tensor-parallel-size 1 \

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Similarly, we should retain --max-model-len 131072 in the tool-calling launch command example.

    --max-model-len 131072 \
    --trust-remote-code \
    --tensor-parallel-size 1 \

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants