Update Mellum2 Instruct serve command#590
Conversation
Add trust-remote-code to the Mellum2 Instruct recipe and show the TP=1 launch command. Signed-off-by: haic0 <haichzha@amd.com> Co-authored-by: Cursor <cursoragent@cursor.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
There was a problem hiding this comment.
Code Review
This pull request updates the configuration for the Mellum2-12B-A2.5B-Instruct model, adding hardware overrides and replacing the --max-model-len argument with --trust-remote-code in both the base arguments and the guide commands. The reviewer feedback correctly points out that --max-model-len should be retained alongside --trust-remote-code to prevent potential out-of-memory errors during KV cache allocation due to the model's large context length.
| base_args: | ||
| - "--max-model-len" | ||
| - "131072" | ||
| - "--trust-remote-code" |
There was a problem hiding this comment.
It looks like --max-model-len was accidentally replaced by --trust-remote-code instead of appending it. For models with large context lengths like Mellum2 (131,072), explicitly setting --max-model-len is important to prevent out-of-memory errors during KV cache allocation in vLLM. We should keep both arguments.
- "--max-model-len"
- "131072"
- "--trust-remote-code"| --trust-remote-code \ | ||
| --tensor-parallel-size 1 |
| --trust-remote-code \ | ||
| --tensor-parallel-size 1 \ |
Summary
Test plan
node scripts/build-recipes-api.mjson the complete validated recipe update set.Made with Cursor