fix(google/gemma-4-26b-a4b-it): add --max-num-batched-tokens to single-GPU command by mahadrehmann · Pull Request #572 · vllm-project/recipes

mahadrehmann · 2026-06-22T11:30:00Z

Summary

Fixes #441

The single-GPU BF16 command for gemma-4-26B-A4B-it was missing
--max-num-batched-tokens, causing this error on 1× A100/H100:

ValueError: Chunked MM input disabled but max_tokens_per_mm_item (2496)
is larger than max_num_batched_tokens (2048).
Please increase max_num_batched_tokens.

This happens because the model's multimodal token budget (2496) exceeds
vLLM's default max_num_batched_tokens of 2048. Added
--max-num-batched-tokens 4096 to all single-GPU command blocks to clear
this threshold.

Changes

models/Google/gemma-4-26B-A4B-it.yaml: added --max-num-batched-tokens 4096 to:
- ### 26B MoE on 1x A100/H100 (BF16) command block
- ### Full-Featured Server Launch command block
- ### Docker (NVIDIA) command block

Co-authored-by: @muhammadfawaz1

…e-GPU command Co-authored-by: muhammadfawaz1 <135441198+muhammadfawaz1@users.noreply.github.com> Signed-off-by: mahadrehmann <mahadrehman04@gmail.com>

vercel · 2026-06-22T11:30:09Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
vllm-recipes	Ready	Preview, Comment	Jun 22, 2026 11:38am

gemini-code-assist

Code Review

This pull request updates the deployment guide for the Google Gemma-4-26B-A4B-it model by adding the --max-num-batched-tokens 4096 flag to the basic vllm serve command. The reviewer points out that other deployment commands in the same file, as well as in the corresponding markdown documentation, should also be updated with this flag to prevent users from encountering the same error across different setup configurations.

Copilot

Pull request overview

This PR updates the Gemma 4 26B MoE single-GPU (A100/H100, BF16) launch command to include an explicit --max-num-batched-tokens value, preventing a vLLM runtime error when the multimodal encoder token budget exceeds the default batch token limit.

Changes:

Added --max-num-batched-tokens 4096 to the “26B MoE on 1x A100/H100 (BF16)” vllm serve command in the Gemma 4 26B A4B IT recipe.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…ingle-GPU commands Co-authored-by: muhammadfawaz1 <135441198+muhammadfawaz1@users.noreply.github.com> Signed-off-by: mahadrehmann <mahadrehman04@gmail.com>

Copilot

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated 2 comments.

mahadrehmann · 2026-06-29T04:41:56Z

Friendly ping, all checks are passing and no conflicts. Would appreciate a review when you get a chance. @Isotr0py @ywang96 @jeejeelee

fix(google/gemma-4-26b-a4b-it): add --max-num-batched-tokens to singl…

c32df30

…e-GPU command Co-authored-by: muhammadfawaz1 <135441198+muhammadfawaz1@users.noreply.github.com> Signed-off-by: mahadrehmann <mahadrehman04@gmail.com>

Copilot AI review requested due to automatic review settings June 22, 2026 11:30

Copilot started reviewing on behalf of mahadrehmann June 22, 2026 11:30 View session

gemini-code-assist Bot reviewed Jun 22, 2026

View reviewed changes

Comment thread models/Google/gemma-4-26B-A4B-it.yaml

Copilot AI reviewed Jun 22, 2026

View reviewed changes

vercel Bot deployed to Preview June 22, 2026 11:31 View deployment

fix(google/gemma-4-26b-a4b-it): add --max-num-batched-tokens to all s…

195ed25

…ingle-GPU commands Co-authored-by: muhammadfawaz1 <135441198+muhammadfawaz1@users.noreply.github.com> Signed-off-by: mahadrehmann <mahadrehman04@gmail.com>

mahadrehmann requested a review from Copilot June 22, 2026 11:37

Copilot started reviewing on behalf of mahadrehmann June 22, 2026 11:38 View session

vercel Bot deployed to Preview June 22, 2026 11:38 View deployment

Copilot AI reviewed Jun 22, 2026

View reviewed changes

Comment thread models/Google/gemma-4-26B-A4B-it.yaml

Comment thread models/Google/gemma-4-26B-A4B-it.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(google/gemma-4-26b-a4b-it): add --max-num-batched-tokens to single-GPU command#572

fix(google/gemma-4-26b-a4b-it): add --max-num-batched-tokens to single-GPU command#572
mahadrehmann wants to merge 2 commits into
vllm-project:mainfrom
mahadrehmann:fix/gemma4-26b-max-batched-tokens

mahadrehmann commented Jun 22, 2026 •

edited

Loading

Uh oh!

vercel Bot commented Jun 22, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

mahadrehmann commented Jun 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

mahadrehmann commented Jun 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Uh oh!

vercel Bot commented Jun 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

mahadrehmann commented Jun 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mahadrehmann commented Jun 22, 2026 •

edited

Loading

vercel Bot commented Jun 22, 2026 •

edited

Loading