[Bugfix] Adjust Qwen3 Stage GPU utils #140

tzhouam · 2025-12-01T09:02:19Z

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

Adjust Qwen3 Stage GPU utils enabling the running on 2 H200 GPUs.

Test Plan

Tested.

Test Result

[Stage-0] Max batch size: 1
--------------------------------
[Stage-0] Received batch size=1, request_ids=[0]
--------------------------------
(Worker_TP1 pid=1341257) INFO:vllm_omni.model_executor.layers.mrope:Multimodal token idx changed!
(Worker_TP0 pid=1340738) INFO:vllm_omni.model_executor.layers.mrope:Multimodal token idx changed!
[Stage-0] Generate done: batch=1, req_ids=[0], gen_ms=15413.5
[Stage-1] Max batch size: 1
--------------------------------
[Stage-1] Received batch size=1, request_ids=[0]
--------------------------------
(Worker pid=1341218) /mnt/ztc_vllm/vllm-omni/vllm_omni/worker/gpu_model_runner.py:207: UserWarning: The given NumPy array is not writable, and PyTorch does not support non-writable tensors. This means writing to this tensor will result in undefined behavior. You may want to copy the array to protect its data or make it writable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at /pytorch/torch/csrc/utils/tensor_numpy.cpp:203.)
(Worker pid=1341218)   info_dict[k] = torch.from_numpy(arr)
(Worker pid=1341218) INFO:vllm_omni.model_executor.layers.mrope:Multimodal token idx changed!
[Stage-1] Generate done: batch=1, req_ids=[0], gen_ms=33673.4
[Stage-2] Max batch size: 1
--------------------------------
[Stage-2] Received batch size=1, request_ids=[0]
--------------------------------
(Worker pid=1341463) INFO:vllm_omni.model_executor.layers.mrope:Multimodal token idx changed!
[Stage-2] Generate done: batch=1, req_ids=[0], gen_ms=179.9
INFO:vllm_omni.entrypoints.omni_llm:[Summary] {'e2e_requests': 1, 'e2e_total_time_ms': 50625.213623046875, 'e2e_sum_time_ms': 50624.441146850586, 'e2e_total_tokens': 0, 'e2e_avg_time_per_request_ms': 50624.441146850586, 'e2e_avg_tokens_per_s': 0.0, 'wall_time_ms': 50625.213623046875, 'final_stage_id': 2, 'stages': [{'stage_id': 0, 'requests': 1, 'tokens': 68, 'total_time_ms': 15835.153102874756, 'avg_time_per_request_ms': 15835.153102874756, 'avg_tokens_per_s': 4.294243292643322}, {'stage_id': 1, 'requests': 1, 'tokens': 278, 'total_time_ms': 33915.329456329346, 'avg_time_per_request_ms': 33915.329456329346, 'avg_tokens_per_s': 8.196883369744743}, {'stage_id': 2, 'requests': 1, 'tokens': 0, 'total_time_ms': 190.22488594055176, 'avg_time_per_request_ms': 190.22488594055176, 'avg_tokens_per_s': 0.0}], 'transfers': [{'from_stage': 0, 'to_stage': 1, 'samples': 1, 'total_bytes': 100728630, 'total_time_ms': 208.88733863830566, 'tx_mbps': 3857.7208425031245, 'rx_samples': 1, 'rx_total_bytes': 100728630, 'rx_total_time_ms': 195.5254077911377, 'rx_mbps': 4121.352048838558, 'total_samples': 1, 'total_transfer_time_ms': 414.17789459228516, 'total_mbps': 1945.61093317946}, {'from_stage': 1, 'to_stage': 2, 'samples': 1, 'total_bytes': 12751, 'total_time_ms': 0.4913806915283203, 'tx_mbps': 207.59464455701115, 'rx_samples': 1, 'rx_total_bytes': 12751, 'rx_total_time_ms': 0.1385211944580078, 'rx_mbps': 736.4071642547332, 'total_samples': 1, 'total_transfer_time_ms': 1.96075439453125, 'total_mbps': 52.02487383657588}]}
Request ID: 0, Text saved to output_audio/00000.txt
Request ID: 0, Saved audio to output_audio/output_0.wav

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

Signed-off-by: tzhouam <[email protected]>

vllm_omni/model_executor/stage_configs/qwen3_omni_moe.yaml

Gaohan123

@congw729 please help to test on H800. Thanks!

hsliuustc0106 · 2025-12-01T10:26:21Z

lgtm
approve

Gaohan123 · 2025-12-01T10:28:43Z

It has some problems. Please don't merge now.

tzhouam added 2 commits December 1, 2025 16:51

increase talker gpu util

f481ed0

Signed-off-by: tzhouam <[email protected]>

adjust Qwen3 Stage GPU Utils

f33d75c

Signed-off-by: tzhouam <[email protected]>

tzhouam requested a review from hsliuustc0106 as a code owner December 1, 2025 09:02

hsliuustc0106 reviewed Dec 1, 2025

View reviewed changes

vllm_omni/model_executor/stage_configs/qwen3_omni_moe.yaml Show resolved Hide resolved

Gaohan123 reviewed Dec 1, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bugfix] Adjust Qwen3 Stage GPU utils #140

[Bugfix] Adjust Qwen3 Stage GPU utils #140

Uh oh!

tzhouam commented Dec 1, 2025 •

edited

Loading

Uh oh!

Uh oh!

Gaohan123 left a comment

Uh oh!

hsliuustc0106 commented Dec 1, 2025

Uh oh!

Gaohan123 commented Dec 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[Bugfix] Adjust Qwen3 Stage GPU utils #140

Are you sure you want to change the base?

[Bugfix] Adjust Qwen3 Stage GPU utils #140

Uh oh!

Conversation

tzhouam commented Dec 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

Uh oh!

Gaohan123 left a comment

Choose a reason for hiding this comment

Uh oh!

hsliuustc0106 commented Dec 1, 2025

Uh oh!

Gaohan123 commented Dec 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tzhouam commented Dec 1, 2025 •

edited

Loading