Skip to content

Conversation

@tzhouam
Copy link
Collaborator

@tzhouam tzhouam commented Dec 1, 2025

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

Adjust Qwen3 Stage GPU utils enabling the running on 2 H200 GPUs.

Test Plan

Tested.

Test Result

[Stage-0] Max batch size: 1
--------------------------------
[Stage-0] Received batch size=1, request_ids=[0]
--------------------------------
(Worker_TP1 pid=1341257) INFO:vllm_omni.model_executor.layers.mrope:Multimodal token idx changed!
(Worker_TP0 pid=1340738) INFO:vllm_omni.model_executor.layers.mrope:Multimodal token idx changed!
[Stage-0] Generate done: batch=1, req_ids=[0], gen_ms=15413.5
[Stage-1] Max batch size: 1
--------------------------------
[Stage-1] Received batch size=1, request_ids=[0]
--------------------------------
(Worker pid=1341218) /mnt/ztc_vllm/vllm-omni/vllm_omni/worker/gpu_model_runner.py:207: UserWarning: The given NumPy array is not writable, and PyTorch does not support non-writable tensors. This means writing to this tensor will result in undefined behavior. You may want to copy the array to protect its data or make it writable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at /pytorch/torch/csrc/utils/tensor_numpy.cpp:203.)
(Worker pid=1341218)   info_dict[k] = torch.from_numpy(arr)
(Worker pid=1341218) INFO:vllm_omni.model_executor.layers.mrope:Multimodal token idx changed!
[Stage-1] Generate done: batch=1, req_ids=[0], gen_ms=33673.4
[Stage-2] Max batch size: 1
--------------------------------
[Stage-2] Received batch size=1, request_ids=[0]
--------------------------------
(Worker pid=1341463) INFO:vllm_omni.model_executor.layers.mrope:Multimodal token idx changed!
[Stage-2] Generate done: batch=1, req_ids=[0], gen_ms=179.9
INFO:vllm_omni.entrypoints.omni_llm:[Summary] {'e2e_requests': 1, 'e2e_total_time_ms': 50625.213623046875, 'e2e_sum_time_ms': 50624.441146850586, 'e2e_total_tokens': 0, 'e2e_avg_time_per_request_ms': 50624.441146850586, 'e2e_avg_tokens_per_s': 0.0, 'wall_time_ms': 50625.213623046875, 'final_stage_id': 2, 'stages': [{'stage_id': 0, 'requests': 1, 'tokens': 68, 'total_time_ms': 15835.153102874756, 'avg_time_per_request_ms': 15835.153102874756, 'avg_tokens_per_s': 4.294243292643322}, {'stage_id': 1, 'requests': 1, 'tokens': 278, 'total_time_ms': 33915.329456329346, 'avg_time_per_request_ms': 33915.329456329346, 'avg_tokens_per_s': 8.196883369744743}, {'stage_id': 2, 'requests': 1, 'tokens': 0, 'total_time_ms': 190.22488594055176, 'avg_time_per_request_ms': 190.22488594055176, 'avg_tokens_per_s': 0.0}], 'transfers': [{'from_stage': 0, 'to_stage': 1, 'samples': 1, 'total_bytes': 100728630, 'total_time_ms': 208.88733863830566, 'tx_mbps': 3857.7208425031245, 'rx_samples': 1, 'rx_total_bytes': 100728630, 'rx_total_time_ms': 195.5254077911377, 'rx_mbps': 4121.352048838558, 'total_samples': 1, 'total_transfer_time_ms': 414.17789459228516, 'total_mbps': 1945.61093317946}, {'from_stage': 1, 'to_stage': 2, 'samples': 1, 'total_bytes': 12751, 'total_time_ms': 0.4913806915283203, 'tx_mbps': 207.59464455701115, 'rx_samples': 1, 'rx_total_bytes': 12751, 'rx_total_time_ms': 0.1385211944580078, 'rx_mbps': 736.4071642547332, 'total_samples': 1, 'total_transfer_time_ms': 1.96075439453125, 'total_mbps': 52.02487383657588}]}
Request ID: 0, Text saved to output_audio/00000.txt
Request ID: 0, Saved audio to output_audio/output_0.wav

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

Copy link
Collaborator

@Gaohan123 Gaohan123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@congw729 please help to test on H800. Thanks!

@hsliuustc0106
Copy link
Collaborator

lgtm
approve

@Gaohan123
Copy link
Collaborator

It has some problems. Please don't merge now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants