[E2E] Add Qwen2.5-Omni model test with OmniRunner #168

gcanlin · 2025-12-02T12:34:46Z

Purpose

Related to #165. Add Qwen2.5-Omni model test with OmniRunner.

Test Plan

pytest -sv tests/omni/test_qwen_omni.py

Test Result

Pass.

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

Signed-off-by: gcanlin <[email protected]>

Gaohan123

Please discuss with PR#174 to unify env setup

Gaohan123 · 2025-12-03T14:28:39Z

pyproject.toml

    "--strict-markers",
    "--strict-config",
-    "--cov=vllm_omni",
-    "--cov-report=term-missing",


What is the removal for?

Sry, it was removed by mistake. Recovered now.

Gaohan123 · 2025-12-03T14:29:37Z

tests/omni/test_qwen_omni.py

+from vllm.assets.video import VideoAsset
+from vllm.multimodal.image import convert_image_mode
+
+models = ["Qwen/Qwen2.5-Omni-7B"]


Please change it to 3B model to make the test lighter

Yeah. It will be done later. Thanks. I'm still trying to fix the OOM bug on NPU when running this test.

gcanlin · 2025-12-04T02:36:21Z

It's ready on GPU. Let me fix the OOM error on NPU.

(EngineCore_DP0 pid=2101976) /home/guocanlin/vllm-omni/vllm_omni/worker/gpu_model_runner.py:207: UserWarning: The given NumPy array is not writable, and PyTorch does not support non-writable tensors. This means writing to this tensor will result in undefined behavior. You may want to copy the array to protect its data or make it writable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at /pytorch/torch/csrc/utils/tensor_numpy.cpp:203.)
(EngineCore_DP0 pid=2101976)   info_dict[k] = torch.from_numpy(arr)
[Stage-1] Generate done: batch=1, req_ids=[0], gen_ms=39939.4
[Stage-2] Max batch size: 1
--------------------------------
[Stage-2] Received batch size=1, request_ids=[0]
--------------------------------
(EngineCore_DP0 pid=2102395) INFO:vllm_omni.model_executor.models.qwen2_5_omni.qwen2_5_omni:Currently, we do not use the chunked process, we only use the token2wav.process_chunk for the whole sequence. The stream mode will be implemented in the future.
[Stage-2] Generate done: batch=1, req_ids=[0], gen_ms=21030.0
INFO:vllm_omni.entrypoints.omni_llm:[Summary] {'e2e_requests': 1, 'e2e_total_time_ms': 133066.19429588318, 'e2e_sum_time_ms': 133065.80710411072, 'e2e_total_tokens': 0, 'e2e_avg_time_per_request_ms': 133065.80710411072, 'e2e_avg_tokens_per_s': 0.0, 'wall_time_ms': 133066.19429588318, 'final_stage_id': 2, 'stages': [{'stage_id': 0, 'requests': 1, 'tokens': 77, 'total_time_ms': 71301.9208908081, 'avg_time_per_request_ms': 71301.9208908081, 'avg_tokens_per_s': 1.0799148050712117}, {'stage_id': 1, 'requests': 1, 'tokens': 2048, 'total_time_ms': 40275.97904205322, 'avg_time_per_request_ms': 40275.97904205322, 'avg_tokens_per_s': 50.84916738738067}, {'stage_id': 2, 'requests': 1, 'tokens': 0, 'total_time_ms': 21053.220510482788, 'avg_time_per_request_ms': 21053.220510482788, 'avg_tokens_per_s': 0.0}], 'transfers': [{'from_stage': 0, 'to_stage': 1, 'samples': 1, 'total_bytes': 80126028, 'total_time_ms': 251.47128105163574, 'tx_mbps': 2549.0315288463453, 'rx_samples': 1, 'rx_total_bytes': 80126028, 'rx_total_time_ms': 273.35214614868164, 'rx_mbps': 2344.9906394784366, 'total_samples': 1, 'total_transfer_time_ms': 526.7231464385986, 'total_mbps': 1216.973714434484}, {'from_stage': 1, 'to_stage': 2, 'samples': 1, 'total_bytes': 5979, 'total_time_ms': 0.3604888916015625, 'tx_mbps': 132.68647415873016, 'rx_samples': 1, 'rx_total_bytes': 5979, 'rx_total_time_ms': 0.05078315734863281, 'rx_mbps': 941.8870841690141, 'total_samples': 1, 'total_transfer_time_ms': 1.384735107421875, 'total_mbps': 34.542346578512394}]}
[rank0]:[W1204 02:26:54.720720142 ProcessGroupNCCL.cpp:1538] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
[rank0]:[W1204 02:26:54.737824625 ProcessGroupNCCL.cpp:1538] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
[rank0]:[W1204 02:26:54.746801971 ProcessGroupNCCL.cpp:1538] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
PASSED

Signed-off-by: gcanlin <[email protected]>

gcanlin · 2025-12-04T06:17:14Z

NPU works now:

(EngineCore_DP0 pid=292337)   info_dict[k] = torch.from_numpy(arr)
[Stage-1] Generate done: batch=1, req_ids=[0], gen_ms=31229.9
('Warning: torch.save with "_use_new_zipfile_serialization = False" is not recommended for npu tensor, which may bring unexpected errors and hopefully set "_use_new_zipfile_serialization = True"', 'if it is necessary to use this, please convert the npu tensor to cpu tensor for saving')
[Stage-2] Max batch size: 1
--------------------------------
[Stage-2] Received batch size=1, request_ids=[0]
--------------------------------
(EngineCore_DP0 pid=293377) INFO:vllm_omni.model_executor.models.qwen2_5_omni.qwen2_5_omni:Currently, we do not use the chunked process, we only use the token2wav.process_chunk for the whole sequence. The stream mode will be implemented in the future.
...[Stage-2] Generate done: batch=1, req_ids=[0], gen_ms=204816.4
('Warning: torch.save with "_use_new_zipfile_serialization = False" is not recommended for npu tensor, which may bring unexpected errors and hopefully set "_use_new_zipfile_serialization = True"', 'if it is necessary to use this, please convert the npu tensor to cpu tensor for saving')
INFO:vllm_omni.entrypoints.omni_llm:[Summary] {'e2e_requests': 1, 'e2e_total_time_ms': 261041.13006591797, 'e2e_sum_time_ms': 261040.785074234, 'e2e_total_tokens': 0, 'e2e_avg_time_per_request_ms': 261040.785074234, 'e2e_avg_tokens_per_s': 0.0, 'wall_time_ms': 261041.13006591797, 'final_stage_id': 2, 'stages': [{'stage_id': 0, 'requests': 1, 'tokens': 79, 'total_time_ms': 24824.2928981781, 'avg_time_per_request_ms': 24824.2928981781, 'avg_tokens_per_s': 3.1823665763224196}, {'stage_id': 1, 'requests': 1, 'tokens': 1213, 'total_time_ms': 31276.91650390625, 'avg_time_per_request_ms': 31276.91650390625, 'avg_tokens_per_s': 38.78259545977}, {'stage_id': 2, 'requests': 1, 'tokens': 0, 'total_time_ms': 204829.79822158813, 'avg_time_per_request_ms': 204829.79822158813, 'avg_tokens_per_s': 0.0}], 'transfers': [{'from_stage': 0, 'to_stage': 1, 'samples': 1, 'total_bytes': 45824576, 'total_time_ms': 48.127174377441406, 'tx_mbps': 7617.247693058714, 'rx_samples': 1, 'rx_total_bytes': 45824576, 'rx_total_time_ms': 24.182558059692383, 'rx_mbps': 15159.546276911258, 'total_samples': 1, 'total_transfer_time_ms': 73.17924499511719, 'total_mbps': 5009.57078778909}, {'from_stage': 1, 'to_stage': 2, 'samples': 1, 'total_bytes': 3647, 'total_time_ms': 0.3380775451660156, 'tx_mbps': 86.2997274358251, 'rx_samples': 1, 'rx_total_bytes': 3647, 'rx_total_time_ms': 0.1347064971923828, 'rx_mbps': 216.5894044318584, 'total_samples': 1, 'total_transfer_time_ms': 1.455545425415039, 'total_mbps': 20.044719656674857}]}
PASSED

Isotr0py · 2025-12-04T14:11:16Z

BTW, I think we need a different stage yaml config for GPU CI's e2e test, because the default CI machine is a L4 24GB GPU.

Here is the yaml I used to run examples/offline_inference/qwen2_5_omni/end2end.py with Qwen2.5-omni-3B on RTX 3090 24GB GPU:

Qwen2_5_omni_3b_RTX3090.yaml

# stage config for running qwen2.5-omni with architecture of OmniLLM.

# The following config has been verified on 1x 24GB RTX3090 GPU.
stage_args:
  - stage_id: 0
    runtime:
      process: true            # Run this stage in a separate process
      devices: "0"            # Visible devices for this stage (CUDA_VISIBLE_DEVICES/torch.cuda.set_device)
      max_batch_size: 1
    engine_args:
      model_stage: thinker
      model_arch: Qwen2_5OmniForConditionalGeneration
      worker_cls: vllm_omni.worker.gpu_ar_worker.GPUARWorker
      scheduler_cls: vllm_omni.core.sched.scheduler.OmniScheduler
      max_model_len: 8192
      max_num_seqs: 2
      gpu_memory_utilization: 0.55
      enforce_eager: true  # Now we only support eager mode
      trust_remote_code: true
      engine_output_type: latent
      enable_prefix_caching: false
    is_comprehension: true
    final_output: true
    final_output_type: text
    default_sampling_params:
      temperature: 0.0
      top_p: 1.0
      top_k: -1
      max_tokens: 2048
      seed: 42
      detokenize: True
      repetition_penalty: 1.1
  - stage_id: 1
    runtime:
      process: true
      devices: "0"
      max_batch_size: 1
    engine_args:
      model_stage: talker
      model_arch: Qwen2_5OmniForConditionalGeneration
      worker_cls: vllm_omni.worker.gpu_ar_worker.GPUARWorker
      scheduler_cls: vllm_omni.core.sched.scheduler.OmniScheduler
      max_model_len: 8192
      max_num_seqs: 2
      gpu_memory_utilization: 0.32
      enforce_eager: true
      trust_remote_code: true
      enable_prefix_caching: false
      engine_output_type: latent
    engine_input_source: [0]
    custom_process_input_func: vllm_omni.model_executor.stage_input_processors.qwen2_5_omni.thinker2talker
    default_sampling_params:
      temperature: 0.9
      top_p: 0.8
      top_k: 40
      max_tokens: 2048
      seed: 42
      detokenize: True
      repetition_penalty: 1.05
      stop_token_ids: [8294]
  - stage_id: 2
    runtime:
      process: true
      devices: "0"            # Example: use a different GPU than the previous stage; use "0" if single GPU
      max_batch_size: 1
    engine_args:
      model_stage: code2wav
      model_arch: Qwen2_5OmniForConditionalGeneration
      worker_cls: vllm_omni.worker.gpu_diffusion_worker.GPUDiffusionWorker
      scheduler_cls: vllm_omni.core.sched.diffusion_scheduler.DiffusionScheduler
      gpu_memory_utilization: 0.125
      enforce_eager: true
      trust_remote_code: true
      enable_prefix_caching: false
      engine_output_type: audio
    engine_input_source: [1]
    final_output: true
    final_output_type: audio
    default_sampling_params:
      temperature: 0.0
      top_p: 1.0
      top_k: -1
      max_tokens: 2048
      seed: 42
      detokenize: True
      repetition_penalty: 1.1

# Top-level runtime config (concise): default windows and stage edges
runtime:
  enabled: true
  defaults:
    window_size: -1             # Simplified: trigger downstream only after full upstream completion
    max_inflight: 1             # Simplified: process serially within each stage
  edges:
    - from: 0                   # thinker → talker: trigger only after receiving full input (-1)
      to: 1
      window_size: -1
    - from: 1                   # talker → code2wav: trigger only after receiving full input (-1)
      to: 2
      window_size: -1

INFO:vllm_omni.entrypoints.omni_llm:[Orchestrator] Stage-2 reported ready
INFO:vllm_omni.entrypoints.omni_llm:[Orchestrator] All stages initialized successfully
[Stage-0] Max batch size: 1
--------------------------------
[Stage-0] Received batch size=1, request_ids=[0]
--------------------------------
[Stage-0] Generate done: batch=1, req_ids=[0], gen_ms=7460.5
[Stage-1] Max batch size: 1
--------------------------------
[Stage-1] Received batch size=1, request_ids=[0]
--------------------------------
(EngineCore_DP0 pid=1251620) /home/mozf/develop-projects/vllm-omni/vllm_omni/worker/gpu_model_runner.py:207: UserWarning: The given NumPy array is not writable, and PyTorch does not support non-writable tensors. This means writing to this tensor will result in undefined behavior. You may want to copy the array to protect its data or make it writable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at /pytorch/torch/csrc/utils/tensor_numpy.cpp:203.)
(EngineCore_DP0 pid=1251620)   info_dict[k] = torch.from_numpy(arr)
[Stage-1] Generate done: batch=1, req_ids=[0], gen_ms=30644.4
[Stage-2] Max batch size: 1
--------------------------------
[Stage-2] Received batch size=1, request_ids=[0]
--------------------------------
(EngineCore_DP0 pid=1252655) INFO:vllm_omni.model_executor.models.qwen2_5_omni.qwen2_5_omni:Currently, we do not use the chunked process, we only use the token2wav.process_chunk for the whole sequence. The stream mode will be implemented in the future.
[Stage-2] Generate done: batch=1, req_ids=[0], gen_ms=34116.7
INFO:vllm_omni.entrypoints.omni_llm:[Summary] {'e2e_requests': 1, 'e2e_total_time_ms': 72805.70530891418, 'e2e_sum_time_ms': 72805.37819862366, 'e2e_total_tokens': 0, 'e2e_avg_time_per_request_ms': 72805.37819862366, 'e2e_avg_tokens_per_s': 0.0, 'wall_time_ms': 72805.70530891418, 'final_stage_id': 2, 'stages': [{'stage_id': 0, 'requests': 1, 'tokens': 137, 'total_time_ms': 7645.807504653931, 'avg_time_per_request_ms': 7645.807504653931, 'avg_tokens_per_s': 17.918316661334906}, {'stage_id': 1, 'requests': 1, 'tokens': 2048, 'total_time_ms': 30854.575634002686, 'avg_time_per_request_ms': 30854.575634002686, 'avg_tokens_per_s': 66.37589264857823}, {'stage_id': 2, 'requests': 1, 'tokens': 0, 'total_time_ms': 34130.319118499756, 'avg_time_per_request_ms': 34130.319118499756, 'avg_tokens_per_s': 0.0}], 'transfers': [{'from_stage': 0, 'to_stage': 1, 'samples': 1, 'total_bytes': 46321940, 'total_time_ms': 98.6323356628418, 'tx_mbps': 3757.1402675361014, 'rx_samples': 1, 'rx_total_bytes': 46321940, 'rx_total_time_ms': 129.7628879547119, 'rx_mbps': 2855.7897087596666, 'total_samples': 1, 'total_transfer_time_ms': 229.16722297668457, 'total_mbps': 1617.0528890935782}, {'from_stage': 1, 'to_stage': 2, 'samples': 1, 'total_bytes': 6090, 'total_time_ms': 0.42247772216796875, 'tx_mbps': 115.31969011286682, 'rx_samples': 1, 'rx_total_bytes': 6090, 'rx_total_time_ms': 0.07271766662597656, 'rx_mbps': 669.9884946885246, 'total_samples': 1, 'total_transfer_time_ms': 1.4905929565429688, 'total_mbps': 32.68497934740883}]}
[rank0]:[W1204 22:04:18.559406999 ProcessGroupNCCL.cpp:1538] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
[rank0]:[W1204 22:04:18.568555755 ProcessGroupNCCL.cpp:1538] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
[rank0]:[W1204 22:04:18.569440808 ProcessGroupNCCL.cpp:1538] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
Request ID: 0, Text saved to output_audio/00000.txt
Request ID: 0, Saved audio to output_audio/output_0.wav

Signed-off-by: gcanlin <[email protected]>

gcanlin · 2025-12-04T15:48:13Z

BTW, I think we need a different stage yaml config for GPU CI's e2e test, because the default CI machine is a L4 24GB GPU.

Here is the yaml I used to run examples/offline_inference/qwen2_5_omni/end2end.py with Qwen2.5-omni-3B on RTX 3090 24GB GPU:

Thx. Add the CI stage config now.

tests/omni/test_qwen_omni.py

Isotr0py · 2025-12-04T16:12:36Z

tests/omni/test_qwen_omni.py

+@pytest.mark.core_model
+@pytest.mark.parametrize("model", models)
+@pytest.mark.parametrize("max_tokens", [2048])
+def test_mixed_modalities_to_audio(omni_runner, model: str, max_tokens: int) -> None:


Suggested change

def test_mixed_modalities_to_audio(omni_runner, model: str, max_tokens: int) -> None:

def test_mixed_modalities_to_audio(omni_runner: type[OmniRunner], model: str, max_tokens: int) -> None:

Signed-off-by: Isotr0py <[email protected]>

Isotr0py

Confirmed the test can pass on RTX 3090 as well:

(EngineCore_DP0 pid=1413214) INFO 12-05 00:08:10 [__init__.py:381] Cudagraph is disabled under eager mode
INFO:vllm_omni.entrypoints.omni_llm:[Orchestrator] Stage-2 reported ready
INFO:vllm_omni.entrypoints.omni_llm:[Orchestrator] All stages initialized successfully
[Stage-0] Max batch size: 1
--------------------------------
[Stage-0] Received batch size=1, request_ids=[0]
--------------------------------
[Stage-0] Generate done: batch=1, req_ids=[0], gen_ms=6246.4
[Stage-1] Max batch size: 1
--------------------------------
[Stage-1] Received batch size=1, request_ids=[0]
--------------------------------
(EngineCore_DP0 pid=1412210) /home/mozf/develop-projects/vllm-omni/vllm_omni/worker/gpu_model_runner.py:207: UserWarning: The given NumPy array is not writable, and PyTorch does not support non-writable tensors. This means writing to this tensor will result in undefined behavior. You may want to copy the array to protect its data or make it writable before converting it to a tensor. This type of warning will be suppressed for the rest of this program. (Triggered internally at /pytorch/torch/csrc/utils/tensor_numpy.cpp:203.)
(EngineCore_DP0 pid=1412210)   info_dict[k] = torch.from_numpy(arr)
[Stage-1] Generate done: batch=1, req_ids=[0], gen_ms=17460.2
[Stage-2] Max batch size: 1
--------------------------------
[Stage-2] Received batch size=1, request_ids=[0]
--------------------------------
(EngineCore_DP0 pid=1413214) INFO:vllm_omni.model_executor.models.qwen2_5_omni.qwen2_5_omni:Currently, we do not use the chunked process, we only use the token2wav.process_chunk for the whole sequence. The stream mode will be implemented in the future.
[Stage-2] Generate done: batch=1, req_ids=[0], gen_ms=15025.2
INFO:vllm_omni.entrypoints.omni_llm:[Summary] {'e2e_requests': 1, 'e2e_total_time_ms': 39295.241832733154, 'e2e_sum_time_ms': 39294.82388496399, 'e2e_total_tokens': 0, 'e2e_avg_time_per_request_ms': 39294.82388496399, 'e2e_avg_tokens_per_s': 0.0, 'wall_time_ms': 39295.241832733154, 'final_stage_id': 2, 'stages': [{'stage_id': 0, 'requests': 1, 'tokens': 72, 'total_time_ms': 6459.305763244629, 'avg_time_per_request_ms': 6459.305763244629, 'avg_tokens_per_s': 11.146708739149865}, {'stage_id': 1, 'requests': 1, 'tokens': 1154, 'total_time_ms': 17631.80184364319, 'avg_time_per_request_ms': 17631.80184364319, 'avg_tokens_per_s': 65.4499188587497}, {'stage_id': 2, 'requests': 1, 'tokens': 0, 'total_time_ms': 15033.496141433716, 'avg_time_per_request_ms': 15033.496141433716, 'avg_tokens_per_s': 0.0}], 'transfers': [{'from_stage': 0, 'to_stage': 1, 'samples': 1, 'total_bytes': 45764674, 'total_time_ms': 97.99075126647949, 'tx_mbps': 3736.2443625354754, 'rx_samples': 1, 'rx_total_bytes': 45764674, 'rx_total_time_ms': 124.15766716003418, 'rx_mbps': 2948.810173181569, 'total_samples': 1, 'total_transfer_time_ms': 223.24252128601074, 'total_mbps': 1639.9984639617237}, {'from_stage': 1, 'to_stage': 2, 'samples': 1, 'total_bytes': 3486, 'total_time_ms': 0.5724430084228516, 'tx_mbps': 48.717513516034984, 'rx_samples': 1, 'rx_total_bytes': 3486, 'rx_total_time_ms': 0.07987022399902344, 'rx_mbps': 349.1664177671642, 'total_samples': 1, 'total_transfer_time_ms': 1.8804073333740234, 'total_mbps': 14.830829206542411}]}
[rank0]:[W1205 00:08:51.904617626 ProcessGroupNCCL.cpp:1538] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
[rank0]:[W1205 00:08:51.907913413 ProcessGroupNCCL.cpp:1538] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
[rank0]:[W1205 00:08:51.911772229 ProcessGroupNCCL.cpp:1538] Warning: WARNING: destroy_process_group() was not called before program exit, which can leak resources. For more info, please see https://pytorch.org/docs/stable/distributed.html#shutdown (function operator())
PASSED

Signed-off-by: Isotr0py <[email protected]>

gcanlin · 2025-12-05T00:01:59Z

OOM happens. Should we continue to reduce gpu_memory_utilization?

ERROR 12-04 10:24:52 [core.py:708] torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 320.00 MiB. GPU 0 has a total capacity of 22.05 GiB of which 113.62 MiB is free. Process 60 has 184.00 MiB memory in use. Process 161 has 184.00 MiB memory in use. Process 198 has 13.57 GiB memory in use. Including non-PyTorch memory, this process has 7.79 GiB memory in use. Process 344 has 184.00 MiB memory in use. Of the allocated memory 6.79 GiB is allocated by PyTorch, and 778.05 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

Signed-off-by: gcanlin <[email protected]>

Signed-off-by: Isotr0py <[email protected]>

Isotr0py · 2025-12-05T06:08:42Z

OOM happens. Should we continue to reduce gpu_memory_utilization?

The OOM is happening at mm profiling during profile run, let's skip mm profiling for single GPU test for now.

Signed-off-by: Isotr0py <[email protected]>

Isotr0py · 2025-12-05T09:43:20Z

The omni e2e test finally passes now. 😭

Gaohan123

LGTM. It is so great it can run in limited resource!

Signed-off-by: gcanlin <[email protected]>

gcanlin · 2025-12-05T12:52:11Z

Updated the class names. Hope it will be a final commit.🥹

Signed-off-by: Isotr0py <[email protected]>

gcanlin added 5 commits December 2, 2025 12:30

[E2E] Add Qwen2.5-Omni test

1b2267d

Signed-off-by: gcanlin <[email protected]>

clean

31645dd

Signed-off-by: gcanlin <[email protected]>

clean

2282a6b

Signed-off-by: gcanlin <[email protected]>

update

996caa5

Signed-off-by: gcanlin <[email protected]>

clean

71e3b33

Signed-off-by: gcanlin <[email protected]>

gcanlin marked this pull request as ready for review December 3, 2025 03:54

gcanlin requested a review from hsliuustc0106 as a code owner December 3, 2025 03:54

update

5e26bb9

Signed-off-by: gcanlin <[email protected]>

Gaohan123 reviewed Dec 3, 2025

View reviewed changes

gcanlin mentioned this pull request Dec 4, 2025

[CI] add diffusion ci #174

Merged

5 tasks

gcanlin added 2 commits December 4, 2025 06:12

fix oom bug

8fafb9b

Signed-off-by: gcanlin <[email protected]>

revert pyproject

8509642

Signed-off-by: gcanlin <[email protected]>

gcanlin requested a review from Gaohan123 December 4, 2025 06:30

Add the optimized config for CI

9e2882d

Signed-off-by: gcanlin <[email protected]>

Merge branch 'main' into omni-test

3e400f4

Isotr0py reviewed Dec 4, 2025

View reviewed changes

update linter and default sampling params

b48069f

Signed-off-by: Isotr0py <[email protected]>

Isotr0py approved these changes Dec 4, 2025

View reviewed changes

Isotr0py added 5 commits December 5, 2025 00:29

update test pipeline

d379c85

Signed-off-by: Isotr0py <[email protected]>

auto collect test

1e9122f

Signed-off-by: Isotr0py <[email protected]>

reduce max_model_len and max_num_seqs

32f4952

Signed-off-by: Isotr0py <[email protected]>

add ci timeout

eee084d

Signed-off-by: Isotr0py <[email protected]>

reduce timeout

49856e3

Signed-off-by: Isotr0py <[email protected]>

gcanlin mentioned this pull request Dec 5, 2025

[Roadmap]: preparing for 1230 release #165

Open

46 tasks

update

ee68dd9

Signed-off-by: gcanlin <[email protected]>

gcanlin and others added 4 commits December 5, 2025 02:57

update queue to gpu_4_queue

3f700e2

Signed-off-by: gcanlin <[email protected]>

reduce input len and gpu_memory_utilization

4493eda

Signed-off-by: Isotr0py <[email protected]>

skip mm profiling

155882e

Signed-off-by: Isotr0py <[email protected]>

fix mm profiling

3217c13

Signed-off-by: Isotr0py <[email protected]>

Isotr0py added 7 commits December 5, 2025 14:24

more stage-1 gpu memory

2fbf4c6

Signed-off-by: Isotr0py <[email protected]>

adjust

dbbdc73

Signed-off-by: Isotr0py <[email protected]>

adjust gpu

c5d2e8a

Signed-off-by: Isotr0py <[email protected]>

adjust again

306b654

Signed-off-by: Isotr0py <[email protected]>

adjust again

bb9a2d2

Signed-off-by: Isotr0py <[email protected]>

adjust again

0efb4cb

Signed-off-by: Isotr0py <[email protected]>

reduce model len to 896

00867b9

Signed-off-by: Isotr0py <[email protected]>

Gaohan123 approved these changes Dec 5, 2025

View reviewed changes

Merge branch 'main' into omni-test

63905a0

Gaohan123 enabled auto-merge (squash) December 5, 2025 09:52

update config

44dec69

Signed-off-by: gcanlin <[email protected]>

auto-merge was automatically disabled December 5, 2025 12:50
Head branch was pushed to by a user without write access

Isotr0py added 2 commits December 5, 2025 21:58

exit process

e139f01

Signed-off-by: Isotr0py <[email protected]>

VLLM_WORKER_MULTIPROC_METHOD=spawn at pipeline

1e5c290

Signed-off-by: Isotr0py <[email protected]>

Isotr0py enabled auto-merge (squash) December 5, 2025 16:18

gcanlin mentioned this pull request Dec 5, 2025

Add Qwen3-omni offline UT #216

Open

5 tasks

Isotr0py added 2 commits December 6, 2025 00:52

fix mistaken 4 gpu queue

67d35ef

Signed-off-by: Isotr0py <[email protected]>

Merge branch 'main' into omni-test

4caff89

	def test_mixed_modalities_to_audio(omni_runner, model: str, max_tokens: int) -> None:
	def test_mixed_modalities_to_audio(omni_runner: type[OmniRunner], model: str, max_tokens: int) -> None:

[E2E] Add Qwen2.5-Omni model test with OmniRunner #168

Are you sure you want to change the base?

[E2E] Add Qwen2.5-Omni model test with OmniRunner #168

Conversation

gcanlin commented Dec 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

Gaohan123 left a comment

Choose a reason for hiding this comment

Uh oh!

Gaohan123 Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

gcanlin Dec 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Gaohan123 Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

gcanlin Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

gcanlin commented Dec 4, 2025

Uh oh!

gcanlin commented Dec 4, 2025

Uh oh!

Isotr0py commented Dec 4, 2025

Uh oh!

gcanlin commented Dec 4, 2025

Uh oh!

Uh oh!

Isotr0py Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

Isotr0py left a comment

Choose a reason for hiding this comment

Uh oh!

gcanlin commented Dec 5, 2025

Uh oh!

Isotr0py commented Dec 5, 2025

Uh oh!

Isotr0py commented Dec 5, 2025

Uh oh!

Gaohan123 left a comment

Choose a reason for hiding this comment

Uh oh!

gcanlin commented Dec 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

gcanlin commented Dec 2, 2025 •

edited

Loading

gcanlin Dec 4, 2025 •

edited

Loading