Skip to content

Conversation

@linyueqian
Copy link
Contributor

@linyueqian linyueqian commented Dec 3, 2025

Purpose

Add a single-stage configuration example for Qwen3-Omni-MoE-Thinking models (e.g., Qwen3-Omni-30B-A3B-Thinking) that only
have the thinker component and produce text-only output (no audio synthesis).

Test Plan

N/A (config file only)

Test Result

Verified on 2x H200 GPUs with tensor_parallel_size=2.

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines 21 to 24
engine_output_type: text
distributed_executor_backend: "mp"
enable_prefix_caching: false
hf_config_name: thinker_config

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Avoid nested thinker_config for Thinking checkpoints

This YAML sets hf_config_name: thinker_config, which makes OmniModelConfig.draw_hf_text_config (vllm_omni/config/model.py:79-85) dereference hf_config.thinker_config before building the model. The Qwen3-Omni-*Thinking checkpoints you are targeting only ship the thinker config itself (Qwen3OmniMoeThinkerConfig) and do not wrap it in a thinker_config attribute, so loading this stage file against those models will raise AttributeError and the config cannot be used. Drop the hf_config_name indirection (and use the thinker architecture) so thinker-only checkpoints load successfully.

Useful? React with 👍 / 👎.

Copy link
Member

@ywang96 ywang96 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I do have a question - looks like right now we're using model_type for huggingface to identify the stage config yaml.

# Fall back to default config
stage_config_file = f"vllm_omni/model_executor/stage_configs/{model_type}.yaml"
stage_config_path = PROJECT_ROOT / stage_config_file
if not os.path.exists(stage_config_path):
raise FileNotFoundError(f"Stage config file {stage_config_path} not found")
stage_configs = load_stage_configs_from_yaml(config_path=str(stage_config_path))
return stage_configs

How does this work for this model? qwen3_omni_moe_thinking isn't a valid model_type right? https://huggingface.co/Qwen/Qwen3-Omni-30B-A3B-Thinking/blob/main/config.json#L10

@linyueqian
Copy link
Contributor Author

Actually I do have a question - looks like right now we're using model_type for huggingface to identify the stage config yaml.

# Fall back to default config
stage_config_file = f"vllm_omni/model_executor/stage_configs/{model_type}.yaml"
stage_config_path = PROJECT_ROOT / stage_config_file
if not os.path.exists(stage_config_path):
raise FileNotFoundError(f"Stage config file {stage_config_path} not found")
stage_configs = load_stage_configs_from_yaml(config_path=str(stage_config_path))
return stage_configs

How does this work for this model? qwen3_omni_moe_thinking isn't a valid model_type right? https://huggingface.co/Qwen/Qwen3-Omni-30B-A3B-Thinking/blob/main/config.json#L10

i add a small check in the utils.py. would that work?

Copy link
Member

@ywang96 ywang96 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a reasonable change for now! Please fix the pre-commit though

Copy link
Collaborator

@Gaohan123 Gaohan123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to use think mode to take end2end generation for audio?

# (no talker/code2wav configs) but reuse the base qwen3_omni_moe model_type.
# Detect this using multiple hints so users don't need to manually rewrite
# the stage config path.
is_qwen3_omni_moe_thinking = (
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to set up just in stage config? Here it is a little bit model specific in general utils.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we only add the YAML without this routing logic, vLLM will automatically pick qwen3_omni_moe.yaml due to the shared model_type. The user would then be forced to explicitly pass --stage-config vllm_omni/.../qwen3_omni_moe_thinking.yaml every time.

I understand your concern about polluting utils.py with model-specific code. Could you point me to a better place to insert this auto-detection?"

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is totally ok to add a custom config file in examples. After all, the folder stage_configs is just for default setting.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have moved it to examples folder.

Copy link
Collaborator

@Gaohan123 Gaohan123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is good. Please use git commit -s to pass the DCO check. Then I will help to merge. Thanks!

@linyueqian
Copy link
Contributor Author

@Gaohan123 I have added DCO sign-offs. Thanks!

Add a single-stage configuration example for Qwen3-Omni-MoE-Thinking models
that only have the thinker component (text-only output, no audio synthesis).

Signed-off-by: linyueqian <[email protected]>
@linyueqian linyueqian force-pushed the add-qwen3-omni-moe-thinking-config branch from 6543b74 to 0f87094 Compare December 4, 2025 16:34
@ywang96 ywang96 enabled auto-merge (squash) December 4, 2025 16:41
@ywang96 ywang96 disabled auto-merge December 4, 2025 16:41
@ywang96 ywang96 merged commit 1406c6e into vllm-project:main Dec 4, 2025
2 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants