-
Notifications
You must be signed in to change notification settings - Fork 79
[Misc] Add stage config for Qwen3-Omni-30B-A3B-Thinking #172
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Misc] Add stage config for Qwen3-Omni-30B-A3B-Thinking #172
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| engine_output_type: text | ||
| distributed_executor_backend: "mp" | ||
| enable_prefix_caching: false | ||
| hf_config_name: thinker_config |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid nested thinker_config for Thinking checkpoints
This YAML sets hf_config_name: thinker_config, which makes OmniModelConfig.draw_hf_text_config (vllm_omni/config/model.py:79-85) dereference hf_config.thinker_config before building the model. The Qwen3-Omni-*Thinking checkpoints you are targeting only ship the thinker config itself (Qwen3OmniMoeThinkerConfig) and do not wrap it in a thinker_config attribute, so loading this stage file against those models will raise AttributeError and the config cannot be used. Drop the hf_config_name indirection (and use the thinker architecture) so thinker-only checkpoints load successfully.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I do have a question - looks like right now we're using model_type for huggingface to identify the stage config yaml.
vllm-omni/vllm_omni/entrypoints/utils.py
Lines 41 to 47 in 574e1fb
| # Fall back to default config | |
| stage_config_file = f"vllm_omni/model_executor/stage_configs/{model_type}.yaml" | |
| stage_config_path = PROJECT_ROOT / stage_config_file | |
| if not os.path.exists(stage_config_path): | |
| raise FileNotFoundError(f"Stage config file {stage_config_path} not found") | |
| stage_configs = load_stage_configs_from_yaml(config_path=str(stage_config_path)) | |
| return stage_configs |
How does this work for this model? qwen3_omni_moe_thinking isn't a valid model_type right? https://huggingface.co/Qwen/Qwen3-Omni-30B-A3B-Thinking/blob/main/config.json#L10
i add a small check in the utils.py. would that work? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a reasonable change for now! Please fix the pre-commit though
Gaohan123
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to use think mode to take end2end generation for audio?
vllm_omni/entrypoints/utils.py
Outdated
| # (no talker/code2wav configs) but reuse the base qwen3_omni_moe model_type. | ||
| # Detect this using multiple hints so users don't need to manually rewrite | ||
| # the stage config path. | ||
| is_qwen3_omni_moe_thinking = ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to set up just in stage config? Here it is a little bit model specific in general utils.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we only add the YAML without this routing logic, vLLM will automatically pick qwen3_omni_moe.yaml due to the shared model_type. The user would then be forced to explicitly pass --stage-config vllm_omni/.../qwen3_omni_moe_thinking.yaml every time.
I understand your concern about polluting utils.py with model-specific code. Could you point me to a better place to insert this auto-detection?"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is totally ok to add a custom config file in examples. After all, the folder stage_configs is just for default setting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have moved it to examples folder.
Gaohan123
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is good. Please use git commit -s to pass the DCO check. Then I will help to merge. Thanks!
|
@Gaohan123 I have added DCO sign-offs. Thanks! |
Add a single-stage configuration example for Qwen3-Omni-MoE-Thinking models that only have the thinker component (text-only output, no audio synthesis). Signed-off-by: linyueqian <[email protected]>
Signed-off-by: linyueqian <[email protected]>
Signed-off-by: linyueqian <[email protected]>
Signed-off-by: linyueqian <[email protected]>
Signed-off-by: linyueqian <[email protected]>
6543b74 to
0f87094
Compare
Purpose
Add a single-stage configuration example for Qwen3-Omni-MoE-Thinking models (e.g., Qwen3-Omni-30B-A3B-Thinking) that only
have the thinker component and produce text-only output (no audio synthesis).
Test Plan
N/A (config file only)
Test Result
Verified on 2x H200 GPUs with tensor_parallel_size=2.