Add missing model configurations for Megatron and TorchTitan backends by WangLingxun · Pull Request #611 · AMD-AGI/Primus

WangLingxun · 2026-03-18T07:06:39Z

Summary

Add model definitions and pretrain example configs for multiple model scales that were previously missing from the Megatron and TorchTitan backends.

Megatron

Model configs: Llama2 13B, Qwen2.5 3B/14B/32B, Qwen3 4B/14B/32B
Pretrain example configs (MI300X & MI355X, BF16 + FP8):
- Llama2 13B
- Qwen2.5 3B, 14B, 32B
- Qwen3 4B, 8B, 14B, 32B

TorchTitan

Model configs: Llama4 Scout 17Bx16E, Llama4 Maverick 17Bx128E, DeepSeek V3 236B, Qwen3 4B/8B/14B (BF16 & FP8 variants)
Pretrain example configs (MI300X & MI355X, BF16 + FP8):
- Llama4 Scout 17Bx16E
- Llama4 Maverick 17Bx128E
- DeepSeek V3 236B
- Qwen3 4B, 8B, 14B

Adds model and training configurations for: - Llama4 Scout 17Bx16E (BF16 and FP8 precision) - Llama4 Maverick 17Bx128E (BF16 and FP8 precision) - DeepSeek V3 236B (BF16 and FP8 precision) - Qwen3 4B, 8B, 14B (pretrain configs)

…en3) - Add model configs: - llama2_13B.yaml -qwen2.5_3B.yaml, qwen2.5_14B.yaml, qwen2.5_32B.yaml -qwen3_4B.yaml, qwen3_14B.yaml, qwen3_32B.yaml - Add BF16 and FP8 pretrain example configs : - Llama2: 13B (BF16, FP8) - Qwen2.5: 3B, 14B, 32B (BF16, FP8) - Qwen3: 8B(MI300X),4B, 14B, 32B (BF16, FP8)

WangLingxun added 2 commits March 18, 2026 06:36

Add missing model configurations for TorchTitan

bfc3ed9

Adds model and training configurations for: - Llama4 Scout 17Bx16E (BF16 and FP8 precision) - Llama4 Maverick 17Bx128E (BF16 and FP8 precision) - DeepSeek V3 236B (BF16 and FP8 precision) - Qwen3 4B, 8B, 14B (pretrain configs)

WangLingxun requested review from Xiaoming-AMD, limou102 and wenxie-amd as code owners March 18, 2026 07:06

Xiaoming-AMD approved these changes Mar 27, 2026

View reviewed changes

Merge branch 'main' into dev/add-missing-models

c0c1f10

Xiaoming-AMD approved these changes Mar 27, 2026

View reviewed changes

Xiaoming-AMD merged commit 8ade235 into main Mar 27, 2026
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add missing model configurations for Megatron and TorchTitan backends#611

Add missing model configurations for Megatron and TorchTitan backends#611
Xiaoming-AMD merged 3 commits intomainfrom
dev/add-missing-models

WangLingxun commented Mar 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

WangLingxun commented Mar 18, 2026

Summary

Megatron

TorchTitan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants