Skip to content

Add missing model configurations for Megatron and TorchTitan backends#611

Merged
Xiaoming-AMD merged 3 commits intomainfrom
dev/add-missing-models
Mar 27, 2026
Merged

Add missing model configurations for Megatron and TorchTitan backends#611
Xiaoming-AMD merged 3 commits intomainfrom
dev/add-missing-models

Conversation

@WangLingxun
Copy link
Copy Markdown
Collaborator

Summary

Add model definitions and pretrain example configs for multiple model scales that were previously missing from the Megatron and TorchTitan backends.

Megatron

  • Model configs: Llama2 13B, Qwen2.5 3B/14B/32B, Qwen3 4B/14B/32B
  • Pretrain example configs (MI300X & MI355X, BF16 + FP8):
    • Llama2 13B
    • Qwen2.5 3B, 14B, 32B
    • Qwen3 4B, 8B, 14B, 32B

TorchTitan

  • Model configs: Llama4 Scout 17Bx16E, Llama4 Maverick 17Bx128E, DeepSeek V3 236B, Qwen3 4B/8B/14B (BF16 & FP8 variants)
  • Pretrain example configs (MI300X & MI355X, BF16 + FP8):
    • Llama4 Scout 17Bx16E
    • Llama4 Maverick 17Bx128E
    • DeepSeek V3 236B
    • Qwen3 4B, 8B, 14B

Adds model and training configurations for:

- Llama4 Scout 17Bx16E (BF16 and FP8 precision)
- Llama4 Maverick 17Bx128E (BF16 and FP8 precision)
- DeepSeek V3 236B (BF16 and FP8 precision)
- Qwen3 4B, 8B, 14B (pretrain configs)
…en3)

- Add model configs:
  - llama2_13B.yaml
  -qwen2.5_3B.yaml, qwen2.5_14B.yaml, qwen2.5_32B.yaml
  -qwen3_4B.yaml, qwen3_14B.yaml, qwen3_32B.yaml

- Add BF16 and FP8 pretrain example configs :
  - Llama2: 13B (BF16, FP8)
  - Qwen2.5: 3B, 14B, 32B (BF16, FP8)
  - Qwen3: 8B(MI300X),4B, 14B, 32B (BF16, FP8)
@Xiaoming-AMD Xiaoming-AMD merged commit 8ade235 into main Mar 27, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants