[Megatron-LM] Add unit tests to test Mamba and Zebra-Llama pretraining by clairesonglee · Pull Request #595 · AMD-AGI/Primus

clairesonglee · 2026-03-11T21:03:03Z

No description provided.

…32B Configs for MI300X & MI355X (#556) YF: Only SFT related config and Doc changes, bypassing unit CI tests ## Summary This PR introduces post-training documentation and updates Qwen3 32B model configuration files to support AMD MI300X and MI355X accelerators. --- ## Changes ### 📘 Documentation - **Added `posttraining.md`** - New comprehensive guide for post-training workflows - Covers setup instructions, configuration details, and usage examples - **Updated `docs/README.md`** - Added a new section referencing post-training documentation - Improved documentation organization and navigation --- ### ⚙️ Configuration Updates - **Updated Qwen3_32B model YAML configs** - Added/modified configurations optimized for: - MI300X - MI355X - Adjusted parameters for compatibility and stable execution --- ## Validation - Verified updated configs load and execute successfully on MI300X and MI355X environments - Confirmed documentation links and structure render correctly --- ## Checklist - [x] Added `posttraining.md` - [x] Updated `docs/README.md` - [x] Modified Qwen3_32B YAML configs - [x] Verified changes locally

Co-authored-by: Mingyu Yang <Mingyu.Yang@amd.com> Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com> Co-authored-by: Kailash Gogineni <gkailashnath1998@gmail.com> Co-authored-by: HuangWei-95 <Wei.Huang4@amd.com> Co-authored-by: HuangWei-95 <weihuan@amd.com> Co-authored-by: Xiaoming-AMD <Xiaoming.Peng@amd.com> Co-authored-by: WangLingxun <linxwang@amd.com>

…578) Expand projection.md with memory projection and performance details.

…581) Hook Megatron validate_args alongside parse_args so Primus-injected arguments are validated consistently, and run additional ROCM-specific argument checks during initialization.

primus/backends/megatron/core/models/hybrid/hybrid_mamba_mla_layer_specs.py

+    from megatron.core.tensor_parallel import (
+        InferenceLayerNormColumnParallelLinear,
+        InferenceRowParallelLinear,
+    )


vidushi8 and others added 12 commits February 12, 2026 19:13

Update torchtitan batxh size and enable CE fusion

43eacb6

update MI355 yaml for better perf

8db36dc

update yaml

143d593

tune hybrid model mi300x configs

365758c

tune hybrid model mi355x configs

06d8e1e

Expand projection.md with memory projection and performance details. (#…

fadaeb1

…578) Expand projection.md with memory projection and performance details.

Merge branch 'main' into release/v26.2

4262afd

update yamls to fix regressions and standardize

d32360e

fix(megatron): patch validate_args and add ROCM argument validation (#…

bc4c861

…581) Hook Megatron validate_args alongside parse_args so Primus-injected arguments are validated consistently, and run additional ROCM-specific argument checks during initialization.

Add unit tests for mamba and hybrid models

ecd27b0

github-code-quality bot found potential problems Mar 11, 2026

View reviewed changes

primus/backends/megatron/core/models/hybrid/hybrid_mamba_mla_layer_specs.py

Comment on lines +31 to +34

from megatron.core.tensor_parallel import (

InferenceLayerNormColumnParallelLinear,

InferenceRowParallelLinear,

)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Megatron-LM] Add unit tests to test Mamba and Zebra-Llama pretraining #595

[Megatron-LM] Add unit tests to test Mamba and Zebra-Llama pretraining #595
clairesonglee wants to merge 12 commits intomainfrom
dev/clairlee/add-hybrid-model-unit-tests

clairesonglee commented Mar 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

clairesonglee commented Mar 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants