Skip to content

feat(runner): align pretrain env bootstrap with legacy behavior and framework-local hooks#598

Merged
Xiaoming-AMD merged 6 commits intomainfrom
feat/runner-pretrain-env-sync
Apr 7, 2026
Merged

feat(runner): align pretrain env bootstrap with legacy behavior and framework-local hooks#598
Xiaoming-AMD merged 6 commits intomainfrom
feat/runner-pretrain-env-sync

Conversation

@WangLingxun
Copy link
Copy Markdown
Collaborator

Summary

  • Align pretrain runtime environment bootstrap with legacy behavior, including deterministic setup and HipBLASLt tuning path integration in runner entry flow.
  • Refactor pretrain framework dispatch to support explicit --backend override (with conflict validation against config) and run ordered framework-local shell hooks before prepare.py.
  • Localize framework dependency bootstrap by adding per-framework install hooks and requirement files for MaxText, Megatron, and TorchTitan; move MaxText JAX pip installation out of prepare.py into hook scripts.

… HipBLASLt tuning

Align runner-based pretrain flow with legacy examples by syncing deterministic and HipBLASLt tuning env semantics. Add deterministic env exports in base_env, propagate HipBLASLt stage logic in pretrain hook, support stage-2 tune-only skip in direct launcher, and extend default container env passthrough list for related tuning variables.
… and localize framework dependencies

Update pretrain dispatch to support --backend override with conflict checks against config, execute framework-local shell hooks before prepare.py, and localize per-framework dependency bootstrap scripts.
WangLingxun added a commit that referenced this pull request Mar 26, 2026
- Add Megatron-Bridge pretrain support by implementing MegatronBridgePretrainTrainer and wiring it to the Bridge pretrain entrypoint.
- Introduce Megatron-Bridge pretrain module config and example YAMLs (qwen3_32b_pretrain, qwen3_8b_pretrain) with recipe-compatible overrides.
- Add framework-local pretrain hooks for Megatron-Bridge (prepare.py, 00_install_requirements.sh, requirements file) to align with runner hook conventions.
- Depend on runner pretrain dispatch/hook refactor from PR #598 commit c9dd3c7 (backend override + framework-local hooks execution).
WangLingxun added a commit that referenced this pull request Mar 26, 2026
- Add Megatron-Bridge pretrain support by implementing MegatronBridgePretrainTrainer and wiring it to the Bridge pretrain entrypoint.
- Introduce Megatron-Bridge pretrain module config and example YAMLs (qwen3_32b_pretrain, qwen3_8b_pretrain) with recipe-compatible overrides.
- Add framework-local pretrain hooks for Megatron-Bridge (prepare.py, 00_install_requirements.sh, requirements file) to align with runner hook conventions.
- Depend on runner pretrain dispatch/hook refactor from PR #598 commit c9dd3c7 (backend override + framework-local hooks execution).
@Xiaoming-AMD Xiaoming-AMD merged commit b2a561b into main Apr 7, 2026
5 checks passed
@WangLingxun WangLingxun deleted the feat/runner-pretrain-env-sync branch April 7, 2026 08:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants