Skip to content

feat(pipeline): add zero-bubble-heuristic scheduling algorithm#618

Open
ChengYao-amd wants to merge 2 commits intomainfrom
dev/yc/add-zero-bubble-heuristic
Open

feat(pipeline): add zero-bubble-heuristic scheduling algorithm#618
ChengYao-amd wants to merge 2 commits intomainfrom
dev/yc/add-zero-bubble-heuristic

Conversation

@ChengYao-amd
Copy link
Contributor

@ChengYao-amd ChengYao-amd commented Mar 19, 2026

Summary

  • Adds a new zero-bubble-heuristic pipeline parallelism scheduling algorithm that uses a graph-based heuristic to explore 8 candidate schedules (combinations of allow_bubble_before_first_b, prioritize_b, no_bubble_greedy) and selects the one with the lowest bubble time.
  • Exposes configurable parameters (pp_max_mem, pp_cost_f, pp_cost_b, pp_cost_w) to control the memory budget and F/B/W cost model, enabling the scheduler to produce memory-aware schedules with realistic cost ratios.
  • Enhances the PP visualization tool (vis.py) with per-rank F/B/W time breakdown, correct cross-rank iteration time calculation, and detailed console output for easier performance analysis.

Changes

Core Algorithm

  • zerobubble_heuristic.py (new): Self-contained implementation of the zero-bubble-heuristic scheduler, ported from the internal Megatron ZB module into the Primus scheduler framework. Implements _Graph (DAG-based scheduling), _initial_solution (best-of-8 heuristic search), and ScheduleZeroBubbleHeuristic (the PipelineScheduleAlgo subclass that generates the schedule table with proper send/recv communication pairs).

Integration

  • pipeline_launcher.py: Registers zero-bubble-heuristic as a valid algorithm, passes max_mem/cost_f/cost_b/cost_w kwargs to the schedule factory, and adds dump_pp_data support via schedule_wrapper.
  • primus_turbo.py: Enables split W-grad operations for the new algorithm.
  • schedule_table_factory.py: Registers ScheduleZeroBubbleHeuristic in the algorithm map; replaces @lru_cache with a manual dict cache to support unhashable kwargs (lists).
  • primus_pipeline.yaml: Adds config entries for pp_max_mem, pp_cost_f, pp_cost_b, pp_cost_w.
  • megatron_pretrain_trainer.py: Adds post-training PP data dump for visualization/analysis.

Visualization & Analysis

  • vis.py: Extracts get_fbw_times() helper; fixes iter_time to use max across all ranks (not just rank-0); adds per-rank F/B/W time and percentage breakdown in console output.
  • pp_simulation.yaml: Adds two example simulation configs (zb-heuristic-mem8, zb-heuristic-mem10).

Algorithm Visualization

image

@ChengYao-amd ChengYao-amd force-pushed the dev/yc/add-zero-bubble-heuristic branch from 139ce7a to d430dbb Compare March 19, 2026 11:07
@araina-amd araina-amd self-requested a review March 19, 2026 21:56
@araina-amd
Copy link
Contributor

I did the comparison for Qwen3.5-235B (64 GPUs, MI355X, PP=4, VPP=1, EP=8, SeqLen=4096). Megatron ILP by Sea AI lab still performs better than the primus pipeline though the difference is much small now.
What I will do is for zero bubble I will still point the projection model to Megatron ILP by Sea AI lab for now and fallback to primus pipeline for all other cases.

<style> </style>

| Config | zerobubble (ms) | zerobubble Tok/s/GPU | zerobubble Bubble% | zb-heuristic (ms) | zb-heuristic Tok/s/GPU | zb-heuristic Bubble% | megatron-ilp (ms) | megatron-ilp Tok/s/GPU | megatron-ilp Bubble% | Best | Final Tok/s/GPU

-- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | --
1 | BF16, GBS=1024, MBS=2, RC=5 | 12,385.12 | 5,292 | 7.22% | 12,296.10 | 5,330 | 6.55% | 11,996.55 | 5,463 | 1.18% | megatron-ilp | 5,093
2 | FP8, GBS=1024, MBS=2, RC=5 | 11,788.54 | 5,559 | 7.18% | 11,743.55 | 5,581 | 6.83% | 11,425.97 | 5,736 | 1.15% | megatron-ilp | 5,329
3 | BF16, GBS=2048, MBS=2, RC=5 | 24,323.48 | 5,389 | 5.77% | 24,200.74 | 5,416 | 5.29% | 23,930.79 | 5,477 | 1.18% | megatron-ilp | 5,285
4 | FP8, GBS=2048, MBS=2, RC=5 | 23,584.90 | 5,557 | 5.73% | 23,519.74 | 5,573 | 5.47% | 23,212.43 | 5,647 | 1.15% | megatron-ilp | 5,442
5 | BF16, GBS=2048, MBS=4, RC=10 | 24,483.61 | 5,353 | 6.95% | 24,346.47 | 5,384 | 6.42% | 23,743.03 | 5,520 | 1.15% | megatron-ilp | 5,325
6 | FP8, GBS=2048, MBS=4, RC=10 | 22,176.38 | 5,910 | 6.95% | 22,176.38 | 5,910 | 6.95% | 21,598.20 | 6,069 | 1.48% | megatron-ilp | 5,833

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants