Skip to content

Conversation

danielvegamyhre
Copy link
Contributor

@danielvegamyhre danielvegamyhre commented Oct 14, 2025

Compile dense layers, run MoE layer in eager.

Commands

mxfp8 grouped mm:

LOG_RANK=4 NGPU=8 CONFIG_FILE="/home/${USER}/torchtitan/torchtitan/models/llama4/train_configs/llama4_17bx16e.toml" ./run_train.sh \
--metrics.log_freq=10 --training.steps=1000  \
--parallelism.data_parallel_shard_degree=4 \
--parallelism.expert_parallel_degree=4 \
--parallelism.tensor_parallel_degree=1 \
--parallelism.expert_tensor_parallel_degree=1 \
--training.seq_len=8192 \
--training.local_batch_size=12 \
--model.print_after_conversion \
--activation_checkpoint.mode="full" \
--parallelism.pipeline_parallel_degree 2 \
--parallelism.pipeline_parallel_schedule "Interleaved1F1B" \
--parallelism.pipeline_parallel_layers_per_stage 1 \
--model.converters="quantize.grouped_mm.mx,quantize.linear.mx" \
--quantize.grouped_mm.mx.fqns="experts" \
--quantize.linear.mx.filter_fqns="output,moe,wk,wv" \
--compile.enable

bf16 baseline:

LOG_RANK=4 NGPU=8 CONFIG_FILE="/home/${USER}/torchtitan/torchtitan/models/llama4/train_configs/llama4_17bx16e.toml" ./run_train.sh \
--metrics.log_freq=10 --training.steps=1000  \
--parallelism.data_parallel_shard_degree=4 \
--parallelism.expert_parallel_degree=4 \
--parallelism.tensor_parallel_degree=1 \
--parallelism.expert_tensor_parallel_degree=1 \
--training.seq_len=8192 \
--training.local_batch_size=12 \
--model.print_after_conversion \
--activation_checkpoint.mode="full" \
--parallelism.pipeline_parallel_degree 2 \
--parallelism.pipeline_parallel_schedule "Interleaved1F1B" \
--parallelism.pipeline_parallel_layers_per_stage 1 \
--compile.enable

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Oct 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant