Skip to content

Conversation

XilunWu
Copy link
Contributor

@XilunWu XilunWu commented Oct 15, 2025

Stack from ghstack (oldest at bottom):

freqs_cis is sensitive to the sequence order. CP load balancing will shuffle the samples, so each batch will have different orders. As a result, we will have to lift these order senstive buffer to the inputs and broadcast them along the batch dimension so that PP will correctly shard freqs_cis without messing up the correctness.

Pull-Request-resolved: #1797

freqs_cis is sensitive to the sequence order. CP load balancing will shuffle the samples, so each batch will have different orders.  As a result, we will have to lift these order senstive buffer to the inputs and broadcast them along the batch dimension so that PP will correctly shard freqs_cis without messing up the correctness.

ghstack-source-id: 0612109
Pull-Request-resolved: #1797

[ghstack-poisoned]
@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Oct 15, 2025
@XilunWu XilunWu marked this pull request as draft October 15, 2025 18:02
freqs_cis is sensitive to the sequence order. CP load balancing will shuffle the samples, so each batch will have different orders.  As a result, we will have to lift these order senstive buffer to the inputs and broadcast them along the batch dimension so that PP will correctly shard freqs_cis without messing up the correctness.

Pull-Request-resolved: #1797

[ghstack-poisoned]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant