You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In a MoE (Mixture of Experts) model, each MoE block typically uses an MLP as its component. During the forward pass, the hidden_states tensor is often flattened using hidden_states.view(-1, hidden_states.shape[-1]). As a result, when tiled_mlp_forward_common tries to unpack the shape with bs, seqlen, hidden = x.shape, it throws an error: ValueError: not enough values to unpack (expected 3, got 2).