In distributed execution, the ShufflingSampler potentially samples duplicate data points to ensure synchronized batch processing on each worker. The duplicate data points contribute twice to E- and M-step.
In its current version, the Trainer includes terms associated to duplicate data points when evaluating free energies (e.g., here, here, here, here). This can lead to significantly different results compared to a sequential execution without the ShufflingSampler and hence without duplicate datapoints (e.g., for a SSSC-House benchmark (\sigma=50, D=144, H=512, |K|=30), I observed a free energy difference on the order of 7).
Furthermore, the additional data points lead to additional terms for the Theta updates (in update_param_batch methods), s..t. different M-step results compared to the sequential execution setting are obtained.
In distributed execution, the ShufflingSampler potentially samples duplicate data points to ensure synchronized batch processing on each worker. The duplicate data points contribute twice to E- and M-step.
In its current version, the
Trainerincludes terms associated to duplicate data points when evaluating free energies (e.g., here, here, here, here). This can lead to significantly different results compared to a sequential execution without the ShufflingSampler and hence without duplicate datapoints (e.g., for a SSSC-House benchmark (\sigma=50, D=144, H=512, |K|=30), I observed a free energy difference on the order of 7).Furthermore, the additional data points lead to additional terms for the Theta updates (in
update_param_batchmethods), s..t. different M-step results compared to the sequential execution setting are obtained.