I compared the forward pass speed of the larger ImageNet model with DenseNet-121 and the latter actually works faster. After benchmarking my guess is that CondenseConv layer is the cause of the slowdown due to memory transfers in ShuffleLayer and torch.index_select.
@ShichenLiu can you comment on this, did you get better performance compared to DenseNet-121 in your experiments?