FIrst of all, maybe it is my misunderstanding of the paper, so hope somebody could explain it for me, thanks! :
in the paper, the loss is defined as

where e is the codebook defined at the beginning of the Section:

So, in the paper, the codebook loss and commitment loss are MSE between z_e(x) and e.
However, in the implementation, they are implemented as MSE between z_e(x)(inputs) and z_q(x)(quantized), where variable quantized means quantized encoding of the image, namely z_q:

Are they actually the same thing? why?
- If the paper stated is right. how the dimension matches between
z_e(x)(H' * W' * D) and e(K * D)?
- if the implementation is right. how
z_q(x)(quantized) backprop since its calculation contains argmin?