Skip to content

(Maybe) in-consistency between VQ-VAE paper and its implementation.  #252

@Apollo1840

Description

@Apollo1840

FIrst of all, maybe it is my misunderstanding of the paper, so hope somebody could explain it for me, thanks! :


in the paper, the loss is defined as
Screenshot from 2022-08-30 11-52-26

where e is the codebook defined at the beginning of the Section:
Screenshot from 2022-08-30 11-57-36

So, in the paper, the codebook loss and commitment loss are MSE between z_e(x) and e.

However, in the implementation, they are implemented as MSE between z_e(x)(inputs) and z_q(x)(quantized), where variable quantized means quantized encoding of the image, namely z_q:
Screenshot from 2022-08-30 11-58-19

Are they actually the same thing? why?

  • If the paper stated is right. how the dimension matches between z_e(x)(H' * W' * D) and e(K * D)?
  • if the implementation is right. how z_q(x)(quantized) backprop since its calculation contains argmin?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions