Hi,
First, congrats for your excellent work!
I was a little bit confusing about the results. I download the [Scannet200_OpenVocab_ISBNet-GSAM] results and each scene is a 600xN matrix. Is that mean you have 600 mask proposal predicted for each scene?
Thanks.