Here are model checkpoints for object recognition models powered by RandSF.Q-tsim and SlotContrast.
Using DINO2 S/14 for encoding.
Models are trained on dataset YTVIS (high quality), with random seeds 42, 43 and 44.
Input resolution is 256x256 (224x224).
Slot matching threshold is 1e-1@IoU.