Skip to content

Releases: Genera1Z/RandSF.Q

dataset-movi_c

21 Oct 10:58
6a1db27

Choose a tag to compare

This is dataset MOVi-C in LMDB database format, which can be used off-the-shelf in this repo.

archive-videosaur

21 Oct 10:53
6a1db27

Choose a tag to compare

Here are model checkpoints for baseline VideoSAUR.
Using DINO2 S/14 for encoding.
Models are trained on datasets MOVi-C/D and YTVIS (high quality), with random seeds 42, 43 and 44.
Input resolution is 256x256 (224x224).

archive-slotcontrast

21 Oct 10:52
6a1db27

Choose a tag to compare

Here are model checkpoints for baseline SlotContrast.
Using DINO2 S/14 for encoding.
Models are trained on datasets MOVi-C/D and YTVIS (high quality), with random seeds 42, 43 and 44.
Input resolution is 256x256 (224x224).

archive-recogn

21 Oct 15:22
2f4fc93

Choose a tag to compare

Here are model checkpoints for object recognition models powered by RandSF.Q-tsim and SlotContrast.
Using DINO2 S/14 for encoding.
Models are trained on dataset YTVIS (high quality), with random seeds 42, 43 and 44.
Input resolution is 256x256 (224x224).
Slot matching threshold is 1e-1@IoU.

archive-randsfq-tsim

21 Oct 10:54
6a1db27

Choose a tag to compare

Here are model checkpoints for our RandSF.Q, built upon SlotContrast, but using time similarity loss.
Using DINO2 S/14 for encoding.
Models are trained on datasets MOVi-C/D and YTVIS (high quality), with random seeds 42, 43 and 44.
Input resolution is 256x256 (224x224).

archive-randsfq

21 Oct 10:45
6a1db27

Choose a tag to compare

Here are model checkpoints for our RandSF.Q, built upon SlotContrast.
Using DINO2 S/14 for encoding.
Models are trained on datasets MOVi-C/D and YTVIS (high quality), with random seeds 42, 43 and 44.
Input resolution is 256x256 (224x224).

dataset-ytvis

21 Oct 10:38
869b592

Choose a tag to compare

This is dataset YTVIS in LMDB database format, which can be used off-the-shelf in this repo.