Follow setup instructions from README.
$ conda activate crossoverAdjust path parameters in configs/train/train_instance_baseline.yaml and run the following:
$ bash scripts/train/train_instance_baseline.shAdjust path parameters in configs/train/train_instance_crossover.yaml and run the following:
$ bash scripts/train/train_instance_crossover.shAdjust path/configuration parameters in configs/train/train_scene_crossover.yaml. You can also add your customised dataset or choose to train on any combination of Scannet, 3RScan, ARKitScenes & MultiScan. Run the following:
$ bash scripts/train/train_scene_crossover.shThe scene retrieval pipeline uses the trained weights from instance retrieval pipeline (for object feature calculation), please ensure to update
task:UnifiedTrain:object_enc_ckptin the config file when training.
We provide all available checkpoints on huggingface 👉 here. Detailed descriptions in the table below:
| Description | Checkpoint Link |
|---|---|
| Instance Baseline trained on 3RScan | 3RScan |
| Instance Baseline trained on ScanNet | ScanNet |
| Instance Baseline trained on ScanNet + 3RScan | ScanNet+3RScan |
| Description | Checkpoint Link |
|---|---|
| Instance CrossOver trained on 3RScan | 3RScan |
| Instance CrossOver trained on ScanNet | ScanNet |
| Instance CrossOver trained on ScanNet + 3RScan | ScanNet+3RScan |
| Instance CrossOver trained on ScanNet + 3RScan + ARKitScenes + MultiScan | ScanNet+3RScan+ARKitScenes+MultiScan |
| Description | Checkpoint Link |
|---|---|
| Unified CrossOver trained on ScanNet + 3RScan | ScanNet+3RScan |
| Unified CrossOver trained on ScanNet + 3RScan + ARKitScenes + MultiScan | ScanNet+3RScan+ARKitScenes+MultiScan |
We provide script to perform instance-level cross-modal retrieval inference on a single scan, and report retrieval metrics and matched objects within the scene, across all available modality pairs. Detailed usage in the file. Quick instructions below:
$ python single_inference/instance_inference.pyVarious configurable parameters:
--dataset: Dataset name - Options:scannet,scan3r,arkitscenes,multiscan--process_dir: Path to processed features directory containing preprocessed object data--ckpt: Path to the pre-trained instance crossover model checkpoint (details here), example_path:./checkpoints/instance_crossover_scannet+scan3r+multiscan+arkitscenes.pth--scan_id: Scan ID to run inference on (e.g.,scene_00004_00)--modalities: List of modalities to use (default:['rgb', 'point', 'cad', 'referral'])--input_dim_3d: Input dimension for 3D features (default: 384)--input_dim_2d: Input dimension for 2D features (default: 1536)--input_dim_1d: Input dimension for 1D features (default: 768)--out_dim: Output embedding dimension (default: 768)
Note: This script requires preprocessed object data for the target scene, namely
objectsDataMultimodal.npzfiles generated during data preprocessing as described in DATA.md. The scan must have valid object instances across the specified modalities.
We release a script to perform inference (generate scene-level embeddings) on a single scan of all supported datasets. Detailed usage in the file. Quick instructions below:
$ python single_inference/scene_inference.pyVarious configurable parameters:
--dataset: dataset name, Scannet/Scan3R--data_dir: data directory (eg:./datasets/Scannet, assumes similar structure as inpreprocess.md).--process_dir: preprocessed data directory (this can point to the downloaded preprocessed directory)--ckpt: Path to the pre-trained scene crossover model checkpoint (details here), example_path: (./checkpoints/scene_crossover_scannet+scan3r.pth/).--scan_id: the scan id from the dataset you'd like to calculate embeddings for (if not provided, embeddings for all scans are calculated).
The script will output embeddings in the same format as provided here.
Run the following script (refer to the script to run instance baseline/instance crossover) for object instance + scene retrieval results using the instance-based methods. Detailed usage inside the script.
$ bash scripts/evaluation/eval_object_retrieval.shRunning this script for 3RScan dataset will also show point-to-point temporal instance matching results on the RIO category subset.
Run the following script (for scene crossover). Detailed usage inside the script.
$ bash scripts/evaluation/eval_scene_retrieval.sh