🏋️ Training and Inference

Environment Setup

Follow setup instructions from README.

$ conda activate crossover

Train Instance Baseline

Adjust path parameters in configs/train/train_instance_baseline.yaml and run the following:

$ bash scripts/train/train_instance_baseline.sh

Train Instance Retrieval Pipeline

Adjust path parameters in configs/train/train_instance_crossover.yaml and run the following:

$ bash scripts/train/train_instance_crossover.sh

Train Scene Retrieval Pipeline

Adjust path/configuration parameters in configs/train/train_scene_crossover.yaml. You can also add your customised dataset or choose to train on any combination of Scannet, 3RScan, ARKitScenes & MultiScan. Run the following:

$ bash scripts/train/train_scene_crossover.sh

The scene retrieval pipeline uses the trained weights from instance retrieval pipeline (for object feature calculation), please ensure to update task:UnifiedTrain:object_enc_ckpt in the config file when training.

Checkpoint Inventory

We provide all available checkpoints on huggingface 👉 here. Detailed descriptions in the table below:

`instance_baseline`

Description	Checkpoint Link
Instance Baseline trained on 3RScan	3RScan
Instance Baseline trained on ScanNet	ScanNet
Instance Baseline trained on ScanNet + 3RScan	ScanNet+3RScan

`instance_crossover`

Description	Checkpoint Link
Instance CrossOver trained on 3RScan	3RScan
Instance CrossOver trained on ScanNet	ScanNet
Instance CrossOver trained on ScanNet + 3RScan	ScanNet+3RScan
Instance CrossOver trained on ScanNet + 3RScan + ARKitScenes + MultiScan	ScanNet+3RScan+ARKitScenes+MultiScan

`scene_crossover`

Description	Checkpoint Link
Unified CrossOver trained on ScanNet + 3RScan	ScanNet+3RScan
Unified CrossOver trained on ScanNet + 3RScan + ARKitScenes + MultiScan	ScanNet+3RScan+ARKitScenes+MultiScan

🛡️ Single Inference

Instance Inference

We provide script to perform instance-level cross-modal retrieval inference on a single scan, and report retrieval metrics and matched objects within the scene, across all available modality pairs. Detailed usage in the file. Quick instructions below:

$ python single_inference/instance_inference.py

Various configurable parameters:

--dataset: Dataset name - Options: scannet, scan3r, arkitscenes, multiscan
--process_dir: Path to processed features directory containing preprocessed object data
--ckpt: Path to the pre-trained instance crossover model checkpoint (details here), example_path: ./checkpoints/instance_crossover_scannet+scan3r+multiscan+arkitscenes.pth
--scan_id: Scan ID to run inference on (e.g., scene_00004_00)
--modalities: List of modalities to use (default: ['rgb', 'point', 'cad', 'referral'])
--input_dim_3d: Input dimension for 3D features (default: 384)
--input_dim_2d: Input dimension for 2D features (default: 1536)
--input_dim_1d: Input dimension for 1D features (default: 768)
--out_dim: Output embedding dimension (default: 768)

Note: This script requires preprocessed object data for the target scene, namely objectsDataMultimodal.npz files generated during data preprocessing as described in DATA.md. The scan must have valid object instances across the specified modalities.

Scene Inference

We release a script to perform inference (generate scene-level embeddings) on a single scan of all supported datasets. Detailed usage in the file. Quick instructions below:

$ python single_inference/scene_inference.py

Various configurable parameters:

--dataset: dataset name, Scannet/Scan3R
--data_dir: data directory (eg: ./datasets/Scannet, assumes similar structure as in preprocess.md).
--process_dir: preprocessed data directory (this can point to the downloaded preprocessed directory)
--ckpt: Path to the pre-trained scene crossover model checkpoint (details here), example_path: (./checkpoints/scene_crossover_scannet+scan3r.pth/).
--scan_id: the scan id from the dataset you'd like to calculate embeddings for (if not provided, embeddings for all scans are calculated).

The script will output embeddings in the same format as provided here.

📊 Evaluation

Cross-Modal Object Retrieval

Run the following script (refer to the script to run instance baseline/instance crossover) for object instance + scene retrieval results using the instance-based methods. Detailed usage inside the script.

$ bash scripts/evaluation/eval_object_retrieval.sh

Running this script for 3RScan dataset will also show point-to-point temporal instance matching results on the RIO category subset.

Cross-Modal Scene Retrieval

Run the following script (for scene crossover). Detailed usage inside the script.

$ bash scripts/evaluation/eval_scene_retrieval.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🏋️ Training and Inference

Environment Setup

Train Instance Baseline

Train Instance Retrieval Pipeline

Train Scene Retrieval Pipeline

Checkpoint Inventory

`instance_baseline`

`instance_crossover`

`scene_crossover`

🛡️ Single Inference

Instance Inference

Scene Inference

📊 Evaluation

Cross-Modal Object Retrieval

Cross-Modal Scene Retrieval

FilesExpand file tree

TRAIN.md

Latest commit

History

TRAIN.md

File metadata and controls

🏋️ Training and Inference

Environment Setup

Train Instance Baseline

Train Instance Retrieval Pipeline

Train Scene Retrieval Pipeline

Checkpoint Inventory

instance_baseline

instance_crossover

scene_crossover

🛡️ Single Inference

Instance Inference

Scene Inference

📊 Evaluation

Cross-Modal Object Retrieval

Cross-Modal Scene Retrieval

`instance_baseline`

`instance_crossover`

`scene_crossover`