Skip to content

Latest commit

 

History

History
114 lines (83 loc) · 6.21 KB

File metadata and controls

114 lines (83 loc) · 6.21 KB

🏋️ Training and Inference

Environment Setup

Follow setup instructions from README.

$ conda activate crossover

Train Instance Baseline

Adjust path parameters in configs/train/train_instance_baseline.yaml and run the following:

$ bash scripts/train/train_instance_baseline.sh

Train Instance Retrieval Pipeline

Adjust path parameters in configs/train/train_instance_crossover.yaml and run the following:

$ bash scripts/train/train_instance_crossover.sh

Train Scene Retrieval Pipeline

Adjust path/configuration parameters in configs/train/train_scene_crossover.yaml. You can also add your customised dataset or choose to train on any combination of Scannet, 3RScan, ARKitScenes & MultiScan. Run the following:

$ bash scripts/train/train_scene_crossover.sh

The scene retrieval pipeline uses the trained weights from instance retrieval pipeline (for object feature calculation), please ensure to update task:UnifiedTrain:object_enc_ckpt in the config file when training.

Checkpoint Inventory

We provide all available checkpoints on huggingface 👉 here. Detailed descriptions in the table below:

instance_baseline
Description Checkpoint Link
Instance Baseline trained on 3RScan 3RScan
Instance Baseline trained on ScanNet ScanNet
Instance Baseline trained on ScanNet + 3RScan ScanNet+3RScan
instance_crossover
Description Checkpoint Link
Instance CrossOver trained on 3RScan 3RScan
Instance CrossOver trained on ScanNet ScanNet
Instance CrossOver trained on ScanNet + 3RScan ScanNet+3RScan
Instance CrossOver trained on ScanNet + 3RScan + ARKitScenes + MultiScan ScanNet+3RScan+ARKitScenes+MultiScan
scene_crossover
Description Checkpoint Link
Unified CrossOver trained on ScanNet + 3RScan ScanNet+3RScan
Unified CrossOver trained on ScanNet + 3RScan + ARKitScenes + MultiScan ScanNet+3RScan+ARKitScenes+MultiScan

🛡️ Single Inference

Instance Inference

We provide script to perform instance-level cross-modal retrieval inference on a single scan, and report retrieval metrics and matched objects within the scene, across all available modality pairs. Detailed usage in the file. Quick instructions below:

$ python single_inference/instance_inference.py

Various configurable parameters:

  • --dataset: Dataset name - Options: scannet, scan3r, arkitscenes, multiscan
  • --process_dir: Path to processed features directory containing preprocessed object data
  • --ckpt: Path to the pre-trained instance crossover model checkpoint (details here), example_path: ./checkpoints/instance_crossover_scannet+scan3r+multiscan+arkitscenes.pth
  • --scan_id: Scan ID to run inference on (e.g., scene_00004_00)
  • --modalities: List of modalities to use (default: ['rgb', 'point', 'cad', 'referral'])
  • --input_dim_3d: Input dimension for 3D features (default: 384)
  • --input_dim_2d: Input dimension for 2D features (default: 1536)
  • --input_dim_1d: Input dimension for 1D features (default: 768)
  • --out_dim: Output embedding dimension (default: 768)

Note: This script requires preprocessed object data for the target scene, namely objectsDataMultimodal.npz files generated during data preprocessing as described in DATA.md. The scan must have valid object instances across the specified modalities.

Scene Inference

We release a script to perform inference (generate scene-level embeddings) on a single scan of all supported datasets. Detailed usage in the file. Quick instructions below:

$ python single_inference/scene_inference.py

Various configurable parameters:

  • --dataset: dataset name, Scannet/Scan3R
  • --data_dir: data directory (eg: ./datasets/Scannet, assumes similar structure as in preprocess.md).
  • --process_dir: preprocessed data directory (this can point to the downloaded preprocessed directory)
  • --ckpt: Path to the pre-trained scene crossover model checkpoint (details here), example_path: (./checkpoints/scene_crossover_scannet+scan3r.pth/).
  • --scan_id: the scan id from the dataset you'd like to calculate embeddings for (if not provided, embeddings for all scans are calculated).

The script will output embeddings in the same format as provided here.

📊 Evaluation

Cross-Modal Object Retrieval

Run the following script (refer to the script to run instance baseline/instance crossover) for object instance + scene retrieval results using the instance-based methods. Detailed usage inside the script.

$ bash scripts/evaluation/eval_object_retrieval.sh

Running this script for 3RScan dataset will also show point-to-point temporal instance matching results on the RIO category subset.

Cross-Modal Scene Retrieval

Run the following script (for scene crossover). Detailed usage inside the script.

$ bash scripts/evaluation/eval_scene_retrieval.sh