Skip to content

aipixel/TapSampling

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

[ICML 2026] TapSampling: Inference-Time Sampling with a Task-Progress-Understanding Verifier for Robotic Manipulation

Sizhe Zhao1, Shengping Zhang1,2✉️, Shuo Yang1, Weiyu Zhao1, Shuigen Wang3, Xiangyang Ji4

1 Harbin Institute of Technology, 2 Harbin Institute of Technology (Weihai) Qingdao Research Institute,
3 Iray Technology co., Ltd., 4. Tsinghua University

Project Paper Hugging Face Collection

🛠️ Installation

git clone https://github.com/aipixel/TapSampling.git
cd TapSampling
export TAPS_PATH="$(pwd)"

conda create -n taps python==3.10
conda activate taps
cd calvin
bash install.sh
pip install -r requirements.txt
pip install "dlimp @ git+https://github.com/kvablack/dlimp.git"
pip install "flash-attn==2.5.5" --no-build-isolation

📦 Dataset/Checkpoints Download

Download CALVIN ABC->D dataset

If you want to train the model, the full CALVIN ABC->D dataset (~517 GB) should be downloaded. For testing with the released checkpoint, only a subset of the dataset needs to be downloaded.

# Option 1: Download the full dataset
cd calvin/dataset
bash download_data.sh ABC

# Option 2: Download only the subset required for inference
cd calvin/dataset
bash download_part_data.sh      

After the download is complete, the dataset directory structure should be:

calvin/dataset/task_ABC_D
├── training
└── validation

Download Policy (VPP) Checkpoints

TapSampling is a policy-agnostic inference-time sampling framework. We take the VPP policy as an example.

# Download the VPP policy checkpoints
python video-prediction-policy/download_vpp_checkpoints.py

After the download is complete, the VPP checkpoints directory structure should be:

video-prediction-policy/official_checkpoints
├── clip-vit-base-patch32/
├── dp-calvin/
└── svd-robot-calvin-ft/

Download TapSampling Checkpoints

# Download the TapSampling checkpoints
python tapsampling/download_base_model_checkpoints.py
python tapsampling/download_tapsampling_checkpoints.py

After the download is complete, the checkpoints directory structure should be:

tapsampling/pretrained_models
├── configs/
├── prism-qwen25-extra-dinosiglip-224px-0_5b/
    └── checkpoints/step-020792-epoch-01-loss=0.5268.pt
├── Qwen2.5-0.5B/
├── vit_large_patch14_reg4_dinov2.lvd142m/
└── ViT-SO400M-14-SigLip/
action_vae/mvae/mvae_24_split
├── checkpoint_50000.pt
└── config.yaml
tapsampling/official_checkpoint/last
├── lora_adapter/
├── action_head--65000_checkpoint.pt
└── (other files)

🚀 Training

Training the Action-VAE

cd "$TAPS_PATH/action_vae"

# Only run once to create an action file: $TAPS_PATH/action_vae/actions.h5
python prepare_actions.py --dataset_dir ../calvin/dataset/task_ABC_D/training

# Train. Check the script for detail configurations (e.g. checkpoint path and output path).
bash train_vae.sh

Training the TapSampling Verifier

cd "$TAPS_PATH/tapsampling"

# Only run once to create an annotate file: $TAPS_PATH/tapsampling/complete_percentage.pkl
python tapsampling/annotate_complete_percentage.py --dataset_root ../calvin/dataset/task_ABC_D

# Train. Check the script for detail configurations (e.g. checkpoint path and output path).
bash train_calvin_sp.sh

🔍 Inference with Released Checkpoint

Deploy the Action-VAE

cd "$TAPS_PATH/action_vae"

# Deploy.
bash deploy_action_vae.sh

Evaluate on CALVIN with VPP + TapSampling Verifier

If the Action-VAE port changed, maybe you need to update server configuration in the video-prediction-policy/policy_evaluation/sampling_wrapper.py.

Importantly, check the model path in the video-prediction-policy/eval.sh before running.

cd "$TAPS_PATH/video-prediction-policy"
bash eval.sh

🙏 Acknowledgements

We thank DART, VPP, CALVIN, and VLA-Adapter for their excellent open-source work.

📖 Citation

@inproceedings{zhao2026tapsampling,
  title={{T}ap{S}ampling: Inference-Time Sampling with a Task-Progress-Understanding Verifier for Robotic Manipulation},
  author={Sizhe Zhao and Shengping Zhang and Shuo Yang and Weiyu Zhao and Shuigen Wang and Xiangyang Ji},
  booktitle={Forty-third International Conference on Machine Learning},
  year={2026}
}

About

[ICML 2026] TapSampling: Inference-Time Sampling with a Task-Progress-Understanding Verifier for Robotic Manipulation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages