[ICML 2026] TapSampling: Inference-Time Sampling with a Task-Progress-Understanding Verifier for Robotic Manipulation
Sizhe Zhao1, Shengping Zhang1,2✉️, Shuo Yang1, Weiyu Zhao1, Shuigen Wang3, Xiangyang Ji4
1 Harbin Institute of Technology, 2 Harbin Institute of Technology (Weihai) Qingdao Research Institute,
3 Iray Technology co., Ltd., 4. Tsinghua University
git clone https://github.com/aipixel/TapSampling.git
cd TapSampling
export TAPS_PATH="$(pwd)"
conda create -n taps python==3.10
conda activate taps
cd calvin
bash install.sh
pip install -r requirements.txt
pip install "dlimp @ git+https://github.com/kvablack/dlimp.git"
pip install "flash-attn==2.5.5" --no-build-isolationIf you want to train the model, the full CALVIN ABC->D dataset (~517 GB) should be downloaded. For testing with the released checkpoint, only a subset of the dataset needs to be downloaded.
# Option 1: Download the full dataset
cd calvin/dataset
bash download_data.sh ABC
# Option 2: Download only the subset required for inference
cd calvin/dataset
bash download_part_data.sh After the download is complete, the dataset directory structure should be:
calvin/dataset/task_ABC_D
├── training
└── validation
TapSampling is a policy-agnostic inference-time sampling framework. We take the VPP policy as an example.
# Download the VPP policy checkpoints
python video-prediction-policy/download_vpp_checkpoints.pyAfter the download is complete, the VPP checkpoints directory structure should be:
video-prediction-policy/official_checkpoints
├── clip-vit-base-patch32/
├── dp-calvin/
└── svd-robot-calvin-ft/
# Download the TapSampling checkpoints
python tapsampling/download_base_model_checkpoints.py
python tapsampling/download_tapsampling_checkpoints.pyAfter the download is complete, the checkpoints directory structure should be:
tapsampling/pretrained_models
├── configs/
├── prism-qwen25-extra-dinosiglip-224px-0_5b/
└── checkpoints/step-020792-epoch-01-loss=0.5268.pt
├── Qwen2.5-0.5B/
├── vit_large_patch14_reg4_dinov2.lvd142m/
└── ViT-SO400M-14-SigLip/
action_vae/mvae/mvae_24_split
├── checkpoint_50000.pt
└── config.yaml
tapsampling/official_checkpoint/last
├── lora_adapter/
├── action_head--65000_checkpoint.pt
└── (other files)
cd "$TAPS_PATH/action_vae"
# Only run once to create an action file: $TAPS_PATH/action_vae/actions.h5
python prepare_actions.py --dataset_dir ../calvin/dataset/task_ABC_D/training
# Train. Check the script for detail configurations (e.g. checkpoint path and output path).
bash train_vae.sh
cd "$TAPS_PATH/tapsampling"
# Only run once to create an annotate file: $TAPS_PATH/tapsampling/complete_percentage.pkl
python tapsampling/annotate_complete_percentage.py --dataset_root ../calvin/dataset/task_ABC_D
# Train. Check the script for detail configurations (e.g. checkpoint path and output path).
bash train_calvin_sp.sh
cd "$TAPS_PATH/action_vae"
# Deploy.
bash deploy_action_vae.sh
If the Action-VAE port changed, maybe you need to update server configuration in the video-prediction-policy/policy_evaluation/sampling_wrapper.py.
Importantly, check the model path in the video-prediction-policy/eval.sh before running.
cd "$TAPS_PATH/video-prediction-policy"
bash eval.sh
We thank DART, VPP, CALVIN, and VLA-Adapter for their excellent open-source work.
@inproceedings{zhao2026tapsampling,
title={{T}ap{S}ampling: Inference-Time Sampling with a Task-Progress-Understanding Verifier for Robotic Manipulation},
author={Sizhe Zhao and Shengping Zhang and Shuo Yang and Weiyu Zhao and Shuigen Wang and Xiangyang Ji},
booktitle={Forty-third International Conference on Machine Learning},
year={2026}
}