GitHub - aipixel/TapSampling: [ICML 2026] TapSampling: Inference-Time Sampling with a Task-Progress-Understanding Verifier for Robotic Manipulation

[ICML 2026] TapSampling: Inference-Time Sampling with a Task-Progress-Understanding Verifier for Robotic Manipulation

Sizhe Zhao¹, Shengping Zhang^1,2✉️, Shuo Yang¹, Weiyu Zhao¹, Shuigen Wang³, Xiangyang Ji⁴

1 Harbin Institute of Technology, 2 Harbin Institute of Technology (Weihai) Qingdao Research Institute,
3 Iray Technology co., Ltd., 4. Tsinghua University

🛠️ Installation

git clone https://github.com/aipixel/TapSampling.git
cd TapSampling
export TAPS_PATH="$(pwd)"

conda create -n taps python==3.10
conda activate taps
cd calvin
bash install.sh
pip install -r requirements.txt
pip install "dlimp @ git+https://github.com/kvablack/dlimp.git"
pip install "flash-attn==2.5.5" --no-build-isolation

📦 Dataset/Checkpoints Download

Download CALVIN ABC->D dataset

If you want to train the model, the full CALVIN ABC->D dataset (~517 GB) should be downloaded. For testing with the released checkpoint, only a subset of the dataset needs to be downloaded.

# Option 1: Download the full dataset
cd calvin/dataset
bash download_data.sh ABC

# Option 2: Download only the subset required for inference
cd calvin/dataset
bash download_part_data.sh

After the download is complete, the dataset directory structure should be:

calvin/dataset/task_ABC_D
├── training
└── validation

Download Policy (VPP) Checkpoints

TapSampling is a policy-agnostic inference-time sampling framework. We take the VPP policy as an example.

# Download the VPP policy checkpoints
python video-prediction-policy/download_vpp_checkpoints.py

After the download is complete, the VPP checkpoints directory structure should be:

video-prediction-policy/official_checkpoints
├── clip-vit-base-patch32/
├── dp-calvin/
└── svd-robot-calvin-ft/

Download TapSampling Checkpoints

# Download the TapSampling checkpoints
python tapsampling/download_base_model_checkpoints.py
python tapsampling/download_tapsampling_checkpoints.py

After the download is complete, the checkpoints directory structure should be:

tapsampling/pretrained_models
├── configs/
├── prism-qwen25-extra-dinosiglip-224px-0_5b/
    └── checkpoints/step-020792-epoch-01-loss=0.5268.pt
├── Qwen2.5-0.5B/
├── vit_large_patch14_reg4_dinov2.lvd142m/
└── ViT-SO400M-14-SigLip/

action_vae/mvae/mvae_24_split
├── checkpoint_50000.pt
└── config.yaml

tapsampling/official_checkpoint/last
├── lora_adapter/
├── action_head--65000_checkpoint.pt
└── (other files)

🚀 Training

Training the Action-VAE

cd "$TAPS_PATH/action_vae"

# Only run once to create an action file: $TAPS_PATH/action_vae/actions.h5
python prepare_actions.py --dataset_dir ../calvin/dataset/task_ABC_D/training

# Train. Check the script for detail configurations (e.g. checkpoint path and output path).
bash train_vae.sh

Training the TapSampling Verifier

cd "$TAPS_PATH/tapsampling"

# Only run once to create an annotate file: $TAPS_PATH/tapsampling/complete_percentage.pkl
python tapsampling/annotate_complete_percentage.py --dataset_root ../calvin/dataset/task_ABC_D

# Train. Check the script for detail configurations (e.g. checkpoint path and output path).
bash train_calvin_sp.sh

🔍 Inference with Released Checkpoint

Deploy the Action-VAE

cd "$TAPS_PATH/action_vae"

# Deploy.
bash deploy_action_vae.sh

Evaluate on CALVIN with VPP + TapSampling Verifier

If the Action-VAE port changed, maybe you need to update server configuration in the video-prediction-policy/policy_evaluation/sampling_wrapper.py.

Importantly, check the model path in the video-prediction-policy/eval.sh before running.

cd "$TAPS_PATH/video-prediction-policy"
bash eval.sh

🙏 Acknowledgements

We thank DART, VPP, CALVIN, and VLA-Adapter for their excellent open-source work.

📖 Citation

@inproceedings{zhao2026tapsampling,
  title={{T}ap{S}ampling: Inference-Time Sampling with a Task-Progress-Understanding Verifier for Robotic Manipulation},
  author={Sizhe Zhao and Shengping Zhang and Shuo Yang and Weiyu Zhao and Shuigen Wang and Xiangyang Ji},
  booktitle={Forty-third International Conference on Machine Learning},
  year={2026}
}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
action_vae		action_vae
calvin		calvin
logs		logs
tapsampling		tapsampling
video-prediction-policy		video-prediction-policy
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

[ICML 2026] TapSampling: Inference-Time Sampling with a Task-Progress-Understanding Verifier for Robotic Manipulation

🛠️ Installation

📦 Dataset/Checkpoints Download

Download CALVIN ABC->D dataset

Download Policy (VPP) Checkpoints

Download TapSampling Checkpoints

🚀 Training

Training the Action-VAE

Training the TapSampling Verifier

🔍 Inference with Released Checkpoint

Deploy the Action-VAE

Evaluate on CALVIN with VPP + TapSampling Verifier

🙏 Acknowledgements

📖 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

[ICML 2026] TapSampling: Inference-Time Sampling with a Task-Progress-Understanding Verifier for Robotic Manipulation

🛠️ Installation

📦 Dataset/Checkpoints Download

Download CALVIN ABC->D dataset

Download Policy (VPP) Checkpoints

Download TapSampling Checkpoints

🚀 Training

Training the Action-VAE

Training the TapSampling Verifier

🔍 Inference with Released Checkpoint

Deploy the Action-VAE

Evaluate on CALVIN with VPP + TapSampling Verifier

🙏 Acknowledgements

📖 Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages