Skip to content

showlab/Edit2Perceive

Repository files navigation

Edit2Perceive: Image Editing Diffusion Models Are Strong Dense Perceivers

Yiqing Shi, Yiren Song, Mike Zheng Shou

arXiv HuggingFace License

Teaser

Abstract: We present Edit2Perceive, a unified framework for diverse dense prediction tasks. We demonstrate that image editing diffusion models (specifically FLUX.1 Kontext), rather than text-to-image generators, provide a better inductive bias for deterministic dense perception. Our model achieves state-of-the-art performance across Zero-shot Monocular Depth Estimation, Surface Normal Estimation, and Interactive Matting, supporting efficient single-step deterministic inference.


📰 News

Dec 19, 2025 Inference Code Release, with model weights

Dec 23, 2025 Training Code Release

🛠️ Installation

1. Environment Setup

git clone [https://github.com/showlab/Edit2Perceive.git](https://github.com/showlab/Edit2Perceive.git)
cd Edit2Perceive

# Create environment (Python 3.12 recommended)
conda create -n e2p python=3.12 
conda activate e2p

# Install dependencies
pip install -r requirements.txt

2. Download Models

Step 1: Download Base Model (FLUX.1-Kontext)

# If huggingface is not available, use mirror
export HF_ENDPOINT=https://hf-mirror.com

hf download black-forest-labs/FLUX.1-Kontext-dev --exclude "transformer/" --local-dir ./FLUX.1-Kontext-dev

Step 2: Download Edit2Perceive Weights Place the models in the ckpts/ directory.

  • Option A: LoRA Weights (Small size, fast validation)
hf download Seq2Tri/Edit2Perceive --local-dir ckpts/ --include "*lora.safetensors"
  • Option B: Full Model Weights (Best quality)
hf download Seq2Tri/Edit2Perceive --local-dir ckpts/ --exclude "*lora.safetensors"

Required Directory Structure:

Edit2Perceive/
├── ckpts/
│   ├── depth.safetensors
│   ├── depth_lora.safetensors
│   ├── normal.safetensors
│   ├── normal_lora.safetensors
│   ├── matting.safetensors
│   └── matting_lora.safetensors
├── FLUX.1-Kontext-dev/
└── ...

🚀 Inference

Web UI (Gradio)

python app.py # Visit http://localhost:7860

Command Line

Run inference on images without the UI:

python inference.py

📊 Evaluation

1. Prepare Datasets

Please download the evaluation datasets from the links below:

2. Run Evaluation

Before run evaluation, chagne the gt_path in utils/eval_multiple_datasets.py to your dataset path. And then

# Depth
python utils/eval_multiple_datasets.py --task depth --state_dict ckpts/depth.safetensors

# Normal
python utils/eval_multiple_datasets.py --task normal --state_dict ckpts/normal.safetensors

# Matting
python utils/eval_multiple_datasets.py --task matting --state_dict ckpts/matting.safetensors

🏋️ Training

1. Dataset Preparation

Set up the datasets for your target task.

Datasets for Depth Estimation (Hypersim & Virtual KITTI 2)
  1. Hypersim Dataset
  • Download:
python preprocess/depth/download_hypersim.py --contains color.hdf5 depth_meters.hdf5 position.hdf5 render_entity_id.hdf5 normal_cam.hdf5 normal_world.hdf5 --silent
  • Preprocess:
python preprocess/depth/preprocess_hypersim.py --dataset_dir /path/to/hypersim --output_dir /path/to/output
  1. Virtual KITTI 2 Dataset
  • Download from here.
Datasets for Surface Normal (Hypersim, InteriorVerse & Sintel)
  1. Hypersim Dataset
  • Download:
python preprocess/depth/download_hypersim.py --contains color.hdf5 position.hdf5 normal_cam.hdf5 normal_world.hdf5 render_entity_id.hdf5 --silent
  • Preprocess:
python preprocess/normal/preprocess_hypersim_normals.py --dataset_dir /path/to/hypersim --output_dir /path/to/output
  1. InteriorVerse Dataset
python preprocess/normal/preprocess_interiorverse_normals.py --dataset_dir /path/to/interiorverse --output_dir /path/to/output
  1. Sintel Dataset
Datasets for Interactive Matting (Comp-1k, Distinctions, AM-2k, COCO)

2. Configure Paths

Update the --dataset_base_path in the scripts located in scripts/*.sh. Note: The dataset order is strict and must not be changed.

# Example for Depth (Hypersim + VKITTI2)
--dataset_base_path "/path/to/Hypersim/processed_depth,/path/to/vkitti2"

# Example for Normal (Hypersim + InteriorVerse + Sintel)
--dataset_base_path "/path/to/Hypersim/processed_normal,/path/to/InteriorVerse/processed_normal,/path/to/sintel"

# Example for Matting
--dataset_base_path "/path/to/composition-1k,/path/to/Distinctions-646,/path/to/AM-2k,/path/to/COCO-Matte"

3. Run Training

Execute the corresponding script for LoRA or Full Fine-tuning, more details of training refer to training_args_instructions.md

# Depth Estimation
bash scripts/Kontext_depth_lora.sh
bash scripts/Kontext_depth.sh

# Surface Normal Estimation
bash scripts/Kontext_normal_lora.sh
bash scripts/Kontext_normal.sh

# Interactive Matting
bash scripts/Kontext_matting_lora.sh
bash scripts/Kontext_matting.

📝 Cite

If you find our work useful in your research, please consider citing our paper:

@misc{shi2025edit2perceive,
      title={Edit2Perceive: Image Editing Diffusion Models Are Strong Dense Perceivers}, 
      author={Yiqing Shi and Yiren Song and Mike Zheng Shou},
      year={2025},
      eprint={2511.18673},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={[https://arxiv.org/abs/2511.18673](https://arxiv.org/abs/2511.18673)}, 
}

📧 Contact

If you have any questions, please feel free to contact Yiqing Shi at [email protected].

About

Official Implementation of Edit2Perceive

Resources

Stars

Watchers

Forks