Edit2Perceive: Image Editing Diffusion Models Are Strong Dense Perceivers

Yiqing Shi, Yiren Song, Mike Zheng Shou

Abstract: We present Edit2Perceive, a unified framework for diverse dense prediction tasks. We demonstrate that image editing diffusion models (specifically FLUX.1 Kontext), rather than text-to-image generators, provide a better inductive bias for deterministic dense perception. Our model achieves state-of-the-art performance across Zero-shot Monocular Depth Estimation, Surface Normal Estimation, and Interactive Matting, supporting efficient single-step deterministic inference.

📰 News

Dec 19, 2025 Inference Code Release, with model weights

Dec 23, 2025 Training Code Release

🛠️ Installation

1. Environment Setup

git clone [https://github.com/showlab/Edit2Perceive.git](https://github.com/showlab/Edit2Perceive.git)
cd Edit2Perceive

# Create environment (Python 3.12 recommended)
conda create -n e2p python=3.12 
conda activate e2p

# Install dependencies
pip install -r requirements.txt

2. Download Models

Step 1: Download Base Model (FLUX.1-Kontext)

# If huggingface is not available, use mirror
export HF_ENDPOINT=https://hf-mirror.com

hf download black-forest-labs/FLUX.1-Kontext-dev --exclude "transformer/" --local-dir ./FLUX.1-Kontext-dev

Step 2: Download Edit2Perceive Weights Place the models in the ckpts/ directory.

Option A: LoRA Weights (Small size, fast validation)

hf download Seq2Tri/Edit2Perceive --local-dir ckpts/ --include "*lora.safetensors"

Option B: Full Model Weights (Best quality)

hf download Seq2Tri/Edit2Perceive --local-dir ckpts/ --exclude "*lora.safetensors"

Required Directory Structure:

Edit2Perceive/
├── ckpts/
│   ├── depth.safetensors
│   ├── depth_lora.safetensors
│   ├── normal.safetensors
│   ├── normal_lora.safetensors
│   ├── matting.safetensors
│   └── matting_lora.safetensors
├── FLUX.1-Kontext-dev/
└── ...

🚀 Inference

Web UI (Gradio)

python app.py # Visit http://localhost:7860

Command Line

Run inference on images without the UI:

python inference.py

📊 Evaluation

1. Prepare Datasets

Please download the evaluation datasets from the links below:

Depth: Evaluation Dataset
Normal: Evaluation Dataset
Matting: P3M-10k, AM-2k, AIM-500

2. Run Evaluation

Before run evaluation, chagne the gt_path in utils/eval_multiple_datasets.py to your dataset path. And then

# Depth
python utils/eval_multiple_datasets.py --task depth --state_dict ckpts/depth.safetensors

# Normal
python utils/eval_multiple_datasets.py --task normal --state_dict ckpts/normal.safetensors

# Matting
python utils/eval_multiple_datasets.py --task matting --state_dict ckpts/matting.safetensors

🏋️ Training

1. Dataset Preparation

Set up the datasets for your target task.

Datasets for Depth Estimation (Hypersim & Virtual KITTI 2)

Hypersim Dataset

Download:

python preprocess/depth/download_hypersim.py --contains color.hdf5 depth_meters.hdf5 position.hdf5 render_entity_id.hdf5 normal_cam.hdf5 normal_world.hdf5 --silent

Preprocess:

python preprocess/depth/preprocess_hypersim.py --dataset_dir /path/to/hypersim --output_dir /path/to/output

Virtual KITTI 2 Dataset

Download from here.

Datasets for Surface Normal (Hypersim, InteriorVerse & Sintel)

Hypersim Dataset

Download:

python preprocess/depth/download_hypersim.py --contains color.hdf5 position.hdf5 normal_cam.hdf5 normal_world.hdf5 render_entity_id.hdf5 --silent

Preprocess:

python preprocess/normal/preprocess_hypersim_normals.py --dataset_dir /path/to/hypersim --output_dir /path/to/output

InteriorVerse Dataset

Refer to download instructions.
Preprocess:

python preprocess/normal/preprocess_interiorverse_normals.py --dataset_dir /path/to/interiorverse --output_dir /path/to/output

Sintel Dataset

Download via Link 1 or Link 2.

Datasets for Interactive Matting (Comp-1k, Distinctions, AM-2k, COCO)

Sources: Composition-1k, Distinctions-646, AM-2k, COCO-Matting.
Special Instructions:
Distinctions-646: You must generate the merged dataset yourself using backgrounds sampled from VOC2012. Refer to preprocess/matting/preprocess_distinctions_646.py.
COCO-Matting: Download trimap_alpha and train2017.zip. Mannually split the trimap_alpha (concatenated in width) into single alpha channels.

2. Configure Paths

Update the --dataset_base_path in the scripts located in scripts/*.sh. Note: The dataset order is strict and must not be changed.

# Example for Depth (Hypersim + VKITTI2)
--dataset_base_path "/path/to/Hypersim/processed_depth,/path/to/vkitti2"

# Example for Normal (Hypersim + InteriorVerse + Sintel)
--dataset_base_path "/path/to/Hypersim/processed_normal,/path/to/InteriorVerse/processed_normal,/path/to/sintel"

# Example for Matting
--dataset_base_path "/path/to/composition-1k,/path/to/Distinctions-646,/path/to/AM-2k,/path/to/COCO-Matte"

3. Run Training

Execute the corresponding script for LoRA or Full Fine-tuning, more details of training refer to training_args_instructions.md

# Depth Estimation
bash scripts/Kontext_depth_lora.sh
bash scripts/Kontext_depth.sh

# Surface Normal Estimation
bash scripts/Kontext_normal_lora.sh
bash scripts/Kontext_normal.sh

# Interactive Matting
bash scripts/Kontext_matting_lora.sh
bash scripts/Kontext_matting.

📝 Cite

If you find our work useful in your research, please consider citing our paper:

@misc{shi2025edit2perceive,
      title={Edit2Perceive: Image Editing Diffusion Models Are Strong Dense Perceivers}, 
      author={Yiqing Shi and Yiren Song and Mike Zheng Shou},
      year={2025},
      eprint={2511.18673},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={[https://arxiv.org/abs/2511.18673](https://arxiv.org/abs/2511.18673)}, 
}

📧 Contact

If you have any questions, please feel free to contact Yiqing Shi at [email protected].

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
configs		configs
data_split		data_split
lora		lora
models		models
pipelines		pipelines
preprocess		preprocess
prompters		prompters
samples		samples
scripts		scripts
utils		utils
vram_management		vram_management
.gitignore		.gitignore
README.md		README.md
app.py		app.py
inference.py		inference.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Edit2Perceive: Image Editing Diffusion Models Are Strong Dense Perceivers

📰 News

🛠️ Installation

1. Environment Setup

2. Download Models

🚀 Inference

Web UI (Gradio)

Command Line

📊 Evaluation

1. Prepare Datasets

2. Run Evaluation

🏋️ Training

1. Dataset Preparation

2. Configure Paths

3. Run Training

📝 Cite

📧 Contact

About

Uh oh!

Languages

showlab/Edit2Perceive

Folders and files

Latest commit

History

Repository files navigation

Edit2Perceive: Image Editing Diffusion Models Are Strong Dense Perceivers

📰 News

🛠️ Installation

1. Environment Setup

2. Download Models

🚀 Inference

Web UI (Gradio)

Command Line

📊 Evaluation

1. Prepare Datasets

2. Run Evaluation

🏋️ Training

1. Dataset Preparation

2. Configure Paths

3. Run Training

📝 Cite

📧 Contact

About

Resources

Uh oh!

Stars

Watchers

Forks

Languages