LoMOE

This is the official PyTorch implementation for the ACMMM 24 paper: "LoMOE: Localized Multi-Object Editing via Multi-Diffusion". All the published data is available on our project page.

Requirements

This code was tested with python=3.9, pytorch=2.0.1 and torchvision=0.15.2. Please follow the instructions here to install PyTorch and TorchVision dependencies. Installing both PyTorch and TorchVision with CUDA support is strongly recommended.

Create a conda environment with the following dependencies:

conda create -n lomoe python=3.9
conda activate lomoe
conda install pytorch==2.0.1 torchvision==0.15.2 pytorch-cuda=11.7 -c pytorch -c nvidia
pip install accelerate==0.20.3 diffusers==0.12.1 einops==0.7.0 ipython transformers==4.26.1 salesforce-lavis==1.0.2

Getting Started

Usage

Start by downloading the SOE and MOE datasets from our project page to ./benchmark/data.

To generate the prompt, inverted latent, and store intermediate latents for an image, first run the inversion script located at ./lomoe/invert/inversion.py. Then, to apply edits, use ./lomoe/edit/main.py. A sample image and corresponding masks for single and multi-object edit operations are provided in ./lomoe/sample/.

Inversion

The invert/inversion.py script takes the following arguments

--input_image : Path to the image.
--results_folder : Path to store the prompt, inverted and intermediate latents.

CUDA_VISIBLE_DEVICES=0 python invert/inversion.py  \
        --input_image "sample/single/init_image.jpg" \
        --results_folder "invert/output/single"

CUDA_VISIBLE_DEVICES=0 python invert/inversion.py  \
        --input_image "sample/multi/init_image.png" \
        --results_folder "invert/output/multi"

Edit

The edit/main.py script takes the following arguments

--mask_paths : Path to the object mask.
--num_fgmasks : Number of foreground masks (defaults to 1).
--bg_prompt : Path to the background prompt (we use the prompt generated by inversion.py).
--bg_negative : Path to the background negative prompt (we use the prompt generated by inversion.py).
--fg_prompts : Edit prompt corresponding to the masks.
--fg_negative : The foreground negative prompt. (We use "artifacts, blurry, smooth texture, bad quality, distortions, unrealistic, distorted image")
--W : Output image width.
--H : Output image height.
--seed : The seed to initialize random number generators (defaults to 0).
--sd_version : The stable diffusion version to be used (use the same as that in inversion.py).
--steps : The number of diffusion timesteps (use the same as that in inversion.py).
--ca_coef : Cross attention preservation loss coefficient (defaults to 1.0).
--seg_coef : Background loss coefficient (defaults to 1.75).
--bootstrapping : Value of the bootstrap parameter (defaults to 20).
--latent : Path to the inverted latent produced by inversion.py.
--latent_list : Path to the latent list produced by inversion.py.
--rec_path : Path to save the reconstructed input image.
--edit_path : Path to save the edited image.
--save_path : Path to save the merged reconstructed and edited image.

CUDA_VISIBLE_DEVICES=0 python edit/main.py \
  --mask_paths "sample/single/mask_1.jpg" \
  --bg_prompt "invert/output/single/prompt/init_image.txt" \
  --bg_negative "invert/output/single/prompt/init_image.txt" \
  --fg_negative "artifacts, blurry, smooth texture, bad quality, distortions, unrealistic, distorted image" \
  --H 512 \
  --W 512 \
  --bootstrapping 20 \
  --latent 'invert/output/single/inversion/init_image.pt' \
  --latent_list 'invert/output/single/latentlist/init_image.pt' \
  --rec_path 'results/single/1_reconstruction.png' \
  --edit_path 'results/single/2_edit.png' \
  --fg_prompts "a red dog collar" \
  --seed 1234 \
  --save_path 'results/single/3_merged.png'

CUDA_VISIBLE_DEVICES=0 python edit/main.py \
  --mask_paths "sample/multi/mask_1.png" "sample/multi/mask_2.png" \
  --bg_prompt "invert/output/multi/prompt/init_image.txt" \
  --bg_negative "invert/output/multi/prompt/init_image.txt" \
  --fg_negative "artifacts, blurry, smooth texture, bad quality, distortions, unrealistic, distorted image" "artifacts, blurry, smooth texture, bad quality, distortions, unrealistic, distorted image" \
  --H 512 \
  --W 512 \
  --bootstrapping 20 \
  --latent 'invert/output/multi/inversion/init_image.pt' \
  --latent_list 'invert/output/multi/latentlist/init_image.pt' \
  --rec_path 'results/multi/1_reconstruction.png' \
  --edit_path 'results/multi/2_edit.png' \
  --fg_prompts "a crochet bird" "an origami bird" \
  --num_fgmasks 2 \
  --seed 1234 \
  --save_path 'results/multi/3_merged.png'

Results

Metrics

To compute the classical and neural metrics, use compute_metrics.py in ./benchmark/metrics/{SOE/MOE}. This includes the SRC and TGT Clip Scores, BG LPIPS, BG PSNR, BG MSE, BG SSIM and the Structural Distance. The compute_aesthetic.py in ./benchmark/metrics/{SOE/MOE} computes the aesthetic metrics including HPS, IR and Aesthetic Score. This file also requires additional dependencies, namely HPSv2 and ImageReward.

NOTE: The compute_metrics.py and compute_aesthetic.py scripts expect a folder containing edits for all images in the dataset. Please modify the code to run them on a smaller subset or single images.

CUDA_VISIBLE_DEVICES=0 python compute_metrics.py --folder_name PATH_TO_SAVED_EDITS
CUDA_VISIBLE_DEVICES=0 python compute_aesthetic.py --folder_name PATH_TO_SAVED_EDITS

Citation

If you use LoMOE or find this work useful for your research, please use the following BibTeX entry.

@InProceedings{Chakrabarty_2024_ACMMM,
  author    = {Chakrabarty$^*$, Goirik and Chandrasekar$^*$, Aditya and Hebbalaguppe, Ramya and Prathosh, AP},
  title     = {LoMOE: Localized Multi-Object Editing via Multi-Diffusion},
  booktitle = {ACM Multimedia 2024},
  month     = {October},
  year      = {2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
assets		assets
benchmark/metrics		benchmark/metrics
src		src
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LoMOE

Requirements

Getting Started

Usage

Inversion

Edit

Results

Metrics

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

LoMOE

Requirements

Getting Started

Usage

Inversion

Edit

Results

Metrics

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages