A PyTorch library for multi-modal image translation with diffusion bridges, GANs, and transformer backbones.
pip install pytorch-image-translation-modelspip install -e .With optional dependencies:
# With training extras (accelerate, peft, datasets, tensorboard)
pip install -e ".[training]"
# With metrics extras (torchmetrics, lpips, torch-fidelity, scipy)
pip install -e ".[metrics]"
# Everything
pip install -e ".[all]"Note: PyTorch is listed as a dependency but you may want to install a specific CUDA build first. See PyTorch — Get Started for details.
Examples default to device="cuda". If your environment is CPU-only, replace "cuda" with "cpu".
from PIL import Image
# Baseline method (UNSB)
from src.pipelines.unsb import UNSBPipeline
unsb = UNSBPipeline.from_pretrained(
"path/to/UNSB-ckpt/horse2zebra", # https://huggingface.co/BiliSakura/UNSB-ckpt
subfolder="generator",
scheduler_num_timesteps=5,
scheduler_tau=0.01,
)
unsb.to("cuda")
unsb_out = unsb(source_image=source, output_type="pil")
unsb_out.images[0].save("unsb_output.png")
# Community method (DiffuseIT) - text/image-guided diffusion translation
from examples.community.diffuseit import load_diffuseit_community_pipeline
pipe = load_diffuseit_community_pipeline(
"/path/to/BiliSakura/DiffuseIT-ckpt/imagenet256-uncond",
)
pipe.to("cuda")
out = pipe(
source_image=source,
prompt="Black Leopard",
source="Lion",
use_range_restart=True,
use_noise_aug_all=True,
output_type="pil",
)
out.images[0].save("diffuseit_output.png")
# Community method (E3Diff)
from examples.community.e3diff import E3DiffPipeline
e3diff = E3DiffPipeline.from_pretrained("path/to/E3Diff-ckpt/SEN12")
e3diff.to("cuda")
community_out = e3diff(source_image=source, num_inference_steps=50, output_type="pil")
community_out.images[0].save("e3diff_output.png")All information regarding per-method checkpoint folder conventions required by from_pretrained(...), as well as comprehensive package documentation, is integrated below.
| Doc | Description |
|---|---|
| Checkpoint layouts | Provides detailed checkpoint folder structures, naming conventions, and requirements for each pipeline and the from_pretrained(...) API. |
| Features | Documents supported models, schedulers, pipelines, data types, training methods, and evaluation metrics. |
| Metrics README | One-stop usage for paired/unpaired metrics and custom HuggingFace/local checkpoints. |
| Datasets | Common image-to-image translation datasets (pix2pix, CycleGAN) with paper and download links. |
| Examples | Extended usage patterns and code snippets for pipelines such as I2SB, DDBM, UNSB, and Local Diffusion. |
| Storage Buckets | Sync training checkpoints and TensorBoard logs to Hugging Face Storage Buckets (CUT, pix2pix tutorials). |
| Package structure | Overview of the codebase organization, modules, and directories. |
| Credits | Citations for reference papers and third-party contributions. |
This repository/package is primarily built upon 4th-MAVIC-T by the EarthBridge Team:
- Zheyuan Chen — bilisakura@zju.edu.cn
- Yuanshen Guan — guanys@mail.ustc.edu.cn
MIT