Skip to content

yangkunl/unity-video-editing

Repository files navigation

Unity in Diversity: Video Editing via Gradient-Latent Purification

CVPR 2025 GitHub License

πŸš€ Introduction

This repository contains the official PyTorch implementation of our paper "Unity in Diversity: Video Editing via Gradient-Latent Purification". Our method enables precise video editing by leveraging gradient-based latent purification techniques to achieve consistent and high-quality results across diverse video content.

πŸ“‹ Table of Contents

✨ Features

  • Gradient-Latent Purification: Novel approach for consistent video editing
  • High-Quality Results: Maintains temporal coherence across frames
  • Flexible Configuration: Easy-to-use YAML configuration system
  • Multiple Input Formats: Support for various video and image formats
  • GPU Accelerated: Optimized for NVIDIA GPUs with CUDA support

πŸ› οΈ Installation

Prerequisites

  • Python 3.9.19
  • CUDA-compatible NVIDIA GPU
  • CUDA 12.1 or compatible version

Environment Setup

  1. Create and activate a conda environment:

    conda create -n ulg python=3.9.19
    conda activate ulg
  2. Install PyTorch with CUDA support:

    pip install torch==2.2.2+cu121 torchvision --extra-index-url https://download.pytorch.org/whl/cu121
  3. Install other dependencies:

    pip install -r requirements.txt

For Chinese Users

If you're in mainland China, use the Hugging Face mirror:

pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple/

πŸš€ Quick Start

Basic Usage

Run video editing with default settings:

python inference.py

For users in China (using HF mirror):

HF_ENDPOINT=https://hf-mirror.com python inference.py

Custom Configuration

Use a specific configuration file:

python inference.py --config configs/your_config.yaml

πŸ“– Usage

Input Preparation

  1. Video Input: Place your input video in the data/ directory
  2. Configuration: Modify the configuration file in configs/ directory
  3. DDIM Inversion: Run preprocessing if needed:
    bash inversion.sh

Configuration Files

  • configs/ddim_inversion_png.yaml: Configuration for DDIM inversion
  • configs/dog_robotic.yaml: Example configuration for dog-to-robot transformation

Key Parameters

  • n_frames: Number of frames to process
  • image_size: Target image resolution [height, width]
  • n_steps: Number of diffusion steps
  • cfg_txt: Text guidance scale
  • cfg_img: Image guidance scale

🎯 Examples

Example 1: Style Transfer

python inference.py --config configs/dog_robotic.yaml

Example 2: Custom Editing

python inference.py \
  --video_path "data/your_video.mp4" \
  --prompt "your editing prompt" \
  --output_dir "results/your_result"

πŸ“ Project Structure

β”œβ”€β”€ configs/                    # Configuration files
β”‚   β”œβ”€β”€ ddim_inversion_png.yaml
β”‚   └── dog_robotic.yaml
β”œβ”€β”€ data/                       # Input data directory
β”œβ”€β”€ ddim_inversion/            # DDIM inversion results
β”œβ”€β”€ dds/                       # DDS related modules
β”œβ”€β”€ results/                   # Output results
β”œβ”€β”€ unity_pipeline/            # Core pipeline implementation
β”‚   β”œβ”€β”€ pipelines/
β”‚   └── utils/
β”œβ”€β”€ inference.py               # Main inference script
β”œβ”€β”€ preprocess.py             # Data preprocessing
β”œβ”€β”€ utils.py                  # Utility functions
└── requirements.txt          # Dependencies

βš™οΈ Configuration

The configuration system uses YAML files. Key settings include:

# General settings
seed: 8888
device: "cuda:0"

# Data settings
image_size: [512, 512]
n_frames: 8

# DDIM inversion settings
inverse_config:
  n_steps: 100
  cfg_txt: 1.0
  cfg_img: 1.0

πŸ”§ Troubleshooting

Common Issues

  1. CUDA out of memory: Reduce image_size or n_frames
  2. Missing dependencies: Ensure all packages in requirements.txt are installed
  3. Slow download: Use mirror sources if you're in China

Performance Tips

  • Use smaller image sizes for faster processing
  • Adjust n_steps based on quality requirements
  • Enable mixed precision training if supported

πŸ“š Citation

If you find this work useful in your research, please consider citing:

@inproceedings{gao2025unity,
  title={Unity in Diversity: Video Editing via Gradient-Latent Purification},
  author={Gao, Junyu and Yang, Kunlin and Yao, Xuan and Hu, Yufan},
  booktitle={Proceedings of the Computer Vision and Pattern Recognition Conference},
  pages={23401--23411},
  year={2025}
}

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

🀝 Contributing

We welcome contributions! Please feel free to submit a Pull Request.

πŸ“§ Contact

For questions or issues, please:


Note: This code is released for research purposes. Please ensure proper attribution when using this work.

About

Unity in Diversity: Video Editing via Gradient-Latent Purification - CVPR 2025

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published