Welcome to the Vision Transformer Training and Inference repository! This project aims to provide training scripts for various pretrained vision transformers like Mask2Former and SegFormer. Additionally, we will implement different inference pipelines for these models.
Vision transformers have revolutionized the field of computer vision by leveraging the power of transformers for image processing tasks. This repository provides scripts to train and perform inference using state-of-the-art vision transformers like Mask2Former and SegFormer.
- Training Scripts: Easily train vision transformers on your custom datasets.
- Inference Pipelines: Perform inference using trained models.
- Customizable: Modify training parameters and augmentations to suit your needs.
- Preprocessing: Includes image preprocessing and augmentation techniques.
To get started, clone the repository and install the required dependencies:
git clone https://github.com/jonleinena/mask2former.git
cd mask2former
pip install -r requirements.txtTo train a model, use the train.py script. You can specify various parameters such as dataset path, image size, model name, and more.
python3 train.py --dataset_path /path/to/dataset --img_size 1024 1024 --model_name_or_path facebook/mask2former-swin-small-ade-semantic --output_path weights --learning_rate 0.0001 --epochs 10Inference scripts will be added soon. Stay tuned!
- Implement training script for Mask2Former
- Implement training script for SegFormer
- Add inference pipeline for Mask2Former
- Add inference pipeline for SegFormer
- Add support for more vision transformers
- Improve documentation and add examples
It's still early stage so no contributions asked. Maybe will open up space for them in the future.
This project is licensed under the MIT License. See the LICENSE file for details.