A blazing-fast, production-ready Rust utility to convert LabelMe JSON annotations into a strictly formatted YOLO dataset.
This tool is built for machine learning pipelines that require high performance and reproducibility. It seamlessly handles both bounding boxes and circular annotations, automatically splits your data into training and validation sets, and generates the required data.yaml file for training with frameworks like Ultralytics YOLOv8/v11.
- Blazing Fast: Utilizes
rayonfor multi-threaded, parallel processing of JSON files and I/O operations. - Multi-Shape Support: Converts standard
rectanglepolygons and automatically calculates bounding boxes forcircleannotations (center + point on radius). - Reproducible ML Splits: Supports fixed RNG seeding for deterministic train/val splits. If you rerun the pipeline, your datasets remain exactly the same.
- Background Image Handling: Automatically includes unannotated images as background images (generating empty
.txtfiles) to help reduce false positives during YOLO training. - Safe File I/O: Uses atomic writes to prevent corrupted label files if the process is interrupted.
Ensure you have Rust and Cargo installed.
Clone the repository and build the release version:
git clone https://github.com/yourusername/labelme-to-yolo-rust.git
cd labelme-to-yolo-rust
cargo build --releaseThe compiled binary will be located at target/release/labelme_to_yolo (or .exe on Windows).
You can run the tool directly via Cargo or use the compiled binary.
# Using Cargo
cargo run --release -- --input-dir ./raw_data --output-dir ./yolo_dataset --seed 42
# Using the compiled binary directly
./target/release/labelme_to_yolo -i ./raw_data -o ./yolo_dataset -t 0.85| Argument | Short | Default | Description |
|---|---|---|---|
--input-dir |
-i |
Required | Directory containing your raw images (.jpg, .png) and LabelMe .json files. |
--output-dir |
-o |
Required | Directory where the formatted YOLO dataset will be saved. |
--max-size |
-m |
540.0 |
Maximum allowed side length for bounding boxes generated from circle shapes. |
--train-ratio |
-t |
0.8 |
Ratio of data to use for the training set (e.g., 0.8 = 80% train, 20% val). |
--seed |
None |
Optional fixed seed for the Random Number Generator to ensure reproducible dataset splits. |
Input Directory Expectation:
raw_data/
├── image1.jpg
├── image1.json
├── image2.png
├── image2.json
└── bg_image.jpg <-- Unannotated images are fine! They become background data.
Output Directory Generated:
yolo_dataset/
├── data.yaml <-- Automatically generated with class mappings
├── images/
│ ├── train/ <-- 80% of images
│ └── val/ <-- 20% of images
└── labels/
├── train/ <-- YOLO format .txt files
└── val/ <-- YOLO format .txt files
Distributed under the MIT License. See LICENSE for more information.