The all-in-one, modular image dataset utility for ML, with a focus on HQ/LQ image pairs for SISR and general computer vision. CLI-first, highly extensible, and packed with advanced tools for dataset curation, analysis, transformation, and validation.
Dataset Forge is a Python CLI tool for managing, analyzing, and transforming image datasetsβespecially high/low quality pairs for super-resolution and machine learning.
It streamlines dataset curation, analysis, transformation, and validation with an intuitive, extensible interface.
- Clean and organize image datasets (HQ/LQ pairs for super-resolution)
- Analyze dataset quality and generate reports
- Process, augment, and transform images
- Modular, CLI-first, and highly extensible
- Robust parallel and GPU-accelerated processing
- π Global Command System: Context-aware help and instant quit from any menu
- π Comprehensive Help: Menu-specific documentation and navigation assistance
- See all features
git clone https://github.com/Courage-1984/Dataset-Forge.git
cd Dataset-Forge- See Getting Started for full Instructions. and then see Special Installation Instructions for further instructions.
- Getting Started
- Features
- Usage Guide
- Troubleshooting
- Contributing
- Development Standards - Menu system patterns and coding standards
- MCP Integration Guide - Enhanced development with AI assistance
- Full Documentation Index
- Python: 3.12+ (see requirements.txt)
- OS: Windows (primary)
- CUDA/cuDNN: For GPU acceleration (see Special Installation)
- Thanks Kim2091 β€οΈ for helpful-scripts
- Thanks umzi2 β€οΈ for WTP Dataset Destroyer & PepeDP
- Thanks the-database β€οΈ for traiNNer-redux
- Thanks Phhofm β€οΈ for sisr
- PepeDP
- WTP Dataset Destroyer
- traiNNer-redux
- Getnative
- resdet
- ExifTool
- Oxipng
- Steghide
- zsteg
- IQA-PyTorch / py-iqa
- imagededup
- ffmpeg | ffmpeg builds
- GetFnative
- getfscaler
This project is licensed under the Creative Commons CC-BY-SA-4.0. See LICENSE for details.
A simplified overview of Dataset Forge's modular architecture:
flowchart TD
A["π CLI Entrypoint"] --> B["π Main Menu"]
B --> C["ποΈ Menu System"]
C --> D["β‘ Actions Layer"]
D --> E["π οΈ Core Utilities"]
D --> F["π§ DPID Implementations"]
subgraph "Core Components"
G["π Dataset Management"]
H["π Analysis & Validation"]
I["β¨ Image Processing"]
J["π οΈ Utilities & Tools"]
end
subgraph "Supporting Systems"
K["πΎ Memory Management"]
L["β‘ Parallel Processing"]
M["π¨ UI/CLI System"]
N["π§ External Libraries"]
end
C --> G
C --> H
C --> I
C --> J
D --> K
D --> L
D --> M
D --> N
For the full roadmap and advanced usage, see the Documentation Home.
