-
Notifications
You must be signed in to change notification settings - Fork 0
30 create a yolo training script #31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
(cherry picked from commit 56a894e)
…es to requirements (cherry picked from commit 712692d)
(cherry picked from commit 0575b16)
(cherry picked from commit 8ad50bc)
(cherry picked from commit 74fb5bf)
(cherry picked from commit 3319e8c)
(cherry picked from commit 21e82d5)
(cherry picked from commit 13ac688)
…with different names and ids (cherry picked from commit 13e5bd5)
…t downloads. This allows for simple gitignore for all datasets (cherry picked from commit c49673c)
(cherry picked from commit 14a3951)
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces a YOLO training script with Roboflow integration to enable automated model training on custom datasets. The implementation includes configuration management, GPU support, and SLURM cluster compatibility for high-performance computing environments.
- Adds a complete YOLO training pipeline with Roboflow dataset integration
- Provides flexible configuration through YAML files
- Includes SLURM job script for cluster-based training
Reviewed Changes
Copilot reviewed 4 out of 6 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| yolo_roboflow_training/train.py | Main training script implementing YOLO model training with Roboflow dataset downloads, device selection, and model export functionality |
| yolo_roboflow_training/requirements.txt | Python dependencies for the training environment, though contains numerous nonexistent package versions |
| yolo_roboflow_training/config.yaml | Training configuration file specifying model type, hyperparameters, and export formats |
| yolo_roboflow_training/Job.slurm | SLURM batch script for running training on GPU cluster with environment setup |
| requirements.txt | Root-level requirements file with critical encoding corruption making it unparsable |
| .gitignore | Updates to exclude training outputs, datasets, and Pipenv files |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| RESULTS_DIR = config.get("results_dir") | ||
| EXPORT_FORMATS = config.get("export_formats") | ||
| DATASET_FORMAT = config.get("dataset_format") | ||
|
|
Copilot
AI
Nov 14, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing input validation for required configuration parameters. If model_type, epochs, patience, imgsz, batch, results_dir, export_formats, or dataset_format are not provided in the config file, they will be None, causing the training to fail with unclear errors. Add validation to ensure these required parameters are present.
| # Validate required config parameters | |
| required_params = { | |
| "model_type": MODEL_TYPE, | |
| "epochs": EPOCHS, | |
| "patience": PATIENCE, | |
| "imgsz": IMGSZ, | |
| "batch": BATCH, | |
| "results_dir": RESULTS_DIR, | |
| "export_formats": EXPORT_FORMATS, | |
| "dataset_format": DATASET_FORMAT, | |
| } | |
| missing_params = [k for k, v in required_params.items() if v is None] | |
| if missing_params: | |
| raise ValueError(f"Missing required config parameters: {', '.join(missing_params)}") |
| if torch.cuda.is_available(): | ||
| device = 0 | ||
| print(f"Using CUDA GPU: {torch.cuda.get_device_name(0)}") | ||
| elif torch.backends.mps.is_built(): |
Copilot
AI
Nov 14, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The MPS device availability check is incorrect. torch.backends.mps.is_built() only checks if MPS support is compiled into PyTorch, not if it's actually available. Use torch.backends.mps.is_available() instead to properly detect Apple Silicon GPU availability.
| elif torch.backends.mps.is_built(): | |
| elif torch.backends.mps.is_available(): |
| roboflow_data_dir.mkdir(exist_ok=True) | ||
| rf = Roboflow(api_key=ROBOFLOW_API_KEY) | ||
| project = rf.workspace().project(PROJECT_ID) | ||
| versions = project.versions() |
Copilot
AI
Nov 14, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The variable versions is assigned but never used. It should either be removed or utilized in the code if dataset version validation is intended.
| versions = project.versions() |
| #pip install pyyaml # used to read the configuration file | ||
| #pip install blobfile # install blobfile to download the dataset | ||
| #pip install kagglehub # install kagglehub to download the dataset | ||
| pip install --force-reinstall torch -U |
Copilot
AI
Nov 14, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inefficient package installation order. Installing packages from requirements.txt (line 34) followed by force-reinstalling torch (line 38) will reinstall torch and its dependencies, potentially causing version conflicts and wasting time. The requirements.txt already specifies torch versions, so the force-reinstall is likely unnecessary. If a specific torch build is needed, consider removing torch from requirements.txt or using --no-deps with the force-reinstall.
| pip install --force-reinstall torch -U | |
| pip install --force-reinstall torch -U --no-deps |
| print(f"Validation metrics: {metrics}") | ||
|
|
||
|
|
||
| # Export the trained model |
Copilot
AI
Nov 14, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing error handling for the model export step. If EXPORT_FORMATS is None or contains invalid formats, the export loop will fail. Add validation to ensure EXPORT_FORMATS is a non-empty list before attempting export.
| # Export the trained model | |
| # Export the trained model | |
| if not isinstance(EXPORT_FORMATS, list) or not EXPORT_FORMATS or not all(isinstance(fmt, str) for fmt in EXPORT_FORMATS): | |
| raise ValueError("EXPORT_FORMATS must be a non-empty list of strings in config.yaml.") |
| pip cache purge | ||
|
|
||
| # makes sure that the pip is up to date | ||
| python3 -m pip install --upgrade pip |
Copilot
AI
Nov 14, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Redundant pip upgrade. The pip upgrade is performed twice: once at line 24 and again at line 31 after activating the virtual environment. Remove the first upgrade at line 24 as the one inside the virtual environment (line 31) is sufficient.
| python3 -m pip install --upgrade pip |
Issue
Proposed changes
Testing