Skip to content

Conversation

@MadsJJ
Copy link
Member

@MadsJJ MadsJJ commented Nov 14, 2025

Issue

Proposed changes

  • Create yolo training script
  • Create slurm job?

Testing

  • Trains model locally with the dataset in config file, but not tested slurm script

…with different names and ids

(cherry picked from commit 13e5bd5)
…t downloads. This allows for simple gitignore for all datasets

(cherry picked from commit c49673c)
(cherry picked from commit be47cc6)
(cherry picked from commit 14a3951)
@MadsJJ MadsJJ requested review from Copilot and kluge7 November 14, 2025 19:39
@MadsJJ MadsJJ self-assigned this Nov 14, 2025
@MadsJJ MadsJJ added the enhancement New feature or request label Nov 14, 2025
@MadsJJ MadsJJ linked an issue Nov 14, 2025 that may be closed by this pull request
Copilot finished reviewing on behalf of MadsJJ November 14, 2025 19:44
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces a YOLO training script with Roboflow integration to enable automated model training on custom datasets. The implementation includes configuration management, GPU support, and SLURM cluster compatibility for high-performance computing environments.

  • Adds a complete YOLO training pipeline with Roboflow dataset integration
  • Provides flexible configuration through YAML files
  • Includes SLURM job script for cluster-based training

Reviewed Changes

Copilot reviewed 4 out of 6 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
yolo_roboflow_training/train.py Main training script implementing YOLO model training with Roboflow dataset downloads, device selection, and model export functionality
yolo_roboflow_training/requirements.txt Python dependencies for the training environment, though contains numerous nonexistent package versions
yolo_roboflow_training/config.yaml Training configuration file specifying model type, hyperparameters, and export formats
yolo_roboflow_training/Job.slurm SLURM batch script for running training on GPU cluster with environment setup
requirements.txt Root-level requirements file with critical encoding corruption making it unparsable
.gitignore Updates to exclude training outputs, datasets, and Pipenv files

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

RESULTS_DIR = config.get("results_dir")
EXPORT_FORMATS = config.get("export_formats")
DATASET_FORMAT = config.get("dataset_format")

Copy link

Copilot AI Nov 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing input validation for required configuration parameters. If model_type, epochs, patience, imgsz, batch, results_dir, export_formats, or dataset_format are not provided in the config file, they will be None, causing the training to fail with unclear errors. Add validation to ensure these required parameters are present.

Suggested change
# Validate required config parameters
required_params = {
"model_type": MODEL_TYPE,
"epochs": EPOCHS,
"patience": PATIENCE,
"imgsz": IMGSZ,
"batch": BATCH,
"results_dir": RESULTS_DIR,
"export_formats": EXPORT_FORMATS,
"dataset_format": DATASET_FORMAT,
}
missing_params = [k for k, v in required_params.items() if v is None]
if missing_params:
raise ValueError(f"Missing required config parameters: {', '.join(missing_params)}")

Copilot uses AI. Check for mistakes.
if torch.cuda.is_available():
device = 0
print(f"Using CUDA GPU: {torch.cuda.get_device_name(0)}")
elif torch.backends.mps.is_built():
Copy link

Copilot AI Nov 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The MPS device availability check is incorrect. torch.backends.mps.is_built() only checks if MPS support is compiled into PyTorch, not if it's actually available. Use torch.backends.mps.is_available() instead to properly detect Apple Silicon GPU availability.

Suggested change
elif torch.backends.mps.is_built():
elif torch.backends.mps.is_available():

Copilot uses AI. Check for mistakes.
roboflow_data_dir.mkdir(exist_ok=True)
rf = Roboflow(api_key=ROBOFLOW_API_KEY)
project = rf.workspace().project(PROJECT_ID)
versions = project.versions()
Copy link

Copilot AI Nov 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable versions is assigned but never used. It should either be removed or utilized in the code if dataset version validation is intended.

Suggested change
versions = project.versions()

Copilot uses AI. Check for mistakes.
#pip install pyyaml # used to read the configuration file
#pip install blobfile # install blobfile to download the dataset
#pip install kagglehub # install kagglehub to download the dataset
pip install --force-reinstall torch -U
Copy link

Copilot AI Nov 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inefficient package installation order. Installing packages from requirements.txt (line 34) followed by force-reinstalling torch (line 38) will reinstall torch and its dependencies, potentially causing version conflicts and wasting time. The requirements.txt already specifies torch versions, so the force-reinstall is likely unnecessary. If a specific torch build is needed, consider removing torch from requirements.txt or using --no-deps with the force-reinstall.

Suggested change
pip install --force-reinstall torch -U
pip install --force-reinstall torch -U --no-deps

Copilot uses AI. Check for mistakes.
print(f"Validation metrics: {metrics}")


# Export the trained model
Copy link

Copilot AI Nov 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing error handling for the model export step. If EXPORT_FORMATS is None or contains invalid formats, the export loop will fail. Add validation to ensure EXPORT_FORMATS is a non-empty list before attempting export.

Suggested change
# Export the trained model
# Export the trained model
if not isinstance(EXPORT_FORMATS, list) or not EXPORT_FORMATS or not all(isinstance(fmt, str) for fmt in EXPORT_FORMATS):
raise ValueError("EXPORT_FORMATS must be a non-empty list of strings in config.yaml.")

Copilot uses AI. Check for mistakes.
pip cache purge

# makes sure that the pip is up to date
python3 -m pip install --upgrade pip
Copy link

Copilot AI Nov 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Redundant pip upgrade. The pip upgrade is performed twice: once at line 24 and again at line 31 after activating the virtual environment. Remove the first upgrade at line 24 as the one inside the virtual environment (line 31) is sufficient.

Suggested change
python3 -m pip install --upgrade pip

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Create a YOLO training script

2 participants