In this project, a GAN for synthesizing electricity load profiles was developed.
The recommended Python version for running this code is 3.11.
- Clone the repository to your local machine:
git clone https://github.com/MODERATE-Project/Synthetic-Load-Profiles.git- Navigate to the project directory:
cd path/to/repository/GAN-
Create an enviroment:
Conda command for creating a suitable environment (replace myenv with the desired enviroment name):
conda create --name myenv python=3.11-
Activate the enviroment:
Conda command for activating the created enviroment (replace myenv with the selected name):
conda activate myenv- Install required Python packages:
conda install pippip install -r requirements.txtThe input data needs to be provided in form of a CSV file.
The data should roughly cover one year (min 365, max 368 days) of hourly electricity consumption values.
Each column of the CSV file should correspond to a single profile/household.
The first column of the CSV file needs to be an index column (ideally containing timestamps).
Example:
There are two ways to run the code:
A marimo notebook is provided for easily uploading files, creating projects and training models.
The notebook can be accessed by running the following command in the project directory:
marimo run marimo.pyAfter uploading the required Input file(s) and adjusting the settings, the program can be started by pressing the "Start" button below the options menu.
⚠ marimo notebooks only allow file sizes up to 100 MB; for larger input files, the Python script has to be used ⚠
As an alternative to the marimo notebook, a Python script ("run.py") can be used to create projects and train models.
Settings have to be adjusted directly in the script and file paths have to be provided for the input files. Advanced options are not provided here, however, a multitude of underlying parameters can be adjusted in "model" → "GAN_params.py" and "WGAN_params.py".
The following (hyper)parameters can be adjusted:
- model type: Lets you choose between an ordinary GAN and a WGAN model. The WGAN is usually more stable in training but for some usecases the GAN might be more suitable.
- output format: Lets you choose between three possible file formats for the synthetic data: ".npy", ".csv" and ".xlsx".
- log metric: Whether or not to log a composite metric for checking the quality of the results in every epoch. If the log metric is enabled, plot, models and samples are saved for the best performing epoch within the training.
- use Wandb: Whether or not to track certain parameters online.
- epochCount/Number of epochs: Amount of epochs for training.
- save frequency: Defines the frequency of epochs at which results should be saved. If the save frequency is higher than the epochCount, plots, models and synthetic data samples are only saved if the log metric is enabled for the best performing epoch.
- save models: Whether or not to save models at the specified frequency or for best performing epoch if log metric is set to true.
- save plots: Whether or not to save plots at the specified frequency or for best performing epoch if log metric is set to true.
- save samples: Whether or not to save samples at the specified frequency or for best performing epoch if log metric is set to true.
- check for min stats: Epoch after which the model starts checking whether the composite metric improves before deciding whether to plot and save the model state and results. If set to 0 every epoch is checked for better results which might slow down the training process in the first couple of epochs as the model improves almost every epoch in the beginning.
- batch size: The batch size (number of training examples processed together in one forward and backward pass) used for training.
- device: Lets you choose between CPU and GPUs for creating and training a model. Leave the default value to enable automatic GPU detection.
- loss function: Loss function used for the generator and the discriminator in the ordinary GAN. By default, the binary cross entropy loss function (BCELoss) is used. If another loss function is chosen, additional adaptions to the code might be needed.
- lrGen/lrDis: Define the learning rates of the generator and the discriminator.
- betas: By default, AdamOptimizor is used in both the Generator and Discriminator. The beta values define the moving averages.
- genLoopCount: When a model is trained, in the beginning, the discriminator might outperform the generator, leading to no training effect. The generator can be trained multiple times per iteration, defined by this variable.
- dropoutOffEpoch: Defines the epoch after which all dropout layers in the generator are deactivated (might imporve the results). This is only valid for the GAN.
