This repository contains a simplified implementation of the algorithm proposed in our paper:
"Hybrid Learning for Cold-Start-Aware Microservice Scheduling in Dynamic Edge Environments"
The method is based on GRU-enhanced Soft Actor-Critic (SAC) and features a two-stage training pipeline:
- Offline Imitation Learning using expert trajectory data.
- Online Reinforcement Learning using the SAC algorithm.
Install required packages via pip or conda:
pip install torch numpyPython built-in packages used:
randomcsvpickleargparsedatetime
You can set the environment parameters in:
config.pyKey configurable items:
EDGE_NODE_NUM: Number of edge nodesmax_tasks,min_tasks: Number of microservice tasks per time slotnode_cpu_freq_max: Maximum CPU frequency of nodesepoch_imitation: Number of imitation learning epochsepoch: Number of reinforcement learning episodese1,e2: Weight coefficients for energy and delay in expert policy
If you haven’t generated expert trajectories yet, run:
python offline_data_collection.pyThis will create an offline data file like:
offline_data_seq_n15_cpu650_task20-5_ep10.pklRun the full training pipeline (both imitation + SAC):
python gru_sac_behavior_clone.pyThe training consists of two phases:
- Trains the GRU-based actor to mimic expert policy
- Uses offline trajectory data saved in
.pklfiles
- Trains actor and critic using online interaction with the environment
- Learns to balance scheduling delay and energy consumption
- Trained actor model from imitation learning is saved as:
imitation_gru_sac_ep{epoch_imitation}_{timestamp}.pth- Training logs (CSV format) are saved under
log/, e.g.:
log/gruBC_SAC_ep10_alpha1_n15_cpu650_task20-5_20250610.csvEach line records:
Episode, Reward, Total Time, Total Energy, Completion Ratio, Download Time, Actor Loss, Critic Loss, Fail Times