-
Notifications
You must be signed in to change notification settings - Fork 16
Allow suspend/resume of simulations on Batch runs #1668
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
… to work with AZ_* env vars.
- takes the supplied job id and makes it the full path to the saved job - not perfect
- Okay after simulation is setup
- useful when all draws can be resumed from a a single specified draw
- we may be restoring simulation from a baseline without any parameters - the draw itself may override parameters
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR implements suspend/resume functionality for simulations running on Azure Batch, allowing users to pause and later continue long-running simulations. The implementation adds support for specifying suspend/resume parameters through command-line arguments and handles the path resolution for pickled simulation files in the Azure Batch environment.
- Adds command-line argument parsing to
batch-submitto handle suspend/resume parameters - Modifies simulation loading logic to support resuming from pickled simulations with flexible path handling
- Includes a test scenario file to validate suspend/resume functionality
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| src/tlo/scenario.py | Enhanced simulation loading logic to support resuming from pickled files with environment variable expansion and flexible path handling |
| src/tlo/cli.py | Added argument parsing to batch-submit command to handle suspend/resume parameters and construct Azure Batch file paths |
| src/scripts/dev/scenarios/suspend-resume-test.py | Added test scenario for validating suspend/resume functionality |
Comments suppressed due to low confidence (1)
src/tlo/cli.py:1
- This code assumes that '--resume-simulation' is always followed by exactly one argument, but doesn't validate that
i+1is within bounds. If '--resume-simulation' is the last argument, accessingscenario_args[i+1]will raise an IndexError.
"""The TLOmodel command-line interface"""
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
Towards implementing #1665, for suspend/resume when using
tlo batch-submitwith following caveats:Currently only suspends/resumes exactly the draw+run that was suspended (i.e. cannot use other draw's runs to resume different draws - which is what we want)doneHave to know some internal Azure Batch environment variables to get the path to pickled simulation - want to get rid of that, somehow.done