Skip to content

Conversation

@tamuri
Copy link
Collaborator

@tamuri tamuri commented Aug 15, 2025

Towards implementing #1665, for suspend/resume when using tlo batch-submit with following caveats:

  1. Currently only suspends/resumes exactly the draw+run that was suspended (i.e. cannot use other draw's runs to resume different draws - which is what we want) done
  2. Have to know some internal Azure Batch environment variables to get the path to pickled simulation - want to get rid of that, somehow. done

tamuri added 9 commits August 15, 2025 10:58
- takes the supplied job id and makes it the full path to the saved job
- not perfect
- useful when all draws can be resumed from a a single specified draw
- we may be restoring simulation from a baseline without any parameters
- the draw itself may override parameters
@tamuri tamuri requested a review from Copilot August 26, 2025 10:39
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements suspend/resume functionality for simulations running on Azure Batch, allowing users to pause and later continue long-running simulations. The implementation adds support for specifying suspend/resume parameters through command-line arguments and handles the path resolution for pickled simulation files in the Azure Batch environment.

  • Adds command-line argument parsing to batch-submit to handle suspend/resume parameters
  • Modifies simulation loading logic to support resuming from pickled simulations with flexible path handling
  • Includes a test scenario file to validate suspend/resume functionality

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
src/tlo/scenario.py Enhanced simulation loading logic to support resuming from pickled files with environment variable expansion and flexible path handling
src/tlo/cli.py Added argument parsing to batch-submit command to handle suspend/resume parameters and construct Azure Batch file paths
src/scripts/dev/scenarios/suspend-resume-test.py Added test scenario for validating suspend/resume functionality
Comments suppressed due to low confidence (1)

src/tlo/cli.py:1

  • This code assumes that '--resume-simulation' is always followed by exactly one argument, but doesn't validate that i+1 is within bounds. If '--resume-simulation' is the last argument, accessing scenario_args[i+1] will raise an IndexError.
"""The TLOmodel command-line interface"""

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants