Create a PR on population automation runs#340
Conversation
There was a problem hiding this comment.
Pull request overview
This PR updates the scheduled population-data automation so that, when data changes are generated, the workflow pushes them to a branch and opens a pull request rather than pushing directly to the default branch.
Changes:
- Update the scheduled workflow to create a branch, push changes, and open a PR via
gh pr create. - Add an “aggressive cleanup” step and adjust the MHCT DB Docker build/run logic and readiness checks.
- Remove a redundant/duplicative Balack’s Cove “High Tide / Riptide” series entry from the population JS config.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
data/pop-js/balacks-cove.js |
Removes a series entry (likely duplicative with existing postProcess behavior). |
.github/workflows/population-scheduled.yml |
Switches scheduled automation from direct push to branch+PR creation; adds cleanup and changes DB container handling. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| git checkout -b scheduled/population-$(date +%Y%m%d) | ||
| git commit -m "Automated population data update" || exit 0 |
There was a problem hiding this comment.
The branch name only includes the date, so reruns on the same day (or a manually re-triggered run after a partial failure) can fail at git checkout -b / git push because the branch already exists. Consider including a unique suffix (e.g., run id / timestamp) or using a branch-reset approach (-B) with an appropriate push strategy.
| git push | ||
| git push origin HEAD | ||
|
|
||
| gh pr create --fill || echo "PR already exists" |
There was a problem hiding this comment.
gh pr create --fill || echo "PR already exists" will also hide other failures (auth problems, validation errors, rate limits), potentially making the workflow look successful when it didn’t open a PR. Consider explicitly checking whether a PR already exists for the head branch and only suppressing that specific case; otherwise fail the step so problems are visible.
| uses: actions/checkout@v5 | ||
|
|
||
| - name: Set up Node.js | ||
| uses: actions/setup-node@v4 | ||
| uses: actions/setup-node@v5 |
There was a problem hiding this comment.
This workflow uses actions/checkout@v5 / actions/setup-node@v5 while the only other workflow in this repo pins major v4 for both. If v5 is intentional, consider aligning versions repo-wide; otherwise, sticking to the same major version helps avoid unexpected behavioral differences.
| sudo rm -rf /usr/local/share/powershell | ||
|
|
||
| # Remove CodeQL and other toolcaches | ||
| sudo rm -rf /opt/hostedtoolcache |
There was a problem hiding this comment.
The cleanup step deletes /opt/hostedtoolcache, which is where GitHub-hosted runners typically keep the Node runtime used to execute JavaScript-based actions (e.g., actions/checkout, actions/setup-node). Removing it at the start of the job can cause subsequent uses: steps to fail. Consider avoiding deletion of /opt/hostedtoolcache (or at least preserving the Node toolcache), or move disk-reclamation to after all uses: actions have run.
| sudo rm -rf /opt/hostedtoolcache | |
| # Note: /opt/hostedtoolcache is preserved to keep Node runtimes for GitHub Actions. |
| - name: Run MHCT db docker container | ||
| run: | | ||
| docker run -d --name mhct-db -p 3306:3306 mhct-db-docker | ||
| docker run -d --name mhct-db -v mhct-db-data:/var/lib/mysql -p 3306:3306 mhct-db |
There was a problem hiding this comment.
docker run now uses a named volume mhct-db-data, but the workflow cleanup only removes the container. This can leave the volume consuming disk for the remainder of the job (and can contribute to disk pressure, which this workflow is trying to mitigate). Either remove the named volume in the cleanup step or avoid using a named volume here if persistence isn’t required.
| @@ -72,9 +130,13 @@ jobs: | |||
|
|
|||
| git add data/pop-csv/* | |||
There was a problem hiding this comment.
git add data/pop-csv/* won’t stage deletions (and can miss files if the directory structure changes). To ensure population updates correctly capture removals/renames as well as additions/modifications, prefer staging the directory with an option that includes deletions (e.g., adding the path with -A).
| git add data/pop-csv/* | |
| git add -A data/pop-csv |
No description provided.