fix(ci): repair GCP resource cleanup so it deletes old disks#10767
fix(ci): repair GCP resource cleanup so it deletes old disks#10767gustavovalverde wants to merge 3 commits into
Conversation
The daily cleanup deleted nothing: it passed each disk to gcloud as a single quoted "<name> --zone=<zone>" argument, which gcloud rejected as underspecified. The swallowed failure let orphaned per-commit test disks pile up until the regional SSD_TOTAL_GB quota filled and integration tests failed at instance creation. Pass the name and --zone/--region as separate arguments, add --quiet, and delete only unattached disks. Scope the cleanup to the dev project.
dd936aa to
5c156ae
Compare
There was a problem hiding this comment.
Pull request overview
This PR fixes the daily "Delete GCP resources" workflow, which was silently deleting nothing. The previous scripts used sed to fold a resource's name and its --zone/--region flag into a single string, then passed that string as one shell‑quoted argument (e.g. "<name> --zone=<zone>"); gcloud rejected it as underspecified. As a result, orphaned per‑commit test disks accumulated until the dev project's regional SSD quota filled and integration tests failed with Quota 'SSD_TOTAL_GB' exceeded. The fix passes the name and location flag as separate arguments, adds --quiet so non‑interactive runs don't abort at the confirmation prompt, restricts disk deletion to unattached disks (-users:*), and scopes the job to the dev environment.
Changes:
- Rewrote disk/instance deletion loops to parse tab‑separated
value(...)output withwhile IFS=$'\t' readand pass--zone/--regionas separate args. - Added
--quietto disk, instance, template, and cache‑image deletes; limited disk cleanup to unattached disks via the-users:*filter. - Replaced the
[dev, prod]cleanup matrix with a singleenvironment: dev, narrowing cleanup to the dev project.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
.github/workflows/scripts/gcp-delete-old-disks.sh |
Refactors to a delete_disks helper; separates name/location args, adds --quiet, and filters to unattached disks. |
.github/workflows/scripts/gcp-delete-old-instances.sh |
Replaces IFS-juggling loop with while read; passes --zone separately and adds --quiet. |
.github/workflows/scripts/gcp-delete-old-templates.sh |
Adds --quiet to the template delete. |
.github/workflows/scripts/gcp-delete-old-cache-images.sh |
Adds --quiet to the image delete. |
.github/workflows/zfnd-delete-gcp-resources.yml |
Removes the [dev, prod] matrix in favor of environment: dev and drops now‑obsolete comments. |
The shell refactors are correct: the here-string loops run in the current shell (so exit and error handling work), IFS=$'\t' is scoped to the read builtin, empty input is guarded with the [[ -z "${NAME}" ]] check, and the -users:* filter is valid gcloud syntax for unattached disks. The one item worth a maintainer's attention is that this also drops prod cleanup, which is a behavioral change bundled with the bugfix.
The "zebrad-" disk filter matched zebrad-cache-* chain-state disks, which the -users:* guard only protects while attached. Exclude them so an unattached cached-state disk is never deleted.
|
@oxarbitrage ready |
|
Queued — the merge queue status continues in this comment ↓. |
Merge Queue Status
This pull request spent 15 minutes 40 seconds in the queue, including 5 minutes 16 seconds running CI. Waiting for
All conditions
ReasonThe merge conditions cannot be satisfied due to failing checks
HintYou may have to fix your CI before adding the pull request to the queue again. Tick the box to put this pull request back in the merge queue (same as
|
Motivation
The daily "Delete GCP resources" job deleted nothing: it passed each disk to gcloud as a single quoted
"<name> --zone=<zone>"argument, which gcloud rejected as underspecified. Orphaned per-commit test disks then piled up until the dev project's regional SSD quota filled and integration tests failed at instance creation withQuota 'SSD_TOTAL_GB' exceeded.Solution
Pass the disk name and its
--zone/--regionflag as separate arguments, add--quiet, and delete only unattached disks. The instance, template, and cache-image cleanups get the same--quietfix. Scope the cleanup to the dev project.AI Disclosure
PR Checklist