-
Notifications
You must be signed in to change notification settings - Fork 12
Description
Is there an existing issue for this?
- I have searched the existing issues
Problem Statement
Use of KTF outside CI may leak GKE clusters due to humans forgetting to delete them after usage. To avoid unnecessary cloud spend, we'd like a means to have them default to ephemeral unless marked otherwise.
Proposed Solution
- Add a GKE cluster label indicating a deletion date.
- Default this label to three days after creation.
- Provide a flag allowing users to override the default.
- Add a scheduled task to delete clusters past their expiration.
Additional information
Elsewhere we run cleanup jobs as part of CI in projects that use KTF. There isn't really a great CI location for this scheduled deletion, since this is intended for clusters created outside CI. I don't think it makes sense to run this in the KTF repo itself.
GCP does provide https://cloud.google.com/scheduler with 3 free jobs a month ($0.10/job/month after). We should use it as a project-agnostic deletion method.
We may want a "never" option but it's simpler to just use dates as label values, and we should probably discourage indefinite lifetime clusters anyway. Setting the duration to 9999 or something similarly ridiculous or manually removing the label should be sufficient if you really, really want to avoid the cleanup job.
Acceptance Criteria
- KTF's GKE cluster creation utility provides a CLI flag that takes a number of days as input and sets an expiration date label.
- The CLI flag defaults to 3 days.
- We provide an example GKE Cloud Scheduler configuration that deletes clusters past expiration.
- We use the scheduler job on our GKE project.