Skip to content

Make GKE clusters ephemeral by default #730

@rainest

Description

@rainest

Is there an existing issue for this?

  • I have searched the existing issues

Problem Statement

Use of KTF outside CI may leak GKE clusters due to humans forgetting to delete them after usage. To avoid unnecessary cloud spend, we'd like a means to have them default to ephemeral unless marked otherwise.

Proposed Solution

  • Add a GKE cluster label indicating a deletion date.
  • Default this label to three days after creation.
  • Provide a flag allowing users to override the default.
  • Add a scheduled task to delete clusters past their expiration.

Additional information

Elsewhere we run cleanup jobs as part of CI in projects that use KTF. There isn't really a great CI location for this scheduled deletion, since this is intended for clusters created outside CI. I don't think it makes sense to run this in the KTF repo itself.

GCP does provide https://cloud.google.com/scheduler with 3 free jobs a month ($0.10/job/month after). We should use it as a project-agnostic deletion method.

We may want a "never" option but it's simpler to just use dates as label values, and we should probably discourage indefinite lifetime clusters anyway. Setting the duration to 9999 or something similarly ridiculous or manually removing the label should be sufficient if you really, really want to avoid the cleanup job.

Acceptance Criteria

  • KTF's GKE cluster creation utility provides a CLI flag that takes a number of days as input and sets an expiration date label.
  • The CLI flag defaults to 3 days.
  • We provide an example GKE Cloud Scheduler configuration that deletes clusters past expiration.
  • We use the scheduler job on our GKE project.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions