Skip to content

[feat] Support shutdown-on-sigterm failover config #248

Description

@jdheyburn

Problem

The operator's proactive failover path covers planned pod rolls initiated by the operator. It does not cover unplanned termination — node drains, evictions, spot preemption, or kubectl delete pod — where the operator never has a chance to run the failover before the pod disappears.

Valkey 9.0 introduced the shutdown-on-sigterm failover config directive (valkey/valkey#1091). When set on a primary node, Valkey issues a graceful CLUSTER FAILOVER to a replica on SIGTERM before the process exits. This is the node-local safety net for unplanned termination.

Since the operator targets Valkey 9.0+, this directive should be injected by default and enforced the terminationGracePeriodSeconds constraint needed for it to work reliably (see companion issue).

Proposed design

  • Inject shutdown-on-sigterm failover into the managed Valkey config by default for all ValkeyCluster resources
  • No double-failover risk: the directive only fires if the node is still primary at SIGTERM; it is a no-op on replicas

Acceptance criteria

  • shutdown-on-sigterm failover directive injected into valkey.conf by default
  • Works alongside the operator's proactive failover without double-firing
  • E2E test: drain the node running the primary; verify a replica is promoted before the pod exits

References

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions