Problem
The operator's proactive failover path covers planned pod rolls initiated by the operator. It does not cover unplanned termination — node drains, evictions, spot preemption, or kubectl delete pod — where the operator never has a chance to run the failover before the pod disappears.
Valkey 9.0 introduced the shutdown-on-sigterm failover config directive (valkey/valkey#1091). When set on a primary node, Valkey issues a graceful CLUSTER FAILOVER to a replica on SIGTERM before the process exits. This is the node-local safety net for unplanned termination.
Since the operator targets Valkey 9.0+, this directive should be injected by default and enforced the terminationGracePeriodSeconds constraint needed for it to work reliably (see companion issue).
Proposed design
- Inject
shutdown-on-sigterm failover into the managed Valkey config by default for all ValkeyCluster resources
- No double-failover risk: the directive only fires if the node is still primary at SIGTERM; it is a no-op on replicas
Acceptance criteria
References
Problem
The operator's proactive failover path covers planned pod rolls initiated by the operator. It does not cover unplanned termination — node drains, evictions, spot preemption, or
kubectl delete pod— where the operator never has a chance to run the failover before the pod disappears.Valkey 9.0 introduced the
shutdown-on-sigterm failoverconfig directive (valkey/valkey#1091). When set on a primary node, Valkey issues a gracefulCLUSTER FAILOVERto a replica on SIGTERM before the process exits. This is the node-local safety net for unplanned termination.Since the operator targets Valkey 9.0+, this directive should be injected by default and enforced the
terminationGracePeriodSecondsconstraint needed for it to work reliably (see companion issue).Proposed design
shutdown-on-sigterm failoverinto the managed Valkey config by default for allValkeyClusterresourcesAcceptance criteria
shutdown-on-sigterm failoverdirective injected intovalkey.confby defaultReferences