feat: inject shutdown-on-sigterm failover by default#268
Conversation
Render `shutdown-on-sigterm failover` in the base cluster config so a primary hands its slots off to a replica during graceful shutdown (node drain, eviction, preemption). This covers out-of-band descheduling that the operator's own rolling failover never observes. Requires Valkey 9.0+. Signed-off-by: melancholictheory <selimvhorst@gmail.com>
|
| Filename | Overview |
|---|---|
| internal/controller/config.go | Adds "shutdown-on-sigterm failover" to getBaseConfig; correctly scoped to ValkeyCluster only (standalone ValkeyNode uses buildManagedConfig which is unaffected), and last-value-wins placement makes it non-overridable by user config. |
| internal/controller/config_test.go | Adds a ContainSubstring assertion for "shutdown-on-sigterm failover" in the rendered config; matches the exact "key value\n" format produced by writeConfigLine. |
| docs/valkeycluster.md | Adds a Graceful shutdown section; default values cited (30s terminationGracePeriodSeconds, 5s cluster-manual-failover-timeout) are accurate per Valkey documentation and give correct margin guidance. |
Sequence Diagram
%%{init: {'theme': 'neutral'}}%%
sequenceDiagram
participant K8s as Kubernetes
participant Primary as Valkey Primary Pod
participant Replica as Valkey Replica Pod
K8s->>Primary: SIGTERM (node drain / eviction / preemption)
Note over Primary: shutdown-on-sigterm failover triggers
Primary->>Replica: CLUSTER FAILOVER (manual)
Note over Replica: Pauses replication stream,<br/>catches up to primary offset
Replica-->>Primary: Failover votes secured
Note over Replica: Promoted to Primary<br/>(within cluster-manual-failover-timeout, default 5s)
Primary->>Primary: Demoted to Replica
Note over Primary: Normal shutdown sequence<br/>(within terminationGracePeriodSeconds, default 30s)
Primary->>K8s: Pod exits cleanly
Note over K8s: SIGKILL never sent<br/>(~25s margin with defaults)
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
sequenceDiagram
participant K8s as Kubernetes
participant Primary as Valkey Primary Pod
participant Replica as Valkey Replica Pod
K8s->>Primary: SIGTERM (node drain / eviction / preemption)
Note over Primary: shutdown-on-sigterm failover triggers
Primary->>Replica: CLUSTER FAILOVER (manual)
Note over Replica: Pauses replication stream,<br/>catches up to primary offset
Replica-->>Primary: Failover votes secured
Note over Replica: Promoted to Primary<br/>(within cluster-manual-failover-timeout, default 5s)
Primary->>Primary: Demoted to Replica
Note over Primary: Normal shutdown sequence<br/>(within terminationGracePeriodSeconds, default 30s)
Primary->>K8s: Pod exits cleanly
Note over K8s: SIGKILL never sent<br/>(~25s margin with defaults)
Reviews (2): Last reviewed commit: "docs: clarify shutdown-on-sigterm grace-..." | Re-trigger Greptile
Spell out the actual defaults: the Kubernetes terminationGracePeriodSeconds of 30s comfortably exceeds the Valkey cluster-manual-failover-timeout default of 5s, so the failover completes out of the box. A grace-period bump is only needed when the failover timeout is raised. Signed-off-by: melancholictheory <selimvhorst@gmail.com>
|
on the grace-period note: the numbers are the other way round. the default cluster-manual-failover-timeout is 5s, not 30s (config.c registers it with a 5000ms default), so with the Kubernetes default terminationGracePeriodSeconds of 30s there is roughly 25s of margin and the failover completes out of the box. updated the docs to spell out both defaults and to say you only need to bump the grace period if you raise the timeout. on whether shutdown-on-sigterm should be overridable: fair distinction, it is more of an operational preference than the correctness settings it sits next to in getBaseConfig. the operator doesn't have an "overridable defaults" layer today, it is either enforced base config or user config. #248 framed it as injected by default, so i kept it enforced, but happy to move it to an overridable layer if you'd rather users could set nosave / default. @jdheyburn that one is a design call for you. |
|
Let's keep it enforced for now, if there is a need to override the defaults then we'll consider that when we get there. |
jdheyburn
left a comment
There was a problem hiding this comment.
LGTM, thank you for updating the docs too!
|
For the e2e test, I think a follow up issue is fine for now. |
Closes #248
Summary
Inject
shutdown-on-sigterm failoverinto the managed Valkey config by default for every ValkeyCluster. On SIGTERM (node drain, eviction, preemption, orkubectl delete pod), a primary fails its slots over to a replica during graceful shutdown, which covers the unplanned termination that the operator's own proactive failover never sees.Implementation
getBaseConfig, alongside the other operator-managed defaults.Acceptance criteria
shutdown-on-sigterm failoverinjected into valkey.conf by defaultI left the E2E test out for now since it needs node-drain orchestration in the e2e harness. Happy to add it here or as a follow-up, whichever you prefer. The unit test asserts the directive is rendered.
The
terminationGracePeriodSecondsconstraint is tracked separately in #260; the docs note the relationship.Testing
shutdown-on-sigterm failoveris in the rendered config.make testandmake lintpass locally.References
Checklist
make testandmake lintinstead)