Skip to content

fix(valkey): switchover scripts iterate stale POD_FQDN_LIST after scale-out #2608

@weicao

Description

@weicao

Problem

When a Valkey cluster is scaled out (e.g. 3 -> 4 replicas) and a targeted switchover is then issued to the freshly added replica, the OpsRequest fails with:

WARNING: could not confirm new primary within 300s

even though Sentinel has already promoted the fresh candidate, post/settle topology is correct, and replica-priority has been restored.

Root cause

addons/valkey/scripts/switchover.sh iterates a member list sourced from the container env variable VALKEY_POD_FQDN_LIST, which is rendered into pod environment at pod creation time via componentVarRef.podFQDNs. The container env of an existing pod is not refreshed by KubeBlocks after scale-out.

So when scale-out grows replicas from N to N+1, the old primary's action container still sees the old N-entry list. All iteration points in switchover.sh then miss the freshly added candidate:

  • set_priorities_with_candidate_bias() — does not set replica-priority=1 on the fresh candidate
  • restore_priorities() — does not restore on the fresh candidate
  • wait_for_new_master() — never probes the fresh candidate, so it cannot observe role:master even after Sentinel promotion
  • check_* helpers using the same list

Fix

Introduce pod_fqdns_with_candidate() that unions KB_SWITCHOVER_CANDIDATE_FQDN (passed at action time as expected_fqdn / candidate_fqdn) into the env list. All iteration points are switched to consume the union list.

Validation

  • ShellSpec: 55 examples, 0 failures (scripts-ut-spec/valkey_switchover_spec.sh), with new cases covering stale-list scenarios.
  • Live broader smoke test (143 PASS / 4 FAIL / 2 SKIP, the 4 fails are non-product environment/capability gaps): T09 fresh scale-out targeted switchover one-shot pass, T14 targeted switchover Ops Succeed with candidate becoming primary, T15 sentinel failover normal.
  • Live chaos suite 143 PASS / 0 FAIL / 0 SKIP covering master kill, all-sentinel kill, all 6 pods kill, rapid master kill, restart, scale-out/in during writes, vscale during writes — fix holds under concurrent writes and chaos.

Same-pattern risk in other addons

Redis (addons/redis/scripts/redis-switchover.sh) follows the identical pattern with REDIS_POD_FQDN_LIST and SENTINEL_POD_FQDN_LIST injected via componentVarRef.podFQDNs. The same iteration points (set_redis_priorities, recover_redis_priorities, check_redis_kernel_status, check_switchover_result) carry the same architectural risk. This PR does not modify Redis — left for a follow-up evaluation.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions