Skip to content

[epic] ValkeyCluster scheduling & placement (shard spread, primary balance, zones, failover) #265

Description

@jdheyburn

Purpose

Umbrella for where ValkeyCluster pods run and how that ties to HA and failover: shard-level placement, even distribution of primaries across workers, zone-aware behaviour, defaults, surfacing scheduling failures, and integration with proactive failover replica selection.

Use this issue as the parent / tracking hub; link implementation issues here as they are opened.

Existing GitHub issues (anchor this epic to these)

Track Issue Role
Shard-aware placement design #146 Open design: API (e.g. topologySpreadConstraints vs operator policy), defaults, failure surfacing, interaction with user affinity / future zone work.
Primaries balanced across workers #247 Cluster-level: avoid too many primaries on one worker; rebalance after node events; complementary to #146 per that issue.
Proactive failover replica selection #249 Replica choice for rolls / write-pause; natural tie-in for zone-aware primary preference later.

Workstreams (file or link sub-issues when ready)

  1. Close / summarize [Design] Shard-aware placement policy for primaries and replicas #146 — record a written decision (Option + defaults + how strict failures surface) before treating any implementation spec as final.
  2. Shard-aware placement implementation — labels, optional default soft spread, spec.placement surface, conditions, docs, E2E — blocked on [Design] Shard-aware placement policy for primaries and replicas #146 outcome.
  3. Primary balance across workers (Primaries balanced across workers #247) — scheduling and/or controlled failovers to rebalance; coordinate with (2) so policies do not fight.
  4. Zone-affinity for primaries (future issue) — preferred AZ, integration with feat: fail over to the highest-offset replica #249 for CLUSTER FAILOVER target selection, optional fail-back — depends on [Design] Shard-aware placement policy for primaries and replicas #146 not painting the API into a corner; often ships after (2).

Cross-cutting

  • PDBs, single-shard / tiny-cluster behaviour, CI-friendly defaults.
  • Discussion #228 for zone / failback context where relevant.

Metadata

Metadata

Assignees

No one assigned

    Labels

    clusterRelates to ValkeyCluster onlyenhancementNew feature or requestschedulingDetermines where the pods get placed onto infrastructure topologies

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions