[epic] ValkeyCluster scheduling & placement (shard spread, primary balance, zones, failover)

### Purpose

Umbrella for **where `ValkeyCluster` pods run** and how that ties to **HA and failover**: shard-level placement, **even distribution of primaries across workers**, zone-aware behaviour, defaults, surfacing scheduling failures, and integration with **proactive failover** replica selection.

Use this issue as the **parent / tracking hub**; link implementation issues here as they are opened.

### Existing GitHub issues (anchor this epic to these)

| Track | Issue | Role |
|-------|-------|------|
| Shard-aware placement **design** | [#146](https://github.com/valkey-io/valkey-operator/issues/146) | Open design: API (e.g. `topologySpreadConstraints` vs operator policy), defaults, failure surfacing, interaction with user `affinity` / future zone work. |
| Primaries balanced across workers | [#247](https://github.com/valkey-io/valkey-operator/issues/247) | Cluster-level: avoid too many primaries on one worker; rebalance after node events; **complementary to #146** per that issue. |
| Proactive failover replica selection | [#249](https://github.com/valkey-io/valkey-operator/issues/249) | Replica choice for rolls / write-pause; natural tie-in for **zone-aware** primary preference later. |

### Workstreams (file or link sub-issues when ready)

1. **Close / summarize #146** — record a written decision (Option + defaults + how strict failures surface) before treating any implementation spec as final.
2. **Shard-aware placement implementation** — labels, optional default soft spread, `spec.placement` surface, conditions, docs, E2E — *blocked on #146 outcome*.
3. **Primary balance across workers (#247)** — scheduling and/or controlled failovers to rebalance; coordinate with (2) so policies do not fight.
4. **Zone-affinity for primaries** (future issue) — preferred AZ, integration with #249 for `CLUSTER FAILOVER` target selection, optional fail-back — *depends on #146 not painting the API into a corner; often ships after (2)*.

### Cross-cutting

- PDBs, single-shard / tiny-cluster behaviour, CI-friendly defaults.
- Discussion [#228](https://github.com/valkey-io/valkey-operator/discussions/228) for zone / failback context where relevant.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[epic] ValkeyCluster scheduling & placement (shard spread, primary balance, zones, failover) #265

Purpose

Existing GitHub issues (anchor this epic to these)

Workstreams (file or link sub-issues when ready)

Cross-cutting

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Track	Issue	Role
Shard-aware placement design	#146	Open design: API (e.g. `topologySpreadConstraints` vs operator policy), defaults, failure surfacing, interaction with user `affinity` / future zone work.
Primaries balanced across workers	#247	Cluster-level: avoid too many primaries on one worker; rebalance after node events; complementary to #146 per that issue.
Proactive failover replica selection	#249	Replica choice for rolls / write-pause; natural tie-in for zone-aware primary preference later.

Uh oh!

[epic] ValkeyCluster scheduling & placement (shard spread, primary balance, zones, failover) #265

Description

Purpose

Existing GitHub issues (anchor this epic to these)

Workstreams (file or link sub-issues when ready)

Cross-cutting

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions