Tighten on when THROTTLE decision can be returned #136794

ywangd · 2025-10-20T04:32:20Z

For allocation simulation, allocation decisions should not be THROTTLE for moveShards and balance. Allocate unassigned shards should not see THROTTLE unless there is prior shard assignment. This PR makes this statement explicit by fixing test allocation decider and HasFrozenAllocationDecider as well as adding relevant assertions.

Relates: ES-12955

Other than HasFrozenCacheAllocationDecider, allocation decision should not be THROTTLE unless there is prior shard assignment/movement. This PR makes this statement explicit by adding relevant assertions and fixing test allocation decider. Relates: ES-12955

elasticsearchmachine · 2025-10-20T04:32:44Z

Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination)

ywangd · 2025-10-20T04:36:22Z

Initially, I wanted to ensure that no THROTTLE can be returned at all unless there is prior shard movement. But HasFrozenCacheAllocationDecider is not compatible with the idea. We also have existing code explicitly checking for it. So it seems that we have to live with this exception. As a result, I added assertions and fixed RandomAllocationDecider to make this exception more explicit. I am open to suggestions.

ywangd · 2025-10-20T04:38:10Z

...ain/java/org/elasticsearch/cluster/routing/allocation/allocator/BalancedShardsAllocator.java

+                                // 2. Or, a partial searchable snapshot index due to HasFrozenCacheAllocationDecider
+                                assert allocation.isSimulating() == false || indexMetadata.isPartialSearchableSnapshot()


It would be great if we could check that the THROTTLE decision is actually from the HasFrozenCacheAllocationDecider. But it's not really feasible. Instead, we can check it's a frozen index. Not really ideal, but seems to be the best we can do.

I'm not a big fan of putting knowledge of specific deciders in here, as discussed, I wonder if HasFrozenAllocationDecider is abusing THROTTLE anyway. My reasoning was:

In the canAllocateUnassigned logic we'll wait for the outcome of a lower-weight THROTTLE when we have higher-weight YES so that would mean this decider can defer the allocation of a partial searchable snapshot while a non-frozen node was initialising doesn't it?
I would think we'd return NO until the setting was loaded?

It seems like returning THROTTLED while we wait to find out if the node can host frozen shards could mean we defer allocating the shard to another node that we KNOW can host frozen shards, while waiting to find out if an initialising node is a frozen node.

we defer allocating the shard to another node that we KNOW can host frozen shards, while waiting to find out if an initialising node is a frozen node.

That is true. I also wonder whether it is intentional since otherwise the shards will be assigned to an existing node only to relocate later momentarily. That said, after reading through code around HasFrozenAllocationDecider, I believe it is OK to return NO since it should not change the beahviour. That is because we always call allocateExistingUnassignedShards before calling into the allocator. This method uses SearchableSnapshotAllocator to allocate unassigne shards first. It also checks cache status and reject allocation when cache state is still being fetched. It in turns places the shard in the ignored list and BalancedShardAllocator ignores them as well. In short, the allocation decision for paritially mounted shards are made earlier in a different place so that it should not matter for the decider to return NO.

ywangd · 2025-10-20T04:39:08Z

...g/elasticsearch/cluster/routing/allocation/decider/ConcurrentRebalanceAllocationDecider.java

        if (allocation.isSimulating() && relocatingShards >= 2) {
+            // This branch should no longer run after https://github.com/elastic/elasticsearch/pull/134786
+            assert false : "allocation simulation should have returned earlier and not hit throttling";
            // BalancedShardAllocator is prone to perform unnecessary moves when cluster_concurrent_rebalance is set to high values (>2).
            // (See https://github.com/elastic/elasticsearch/issues/87279)
            // Above allocator is used in DesiredBalanceComputer. Since we do not move actual shard data during calculation


We can potentially remove this branch in future. For now, adding the assertion seems to be a good first step.

I think if we can't confidently get rid of this code, then we aren't really sure we're not throttling with the changes. This should at least have a planned follow up action that is more certain.

I believe it is safe to remove. My usage of "potential" may be confusing. It does not imply less confidence, was just bad wording.

Adding assert false ensures it does not happen in tests and is equivalent to deprecation. I'd prefer to have it first to run through CIs for sometime before finally removing it. It is also because the existing comment has an issue linked to it #87279 and it'd be great to spend some time separately to see whether it can be closed while removing the branch. I raised ES-13282 specifically for deleting this branch.

nicktindall

All this change LGTM pending the question of whether we can make HasFrozenCacheAllocationDecider return NO while we're initialising the node so that we can remove the special knowledge of it from here?

…before-movement

DiannaHohensee

The purpose of this task is to gain confidence that the THROTTLE decision is not encountered, but right now this patch doesn't provide that confidence. This isn't in the hot-path, so I don't think we need to rush the change. I'd rather sort out the Decider before proceeding.

...sticsearch/xpack/searchablesnapshots/allocation/decider/HasFrozenCacheAllocationDecider.java

DiannaHohensee · 2025-10-20T22:55:59Z

...ain/java/org/elasticsearch/cluster/routing/allocation/allocator/BalancedShardsAllocator.java

-                                // happen in tests because we have decider like RandomAllocationDecider that can randomly return THROTTLE
-                                // when there is no existing relocation.
+                                // Check routingNodes.getRelocatingShardCount() > 0 in case the first relocation is a THROTTLE.
                                shardBalanced = true;


Can we simplify the code by making this

assert .... if (completeEarlyOnShardAssignmentChange && routingNodes.getRelocatingShardCount() > 0) { return true; } continue;

Not really. We still want to track shardBalanced when it is not simulation.

DiannaHohensee · 2025-10-20T22:59:28Z

...g/elasticsearch/cluster/routing/allocation/decider/ConcurrentRebalanceAllocationDecider.java

        if (allocation.isSimulating() && relocatingShards >= 2) {
+            // This branch should no longer run after https://github.com/elastic/elasticsearch/pull/134786
+            assert false : "allocation simulation should have returned earlier and not hit throttling";
            // BalancedShardAllocator is prone to perform unnecessary moves when cluster_concurrent_rebalance is set to high values (>2).
            // (See https://github.com/elastic/elasticsearch/issues/87279)
            // Above allocator is used in DesiredBalanceComputer. Since we do not move actual shard data during calculation


I think if we can't confidently get rid of this code, then we aren't really sure we're not throttling with the changes. This should at least have a planned follow up action that is more certain.

…before-movement

ywangd · 2025-10-21T07:58:15Z

@nicktindall @DiannaHohensee The PR is ready for another look. In summary, I have taken Nick's suggestion for returning NO from HasFrozenAllocationDecider.

I spent quite some time reading through relevant code for searchable snapshot indices allocation. I believe it is safe to do so.

Unassigned searchable snapshot shards are allocated elsewhere, see also this comment.
For relocations, throttle is effectively a NO and unlike unassigned shards, there is no urgency for them.

ywangd · 2025-10-21T08:01:51Z

...sticsearch/xpack/searchablesnapshots/allocation/decider/HasFrozenCacheAllocationDecider.java

-            default -> STILL_FETCHING;
+            case FETCHING -> allocation.isSimulating() ? NO_STILL_FETCHING : STILL_FETCHING;
+            case UNKNOWN -> UNKNOWN_NODE;


HasFrozenAllocationDecider does not differentiate between fetching node state (FETCHING) and no-such-node (UNKNOWN). The later can only happen when a node leaves the cluster which seems more justified to say NO. In any case, they do not really impact SearchableSnapshotAllocator which is what matters for allocating unassigned searchable snapshot shards.
Btw, I can also undo the split if it is considered unrelated.

ywangd · 2025-10-21T08:04:47Z

...main/java/org/elasticsearch/cluster/routing/allocation/allocator/DesiredBalanceComputer.java

            if (hasChanges == false && info.lastAllocationStatus() == UnassignedInfo.AllocationStatus.DECIDERS_THROTTLED) {
+                // Unassigned ignored shards must be based on the provided set of ignoredShards
+                assert ignoredShards.contains(discardAllocationStatus(shard))
+                    || ignoredShards.stream().filter(ShardRouting::primary).anyMatch(primary -> primary.shardId().equals(shard.shardId()))
+                    : "ignored shard "
+                        + shard
+                        + " unexpectedly has THROTTLE status and no counterpart in the provided ignoredShards set "
+                        + ignoredShards;
                // Simulation could not progress due to missing information in any of the deciders.
                // Currently, this could happen if `HasFrozenCacheAllocationDecider` is still fetching the data.
                // Progress would be made after the followup reroute call.


The comment about HasFrozenCacheAllocationDecider had me concerned for some time. Adding this assertion to make it clear that throttled status can only come from DesiredBalanceInput#ignoredShards, i.e. they are not produced by BalancedShardAllocator.

ywangd requested review from DiannaHohensee and nicktindall October 20, 2025 04:32

ywangd added >non-issue :Distributed Coordination/Allocation All issues relating to the decision making around placing a shard (both master logic & on the nodes) v9.3.0 labels Oct 20, 2025

elasticsearchmachine added the Team:Distributed Coordination Meta label for Distributed Coordination team label Oct 20, 2025

ywangd commented Oct 20, 2025

View reviewed changes

nicktindall approved these changes Oct 20, 2025

View reviewed changes

ywangd added 6 commits October 20, 2025 16:07

Remove assertions from ThrottlingAllocationDecider

e039dc1

asserton

4024c40

HasFrozenCacheAllocationDecider returns NO for assigned shards

3a8664d

expand changes

4af235f

tweak assertions

7d19eac

Merge remote-tracking branch 'origin/main' into ES-12955-no-throttle-…

548ea9d

…before-movement

DiannaHohensee reviewed Oct 20, 2025

View reviewed changes

ywangd marked this pull request as draft October 20, 2025 23:12

ywangd added 6 commits October 21, 2025 14:03

Merge remote-tracking branch 'origin/main' into ES-12955-no-throttle-…

c7f50d8

…before-movement

revert unnecessary changes

1f04641

improve HasFrozenCacheAllocationDecider

4025f16

tweak

d36a3d9

fix test

69227ae

fix another test

f7eba77

ywangd marked this pull request as ready for review October 21, 2025 07:17

ywangd changed the title ~~Tighten when THROTTLE decision can be returned~~ Tighten on when THROTTLE decision can be returned Oct 21, 2025

ywangd commented Oct 21, 2025

View reviewed changes

ywangd requested review from DiannaHohensee and nicktindall October 21, 2025 08:05

		// 2. Or, a partial searchable snapshot index due to HasFrozenCacheAllocationDecider
		assert allocation.isSimulating() == false \|\| indexMetadata.isPartialSearchableSnapshot()

Tighten on when THROTTLE decision can be returned #136794

Are you sure you want to change the base?

Tighten on when THROTTLE decision can be returned #136794

Conversation

ywangd commented Oct 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elasticsearchmachine commented Oct 20, 2025

Uh oh!

ywangd commented Oct 20, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nicktindall Oct 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nicktindall left a comment

Choose a reason for hiding this comment

Uh oh!

DiannaHohensee left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ywangd commented Oct 21, 2025

Uh oh!

ywangd Oct 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ywangd commented Oct 20, 2025 •

edited

Loading

nicktindall Oct 20, 2025 •

edited

Loading

ywangd Oct 21, 2025 •

edited

Loading