-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Tighten on when THROTTLE decision can be returned #136794
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Other than HasFrozenCacheAllocationDecider, allocation decision should not be THROTTLE unless there is prior shard assignment/movement. This PR makes this statement explicit by adding relevant assertions and fixing test allocation decider. Relates: ES-12955
Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination) |
Initially, I wanted to ensure that no THROTTLE can be returned at all unless there is prior shard movement. But HasFrozenCacheAllocationDecider is not compatible with the idea. We also have existing code explicitly checking for it. So it seems that we have to live with this exception. As a result, I added assertions and fixed |
// 2. Or, a partial searchable snapshot index due to HasFrozenCacheAllocationDecider | ||
assert allocation.isSimulating() == false || indexMetadata.isPartialSearchableSnapshot() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be great if we could check that the THROTTLE decision is actually from the HasFrozenCacheAllocationDecider
. But it's not really feasible. Instead, we can check it's a frozen index. Not really ideal, but seems to be the best we can do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a big fan of putting knowledge of specific deciders in here, as discussed, I wonder if HasFrozenAllocationDecider
is abusing THROTTLE
anyway. My reasoning was:
In the
canAllocateUnassigned
logic we'll wait for the outcome of a lower-weight THROTTLE when we have higher-weight YES so that would mean this decider can defer the allocation of a partial searchable snapshot while a non-frozen node was initialising doesn't it?
I would think we'd return NO until the setting was loaded?
It seems like returning THROTTLED
while we wait to find out if the node can host frozen shards could mean we defer allocating the shard to another node that we KNOW can host frozen shards, while waiting to find out if an initialising node is a frozen node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we defer allocating the shard to another node that we KNOW can host frozen shards, while waiting to find out if an initialising node is a frozen node.
That is true. I also wonder whether it is intentional since otherwise the shards will be assigned to an existing node only to relocate later momentarily. That said, after reading through code around HasFrozenAllocationDecider
, I believe it is OK to return NO
since it should not change the beahviour. That is because we always call allocateExistingUnassignedShards before calling into the allocator. This method uses SearchableSnapshotAllocator to allocate unassigne shards first. It also checks cache status and reject allocation when cache state is still being fetched. It in turns places the shard in the ignored list and BalancedShardAllocator
ignores them as well. In short, the allocation decision for paritially mounted shards are made earlier in a different place so that it should not matter for the decider to return NO
.
if (allocation.isSimulating() && relocatingShards >= 2) { | ||
// This branch should no longer run after https://github.com/elastic/elasticsearch/pull/134786 | ||
assert false : "allocation simulation should have returned earlier and not hit throttling"; | ||
// BalancedShardAllocator is prone to perform unnecessary moves when cluster_concurrent_rebalance is set to high values (>2). | ||
// (See https://github.com/elastic/elasticsearch/issues/87279) | ||
// Above allocator is used in DesiredBalanceComputer. Since we do not move actual shard data during calculation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can potentially remove this branch in future. For now, adding the assertion seems to be a good first step.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if we can't confidently get rid of this code, then we aren't really sure we're not throttling with the changes. This should at least have a planned follow up action that is more certain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe it is safe to remove. My usage of "potential" may be confusing. It does not imply less confidence, was just bad wording.
Adding assert false
ensures it does not happen in tests and is equivalent to deprecation. I'd prefer to have it first to run through CIs for sometime before finally removing it. It is also because the existing comment has an issue linked to it #87279 and it'd be great to spend some time separately to see whether it can be closed while removing the branch. I raised ES-13282 specifically for deleting this branch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All this change LGTM pending the question of whether we can make HasFrozenCacheAllocationDecider
return NO
while we're initialising the node so that we can remove the special knowledge of it from here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The purpose of this task is to gain confidence that the THROTTLE decision is not encountered, but right now this patch doesn't provide that confidence. This isn't in the hot-path, so I don't think we need to rush the change. I'd rather sort out the Decider before proceeding.
...sticsearch/xpack/searchablesnapshots/allocation/decider/HasFrozenCacheAllocationDecider.java
Outdated
Show resolved
Hide resolved
...sticsearch/xpack/searchablesnapshots/allocation/decider/HasFrozenCacheAllocationDecider.java
Outdated
Show resolved
Hide resolved
// happen in tests because we have decider like RandomAllocationDecider that can randomly return THROTTLE | ||
// when there is no existing relocation. | ||
// Check routingNodes.getRelocatingShardCount() > 0 in case the first relocation is a THROTTLE. | ||
shardBalanced = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we simplify the code by making this
assert ....
if (completeEarlyOnShardAssignmentChange && routingNodes.getRelocatingShardCount() > 0) {
return true;
}
continue;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really. We still want to track shardBalanced
when it is not simulation.
if (allocation.isSimulating() && relocatingShards >= 2) { | ||
// This branch should no longer run after https://github.com/elastic/elasticsearch/pull/134786 | ||
assert false : "allocation simulation should have returned earlier and not hit throttling"; | ||
// BalancedShardAllocator is prone to perform unnecessary moves when cluster_concurrent_rebalance is set to high values (>2). | ||
// (See https://github.com/elastic/elasticsearch/issues/87279) | ||
// Above allocator is used in DesiredBalanceComputer. Since we do not move actual shard data during calculation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think if we can't confidently get rid of this code, then we aren't really sure we're not throttling with the changes. This should at least have a planned follow up action that is more certain.
@nicktindall @DiannaHohensee The PR is ready for another look. In summary, I have taken Nick's suggestion for returning I spent quite some time reading through relevant code for searchable snapshot indices allocation. I believe it is safe to do so.
|
default -> STILL_FETCHING; | ||
case FETCHING -> allocation.isSimulating() ? NO_STILL_FETCHING : STILL_FETCHING; | ||
case UNKNOWN -> UNKNOWN_NODE; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HasFrozenAllocationDecider
does not differentiate between fetching node state (FETCHING
) and no-such-node (UNKNOWN
). The later can only happen when a node leaves the cluster which seems more justified to say NO
. In any case, they do not really impact SearchableSnapshotAllocator which is what matters for allocating unassigned searchable snapshot shards.
Btw, I can also undo the split if it is considered unrelated.
if (hasChanges == false && info.lastAllocationStatus() == UnassignedInfo.AllocationStatus.DECIDERS_THROTTLED) { | ||
// Unassigned ignored shards must be based on the provided set of ignoredShards | ||
assert ignoredShards.contains(discardAllocationStatus(shard)) | ||
|| ignoredShards.stream().filter(ShardRouting::primary).anyMatch(primary -> primary.shardId().equals(shard.shardId())) | ||
: "ignored shard " | ||
+ shard | ||
+ " unexpectedly has THROTTLE status and no counterpart in the provided ignoredShards set " | ||
+ ignoredShards; | ||
// Simulation could not progress due to missing information in any of the deciders. | ||
// Currently, this could happen if `HasFrozenCacheAllocationDecider` is still fetching the data. | ||
// Progress would be made after the followup reroute call. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment about HasFrozenCacheAllocationDecider
had me concerned for some time. Adding this assertion to make it clear that throttled status can only come from DesiredBalanceInput#ignoredShards
, i.e. they are not produced by BalancedShardAllocator
.
For allocation simulation, allocation decisions should not be THROTTLE for moveShards and balance. Allocate unassigned shards should not see THROTTLE unless there is prior shard assignment. This PR makes this statement explicit by fixing test allocation decider and HasFrozenAllocationDecider as well as adding relevant assertions.
Relates: ES-12955