-
Notifications
You must be signed in to change notification settings - Fork 14.8k
KAFKA-19928: Added retry and backoff mechanism in NetworkPartitionMetadataClient #21001
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
apoorvmittal10
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the changes, some doubts.
| /** | ||
| * Manages retry attempts and exponential backoff for requests. | ||
| */ | ||
| public class ExponentialBackoffManager { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really require a new class and can't use some existing mechanism?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PersisterStateManager already had a subclass called BackOffManager, and I requred a similar thing. Instead of having 2 subclasses at different places, I think having a utility class is better. It could be used in future elsewhere as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There already exists https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/common/utils/ExponentialBackoff.java, can't that be used?
|
@chirag-wadhwa5 Checkstyle failures to fix please. |
| /** | ||
| * Manages retry attempts and exponential backoff for requests. | ||
| */ | ||
| public class ExponentialBackoffManager { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is a public class now, please add a few unit tests.
| shouldRetry = true; | ||
| } else if (clientResponse.wasTimedOut()) { | ||
| log.error("Response for ListOffsets for TopicPartitions: {} timed out - {}.", partitionFutures.keySet(), clientResponse); | ||
| log.warn("Response for ListOffsets for TopicPartitions: {} timed out - {}.", partitionFutures.keySet(), clientResponse); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe remove logging from retriable requests - log if attempts exhausted only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review. Maybe I'll change it to debug.
|
@chirag-wadhwa5 There are thread leaks from the new reaper |
|
@chirag-wadhwa5 Please can you merge latest changes from trunk. |
apoorvmittal10
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving given we will improve it in near future.
…adataClient (#21001) Currently, if a ListOffsets request fails in NetworkPartitionMetadataClient for any reason, the corresponding future is completed then and there, without any retries. But the NetworkClient and InterbrokerSendThread are loaded lazily in the NetworkPartitionMetadataClient on the arrival of the first request. But when the first request comes, it is immediately enqueued in the NetworkClient, before the connection could be established, thereby always failing the first request. As a solution to that, this PR introduces a retry mechanism with an upper limit on the retry attempts, as well as exponential backoff between succesive retries. Reviewers: Apoorv Mittal <[email protected]>, Andrew Schofield <[email protected]>, Sushant Mahajan <[email protected]>
|
Cherry-picked to 4.2. |
…adataClient (apache#21001) Currently, if a ListOffsets request fails in NetworkPartitionMetadataClient for any reason, the corresponding future is completed then and there, without any retries. But the NetworkClient and InterbrokerSendThread are loaded lazily in the NetworkPartitionMetadataClient on the arrival of the first request. But when the first request comes, it is immediately enqueued in the NetworkClient, before the connection could be established, thereby always failing the first request. As a solution to that, this PR introduces a retry mechanism with an upper limit on the retry attempts, as well as exponential backoff between succesive retries. Reviewers: Apoorv Mittal <[email protected]>, Andrew Schofield <[email protected]>, Sushant Mahajan <[email protected]>
Currently, if a ListOffsets request fails in
NetworkPartitionMetadataClient for any reason, the corresponding future
is completed then and there, without any retries. But the NetworkClient
and InterbrokerSendThread are loaded lazily in the
NetworkPartitionMetadataClient on the arrival of the first request. But
when the first request comes, it is immediately enqueued in the
NetworkClient, before the connection could be established, thereby
always failing the first request. As a solution to that, this PR
introduces a retry mechanism with an upper limit on the retry attempts,
as well as exponential backoff between succesive retries.
Reviewers: Apoorv Mittal [email protected], Andrew Schofield
[email protected], Sushant Mahajan [email protected]