Skip to content

Conversation

@glbrntt
Copy link
Collaborator

@glbrntt glbrntt commented Nov 27, 2025

Motivation:

Previously, when a name resolver threw an error or returned nil, the
channel would immediately call beginGracefulShutdown() and become
permanently unavailable. This made channels fragile to transient
failures like DNS timeouts, network interruptions, etc. It also breaks
an API contract: the client shouldn't shutdown unless the user
explicitly asked it to.

Modifications:

  • Documented expected behaviour for name resolvers
  • Added ResolverWithBackoff wrapper which acts as a state machine for
    resolving with retries and backoff
  • Modified push based resolution to retry indefinitely on errors
    and nil completion by creating new iterators, with exponential backoff
    between attempts
  • Modified pull based resolution to retry on errors and nil completion
    with exponential backoff

Result:

Motivation:

Previously, when a name resolver threw an error or returned nil, the
channel would immediately call beginGracefulShutdown() and become
permanently unavailable. This made channels fragile to transient
failures like DNS timeouts, network interruptions, etc. It also breaks
an API contract: the client shouldn't shutdown unless the user
explicitly asked it to.

Modifications:

- Documented expected behaviour for name resolvers
- Added ResolverWithBackoff wrapper which acts as a state machine for
  resolving with retries and backoff
- Modified push based resolution to retry indefinitely on errors
  and nil completion by creating new iterators, with exponential backoff
  between attempts
- Modified pull based resolution to retry on errors and nil completion
  with exponential backoff

Result:

- Channels are more resilient to transient name resolution failures.
- Resolves grpc/grpc-swift-2#25
@glbrntt glbrntt added the 🆕 semver/minor Adds new public API. label Nov 27, 2025
()
}

try? await Task.sleep(for: duration)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we propagate CancellationErrors here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this is running in a non-throwing task group, just retuning is fine.

@glbrntt glbrntt requested a review from gjcairo December 4, 2025 13:03
@glbrntt glbrntt merged commit dcfa8dc into grpc:main Dec 8, 2025
39 checks passed
@glbrntt glbrntt deleted the throwing-resolver branch December 8, 2025 10:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

🆕 semver/minor Adds new public API.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Cause and mitigation for "Client has been stopped. Can't make any more RPCs." error

2 participants