Skip to content

Conversation

@aepfli
Copy link
Member

@aepfli aepfli commented Sep 7, 2025

this is a follow up to #743

Highlights

  • Eventing and Grace Period Implementation: Implemented eventing for the in-process flagd resolver, introducing ProviderStale and ProviderError states with a configurable grace period to manage transient connection issues during flag synchronization.

  • New gRPC-based Synchronization Mechanism: Introduced a new gRPC-based synchronization mechanism (grpc_sync.go) for the in-process resolver, enabling streaming flag updates and robust connection state monitoring.

  • Configurable Grace Period: Added a GracePeriod configuration option, defaulting to 5 seconds and configurable via the FLAGD_GRACE_PERIOD environment variable.

  • Updated End-to-End Tests: Updated end-to-end tests to validate the new grace period and eventing functionalities for the in-process provider.

But also some history to all of this.

  • New GRPC Connection: In the beginning people implemented a custom backoff mechanism with loops etc. which is overkill, and we think we will not be able to implement a better reconnection algorithm than grpc. see Provider reconnection topic flagd#1472
  • Events: The Flagd provider support events and event transistions, this was missing in the go flagd provider - now at least inprocess has those events
  • GracePeriod: per provider specification, we should allow a provider to have a grace period in which it is not in an Error state but a Stale state. just to help recover from issues without triggering errors. This is also added in this implementation.
  • E2E: everything is covered with our extensive e2e flagd testbed coverage - so we check if we do reconnects etc.

Important to note, i focused my efforts on the inprocess provider, and i might improve it later on for rpc. but this is already a big change, and also refactoring of the code (i wanted to increase readability) so there is no space for more. But in general it is getting closer and closer to the java implementation

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @aepfli, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the in-process flagd resolver by introducing robust eventing capabilities and a configurable grace period. These changes allow the provider to accurately communicate its operational state (ready, stale, error) and gracefully handle transient network or synchronization issues, leading to improved reliability and responsiveness in flag management.

Highlights

  • Eventing and Grace Period Implementation: Implemented eventing for the in-process flagd resolver, introducing ProviderStale and ProviderError states with a configurable grace period to manage transient connection issues during flag synchronization.
  • New gRPC-based Synchronization Mechanism: Introduced a new gRPC-based synchronization mechanism (grpc_sync.go) for the in-process resolver, enabling streaming flag updates and robust connection state monitoring.
  • Configurable Grace Period: Added a GracePeriod configuration option, defaulting to 5 seconds and configurable via the FLAGD_GRACE_PERIOD environment variable.
  • Updated End-to-End Tests: Updated end-to-end tests to validate the new grace period and eventing functionalities for the in-process provider.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

gemini-code-assist[bot]

This comment was marked as outdated.

@aepfli aepfli force-pushed the feat/flagd-inprocess-eventing-and-grace-period branch from 26b2c9e to 54e5e2d Compare September 7, 2025 18:11
@aepfli aepfli force-pushed the feat/improve_inprocess_tests branch from a8cb140 to 4f4f163 Compare September 7, 2025 18:41
@aepfli aepfli force-pushed the feat/flagd-inprocess-eventing-and-grace-period branch from 54e5e2d to f259c3f Compare September 7, 2025 18:42
@aepfli aepfli force-pushed the feat/flagd-inprocess-eventing-and-grace-period branch from f259c3f to b6cd812 Compare September 7, 2025 18:44
Signed-off-by: Simon Schrottner <[email protected]>
@toddbaert toddbaert changed the title feat(flagd): add eventing with graceperiod fir inprocess resolver feat(flagd): add eventing with graceperiod for inprocess resolver Sep 8, 2025
@aepfli aepfli changed the base branch from feat/improve_inprocess_tests to main September 8, 2025 13:31
@aepfli aepfli force-pushed the feat/flagd-inprocess-eventing-and-grace-period branch from 9d2e8e4 to 56aa6f4 Compare September 8, 2025 19:22
@aepfli
Copy link
Member Author

aepfli commented Sep 9, 2025

@open-feature/flagd-approvers please take a look at this, the internals have changed a little - but from the outside this should not be breaking. We are not conform with the provider spec for inprocess when it comes to events and error handling, imho a big improvement, also we rely now on grpc for the retry handling. please let me know what you think about this

@aepfli
Copy link
Member Author

aepfli commented Sep 9, 2025

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces eventing with a grace period for the in-process resolver, which is a significant enhancement. The implementation involves a new custom gRPC sync provider and a substantial refactoring of the in-process service logic for better lifecycle management and robustness. The changes are well-structured. I've identified a critical issue related to state management that could affect applications using multiple provider instances, and a medium-severity issue concerning duplicated constants which impacts maintainability. Addressing these points will make the implementation more robust and easier to maintain.

Signed-off-by: Simon Schrottner <[email protected]>
@tangenti
Copy link
Contributor

tangenti commented Sep 9, 2025

This is a large change. Could you add more details in the PR description for people to understand the problem it's trying to solve here?

@aepfli
Copy link
Member Author

aepfli commented Sep 9, 2025

@tangenti, you are right, i am sorry, i stopped writing pr description, because gemini is just so powerful, and in the end writes way better description:

Highlights

  • Eventing and Grace Period Implementation: Implemented eventing for the in-process flagd resolver, introducing ProviderStale and ProviderError states with a configurable grace period to manage transient connection issues during flag synchronization.

  • New gRPC-based Synchronization Mechanism: Introduced a new gRPC-based synchronization mechanism (grpc_sync.go) for the in-process resolver, enabling streaming flag updates and robust connection state monitoring.

  • Configurable Grace Period: Added a GracePeriod configuration option, defaulting to 5 seconds and configurable via the FLAGD_GRACE_PERIOD environment variable.

  • Updated End-to-End Tests: Updated end-to-end tests to validate the new grace period and eventing functionalities for the in-process provider.

But also some history to all of this.

  • New GRPC Connection: In the beginning people implemented a custom backoff mechanism with loops etc. which is overkill, and we think we will not be able to implement a better reconnection algorithm than grpc. see Provider reconnection topic flagd#1472
  • Events: The Flagd provider support events and event transistions, this was missing in the go flagd provider - now at least inprocess has those events
  • GracePeriod: per provider specification, we should allow a provider to have a grace period in which it is not in an Error state but a Stale state. just to help recover from issues without triggering errors. This is also added in this implementation.
  • E2E: everything is covered with our extensive e2e flagd testbed coverage - so we check if we do reconnects etc.

Important to note, i focused my efforts on the inprocess provider, and i might improve it later on for rpc. but this is already a big change, and also refactoring of the code (i wanted to increase readability) so there is no space for more.

@tangenti
Copy link
Contributor

  • Events: The Flagd provider support events and event transistions, this was missing in the go flagd provider - now at least inprocess has those events
  • GracePeriod: per provider specification, we should allow a provider to have a grace period in which it is not in an Error state but a Stale state. just to help recover from issues without triggering errors. This is also added in this implementation.

Thanks Simon. For events and grace periods, are those documented somewhere? Is it already implemented in the flagd providers of other languages?

CC @andreyturkov

@aepfli
Copy link
Member Author

aepfli commented Sep 10, 2025

Thanks Simon. For events and grace periods, are those documented somewhere? Is it already implemented in the flagd providers of other languages?

CC @andreyturkov

it is documented here and here we do have gherkin tests for our e2e tests, and they are executed as long as the @grace filter is not disabled. we do have those for java and python for sure - i would need to check for the others - but it is a defined behaviour.

@tangenti
Copy link
Contributor

I attempted a few times and it's still quite hard for me to do a thorough review.

The PR seems to mix several different changes together:

  1. gRPC sync for the in-process provider
  2. grace period for sync connection
  3. provider states based on the connection status
  4. a lot comments update and small refactor that are irrelevant to 1,2 or 3

Does it make sense to split it into smaller PRs? In general, it's a good practice to have smaller PRs focused on a single focus - easier to be reviewed, easier to iterate for comments and feedback, and also better granularity for rollback.

@toddbaert What do you think?

@aepfli
Copy link
Member Author

aepfli commented Sep 12, 2025

I agree with the size, but sadly those changes are somewhat correlating in its nature.

To get eventing, I needed to either change the grpc isync behavior in flagd or create my own grpc sync what I did.

Instead of relying on the old behavior, I added the new and more robust grpc sync based on the native implementation.

I agree that the last step, in which I told claude.ai to make the code easier to read and debug for the resolver, may have been not the best idea - but that is something I can revert, and extract. But on the other side, beside a little property renaming i did not touch any of the existing tests, and they are still passing.

We do have the flagd testbed, which covers a lot of the reconnection and eventing behavior. Which is the basic for tests for java, python, etc. too and ensure consistent behaviour with a running flagd. I can also demo our tooling and what we did there.

This is also a first step in my testing improvements, and a basis for my sophisticated e2e test scenario for flagd, which utilizes this implementation for testing compliance with providers. see

I will try to revert the restructuring of the code - but eventing and grpc will be most likely hard to strip. see https://github.com/open-feature/go-sdk-contrib/pull/744/files#diff-3c5e0c3b1eadd540e2508ddbf5a2459c13865d5d0c5012d72911c814c6677a6dR29

// Edit:
a lot of the additional behaviour in the service (especially around shutdown) are related to the fact, that during tests the provider didn't fully shutdown, and caused issues.

if s.Provider != nil {
// Try to cast to common provider interfaces that might have shutdown methods
// This is defensive - not all providers will have explicit shutdown
go func() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder why this was in a func before?

Copy link
Member Author

@aepfli aepfli Oct 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most likely to bypass the problem I solved in here with in proper shutdown of the provider

Comment on lines +334 to +337
// Handle initialization completion (only happens once ever)
i.initOnce.Do(func() {
close(i.shutdownChannels.initSuccess)
})
Copy link
Member

@toddbaert toddbaert Oct 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm having a hard time seeing where this might be re-initialized.

If I'm not missing anything, this means that the provider can never be used again properly after shutdown.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what the take away is, or the actionable item for me.

Now it cannot be reinitialized any more, after the shutdown. Because we only initialize it once. - should we be more careful, should we allow reinitialization? Should we return an error when it was initialized already?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know we support re-initialization in other flagd providers, though it's not something we specifically specify.

My only concern with this PR as it stands is if this will break existing usage if anyone re-initializes (I'm not sure what the behavior was before, but it seems like it might have changed).

Maybe let's merge this PR, and then decide what to do about this separately.

}

// Sync implements gRPC-based flag synchronization with improved context cancellation and error handling
type Sync struct {
Copy link
Member

@toddbaert toddbaert Oct 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously we used to use the grpc_sync implementation from flagd/core, instead of a custom one here.

Was it not possible to enhance/add options or features to that one to support the usage we need here? Or too hard or impractical? This isn't a crazy amount of duplication, but there's still some.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Imho, I would prefer to go forward with this one in here, even if we might Backport it later to flagd. But it is hard to get things done in both repos and stying consistent. Now I have at least a version which is highly tested in here, which works based on the test. Doing this in synced pull requests and still ensuring both implementations work fine, is a lot of effort. We can in the future remove this, when the flagd one supports the same functionality and paradigms. But the effort to do this at once is massive.

@toddbaert
Copy link
Member

toddbaert commented Oct 9, 2025

@aepfli This looks pretty close to me, but I'd like your thoughts on these two comments before I approve:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants