- 
                Notifications
    You must be signed in to change notification settings 
- Fork 66
feat(flagd): add eventing with graceperiod for inprocess resolver #744
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of Changes
Hello @aepfli, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request significantly enhances the in-process flagd resolver by introducing robust eventing capabilities and a configurable grace period. These changes allow the provider to accurately communicate its operational state (ready, stale, error) and gracefully handle transient network or synchronization issues, leading to improved reliability and responsiveness in flag management.
Highlights
- Eventing and Grace Period Implementation: Implemented eventing for the in-process flagd resolver, introducing ProviderStaleandProviderErrorstates with a configurable grace period to manage transient connection issues during flag synchronization.
- New gRPC-based Synchronization Mechanism: Introduced a new gRPC-based synchronization mechanism (grpc_sync.go) for the in-process resolver, enabling streaming flag updates and robust connection state monitoring.
- Configurable Grace Period: Added a GracePeriodconfiguration option, defaulting to 5 seconds and configurable via theFLAGD_GRACE_PERIODenvironment variable.
- Updated End-to-End Tests: Updated end-to-end tests to validate the new grace period and eventing functionalities for the in-process provider.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.
| Feature | Command | Description | 
|---|---|---|
| Code Review | /gemini review | Performs a code review for the current pull request in its current state. | 
| Pull Request Summary | /gemini summary | Provides a summary of the current pull request in its current state. | 
| Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. | 
| Help | /gemini help | Displays a list of available commands. | 
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
- 
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩ 
26b2c9e    to
    54e5e2d      
    Compare
  
    Signed-off-by: Simon Schrottner <[email protected]>
a8cb140    to
    4f4f163      
    Compare
  
    54e5e2d    to
    f259c3f      
    Compare
  
    Signed-off-by: Simon Schrottner <[email protected]>
f259c3f    to
    b6cd812      
    Compare
  
    Signed-off-by: Simon Schrottner <[email protected]>
Signed-off-by: Simon Schrottner <[email protected]>
Signed-off-by: Simon Schrottner <[email protected]>
Signed-off-by: Simon Schrottner <[email protected]>
9d2e8e4    to
    56aa6f4      
    Compare
  
    Signed-off-by: Simon Schrottner <[email protected]>
Signed-off-by: Simon Schrottner <[email protected]>
…onger for e2e tests Signed-off-by: Simon Schrottner <[email protected]>
| @open-feature/flagd-approvers please take a look at this, the internals have changed a little - but from the outside this should not be breaking. We are not conform with the provider spec for inprocess when it comes to events and error handling, imho a big improvement, also we rely now on grpc for the retry handling. please let me know what you think about this | 
| /gemini review | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces eventing with a grace period for the in-process resolver, which is a significant enhancement. The implementation involves a new custom gRPC sync provider and a substantial refactoring of the in-process service logic for better lifecycle management and robustness. The changes are well-structured. I've identified a critical issue related to state management that could affect applications using multiple provider instances, and a medium-severity issue concerning duplicated constants which impacts maintainability. Addressing these points will make the implementation more robust and easier to maintain.
Signed-off-by: Simon Schrottner <[email protected]>
Signed-off-by: Simon Schrottner <[email protected]>
| This is a large change. Could you add more details in the PR description for people to understand the problem it's trying to solve here? | 
| @tangenti, you are right, i am sorry, i stopped writing pr description, because gemini is just so powerful, and in the end writes way better description: 
 But also some history to all of this. 
 Important to note, i focused my efforts on the inprocess provider, and i might improve it later on for rpc. but this is already a big change, and also refactoring of the code (i wanted to increase readability) so there is no space for more. | 
| 
 Thanks Simon. For events and grace periods, are those documented somewhere? Is it already implemented in the flagd providers of other languages? | 
| 
 it is documented here and here we do have gherkin tests for our e2e tests, and they are executed as long as the  | 
| I attempted a few times and it's still quite hard for me to do a thorough review. The PR seems to mix several different changes together: 
 Does it make sense to split it into smaller PRs? In general, it's a good practice to have smaller PRs focused on a single focus - easier to be reviewed, easier to iterate for comments and feedback, and also better granularity for rollback. @toddbaert What do you think? | 
| I agree with the size, but sadly those changes are somewhat correlating in its nature. To get eventing, I needed to either change the grpc isync behavior in flagd or create my own grpc sync what I did. Instead of relying on the old behavior, I added the new and more robust grpc sync based on the native implementation. I agree that the last step, in which I told claude.ai to make the code easier to read and debug for the resolver, may have been not the best idea - but that is something I can revert, and extract. But on the other side, beside a little property renaming i did not touch any of the existing tests, and they are still passing. We do have the flagd testbed, which covers a lot of the reconnection and eventing behavior. Which is the basic for tests for java, python, etc. too and ensure consistent behaviour with a running flagd. I can also demo our tooling and what we did there. This is also a first step in my testing improvements, and a basis for my sophisticated e2e test scenario for flagd, which utilizes this implementation for testing compliance with providers. see I will try to revert the restructuring of the code - but eventing and grpc will be most likely hard to strip. see https://github.com/open-feature/go-sdk-contrib/pull/744/files#diff-3c5e0c3b1eadd540e2508ddbf5a2459c13865d5d0c5012d72911c814c6677a6dR29 // Edit: | 
| if s.Provider != nil { | ||
| // Try to cast to common provider interfaces that might have shutdown methods | ||
| // This is defensive - not all providers will have explicit shutdown | ||
| go func() { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder why this was in a func before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most likely to bypass the problem I solved in here with in proper shutdown of the provider
| // Handle initialization completion (only happens once ever) | ||
| i.initOnce.Do(func() { | ||
| close(i.shutdownChannels.initSuccess) | ||
| }) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm having a hard time seeing where this might be re-initialized.
If I'm not missing anything, this means that the provider can never be used again properly after shutdown.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what the take away is, or the actionable item for me.
Now it cannot be reinitialized any more, after the shutdown. Because we only initialize it once. - should we be more careful, should we allow reinitialization? Should we return an error when it was initialized already?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know we support re-initialization in other flagd providers, though it's not something we specifically specify.
My only concern with this PR as it stands is if this will break existing usage if anyone re-initializes (I'm not sure what the behavior was before, but it seems like it might have changed).
Maybe let's merge this PR, and then decide what to do about this separately.
| } | ||
|  | ||
| // Sync implements gRPC-based flag synchronization with improved context cancellation and error handling | ||
| type Sync struct { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously we used to use the grpc_sync implementation from flagd/core, instead of a custom one here.
Was it not possible to enhance/add options or features to that one to support the usage we need here? Or too hard or impractical? This isn't a crazy amount of duplication, but there's still some.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Imho, I would prefer to go forward with this one in here, even if we might Backport it later to flagd. But it is hard to get things done in both repos and stying consistent. Now I have at least a version which is highly tested in here, which works based on the test. Doing this in synced pull requests and still ensuring both implementations work fine, is a lot of effort. We can in the future remove this, when the flagd one supports the same functionality and paradigms. But the effort to do this at once is massive.
| @aepfli This looks pretty close to me, but I'd like your thoughts on these two comments before I approve: 
 | 
Signed-off-by: Simon Schrottner <[email protected]>
this is a follow up to #743
But also some history to all of this.
Important to note, i focused my efforts on the inprocess provider, and i might improve it later on for rpc. but this is already a big change, and also refactoring of the code (i wanted to increase readability) so there is no space for more. But in general it is getting closer and closer to the java implementation