feat(flagd): add eventing with graceperiod for inprocess resolver #744

aepfli · 2025-09-07T18:08:00Z

this is a follow up to #743

Highlights

Eventing and Grace Period Implementation: Implemented eventing for the in-process flagd resolver, introducing ProviderStale and ProviderError states with a configurable grace period to manage transient connection issues during flag synchronization.

New gRPC-based Synchronization Mechanism: Introduced a new gRPC-based synchronization mechanism (grpc_sync.go) for the in-process resolver, enabling streaming flag updates and robust connection state monitoring.

Configurable Grace Period: Added a GracePeriod configuration option, defaulting to 5 seconds and configurable via the FLAGD_GRACE_PERIOD environment variable.

Updated End-to-End Tests: Updated end-to-end tests to validate the new grace period and eventing functionalities for the in-process provider.

But also some history to all of this.

New GRPC Connection: In the beginning people implemented a custom backoff mechanism with loops etc. which is overkill, and we think we will not be able to implement a better reconnection algorithm than grpc. see Provider reconnection topic flagd#1472
Events: The Flagd provider support events and event transistions, this was missing in the go flagd provider - now at least inprocess has those events
GracePeriod: per provider specification, we should allow a provider to have a grace period in which it is not in an Error state but a Stale state. just to help recover from issues without triggering errors. This is also added in this implementation.
E2E: everything is covered with our extensive e2e flagd testbed coverage - so we check if we do reconnects etc.

Important to note, i focused my efforts on the inprocess provider, and i might improve it later on for rpc. but this is already a big change, and also refactoring of the code (i wanted to increase readability) so there is no space for more. But in general it is getting closer and closer to the java implementation

gemini-code-assist

Summary of Changes

Hello @aepfli, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the in-process flagd resolver by introducing robust eventing capabilities and a configurable grace period. These changes allow the provider to accurately communicate its operational state (ready, stale, error) and gracefully handle transient network or synchronization issues, leading to improved reliability and responsiveness in flag management.

Highlights

Eventing and Grace Period Implementation: Implemented eventing for the in-process flagd resolver, introducing ProviderStale and ProviderError states with a configurable grace period to manage transient connection issues during flag synchronization.
New gRPC-based Synchronization Mechanism: Introduced a new gRPC-based synchronization mechanism (grpc_sync.go) for the in-process resolver, enabling streaming flag updates and robust connection state monitoring.
Configurable Grace Period: Added a GracePeriod configuration option, defaulting to 5 seconds and configurable via the FLAGD_GRACE_PERIOD environment variable.
Updated End-to-End Tests: Updated end-to-end tests to validate the new grace period and eventing functionalities for the in-process provider.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

Signed-off-by: Simon Schrottner <[email protected]>

…onger for e2e tests Signed-off-by: Simon Schrottner <[email protected]>

aepfli · 2025-09-09T05:04:25Z

@open-feature/flagd-approvers please take a look at this, the internals have changed a little - but from the outside this should not be breaking. We are not conform with the provider spec for inprocess when it comes to events and error handling, imho a big improvement, also we rely now on grpc for the retry handling. please let me know what you think about this

aepfli · 2025-09-09T05:04:33Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces eventing with a grace period for the in-process resolver, which is a significant enhancement. The implementation involves a new custom gRPC sync provider and a substantial refactoring of the in-process service logic for better lifecycle management and robustness. The changes are well-structured. I've identified a critical issue related to state management that could affect applications using multiple provider instances, and a medium-severity issue concerning duplicated constants which impacts maintainability. Addressing these points will make the implementation more robust and easier to maintain.

providers/flagd/pkg/service/in_process/grpc_sync.go

providers/flagd/pkg/service/in_process/service.go

Signed-off-by: Simon Schrottner <[email protected]>

tangenti · 2025-09-09T13:35:10Z

This is a large change. Could you add more details in the PR description for people to understand the problem it's trying to solve here?

aepfli · 2025-09-09T14:12:14Z

@tangenti, you are right, i am sorry, i stopped writing pr description, because gemini is just so powerful, and in the end writes way better description:

Highlights

Eventing and Grace Period Implementation: Implemented eventing for the in-process flagd resolver, introducing ProviderStale and ProviderError states with a configurable grace period to manage transient connection issues during flag synchronization.

New gRPC-based Synchronization Mechanism: Introduced a new gRPC-based synchronization mechanism (grpc_sync.go) for the in-process resolver, enabling streaming flag updates and robust connection state monitoring.

Configurable Grace Period: Added a GracePeriod configuration option, defaulting to 5 seconds and configurable via the FLAGD_GRACE_PERIOD environment variable.

Updated End-to-End Tests: Updated end-to-end tests to validate the new grace period and eventing functionalities for the in-process provider.

But also some history to all of this.

New GRPC Connection: In the beginning people implemented a custom backoff mechanism with loops etc. which is overkill, and we think we will not be able to implement a better reconnection algorithm than grpc. see Provider reconnection topic flagd#1472
Events: The Flagd provider support events and event transistions, this was missing in the go flagd provider - now at least inprocess has those events
GracePeriod: per provider specification, we should allow a provider to have a grace period in which it is not in an Error state but a Stale state. just to help recover from issues without triggering errors. This is also added in this implementation.
E2E: everything is covered with our extensive e2e flagd testbed coverage - so we check if we do reconnects etc.

Important to note, i focused my efforts on the inprocess provider, and i might improve it later on for rpc. but this is already a big change, and also refactoring of the code (i wanted to increase readability) so there is no space for more.

tangenti · 2025-09-10T14:39:47Z

Events: The Flagd provider support events and event transistions, this was missing in the go flagd provider - now at least inprocess has those events

GracePeriod: per provider specification, we should allow a provider to have a grace period in which it is not in an Error state but a Stale state. just to help recover from issues without triggering errors. This is also added in this implementation.

Thanks Simon. For events and grace periods, are those documented somewhere? Is it already implemented in the flagd providers of other languages?

CC @andreyturkov

aepfli · 2025-09-10T14:54:11Z

Thanks Simon. For events and grace periods, are those documented somewhere? Is it already implemented in the flagd providers of other languages?

CC @andreyturkov

it is documented here and here we do have gherkin tests for our e2e tests, and they are executed as long as the @grace filter is not disabled. we do have those for java and python for sure - i would need to check for the others - but it is a defined behaviour.

tangenti · 2025-09-12T13:23:07Z

I attempted a few times and it's still quite hard for me to do a thorough review.

The PR seems to mix several different changes together:

gRPC sync for the in-process provider
grace period for sync connection
provider states based on the connection status
a lot comments update and small refactor that are irrelevant to 1,2 or 3

Does it make sense to split it into smaller PRs? In general, it's a good practice to have smaller PRs focused on a single focus - easier to be reviewed, easier to iterate for comments and feedback, and also better granularity for rollback.

@toddbaert What do you think?

aepfli · 2025-09-12T14:07:53Z

I agree with the size, but sadly those changes are somewhat correlating in its nature.

To get eventing, I needed to either change the grpc isync behavior in flagd or create my own grpc sync what I did.

Instead of relying on the old behavior, I added the new and more robust grpc sync based on the native implementation.

I agree that the last step, in which I told claude.ai to make the code easier to read and debug for the resolver, may have been not the best idea - but that is something I can revert, and extract. But on the other side, beside a little property renaming i did not touch any of the existing tests, and they are still passing.

We do have the flagd testbed, which covers a lot of the reconnection and eventing behavior. Which is the basic for tests for java, python, etc. too and ensure consistent behaviour with a running flagd. I can also demo our tooling and what we did there.

This is also a first step in my testing improvements, and a basis for my sophisticated e2e test scenario for flagd, which utilizes this implementation for testing compliance with providers. see

I will try to revert the restructuring of the code - but eventing and grpc will be most likely hard to strip. see https://github.com/open-feature/go-sdk-contrib/pull/744/files#diff-3c5e0c3b1eadd540e2508ddbf5a2459c13865d5d0c5012d72911c814c6677a6dR29

// Edit:
a lot of the additional behaviour in the service (especially around shutdown) are related to the fact, that during tests the provider didn't fully shutdown, and caused issues.

toddbaert · 2025-10-09T14:52:15Z

tests/flagd/testframework/step_definitions.go

 	if s.Provider != nil {
 		// Try to cast to common provider interfaces that might have shutdown methods
 		// This is defensive - not all providers will have explicit shutdown
-		go func() {


I wonder why this was in a func before?

Most likely to bypass the problem I solved in here with in proper shutdown of the provider

toddbaert · 2025-10-09T16:57:07Z

providers/flagd/pkg/service/in_process/service.go

+	// Handle initialization completion (only happens once ever)
+	i.initOnce.Do(func() {
+		close(i.shutdownChannels.initSuccess)
+	})


I'm having a hard time seeing where this might be re-initialized.

If I'm not missing anything, this means that the provider can never be used again properly after shutdown.

Not sure what the take away is, or the actionable item for me.

Now it cannot be reinitialized any more, after the shutdown. Because we only initialize it once. - should we be more careful, should we allow reinitialization? Should we return an error when it was initialized already?

I know we support re-initialization in other flagd providers, though it's not something we specifically specify.

My only concern with this PR as it stands is if this will break existing usage if anyone re-initializes (I'm not sure what the behavior was before, but it seems like it might have changed).

Maybe let's merge this PR, and then decide what to do about this separately.

toddbaert · 2025-10-09T17:11:24Z

providers/flagd/pkg/service/in_process/grpc_sync.go

+}
+
+// Sync implements gRPC-based flag synchronization with improved context cancellation and error handling
+type Sync struct {


Previously we used to use the grpc_sync implementation from flagd/core, instead of a custom one here.

Was it not possible to enhance/add options or features to that one to support the usage we need here? Or too hard or impractical? This isn't a crazy amount of duplication, but there's still some.

Imho, I would prefer to go forward with this one in here, even if we might Backport it later to flagd. But it is hard to get things done in both repos and stying consistent. Now I have at least a version which is highly tested in here, which works based on the test. Doing this in synced pull requests and still ensuring both implementations work fine, is a lot of effort. We can in the future remove this, when the flagd one supports the same functionality and paradigms. But the effort to do this at once is massive.

toddbaert · 2025-10-09T17:33:32Z

@aepfli This looks pretty close to me, but I'd like your thoughts on these two comments before I approve:

feat(flagd): add eventing with graceperiod for inprocess resolver #744 (comment) (potential bug)
feat(flagd): add eventing with graceperiod for inprocess resolver #744 (comment)
- I have a mild preference not to re-implement this object, but I also don't think it's worth a massive effort to enhance the flagd one. WDYT?

Signed-off-by: Simon Schrottner <[email protected]>

aepfli requested review from a team as code owners September 7, 2025 18:08

github-actions bot assigned bacherfl, Kavindu-Dodan and toddbaert Sep 7, 2025

github-actions bot requested review from Kavindu-Dodan, bacherfl and toddbaert September 7, 2025 18:08

gemini-code-assist bot reviewed Sep 7, 2025

View reviewed changes

This comment was marked as outdated.

Sign in to view

aepfli force-pushed the feat/flagd-inprocess-eventing-and-grace-period branch from 26b2c9e to 54e5e2d Compare September 7, 2025 18:11

test: improve testing and fixing test state issues

4f4f163

Signed-off-by: Simon Schrottner <[email protected]>

aepfli force-pushed the feat/improve_inprocess_tests branch from a8cb140 to 4f4f163 Compare September 7, 2025 18:41

aepfli force-pushed the feat/flagd-inprocess-eventing-and-grace-period branch from 54e5e2d to f259c3f Compare September 7, 2025 18:42

feat(flagd): add eventing with graceperiod fir inprocess resolver

b6cd812

Signed-off-by: Simon Schrottner <[email protected]>

aepfli force-pushed the feat/flagd-inprocess-eventing-and-grace-period branch from f259c3f to b6cd812 Compare September 7, 2025 18:44

fixup: improve configuration

5155a86

Signed-off-by: Simon Schrottner <[email protected]>

toddbaert changed the title ~~feat(flagd): add eventing with graceperiod fir inprocess resolver~~ feat(flagd): add eventing with graceperiod for inprocess resolver Sep 8, 2025

aepfli changed the base branch from feat/improve_inprocess_tests to main September 8, 2025 13:31

aepfli added 3 commits September 8, 2025 15:57

Merge branch 'main' into feat/flagd-inprocess-eventing-and-grace-period

39bb8d5

Signed-off-by: Simon Schrottner <[email protected]>

Merge branch 'main' into feat/flagd-inprocess-eventing-and-grace-period

6bc47d2

Signed-off-by: Simon Schrottner <[email protected]>

fixup: improve code

56aa6f4

Signed-off-by: Simon Schrottner <[email protected]>

aepfli force-pushed the feat/flagd-inprocess-eventing-and-grace-period branch from 9d2e8e4 to 56aa6f4 Compare September 8, 2025 19:22

aepfli added 4 commits September 8, 2025 21:22

Merge branch 'main' into feat/flagd-inprocess-eventing-and-grace-period

5af1593

fixup: fixing test

d7d9a16

Signed-off-by: Simon Schrottner <[email protected]>

feat: add retry configuration

100e82f

Signed-off-by: Simon Schrottner <[email protected]>

fixup: try fixing test execution. seems like startup, etc is taking l…

0e24fac

…onger for e2e tests Signed-off-by: Simon Schrottner <[email protected]>

gemini-code-assist bot reviewed Sep 9, 2025

View reviewed changes

providers/flagd/pkg/service/in_process/grpc_sync.go Outdated Show resolved Hide resolved

providers/flagd/pkg/service/in_process/grpc_sync.go Outdated Show resolved Hide resolved

providers/flagd/pkg/service/in_process/service.go Show resolved Hide resolved

aepfli added 2 commits September 9, 2025 07:11

fixup: gemini suggestions:

9414c4d

Signed-off-by: Simon Schrottner <[email protected]>

fixup: update harness for better tests

bbc6269

Signed-off-by: Simon Schrottner <[email protected]>

erka mentioned this pull request Sep 15, 2025

ci: upgrade golangci-lint to v2 #746

Merged

Merge branch 'main' into feat/flagd-inprocess-eventing-and-grace-period

f64dd1d

toddbaert reviewed Oct 9, 2025

View reviewed changes

aepfli mentioned this pull request Oct 23, 2025

Infinite retry to establish connection to FlagSyncService in Flagd golang provider #756

Open

alexandraoberaigner mentioned this pull request Oct 24, 2025

fix(flagd): do not retry for certain status codes (#756) #783

Open

toddbaert self-requested a review October 27, 2025 13:14

toddbaert approved these changes Oct 27, 2025

View reviewed changes

Merge branch 'main' into feat/flagd-inprocess-eventing-and-grace-period

d822d43

Signed-off-by: Simon Schrottner <[email protected]>

aepfli mentioned this pull request Oct 31, 2025

test: using e2e framework to test flagd via go-sdk-contrib flagd provider open-feature/flagd#1821

Draft

feat(flagd): add eventing with graceperiod for inprocess resolver #744

Are you sure you want to change the base?

feat(flagd): add eventing with graceperiod for inprocess resolver #744

Uh oh!

Conversation

aepfli commented Sep 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Highlights

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Footnotes

Uh oh!

This comment was marked as outdated.

Uh oh!

aepfli commented Sep 9, 2025

Uh oh!

aepfli commented Sep 9, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tangenti commented Sep 9, 2025

Uh oh!

aepfli commented Sep 9, 2025

Highlights

Uh oh!

tangenti commented Sep 10, 2025

Uh oh!

aepfli commented Sep 10, 2025

Uh oh!

tangenti commented Sep 12, 2025

Uh oh!

aepfli commented Sep 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

toddbaert Oct 9, 2025

Choose a reason for hiding this comment

Uh oh!

aepfli Oct 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

toddbaert Oct 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aepfli Oct 9, 2025

Choose a reason for hiding this comment

Uh oh!

toddbaert Oct 27, 2025

Choose a reason for hiding this comment

Uh oh!

toddbaert Oct 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aepfli Oct 9, 2025

Choose a reason for hiding this comment

Uh oh!

toddbaert commented Oct 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

aepfli commented Sep 7, 2025 •

edited

Loading

aepfli commented Sep 12, 2025 •

edited

Loading

aepfli Oct 9, 2025 •

edited

Loading

toddbaert Oct 9, 2025 •

edited

Loading

toddbaert Oct 9, 2025 •

edited

Loading

toddbaert commented Oct 9, 2025 •

edited

Loading