feat(metrics): Add fork-safety to SynchronousMeasurementConsumer #4767
+31
−0
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
Implement post-fork reinitialization of threading locks in the metrics measurement consumer to prevent deadlocks and data duplication in forked child processes.
This change adds fork-safety mechanisms to
SynchronousMeasurementConsumer
by:os.register_at_fork()
to detect process forksThis addresses the deadlock issue reported in Flask/Gunicorn applications with gevent workers where threads get stuck trying to acquire locks that were held during fork, causing request timeouts and memory leaks.
Closes #4345
Type of change
How Has This Been Tested?
The fork-safety implementation has been tested with:
Does This PR Require a Contrib Repo Change?
Checklist:
Technical Implementation Details
Root Cause Analysis:
The deadlock occurred because forked child processes inherited the parent's thread state, including locks that may have been held at fork time. In gevent environments, this caused threads to wait indefinitely for locks that would never be released, as described in the stack trace from issue #4345.
Solution Approach:
os.register_at_fork(after_in_child=...)
to register cleanup callbacks_at_fork_reinit()
on threading.Lock instances to reset their state_needs_storage_reinit
flag to defer expensive operations until first use_instrument_view_instrument_matches
cache to prevent duplicate metricsPerformance Considerations:
os.register_at_fork
exists (Python 3.7+)This fix ensures that OpenTelemetry metrics work reliably in production environments using pre-fork server models, resolving the critical deadlock issue that was causing request timeouts and memory leaks in Flask/Gunicorn deployments.