-
Notifications
You must be signed in to change notification settings - Fork 1.7k
WIP!!! - Fix app start trace outliers from network delays (#10733) #15409
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@visumickey @eBlender Draft pull request While I test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request aims to fix outliers in app start traces caused by network delays or other interruptions by introducing a 'reasonable' timeout (30 seconds) after which the app start trace is cancelled. The changes include replacing a static flag with an instance property for better state management and modifying the trace completion logic.
My review focuses on improving code clarity and simplifying the asynchronous execution. I've suggested removing a piece of uncertain-looking comment and simplifying a nested dispatch_async
call that adds unnecessary complexity. The core logic for cancelling long-running app start traces appears sound and should address the reported issue.
// Use dispatch_async with a higher priority queue to reduce interference from network | ||
// operations This ensures trace completion isn't delayed by main queue congestion from network | ||
// calls | ||
__weak typeof(self) weakSelf = self; | ||
dispatch_async(dispatch_get_global_queue(QOS_CLASS_USER_INITIATED, 0), ^{ | ||
dispatch_async(dispatch_get_main_queue(), ^{ | ||
__strong typeof(weakSelf) strongSelf = weakSelf; | ||
if (!strongSelf || !strongSelf.appStartTrace) { | ||
return; | ||
} | ||
|
||
NSTimeInterval startTimeSinceEpoch = [strongSelf.appStartTrace startTimeSinceEpoch]; | ||
NSTimeInterval currentTimeSinceEpoch = [[NSDate date] timeIntervalSince1970]; | ||
NSTimeInterval elapsed = currentTimeSinceEpoch - startTimeSinceEpoch; | ||
|
||
// The below check accounts for multiple scenarios: | ||
// 1. App started in background and comes to foreground later | ||
// 2. App launched but immediately backgrounded | ||
// 3. Network delays during startup inflating metrics | ||
BOOL shouldCompleteTrace = (elapsed < gAppStartMaxValidDuration) && | ||
[strongSelf isAppStartEnabled] && | ||
![strongSelf isApplicationPreWarmed]; | ||
|
||
// Additional safety: cancel if elapsed time is unreasonably long for app start | ||
if (shouldCompleteTrace && elapsed < gAppStartReasonableValidDuration) { | ||
[strongSelf.appStartTrace stop]; | ||
} else { | ||
[strongSelf.appStartTrace cancel]; | ||
if (elapsed >= gAppStartReasonableValidDuration) { | ||
// Log for debugging network related delays | ||
NSLog( | ||
@"Firebase Performance: App start trace cancelled due to excessive duration: %.2fs", | ||
elapsed); | ||
} | ||
} | ||
}); | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The nested dispatch_async
calls seem overly complex for deferring execution. The outer dispatch to a global user-initiated queue, only to immediately dispatch back to the main queue, doesn't prevent delays if the main queue is congested. The block will still have to wait for its turn on the main thread.
A single dispatch_async
to the main queue is sufficient to defer the execution to the next run loop cycle. This simplifies the code and removes the misleading comment about avoiding main queue congestion.
The use of weakSelf
/strongSelf
is a good improvement for safety over the original implementation.
// Defer stopping the trace to the next run loop cycle. This is to ensure that the app is
// fully interactive.
__weak typeof(self) weakSelf = self;
dispatch_async(dispatch_get_main_queue(), ^{
__strong typeof(weakSelf) strongSelf = weakSelf;
if (!strongSelf || !strongSelf.appStartTrace) {
return;
}
NSTimeInterval startTimeSinceEpoch = [strongSelf.appStartTrace startTimeSinceEpoch];
NSTimeInterval currentTimeSinceEpoch = [[NSDate date] timeIntervalSince1970];
NSTimeInterval elapsed = currentTimeSinceEpoch - startTimeSinceEpoch;
// The below check accounts for multiple scenarios:
// 1. App started in background and comes to foreground later
// 2. App launched but immediately backgrounded
// 3. Network delays during startup inflating metrics
BOOL shouldCompleteTrace = (elapsed < gAppStartMaxValidDuration) &&
[strongSelf isAppStartEnabled] &&
![strongSelf isApplicationPreWarmed];
// Additional safety: cancel if elapsed time is unreasonably long for app start
if (shouldCompleteTrace && elapsed < gAppStartReasonableValidDuration) {
[strongSelf.appStartTrace stop];
} else {
[strongSelf.appStartTrace cancel];
if (elapsed >= gAppStartReasonableValidDuration) {
// Log for debugging network related delays
NSLog(
@"Firebase Performance: App start trace cancelled due to excessive duration: %.2fs",
elapsed);
}
}
});
static NSDate *doubleDispatchTime = nil; | ||
static NSDate *applicationDidFinishLaunchTime = nil; | ||
static NSTimeInterval gAppStartMaxValidDuration = 60 * 60; // 60 minutes. | ||
static NSTimeInterval gAppStartReasonableValidDuration = 30.0; // 30 seconds, reasonable app start time??? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ???
in the comment suggests uncertainty. Please remove it for a more polished and professional code appearance.
static NSTimeInterval gAppStartReasonableValidDuration = 30.0; // 30 seconds, reasonable app start time??? | |
static NSTimeInterval gAppStartReasonableValidDuration = 30.0; // 30 seconds, reasonable app start time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will Fix, I may need input from @visumickey to see if this is ideal or an acceptable time (Hence the ???)
WIP!!! - Fix app start trace outliers from network delays (#10733)
Discussion
Fix _app_start outliers mentioned in #10733 (Still Draft, Work In Progress Ongoing testing, seems good)
User Statements:
Long time reported in the _app_start metric in the 90 to 95 percentile of data in firebase console (up to 1000+ Seconds)
Issues seems to be appearing from background tasks that kick in the activity and ends until the first run loop runs successfully
Some reports have mentioned been able to have long _app_star metric when app launch is interrupted (Via locking the device or receiving a phone call)
What this fixes? (My possible reproduction ideas on why this is happening):
Case 1 - Spotty network right at cold launch
Case 2 - Targeted failures for early endpoints (e.g. Like if your App depended on many endpoints to launch and one of them was down)
Case 3 - Background launch before foreground
Case 4 - Sature GDC Workers to limit the available thread pool
Testing
API Changes