-
Notifications
You must be signed in to change notification settings - Fork 468
fix(profiling): correctly detect on-cpu tasks #15452
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
fix(profiling): correctly detect on-cpu tasks #15452
Conversation
|
|
74a9e70 to
cb01154
Compare
Good find. This makes me think that we should treat task information more as metadata (labels) on a frame than try to "materialise" them as frames. This was more of a "UI-trick" to visualise the task link, but I don't think it is ultimately the right data model. |
Absolutely agree. I have started thinking about this – @r1viollet and I have come up with a medium/long-term vision for visualising But I think that's a problem for later – if we have correct stacks for Tasks and Coroutines it'll already be a good first step, and then we make whatever arrangements we need to make what we want happen. Also making Echion and Stack V2 effectively the same thing will definitely help if we need to change interfaces to make that work in a way that is incompatible with the |
Bootstrap import analysisComparison of import times between this PR and base. SummaryThe average import time from this PR is: 251 ± 4 ms. The average import time from base is: 255 ± 4 ms. The import time difference between this PR and base is: -3.4 ± 0.2 ms. Import time breakdownThe following import paths have shrunk:
|
Performance SLOsComparing candidate kowalski/fix-profiling-correctly-detect-on-cpu-tasks (988f625) with baseline main (61061a2) ❌ Test Failures (1 suite)❌ telemetryaddmetric - 29/30✅ 1-count-metric-1-timesTime: ✅ 3.400µs (SLO: <20.000µs 📉 -83.0%) vs baseline: 📈 +15.8% Memory: ✅ 34.760MB (SLO: <35.500MB -2.1%) vs baseline: +3.5% ✅ 1-count-metrics-100-timesTime: ✅ 202.626µs (SLO: <220.000µs -7.9%) vs baseline: +0.8% Memory: ✅ 34.839MB (SLO: <35.500MB 🟡 -1.9%) vs baseline: +4.0% ✅ 1-distribution-metric-1-timesTime: ✅ 3.301µs (SLO: <20.000µs 📉 -83.5%) vs baseline: +0.8% Memory: ✅ 34.878MB (SLO: <35.500MB 🟡 -1.8%) vs baseline: +4.0% ✅ 1-distribution-metrics-100-timesTime: ✅ 219.334µs (SLO: <230.000µs -4.6%) vs baseline: +2.1% Memory: ✅ 34.859MB (SLO: <35.500MB 🟡 -1.8%) vs baseline: +3.7% ✅ 1-gauge-metric-1-timesTime: ✅ 2.183µs (SLO: <20.000µs 📉 -89.1%) vs baseline: -1.4% Memory: ✅ 34.741MB (SLO: <35.500MB -2.1%) vs baseline: +3.6% ✅ 1-gauge-metrics-100-timesTime: ✅ 138.134µs (SLO: <150.000µs -7.9%) vs baseline: -0.1% Memory: ✅ 34.898MB (SLO: <35.500MB 🟡 -1.7%) vs baseline: +3.9% ✅ 1-rate-metric-1-timesTime: ✅ 3.100µs (SLO: <20.000µs 📉 -84.5%) vs baseline: +0.4% Memory: ✅ 34.819MB (SLO: <35.500MB 🟡 -1.9%) vs baseline: +3.7% ✅ 1-rate-metrics-100-timesTime: ✅ 217.573µs (SLO: <250.000µs 📉 -13.0%) vs baseline: +1.0% Memory: ✅ 34.780MB (SLO: <35.500MB -2.0%) vs baseline: +3.5% ✅ 100-count-metrics-100-timesTime: ✅ 20.497ms (SLO: <22.000ms -6.8%) vs baseline: -0.4% Memory: ✅ 35.036MB (SLO: <35.500MB 🟡 -1.3%) vs baseline: +4.3% ❌ 100-distribution-metrics-100-timesTime: ❌ 2.309ms (SLO: <2.300ms +0.4%) vs baseline: +1.7% Memory: ✅ 35.016MB (SLO: <35.500MB 🟡 -1.4%) vs baseline: +4.3% ✅ 100-gauge-metrics-100-timesTime: ✅ 1.432ms (SLO: <1.550ms -7.6%) vs baseline: +0.5% Memory: ✅ 34.918MB (SLO: <35.500MB 🟡 -1.6%) vs baseline: +4.0% ✅ 100-rate-metrics-100-timesTime: ✅ 2.237ms (SLO: <2.550ms 📉 -12.3%) vs baseline: +0.2% Memory: ✅ 35.036MB (SLO: <35.500MB 🟡 -1.3%) vs baseline: +4.3% ✅ flush-1-metricTime: ✅ 4.590µs (SLO: <20.000µs 📉 -77.1%) vs baseline: +3.8% Memory: ✅ 35.154MB (SLO: <35.500MB 🟡 -1.0%) vs baseline: +4.7% ✅ flush-100-metricsTime: ✅ 174.229µs (SLO: <250.000µs 📉 -30.3%) vs baseline: +0.7% Memory: ✅ 35.095MB (SLO: <35.500MB 🟡 -1.1%) vs baseline: +4.3% ✅ flush-1000-metricsTime: ✅ 2.188ms (SLO: <2.500ms 📉 -12.5%) vs baseline: -0.7% Memory: ✅ 35.940MB (SLO: <36.500MB 🟡 -1.5%) vs baseline: +4.4% 📈 Performance Regressions (2 suites)📈 iastaspects - 118/118✅ add_aspectTime: ✅ 0.403µs (SLO: <10.000µs 📉 -96.0%) vs baseline: ~same Memory: ✅ 40.033MB (SLO: <41.500MB -3.5%) vs baseline: +4.1% ✅ add_inplace_aspectTime: ✅ 0.404µs (SLO: <10.000µs 📉 -96.0%) vs baseline: -1.2% Memory: ✅ 40.180MB (SLO: <41.500MB -3.2%) vs baseline: +4.6% ✅ add_inplace_noaspectTime: ✅ 0.317µs (SLO: <10.000µs 📉 -96.8%) vs baseline: +1.1% Memory: ✅ 40.105MB (SLO: <41.500MB -3.4%) vs baseline: +4.1% ✅ add_noaspectTime: ✅ 0.284µs (SLO: <10.000µs 📉 -97.2%) vs baseline: +2.4% Memory: ✅ 40.144MB (SLO: <41.500MB -3.3%) vs baseline: +4.5% ✅ bytearray_aspectTime: ✅ 1.367µs (SLO: <10.000µs 📉 -86.3%) vs baseline: +3.2% Memory: ✅ 40.228MB (SLO: <41.500MB -3.1%) vs baseline: +4.8% ✅ bytearray_extend_aspectTime: ✅ 1.510µs (SLO: <10.000µs 📉 -84.9%) vs baseline: +1.4% Memory: ✅ 40.113MB (SLO: <41.500MB -3.3%) vs baseline: +4.0% ✅ bytearray_extend_noaspectTime: ✅ 0.610µs (SLO: <10.000µs 📉 -93.9%) vs baseline: -0.6% Memory: ✅ 40.228MB (SLO: <41.500MB -3.1%) vs baseline: +4.6% ✅ bytearray_noaspectTime: ✅ 0.485µs (SLO: <10.000µs 📉 -95.2%) vs baseline: ~same Memory: ✅ 40.385MB (SLO: <41.500MB -2.7%) vs baseline: +4.6% ✅ bytes_aspectTime: ✅ 1.297µs (SLO: <10.000µs 📉 -87.0%) vs baseline: +1.5% Memory: ✅ 40.311MB (SLO: <41.500MB -2.9%) vs baseline: +4.7% ✅ bytes_noaspectTime: ✅ 0.496µs (SLO: <10.000µs 📉 -95.0%) vs baseline: +0.8% Memory: ✅ 40.049MB (SLO: <41.500MB -3.5%) vs baseline: +4.1% ✅ bytesio_aspectTime: ✅ 1.333µs (SLO: <10.000µs 📉 -86.7%) vs baseline: +2.1% Memory: ✅ 40.287MB (SLO: <41.500MB -2.9%) vs baseline: +5.1% ✅ bytesio_noaspectTime: ✅ 0.503µs (SLO: <10.000µs 📉 -95.0%) vs baseline: +0.8% Memory: ✅ 40.092MB (SLO: <41.500MB -3.4%) vs baseline: +4.6% ✅ capitalize_aspectTime: ✅ 0.736µs (SLO: <10.000µs 📉 -92.6%) vs baseline: +0.6% Memory: ✅ 40.252MB (SLO: <41.500MB -3.0%) vs baseline: +4.5% ✅ capitalize_noaspectTime: ✅ 0.435µs (SLO: <10.000µs 📉 -95.6%) vs baseline: +0.3% Memory: ✅ 40.205MB (SLO: <41.500MB -3.1%) vs baseline: +4.2% ✅ casefold_aspectTime: ✅ 0.734µs (SLO: <10.000µs 📉 -92.7%) vs baseline: -1.0% Memory: ✅ 40.146MB (SLO: <41.500MB -3.3%) vs baseline: +4.6% ✅ casefold_noaspectTime: ✅ 0.366µs (SLO: <10.000µs 📉 -96.3%) vs baseline: -1.1% Memory: ✅ 40.119MB (SLO: <41.500MB -3.3%) vs baseline: +4.4% ✅ decode_aspectTime: ✅ 0.724µs (SLO: <10.000µs 📉 -92.8%) vs baseline: -0.8% Memory: ✅ 40.206MB (SLO: <41.500MB -3.1%) vs baseline: +4.6% ✅ decode_noaspectTime: ✅ 0.416µs (SLO: <10.000µs 📉 -95.8%) vs baseline: -0.6% Memory: ✅ 40.147MB (SLO: <41.500MB -3.3%) vs baseline: +4.4% ✅ encode_aspectTime: ✅ 0.706µs (SLO: <10.000µs 📉 -92.9%) vs baseline: -0.7% Memory: ✅ 40.167MB (SLO: <41.500MB -3.2%) vs baseline: +4.6% ✅ encode_noaspectTime: ✅ 0.399µs (SLO: <10.000µs 📉 -96.0%) vs baseline: -0.6% Memory: ✅ 40.349MB (SLO: <41.500MB -2.8%) vs baseline: +4.8% ✅ format_aspectTime: ✅ 3.396µs (SLO: <10.000µs 📉 -66.0%) vs baseline: -3.2% Memory: ✅ 40.167MB (SLO: <41.500MB -3.2%) vs baseline: +4.6% ✅ format_map_aspectTime: ✅ 3.527µs (SLO: <10.000µs 📉 -64.7%) vs baseline: -2.9% Memory: ✅ 40.167MB (SLO: <41.500MB -3.2%) vs baseline: +4.0% ✅ format_map_noaspectTime: ✅ 0.773µs (SLO: <10.000µs 📉 -92.3%) vs baseline: -0.3% Memory: ✅ 40.167MB (SLO: <41.500MB -3.2%) vs baseline: +4.1% ✅ format_noaspectTime: ✅ 0.596µs (SLO: <10.000µs 📉 -94.0%) vs baseline: +0.3% Memory: ✅ 40.129MB (SLO: <41.500MB -3.3%) vs baseline: +4.7% ✅ index_aspectTime: ✅ 0.359µs (SLO: <10.000µs 📉 -96.4%) vs baseline: -1.2% Memory: ✅ 40.309MB (SLO: <41.500MB -2.9%) vs baseline: +4.7% ✅ index_noaspectTime: ✅ 0.278µs (SLO: <10.000µs 📉 -97.2%) vs baseline: +0.1% Memory: ✅ 40.149MB (SLO: <41.500MB -3.3%) vs baseline: +4.5% ✅ join_aspectTime: ✅ 1.343µs (SLO: <10.000µs 📉 -86.6%) vs baseline: +2.4% Memory: ✅ 40.251MB (SLO: <41.500MB -3.0%) vs baseline: +4.6% ✅ join_noaspectTime: ✅ 0.487µs (SLO: <10.000µs 📉 -95.1%) vs baseline: -0.5% Memory: ✅ 40.202MB (SLO: <41.500MB -3.1%) vs baseline: +4.6% ✅ ljust_aspectTime: ✅ 2.575µs (SLO: <20.000µs 📉 -87.1%) vs baseline: +2.6% Memory: ✅ 40.169MB (SLO: <41.500MB -3.2%) vs baseline: +4.4% ✅ ljust_noaspectTime: ✅ 0.405µs (SLO: <10.000µs 📉 -95.9%) vs baseline: +0.3% Memory: ✅ 40.251MB (SLO: <41.500MB -3.0%) vs baseline: +4.7% ✅ lower_aspectTime: ✅ 2.307µs (SLO: <10.000µs 📉 -76.9%) vs baseline: +3.0% Memory: ✅ 40.129MB (SLO: <41.500MB -3.3%) vs baseline: +4.3% ✅ lower_noaspectTime: ✅ 0.368µs (SLO: <10.000µs 📉 -96.3%) vs baseline: +0.2% Memory: ✅ 40.247MB (SLO: <41.500MB -3.0%) vs baseline: +4.9% ✅ lstrip_aspectTime: ✅ 2.255µs (SLO: <20.000µs 📉 -88.7%) vs baseline: +2.1% Memory: ✅ 40.344MB (SLO: <41.500MB -2.8%) vs baseline: +5.3% ✅ lstrip_noaspectTime: ✅ 0.382µs (SLO: <10.000µs 📉 -96.2%) vs baseline: +0.7% Memory: ✅ 40.168MB (SLO: <41.500MB -3.2%) vs baseline: +4.1% ✅ modulo_aspectTime: ✅ 1.048µs (SLO: <10.000µs 📉 -89.5%) vs baseline: +0.9% Memory: ✅ 40.147MB (SLO: <41.500MB -3.3%) vs baseline: +4.5% ✅ modulo_aspect_for_bytearray_bytearrayTime: ✅ 1.551µs (SLO: <10.000µs 📉 -84.5%) vs baseline: +0.6% Memory: ✅ 40.158MB (SLO: <41.500MB -3.2%) vs baseline: +4.4% ✅ modulo_aspect_for_bytesTime: ✅ 0.982µs (SLO: <10.000µs 📉 -90.2%) vs baseline: +0.6% Memory: ✅ 40.173MB (SLO: <41.500MB -3.2%) vs baseline: +4.2% ✅ modulo_aspect_for_bytes_bytearrayTime: ✅ 1.248µs (SLO: <10.000µs 📉 -87.5%) vs baseline: -0.6% Memory: ✅ 40.344MB (SLO: <41.500MB -2.8%) vs baseline: +5.0% ✅ modulo_noaspectTime: ✅ 0.625µs (SLO: <10.000µs 📉 -93.7%) vs baseline: -0.2% Memory: ✅ 40.112MB (SLO: <41.500MB -3.3%) vs baseline: +3.9% ✅ replace_aspectTime: ✅ 4.855µs (SLO: <10.000µs 📉 -51.4%) vs baseline: -0.6% Memory: ✅ 40.067MB (SLO: <41.500MB -3.5%) vs baseline: +3.9% ✅ replace_noaspectTime: ✅ 0.462µs (SLO: <10.000µs 📉 -95.4%) vs baseline: -0.2% Memory: ✅ 40.267MB (SLO: <41.500MB -3.0%) vs baseline: +4.8% ✅ repr_aspectTime: ✅ 0.905µs (SLO: <10.000µs 📉 -91.0%) vs baseline: -0.8% Memory: ✅ 40.049MB (SLO: <41.500MB -3.5%) vs baseline: +4.2% ✅ repr_noaspectTime: ✅ 0.425µs (SLO: <10.000µs 📉 -95.8%) vs baseline: +2.2% Memory: ✅ 40.096MB (SLO: <41.500MB -3.4%) vs baseline: +4.2% ✅ rstrip_aspectTime: ✅ 1.948µs (SLO: <20.000µs 📉 -90.3%) vs baseline: +2.5% Memory: ✅ 40.246MB (SLO: <41.500MB -3.0%) vs baseline: +4.4% ✅ rstrip_noaspectTime: ✅ 0.375µs (SLO: <10.000µs 📉 -96.2%) vs baseline: -1.7% Memory: ✅ 40.045MB (SLO: <41.500MB -3.5%) vs baseline: +4.4% ✅ slice_aspectTime: ✅ 0.496µs (SLO: <10.000µs 📉 -95.0%) vs baseline: ~same Memory: ✅ 40.271MB (SLO: <41.500MB -3.0%) vs baseline: +4.6% ✅ slice_noaspectTime: ✅ 0.451µs (SLO: <10.000µs 📉 -95.5%) vs baseline: +1.5% Memory: ✅ 40.230MB (SLO: <41.500MB -3.1%) vs baseline: +4.7% ✅ stringio_aspectTime: ✅ 1.552µs (SLO: <10.000µs 📉 -84.5%) vs baseline: +1.3% Memory: ✅ 40.306MB (SLO: <41.500MB -2.9%) vs baseline: +4.8% ✅ stringio_noaspectTime: ✅ 0.718µs (SLO: <10.000µs 📉 -92.8%) vs baseline: +0.5% Memory: ✅ 40.229MB (SLO: <41.500MB -3.1%) vs baseline: +4.8% ✅ strip_aspectTime: ✅ 2.240µs (SLO: <20.000µs 📉 -88.8%) vs baseline: +1.7% Memory: ✅ 40.128MB (SLO: <41.500MB -3.3%) vs baseline: +4.6% ✅ strip_noaspectTime: ✅ 0.387µs (SLO: <10.000µs 📉 -96.1%) vs baseline: +0.5% Memory: ✅ 40.107MB (SLO: <41.500MB -3.4%) vs baseline: +4.2% ✅ swapcase_aspectTime: ✅ 2.793µs (SLO: <10.000µs 📉 -72.1%) vs baseline: 📈 +13.8% Memory: ✅ 40.147MB (SLO: <41.500MB -3.3%) vs baseline: +3.7% ✅ swapcase_noaspectTime: ✅ 0.538µs (SLO: <10.000µs 📉 -94.6%) vs baseline: +0.6% Memory: ✅ 40.010MB (SLO: <41.500MB -3.6%) vs baseline: +4.2% ✅ title_aspectTime: ✅ 2.441µs (SLO: <10.000µs 📉 -75.6%) vs baseline: +2.7% Memory: ✅ 40.207MB (SLO: <41.500MB -3.1%) vs baseline: +4.9% ✅ title_noaspectTime: ✅ 0.502µs (SLO: <10.000µs 📉 -95.0%) vs baseline: +0.9% Memory: ✅ 40.168MB (SLO: <41.500MB -3.2%) vs baseline: +4.3% ✅ translate_aspectTime: ✅ 3.309µs (SLO: <10.000µs 📉 -66.9%) vs baseline: ~same Memory: ✅ 40.225MB (SLO: <41.500MB -3.1%) vs baseline: +3.7% ✅ translate_noaspectTime: ✅ 1.048µs (SLO: <10.000µs 📉 -89.5%) vs baseline: +0.9% Memory: ✅ 40.182MB (SLO: <41.500MB -3.2%) vs baseline: +4.2% ✅ upper_aspectTime: ✅ 2.299µs (SLO: <10.000µs 📉 -77.0%) vs baseline: +1.0% Memory: ✅ 40.289MB (SLO: <41.500MB -2.9%) vs baseline: +5.0% ✅ upper_noaspectTime: ✅ 0.371µs (SLO: <10.000µs 📉 -96.3%) vs baseline: +0.8% Memory: ✅ 40.068MB (SLO: <41.500MB -3.5%) vs baseline: +4.0% 📈 iastaspectsospath - 24/24✅ ospathbasename_aspectTime: ✅ 5.207µs (SLO: <10.000µs 📉 -47.9%) vs baseline: 📈 +26.2% Memory: ✅ 40.364MB (SLO: <41.000MB 🟡 -1.6%) vs baseline: +4.9% ✅ ospathbasename_noaspectTime: ✅ 1.078µs (SLO: <10.000µs 📉 -89.2%) vs baseline: ~same Memory: ✅ 40.226MB (SLO: <41.000MB 🟡 -1.9%) vs baseline: +4.4% ✅ ospathjoin_aspectTime: ✅ 6.156µs (SLO: <10.000µs 📉 -38.4%) vs baseline: +0.1% Memory: ✅ 40.206MB (SLO: <41.000MB 🟡 -1.9%) vs baseline: +4.4% ✅ ospathjoin_noaspectTime: ✅ 2.281µs (SLO: <10.000µs 📉 -77.2%) vs baseline: +0.2% Memory: ✅ 40.206MB (SLO: <41.000MB 🟡 -1.9%) vs baseline: +4.3% ✅ ospathnormcase_aspectTime: ✅ 3.425µs (SLO: <10.000µs 📉 -65.8%) vs baseline: +1.4% Memory: ✅ 40.167MB (SLO: <41.000MB -2.0%) vs baseline: +4.3% ✅ ospathnormcase_noaspectTime: ✅ 0.574µs (SLO: <10.000µs 📉 -94.3%) vs baseline: +0.8% Memory: ✅ 40.167MB (SLO: <41.000MB -2.0%) vs baseline: +4.6% ✅ ospathsplit_aspectTime: ✅ 4.704µs (SLO: <10.000µs 📉 -53.0%) vs baseline: -1.2% Memory: ✅ 40.285MB (SLO: <41.000MB 🟡 -1.7%) vs baseline: +5.1% ✅ ospathsplit_noaspectTime: ✅ 1.587µs (SLO: <10.000µs 📉 -84.1%) vs baseline: ~same Memory: ✅ 40.324MB (SLO: <41.000MB 🟡 -1.6%) vs baseline: +4.6% ✅ ospathsplitdrive_aspectTime: ✅ 3.622µs (SLO: <10.000µs 📉 -63.8%) vs baseline: +0.7% Memory: ✅ 40.403MB (SLO: <41.000MB 🟡 -1.5%) vs baseline: +5.0% ✅ ospathsplitdrive_noaspectTime: ✅ 0.701µs (SLO: <10.000µs 📉 -93.0%) vs baseline: +0.4% Memory: ✅ 40.226MB (SLO: <41.000MB 🟡 -1.9%) vs baseline: +4.7% ✅ ospathsplitext_aspectTime: ✅ 4.503µs (SLO: <10.000µs 📉 -55.0%) vs baseline: +1.8% Memory: ✅ 40.324MB (SLO: <41.000MB 🟡 -1.6%) vs baseline: +4.7% ✅ ospathsplitext_noaspectTime: ✅ 1.374µs (SLO: <10.000µs 📉 -86.3%) vs baseline: -0.5% Memory: ✅ 40.226MB (SLO: <41.000MB 🟡 -1.9%) vs baseline: +4.4% 🟡 Near SLO Breach (16 suites)🟡 coreapiscenario - 10/10 (1 unstable)
|
cb01154 to
988f625
Compare
Description
https://datadoghq.atlassian.net/browse/PROF-13137
This PR fixes how we check whether a Task is currently on CPU.
Previously, we would look at
task->coro->is_running. Unfortunately, this is only true when the Task's coroutine itself is executing. If the Task's coroutine is awaiting a coroutine that is executing,task->coro->is_runningwill not be true, even though the Task is indeed executing, making us unable to detect on-CPU-ness.To fix this, the PR introduces a new
is_on_cpumethod onTaskInfowhich recursively checks whether the Task's coroutine (or something it awaits) is executing. The result of this method is cached (as recursively checking is somewhat costly and does pointer chasing).Additionally, the code that this method only calls it if we haven't previously seen an on-CPU Task. The reason for that is that there can only be one executing Task at any given time – if we've already seen an on-CPU Task, we can assume all the other ones are not.
Note this PR is the first of a series aiming at fixing the way we sample asyncio Stacks. More will come – this is just a stepping stone.
P403n1x87/echion#196