runtime/pprof: add debug=3 goroutine profile with goid and labels #9

dt · 2025-10-07T14:58:40Z

This adds a new goroutine profile debug mode, debug=3.

This mode emits in the same binary proto output format as debug=0, with the only difference being that it does not aggregate matching stack/label combinations into a single count, and instead mits a sample per goroutine with additional synthesized labels to communicate some of the details of each goroutine, specifically:

go::goroutine: the goroutine's ID
go::goroutine_created_by: ID of the goroutine's creator (if any)
go::goroutine_state: current state of the goroutine (e.g. runnable)
go::goroutine_wait_minutes: approximate minutes goroutine has spent waiting (if applicable)

These are synthesized as labels, side-by-side with any user-set labels thus the 'go::' prefix to namespace these synthetic labels somewhat. We aim to minimize collisions with user-set labels as while the format does allow for multi-valued labels, meaning a collision would not immediately overwrite a user-set label, the upstream proto does warn that not all tools handle multi-valued labels well, so the namespace prefix aims to avoid them if possible. The form 'go::' is chosen as the upstream proto reserves 'pprof::', which seems to establish this format as a convention.

Previously the debug=2 mode was the only way to get this kind of per-goroutine information that is sometimes vital to understanding the state of a process. However debug=2 had two major drawbacks:

its collection incurs a potentially lengthy and disruptive
stop-the-world pause due to it not (and perhaps being unable to)
utilizing the concurrent collection mechanism used by debug=0/1, and
it does not (currently) include user-set labels in the same profile.

This new mode that uses the same concurrent collection mechanism as 0/1, with its minimal STW penalty, but also includes the per-goroutine details that make debug=2 so useful. Changing debug=2, which already included this per-goroutine detail, in-place to use the concurrent collection mechanism could avoid introducing a new mode, but it is not clear that this is possible with breaking changes to its output: debug=2 includes argument values in its output would would require collection in both a concurrent and consistent manner. Thus the use of a new mode instead.

The difference in service latency observed by running goroutines during profile collection is demonstrated by BenchmarkGoroutineProfileLatencyImpact:

                       │       debug=2  │                   debug=3             │
                       │ max_latency_ns │ max_latency_ns  vs base               │
goroutines=100x3-14        508.4k ± 17%    166.7k ±  45%  -67.21% (p=0.002 n=6)
goroutines=100x10-14       727.4k ±  9%    176.3k ±  34%  -75.76% (p=0.002 n=6)
goroutines=100x50-14      1979.1k ±  6%    201.7k ±  18%  -89.81% (p=0.002 n=6)
goroutines=1000x3-14      6454.9k ±  3%    235.6k ±  36%  -96.35% (p=0.002 n=6)
goroutines=1000x10-14    10620.8k ± 47%    259.8k ± 165%  -97.55% (p=0.002 n=6)
goroutines=1000x50-14    35170.8k ±  1%    228.7k ± 108%  -99.35% (p=0.002 n=6)
goroutines=10000x3-14     63.836M ± 49%    1.961M ± 470%  -96.93% (p=0.002 n=6)
goroutines=10000x10-14    58.962M ± 82%    4.073M ± 134%  -93.09% (p=0.002 n=6)
goroutines=10000x50-14   259.723M ± 33%    4.730M ±  91%  -98.18% (p=0.002 n=6)
geomean                                                  10.62M          527.0k         -95.04%

dt · 2025-10-07T15:05:35Z

This PR is against master, rather than the CRDB branch, so mostly just want review here for content and approach, to arrive at a patch we think could potentially be, whether we submit it or not, eligible for submission as a CL upstream.

If/when we are happy that this is good current/future go versions, then I'll go back and rebase it against our current go fork branches, just since there is some churn in the area to handle (a recent change added a new 'leak' profile upstream) and I'd rather only do that once, after we arrive at what we think is the correct patch going forward/for upstream.

Copilot

Pull Request Overview

This PR introduces a new debug=3 mode for goroutine profiles that provides detailed per-goroutine information (ID, creator, state, wait time) in binary proto format while using the efficient concurrent collection mechanism. The key advantage is dramatically reduced latency impact on running goroutines compared to the existing debug=2 mode.

Key changes:

Adds debug=3 goroutine profile mode with per-goroutine metadata labels
Refactors goroutine profiling to support both aggregated and non-aggregated collection
Improves waitsince timing precision by delaying reset until after profile collection

Reviewed Changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
src/runtime/traceback.go	Extracts goroutine status/wait time logic into reusable functions for profile consumption
src/runtime/proc.go	Adjusts waitsince field reset timing to preserve accurate wait duration during profiling
src/runtime/pprof/proto.go	Adds pbLabelNum helper and updates parameter naming for label encoding
src/runtime/pprof/pprof_test.go	Adds comprehensive tests and benchmarks for the new debug=3 mode
src/runtime/pprof/pprof.go	Implements debug=3 mode logic and refactors profile interfaces to support goroutine metadata
src/runtime/mprof.go	Updates goroutine profiling to collect and store per-goroutine metadata
src/internal/profilerecord/profilerecord.go	Adds GoroutineRecord type and interface methods for unified profile handling

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

src/runtime/pprof/pprof.go

This adds a new goroutine profile debug mode, debug=3. This mode emits in the same binary proto output format as debug=0, with the only difference being that it does not aggregate matching stack/label combinations into a single count, and instead emits a sample per goroutine with additional synthesized labels to communicate some of the details of each goroutine, specifically: - go::goroutine: the goroutine's ID - go::goroutine_created_by: ID of the goroutine's creator (if any) - go::goroutine_state: current state of the goroutine (e.g. runnable) - go::goroutine_wait_minutes: approximate minutes goroutine has spent waiting (if applicable) These are synthesized as labels, side-by-side with any user-set labels thus the 'go::' prefix to namespace these synthetic labels somewhat. We aim to minimize collisions with user-set labels as while the format does allow for multi-valued labels, meaning a collision would not immediately overwrite a user-set label, the upstream proto does warn that not all tools handle multi-valued labels well, so the namespace prefix aims to avoid them if possible. The form 'go::' is chosen as the upstream proto reserves 'pprof::', which seems to establish this format as a convention. Previously the debug=2 mode was the only way to get this kind of per-goroutine information that is sometimes vital to understanding the state of a process. However debug=2 had two major drawbacks: 1) its collection incurs a potentially lengthy and disruptive stop-the-world pause due to it not (and perhaps being unable to) utilizing the concurrent collection mechanism used by debug=0/1, and 2) it does not (currently) include user-set labels in the same profile. This new mode that uses the same concurrent collection mechanism as 0/1, with its minimal STW penalty, but also includes the per-goroutine details that make debug=2 so useful. Changing debug=2, which already included this per-goroutine detail, in-place to use the concurrent collection mechanism could avoid introducing a new mode, but it is not clear that this is possible with breaking changes to its output: debug=2 includes argument _values_ in its output would require collection in both a concurrent and consistent manner. Thus the use of a new mode instead. The difference in service latency observed by running goroutines during profile collection is demonstrated by BenchmarkGoroutineProfileLatencyImpact: │ debug=2 │ debug=3 │ │ max_latency_ns │ max_latency_ns vs base │ goroutines=100x3-14 422.2k ± 13% 190.3k ± 38% -54.93% (p=0.002 n=6) goroutines=100x10-14 619.7k ± 10% 171.1k ± 43% -72.38% (p=0.002 n=6) goroutines=100x50-14 1423.6k ± 7% 174.3k ± 44% -87.76% (p=0.002 n=6) goroutines=1000x3-14 2424.8k ± 8% 298.6k ± 106% -87.68% (p=0.002 n=6) goroutines=1000x10-14 7378.4k ± 2% 268.2k ± 146% -96.36% (p=0.002 n=6) goroutines=1000x50-14 23372.5k ± 10% 330.1k ± 173% -98.59% (p=0.002 n=6) goroutines=10000x3-14 42.802M ± 47% 1.991M ± 105% -95.35% (p=0.002 n=6) goroutines=10000x10-14 36668.2k ± 95% 743.1k ± 72% -97.97% (p=0.002 n=6) goroutines=10000x50-14 120639.1k ± 2% 188.2k ± 2582% -99.84% (p=0.002 n=6) geomean 6.760M 326.2k -95.18%

stevendanna

Overall I really like this idea. I gave the code an initial read through, but I'm not familiar enough with the go pprof internals to offer much more than superficial comments yet.

Is there anything you think deserves special attention and/or a second look?

dt · 2025-10-07T20:27:19Z

Is there anything you think deserves special attention and/or a second look?

If I had to pick two things:
a) I moved waitsince around a little in exitsyscall. A second pair of eyes on the function -- in is entirety -- would be nice to double-check my change is sound and sufficient.
b) Am I outputting enough/the right stuff in printCountProfile? I'm not actually emitting creation location, just creator goid, since I don't know what I'd need creation location for, but it is in debug=2. Similarly I think there are more fields in goroutiune header for debug=2 than just status/waitreason these days (scan, leaked, bubble, etc) but I don't know if those are "status" and if we need them here. I think they're mostly there for traceback to show in a crash and only make it into "goroutine profiles" since debug=2 shares code with traceback.

dt requested review from RaduBerinde, jeffswenson and stevendanna October 7, 2025 14:58

dt requested a review from Copilot October 7, 2025 15:06

dt force-pushed the debug3 branch from 45c633d to d229162 Compare October 7, 2025 15:07

Copilot AI reviewed Oct 7, 2025

View reviewed changes

src/runtime/pprof/pprof.go Show resolved Hide resolved

src/runtime/pprof/pprof.go Show resolved Hide resolved

src/runtime/pprof/pprof.go Show resolved Hide resolved

dt mentioned this pull request Oct 7, 2025

runtime/pprof: add debug=26257 goroutine profile with labels and reduced STW #5

Closed

dt force-pushed the debug3 branch from d229162 to ed6e385 Compare October 7, 2025 17:00

dt force-pushed the debug3 branch from ed6e385 to 3ae16d4 Compare October 7, 2025 17:40

stevendanna reviewed Oct 7, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

runtime/pprof: add debug=3 goroutine profile with goid and labels #9

runtime/pprof: add debug=3 goroutine profile with goid and labels #9

Uh oh!

dt commented Oct 7, 2025 •

edited

Loading

Uh oh!

dt commented Oct 7, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

stevendanna left a comment

Uh oh!

dt commented Oct 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

runtime/pprof: add debug=3 goroutine profile with goid and labels #9

Are you sure you want to change the base?

runtime/pprof: add debug=3 goroutine profile with goid and labels #9

Uh oh!

Conversation

dt commented Oct 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dt commented Oct 7, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

stevendanna left a comment

Choose a reason for hiding this comment

Uh oh!

dt commented Oct 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dt commented Oct 7, 2025 •

edited

Loading