Skip to content

fix: prevent double-close inflating zstd pool release counters#330

Merged
EdSchouten merged 3 commits intobuildbarn:mainfrom
aron-muon:aron/pool-closure-fix
Mar 12, 2026
Merged

fix: prevent double-close inflating zstd pool release counters#330
EdSchouten merged 3 commits intobuildbarn:mainfrom
aron-muon:aron/pool-closure-fix

Conversation

@aron-muon
Copy link
Contributor

@aron-muon aron-muon commented Mar 12, 2026

Summary

Hey @EdSchouten, firstly I want to say that overall in our testing, the changes from #327 have been working very well! A tremendous thank you for taking your time to work with me and refine this approach into a very nice technical solution 🥳

image

After integrating with the new metrics available, I've noticed that the observed releases did not match the observed acquisitions. This PR introduces a patch fix for this issue.

  • Remove redundant defer zstdReader.Close() in writeZstd(), the buffer takes ownership of the reader via NewCASBufferFromReader and closes it, so the defer was a second close
  • Add double-close guards to metricsEncoder.Close() and metricsDecoder.Close() as a safety net, matching the existing pattern in pooledEncoder/pooledDecoder

The writeZstd() path double-closes the decoder: once when the buffer consumes and closes the reader, and again via defer. Since metricsDecoder.Close() lacks double-close protection (unlike the underlying pooledDecoder), each compressed ByteStream write increments releases_total an extra time. Over time this causes releases_total to permanently exceed acquisitions_total (observed: 336,865 releases vs 333,001 acquisitions = 3,864 excess, corresponding to the number of compressed ByteStream writes).

Test plan

  • bazel test //pkg/zstd/... //pkg/blobstore/grpcservers/... — all pass
  • bazel test //... — all 30 tests pass
  • Verify releases_total no longer exceeds acquisitions_total after deployment

In writeZstd(), the zstdReader was closed twice: once by the buffer
(which takes ownership via NewCASBufferFromReader) and again by
defer zstdReader.Close(). Since metricsDecoder.Close() had no
double-close guard — unlike the underlying pooledDecoder — each
duplicate close incremented releases_total without a corresponding
acquisition, causing releases to permanently exceed acquisitions.

Remove the redundant defer and add double-close protection to both
metricsEncoder and metricsDecoder as a safety net.
@aron-muon aron-muon marked this pull request as draft March 12, 2026 11:49
@aspect-workflows
Copy link

aspect-workflows bot commented Mar 12, 2026

Test

2 test targets passed

Targets
//pkg/blobstore/grpcservers:grpcservers_test [k8-fastbuild]                          172ms
//pkg/blobstore/sharding/integration:integration_test [k8-fastbuild]                 43ms

Total test execution time was 215ms. 28 tests (93.3%) were fully cached saving 6s.

@aron-muon
Copy link
Contributor Author

Visualizing the issue further:
image

Per review feedback: set Encoder/Decoder to nil on close instead of
using a separate bool field. This prevents calling any methods on the
object after close and matches the pattern in pooledEncoder/pooledDecoder.
@aron-muon aron-muon requested a review from EdSchouten March 12, 2026 13:07
Per review feedback: preserve the gRPC status code from
Pool.NewDecoder() (e.g. codes.Canceled) instead of hardcoding
codes.ResourceExhausted via status.Errorf.
@aron-muon aron-muon requested a review from EdSchouten March 12, 2026 13:48
@EdSchouten EdSchouten merged commit d172b5f into buildbarn:main Mar 12, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants