8371260: Improve scaling of downcalls using MemorySegments allocated with shared arenas. #28575

stooart-mon · 2025-12-01T11:59:38Z

MemorySegments allocated from shared Arena from
java.lang.foreign.Arena.ofShared() have their lifecycle controlled by jdk.internal.foreign.SharedSession. This class ensures that the MemorySegments can't be freed until after a thread has called Arena.close(). This is implemented using a counter that is atomically incremented when used, and decremented when not used, on every invocation of a downcall. While shared Arenas allow any thread to use it and to close it, this tracking has a cost when multiple threads are contended on it. This patch changes the implementation to use multiple counters to reduce contention. sun.nio.ch.IOUtil, java.nio.Buffer and sun.nio.ch.SimpleAsynchronousFileChannelImpl are modified as they have threads releasing the scope different from the ones that allocated them, so a ticket that tracks the counter has to be passed over.

The microbenchmark org.openjdk.bench.java.lang.foreign. CallOverheadConstant.panama_identity_memory_address_shared_3 was used to generate the following results. The scalability was checked on a number of platforms with the JMH parameter "-t" specifying the number of threads. Measurements are in ns/op .

The hardware are the Neoverse-N1, N2, V1 and V2, Intel Xeon 8375c and the AMD Epyc 9654.

Threads	N1	N2	V1	V2	Xeon	Epyc
1	30.88	32.15	33.54	32.82	27.46	8.45
2	142.56	134.48	132.01	131.50	116.68	46.53
4	310.18	282.75	287.59	271.82	251.88	86.11
8	702.02	710.29	736.72	670.63	533.46	194.60
16	1,436.17	1,684.80	1,833.69	1,782.78	1,100.15	827.28
24	2,185.55	2,508.86	2,732.22	2,815.26	1,646.09	1,530.28
32	2,942.48	3,432.84	3,643.64	3,782.23	2,236.81	2,278.52
48	4,466.56	5,174.72	5,401.95	5,621.41	4,926.30	3,026.58

After:

Threads	N1	N2	V1	V2	Xeon	Epyc
1	32.41	32.11	34.43	31.32	27.94	9.82
2	32.64	33.72	35.11	31.30	28.02	9.81
4	32.71	36.84	34.67	31.35	28.12	10.49
8	58.22	31.60	36.87	31.72	47.09	16.52
16	70.15	47.76	52.37	47.26	70.91	14.53
24	77.38	78.14	81.67	71.98	87.20	21.70
32	87.54	98.01	84.73	86.79	109.25	22.65
48	121.54	128.14	120.51	104.35	175.08	26.85

Progress

Change must be properly reviewed (1 review required, with at least 1 Reviewer)
Change must not contain extraneous whitespace
Commit message must refer to an issue

Warning

⚠️ Found trailing period in issue title for 8371260: Improve scaling of downcalls using MemorySegments allocated with shared arenas.

Issue

JDK-8371260: Improve scaling of downcalls using MemorySegments allocated with shared arenas. (Enhancement - P4)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/28575/head:pull/28575
$ git checkout pull/28575

Update a local copy of the PR:
$ git checkout pull/28575
$ git pull https://git.openjdk.org/jdk.git pull/28575/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 28575

View PR using the GUI difftool:
$ git pr show -t 28575

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/28575.diff

Using Webrev

Link to Webrev Comment

…with shared arenas MemorySegments allocated from shared Arena from java.lang.foreign.Arena.ofShared() have their lifecycle controlled by jdk.internal.foreign.SharedSession. This class ensures that the MemorySegments can't be freed until after a thread has called Arena.close(). This is implemented using a counter that is atomically incremented when used, and decremented when not used, on every invocation of a downcall. While shared Arenas allow any thread to use it and to close it, this tracking has a cost when multiple threads are contended on it. This patch changes the implementation to use multiple counters to reduce contention. sun.nio.ch.IOUtil, java.nio.Buffer and sun.nio.ch.SimpleAsynchronousFileChannelImpl are modified as they have threads releasing the scope different from the ones that allocated them, so a ticket that tracks the counter has to be passed over. The microbenchmark org.openjdk.bench.java.lang.foreign. CallOverheadConstant.panama_identity_memory_address_shared_3 was used to generate the following results. The scalability was checked on a number of platforms with the JMH parameter "-t" specifying the number of threads. Measurements are in ns/op . The hardware are the Neoverse-N1, N2, V1 and V2, Intel Xeon 8375c and the AMD Epyc 9654. Threads N1 N2 V1 V2 Xeon Epyc 1 30.88 32.15 33.54 32.82 27.46 8.45 2 142.56 134.48 132.01 131.50 116.68 46.53 4 310.18 282.75 287.59 271.82 251.88 86.11 8 702.02 710.29 736.72 670.63 533.46 194.60 16 1,436.17 1,684.80 1,833.69 1,782.78 1,100.15 827.28 24 2,185.55 2,508.86 2,732.22 2,815.26 1,646.09 1,530.28 32 2,942.48 3,432.84 3,643.64 3,782.23 2,236.81 2,278.52 48 4,466.56 5,174.72 5,401.95 5,621.41 4,926.30 3,026.58 After Threads N1 N2 V1 V2 Xeon Epyc 1 32.41 32.11 34.43 31.32 27.94 9.82 2 32.64 33.72 35.11 31.30 28.02 9.81 4 32.71 36.84 34.67 31.35 28.12 10.49 8 58.22 31.60 36.87 31.72 47.09 16.52 16 70.15 47.76 52.37 47.26 70.91 14.53 24 77.38 78.14 81.67 71.98 87.20 21.70 32 87.54 98.01 84.73 86.79 109.25 22.65 48 121.54 128.14 120.51 104.35 175.08 26.85

bridgekeeper · 2025-12-01T12:01:13Z

👋 Welcome back smonteith! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

openjdk · 2025-12-01T12:02:00Z

❗ This change is not yet ready to be integrated.
See the Progress checklist in the description for automated requirements.

openjdk · 2025-12-01T12:02:53Z

@stooart-mon The following labels will be automatically applied to this pull request:

core-libs
net
nio
security

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command.

mlbridge · 2025-12-01T12:07:01Z

Webrevs

00: Full (3f26a5b6)

AlanBateman · 2025-12-01T13:13:13Z

Can you confirm that you aren't planning to try to try to integrate this before the fork for JDK 26 this week? There are several discussion points, and some of the changes are in risky areas, will likely need back time in main line.

AlanBateman · 2025-12-01T13:11:43Z

src/java.base/windows/classes/sun/nio/ch/WindowsAsynchronousFileChannelImpl.java

        private final long position;    // file position
        private final PendingFuture<Integer,A> result;
        private volatile boolean released;
+        private int ticket;             // to release buffer scope


This is for substitution, so needs to be declared with buf (further down), and needs to be volatile (can't use plain access as it may be released on different thread).

stooart-mon · 2025-12-01T13:46:17Z

Can you confirm that you aren't planning to try to try to integrate this before the fork for JDK 26 this week? There are several discussion points, and some of the changes are in risky areas, will likely need back time in main line.

I am not planning, and wouldn't want to integrate this just before a fork. While I've checked it the best I can, I'd appreciate some scrutiny.

mcimadamore · 2025-12-01T18:24:07Z

Thanks for working on this problem.

At the high level I understand your goal here. It is a bit unfortunate that we have to change the API for acquire/release which cascades in many smaller (but simple) changes throughout the JDK. But let's set that aside.

The main (and more serious) issue with this PR is the decoupling of the liveness bit from the ref counting part. This is something that will require some time to work through to ensure correctness is preserved in all cases.

More generally, it is generally advisable for such deep changes to perhaps reach out to panama-dev mailing list first, and maybe have a discussion/reach some consensus there, before moving ahead with a PR.

stooart-mon · 2025-12-01T19:03:50Z

Thanks for working on this problem.

At the high level I understand your goal here. It is a bit unfortunate that we have to change the API for acquire/release which cascades in many smaller (but simple) changes throughout the JDK. But let's set that aside.
Yes, that was regrettable. Originally I retained the "void release0()" that recalculated the "ticket", which resulted in fewer changes. However, it was felt the explicit ticket passing was less likely to introduce errors.

The main (and more serious) issue with this PR is the decoupling of the liveness bit from the ref counting part. This is something that will require some time to work through to ensure correctness is preserved in all cases.

Yes, I had been grappling with the atomicity. The "acquireCount" and "state" are decoupled in the most recent revision of the, so this closing should either fail and not update state, or pass and complete the shutdown. However, the "state" and ScopedAccesses are complex, so I appreciate the scrutiny in that area.

More generally, it is generally advisable for such deep changes to perhaps reach out to panama-dev mailing list first, and maybe have a discussion/reach some consensus there, before moving ahead with a PR.

That was my mistake - I expected that with the foreign ABI being merged, discussions on internals would have reverted to here.

mcimadamore · 2025-12-02T17:51:21Z

That was my mistake - I expected that with the foreign ABI being merged, discussions on internals would have reverted to here.

In general they are -- but I find that one issue with PRs is that they tend to be very code-centric, whereas in some cases it might be useful to discuss design options more abstractly and then converge towards an implementation. And doing design discussions via PR is suboptimal. But, you did right in bringing this up!

openjdk bot changed the title ~~8371260: Improve scaling of downcalls using MemorySegments allocated with shared arenas~~ 8371260: Improve scaling of downcalls using MemorySegments allocated with shared arenas. Dec 1, 2025

openjdk bot added security [email protected] nio [email protected] core-libs [email protected] net [email protected] labels Dec 1, 2025

openjdk bot added the rfr Pull request is ready for review label Dec 1, 2025

AlanBateman reviewed Dec 1, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

8371260: Improve scaling of downcalls using MemorySegments allocated with shared arenas. #28575

8371260: Improve scaling of downcalls using MemorySegments allocated with shared arenas. #28575

Uh oh!

stooart-mon commented Dec 1, 2025 •

edited by openjdk bot

Loading

Uh oh!

bridgekeeper bot commented Dec 1, 2025

Uh oh!

openjdk bot commented Dec 1, 2025

Uh oh!

openjdk bot commented Dec 1, 2025

Uh oh!

mlbridge bot commented Dec 1, 2025

Uh oh!

AlanBateman commented Dec 1, 2025

Uh oh!

AlanBateman Dec 1, 2025

Uh oh!

stooart-mon commented Dec 1, 2025

Uh oh!

mcimadamore commented Dec 1, 2025

Uh oh!

stooart-mon commented Dec 1, 2025

Uh oh!

mcimadamore commented Dec 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

3 participants

8371260: Improve scaling of downcalls using MemorySegments allocated with shared arenas. #28575

Are you sure you want to change the base?

8371260: Improve scaling of downcalls using MemorySegments allocated with shared arenas. #28575

Uh oh!

Conversation

stooart-mon commented Dec 1, 2025 • edited by openjdk bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Progress

Warning

Issue

Reviewing

Uh oh!

bridgekeeper bot commented Dec 1, 2025

Uh oh!

openjdk bot commented Dec 1, 2025

Uh oh!

openjdk bot commented Dec 1, 2025

Uh oh!

mlbridge bot commented Dec 1, 2025

Webrevs

Uh oh!

AlanBateman commented Dec 1, 2025

Uh oh!

AlanBateman Dec 1, 2025

Choose a reason for hiding this comment

Uh oh!

stooart-mon commented Dec 1, 2025

Uh oh!

mcimadamore commented Dec 1, 2025

Uh oh!

stooart-mon commented Dec 1, 2025

Uh oh!

mcimadamore commented Dec 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

3 participants

stooart-mon commented Dec 1, 2025 •

edited by openjdk bot

Loading