-
Notifications
You must be signed in to change notification settings - Fork 6.2k
8371260: Improve scaling of downcalls using MemorySegments allocated with shared arenas. #28575
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
…with shared arenas
MemorySegments allocated from shared Arena from
java.lang.foreign.Arena.ofShared() have their lifecycle controlled by
jdk.internal.foreign.SharedSession. This class ensures that the
MemorySegments can't be freed until after a thread has called
Arena.close(). This is implemented using a counter that is atomically
incremented when used, and decremented when not used, on every invocation
of a downcall. While shared Arenas allow any thread to use it and to
close it, this tracking has a cost when multiple threads are contended
on it. This patch changes the implementation to use multiple counters to
reduce contention. sun.nio.ch.IOUtil, java.nio.Buffer and
sun.nio.ch.SimpleAsynchronousFileChannelImpl are modified as they have
threads releasing the scope different from the ones that allocated them,
so a ticket that tracks the counter has to be passed over.
The microbenchmark org.openjdk.bench.java.lang.foreign.
CallOverheadConstant.panama_identity_memory_address_shared_3
was used to generate the following results. The scalability was checked
on a number of platforms with the JMH parameter "-t" specifying the
number of threads. Measurements are in ns/op .
The hardware are the Neoverse-N1, N2, V1 and V2, Intel Xeon 8375c and the
AMD Epyc 9654.
Threads N1 N2 V1 V2 Xeon Epyc
1 30.88 32.15 33.54 32.82 27.46 8.45
2 142.56 134.48 132.01 131.50 116.68 46.53
4 310.18 282.75 287.59 271.82 251.88 86.11
8 702.02 710.29 736.72 670.63 533.46 194.60
16 1,436.17 1,684.80 1,833.69 1,782.78 1,100.15 827.28
24 2,185.55 2,508.86 2,732.22 2,815.26 1,646.09 1,530.28
32 2,942.48 3,432.84 3,643.64 3,782.23 2,236.81 2,278.52
48 4,466.56 5,174.72 5,401.95 5,621.41 4,926.30 3,026.58
After
Threads N1 N2 V1 V2 Xeon Epyc
1 32.41 32.11 34.43 31.32 27.94 9.82
2 32.64 33.72 35.11 31.30 28.02 9.81
4 32.71 36.84 34.67 31.35 28.12 10.49
8 58.22 31.60 36.87 31.72 47.09 16.52
16 70.15 47.76 52.37 47.26 70.91 14.53
24 77.38 78.14 81.67 71.98 87.20 21.70
32 87.54 98.01 84.73 86.79 109.25 22.65
48 121.54 128.14 120.51 104.35 175.08 26.85
|
👋 Welcome back smonteith! A progress list of the required criteria for merging this PR into |
|
❗ This change is not yet ready to be integrated. |
|
@stooart-mon The following labels will be automatically applied to this pull request:
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command. |
|
Can you confirm that you aren't planning to try to try to integrate this before the fork for JDK 26 this week? There are several discussion points, and some of the changes are in risky areas, will likely need back time in main line. |
| private final long position; // file position | ||
| private final PendingFuture<Integer,A> result; | ||
| private volatile boolean released; | ||
| private int ticket; // to release buffer scope |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is for substitution, so needs to be declared with buf (further down), and needs to be volatile (can't use plain access as it may be released on different thread).
I am not planning, and wouldn't want to integrate this just before a fork. While I've checked it the best I can, I'd appreciate some scrutiny. |
|
Thanks for working on this problem. At the high level I understand your goal here. It is a bit unfortunate that we have to change the API for acquire/release which cascades in many smaller (but simple) changes throughout the JDK. But let's set that aside. The main (and more serious) issue with this PR is the decoupling of the liveness bit from the ref counting part. This is something that will require some time to work through to ensure correctness is preserved in all cases. More generally, it is generally advisable for such deep changes to perhaps reach out to panama-dev mailing list first, and maybe have a discussion/reach some consensus there, before moving ahead with a PR. |
Yes, I had been grappling with the atomicity. The "acquireCount" and "state" are decoupled in the most recent revision of the, so this closing should either fail and not update state, or pass and complete the shutdown. However, the "state" and ScopedAccesses are complex, so I appreciate the scrutiny in that area.
That was my mistake - I expected that with the foreign ABI being merged, discussions on internals would have reverted to here. |
In general they are -- but I find that one issue with PRs is that they tend to be very code-centric, whereas in some cases it might be useful to discuss design options more abstractly and then converge towards an implementation. And doing design discussions via PR is suboptimal. But, you did right in bringing this up! |
MemorySegments allocated from shared Arena from
java.lang.foreign.Arena.ofShared() have their lifecycle controlled by jdk.internal.foreign.SharedSession. This class ensures that the MemorySegments can't be freed until after a thread has called Arena.close(). This is implemented using a counter that is atomically incremented when used, and decremented when not used, on every invocation of a downcall. While shared Arenas allow any thread to use it and to close it, this tracking has a cost when multiple threads are contended on it. This patch changes the implementation to use multiple counters to reduce contention. sun.nio.ch.IOUtil, java.nio.Buffer and sun.nio.ch.SimpleAsynchronousFileChannelImpl are modified as they have threads releasing the scope different from the ones that allocated them, so a ticket that tracks the counter has to be passed over.
The microbenchmark org.openjdk.bench.java.lang.foreign. CallOverheadConstant.panama_identity_memory_address_shared_3 was used to generate the following results. The scalability was checked on a number of platforms with the JMH parameter "-t" specifying the number of threads. Measurements are in ns/op .
The hardware are the Neoverse-N1, N2, V1 and V2, Intel Xeon 8375c and the AMD Epyc 9654.
After:
Progress
Warning
8371260: Improve scaling of downcalls using MemorySegments allocated with shared arenas.Issue
Reviewing
Using
gitCheckout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/28575/head:pull/28575$ git checkout pull/28575Update a local copy of the PR:
$ git checkout pull/28575$ git pull https://git.openjdk.org/jdk.git pull/28575/headUsing Skara CLI tools
Checkout this PR locally:
$ git pr checkout 28575View PR using the GUI difftool:
$ git pr show -t 28575Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/28575.diff
Using Webrev
Link to Webrev Comment