Skip to content

Conversation

@benoitmaillard
Copy link
Contributor

@benoitmaillard benoitmaillard commented Oct 9, 2025

This PR prevents the C2 compiler from hitting memory limits during compilation when using -XX:+StressLoopPeeling and -XX:+VerifyLoopOptimizations in certain edge cases. The fix addresses an issue where the ciEnv arena grows uncontrollably due to the high number of verification passes, a complex IR graph, and repeated field accesses leading to unnecessary memory allocations.

Analysis

This issue was initially detected with the fuzzer. The original test from the fuzzer was reduced
and added to this PR as a regression test.

The test contains a switch inside a loop, and stressing the loop peeling results in
a fairly complex graph. The split-if optimization is applied agressively, and we
run a verification pass at every progress made.

We end up with a relatively high number of verification passes, with each pass being
fairly expensive because of the size of the graph.
Each verification pass requires building a new IdealLoopTree. This is quite slow
(which is unfortunately hard to mitigate), and also causes inefficient memory usage
on the ciEnv arena.

The inefficient usages are caused by the ciInstanceKlass::get_field_by_offset method.
At every call, we have

  • One allocation on the ciEnv arena to store the returned ciField
  • The constructor of ciField results in a call to ciObjectFactory::get_symbol, which:
    • Allocates a new ciSymbol on the ciEnv arena at every call (when not found in vmSymbols)
    • Pushes the new symbol to the _symbols array

The ciEnv objects returned by ciInstanceKlass::get_field_by_offset are only used once, to
check if the BasicType of a static field is a reference type.

In ciObjectFactory, the _symbols array ends up containg a large number of duplicates for certain symbols
(up to several millions), which hints at the fact that ciObjectFactory::get_symbol should not be called
repeatedly as it is done here.

The stack trace of how we get to the ciInstanceKlass::get_field_by_offset is shown below:

ciInstanceKlass::get_field_by_offset ciInstanceKlass.cpp:412
TypeOopPtr::TypeOopPtr type.cpp:3484
TypeInstPtr::TypeInstPtr type.cpp:3953
TypeInstPtr::make type.cpp:3990
TypeInstPtr::add_offset type.cpp:4509
AddPNode::bottom_type addnode.cpp:696
MemNode::adr_type memnode.cpp:73
PhaseIdealLoop::get_late_ctrl_with_anti_dep loopnode.cpp:6477
PhaseIdealLoop::get_late_ctrl loopnode.cpp:6439
PhaseIdealLoop::build_loop_late_post_work loopnode.cpp:6827
PhaseIdealLoop::build_loop_late_post loopnode.cpp:6715
PhaseIdealLoop::build_loop_late loopnode.cpp:6660
PhaseIdealLoop::build_and_optimize loopnode.cpp:5093
PhaseIdealLoop::PhaseIdealLoop loopnode.hpp:1209
PhaseIdealLoop::verify loopnode.cpp:5336
...

Because the ciEnv arena is not fred up between verification passes, it quickly fills up and hits
the memory limit after about 30s of execution in this case.

Proposed fix

As explained in the previous section, the only point of the ciInstanceKlass::get_field_by_offset
call is to obtain the BasicType of the field. By inspecting carefully what this method does,
we notice that the field descriptor fd already contains the type information we need.
We do not actually need all the information embedded in the ciField object.

ciField* ciInstanceKlass::get_field_by_offset(int field_offset, bool is_static) {
  if (!is_static) {
    for (int i = 0, len = nof_nonstatic_fields(); i < len; i++) {
      ciField* field = _nonstatic_fields->at(i);
      int  field_off = field->offset_in_bytes();
      if (field_off == field_offset)
        return field;
    }
    return nullptr;
  }
  VM_ENTRY_MARK;
  InstanceKlass* k = get_instanceKlass();
  fieldDescriptor fd;
  if (!k->find_field_from_offset(field_offset, is_static, &fd)) {
    return nullptr;
  }
  ciField* field = new (CURRENT_THREAD_ENV->arena()) ciField(&fd);
  return field;
}

Hence we can simply create a more specialized version of ciInstanceKlass::get_field_type_by_offset
that directly returns the BasicType without creating the ciField. This happens to
avoid the three memory allocations mentioned before.

After this change, the memory usage of the ciEnv arena stays constant across verification
passes.

Testing

  • Added test obtained from the fuzzer (and reduced with c-reduce)
  • GitHub Actions
  • tier1-3, plus some internal testing

Thank you for reviewing!


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8366990: C2: Compilation hits the memory limit when verifying loop opts in Split-If code (Bug - P4)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/27731/head:pull/27731
$ git checkout pull/27731

Update a local copy of the PR:
$ git checkout pull/27731
$ git pull https://git.openjdk.org/jdk.git pull/27731/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 27731

View PR using the GUI difftool:
$ git pr show -t 27731

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/27731.diff

Using Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Oct 9, 2025

👋 Welcome back bmaillard! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Oct 9, 2025

❗ This change is not yet ready to be integrated.
See the Progress checklist in the description for automated requirements.

@openjdk openjdk bot changed the title 8366990 8366990: C2: Compilation stuck when verifying loop opts in Split-If code Oct 9, 2025
@openjdk
Copy link

openjdk bot commented Oct 9, 2025

@benoitmaillard The following label will be automatically applied to this pull request:

  • hotspot-compiler

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@benoitmaillard benoitmaillard changed the title 8366990: C2: Compilation stuck when verifying loop opts in Split-If code 8366990: C2: Compilation hits the memory limit when verifying loop opts in Split-If code Oct 10, 2025
@benoitmaillard benoitmaillard marked this pull request as ready for review October 10, 2025 12:38
@openjdk openjdk bot added the rfr Pull request is ready for review label Oct 10, 2025
@mlbridge
Copy link

mlbridge bot commented Oct 10, 2025

Copy link
Member

@chhagedorn chhagedorn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice summary and solution! I have a few comments but otherwise, the fix looks good to me.

I guess it's a discussion for another time if we also want to improve the verification time somehow. But that should not block this PR.

* -XX:CompileCommand=compileonly,compiler.loopopts.TestVerifyLoopOptimizationsHitsMemLimit::test
* -XX:-TieredCompilation -Xcomp -XX:CompileCommand=dontinline,*::*
* -XX:+StressLoopPeeling -XX:PerMethodTrapLimit=0 -XX:+VerifyLoopOptimizations
* -XX:StressSeed=1870557292
Copy link
Member

@chhagedorn chhagedorn Oct 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to remove the stress seed since it might not trigger anymore in later builds. Usually, we add a run with a fixed stress seed and one without but since this test requires to do just some verification work, I would suggest to not add two runs but only one without fixed seed.

Another question: How close are we to hit the default the memory limit with this test? With your fix it probably consumes not much memory anymore. I therefore suggest to add MemLimit as additional flag with a much smaller value to be sure that your fix works as expected (you might need to check how low we can choose the limit to not run into problems in higher tiers).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was able to reduce the test further using a memory limit of 100M (approximately 10 times less than the default) and a shorter timeout with creduce. Compilation of the new test method with a fast debug build now takes an average of 1.22 s over 100 runs according to -XX:+CITime.
With the decrease compilation time I think it now reasonable to have two runs (one with the stress seed, one without). Let me know if you think otherwise!

@benoitmaillard
Copy link
Contributor Author

benoitmaillard commented Oct 15, 2025

I have made the following (significant) changes that are ready for review:

  • Replaced the test method with a further reduced version that now takes a little more than one second compared to ~40s previously
  • Added a second run without a fixed stress seed (as the compilation is now fast enough)
  • Added a memory limit of 100M

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

hotspot-compiler [email protected] rfr Pull request is ready for review

Development

Successfully merging this pull request may close these issues.

6 participants