Skip to content

Conversation

@rwestrel
Copy link
Contributor

@rwestrel rwestrel commented Dec 1, 2025

For this failure memory stats are:

Total Usage: 1095525816 
    --- Arena Usage by Arena Type and compilation phase, at arena usage peak of 1095525816 ---
        Phase                         Total        ra      node      comp      type    states   reglive  regsplit   regmask superword     cienv        ha     other
        none                        5976032    331560   5402064    197512     33712     10200         0         0       984         0         0         0         0
        parse                       2716464     65456   1145480    196408   1112752         0         0         0         0         0    196368         0         0
        optimizer                     98184         0     32728         0     65456         0         0         0         0         0         0         0         0
        connectionGraph               32728         0         0     32728         0         0         0         0         0         0         0         0         0
        iterGVN                       32728         0     32728         0         0         0         0         0         0         0         0         0         0
        idealLoop                 918189632         0  38687056 872824784    392776         0         0         0         0         0   6285016         0         0
        idealLoopVerify             2228144         0         0   2228144         0         0         0         0         0         0         0         0         0
        macroExpand                   32728         0     32728         0         0         0         0         0         0         0         0         0         0
        graphReshape                  32728         0     32728         0         0         0         0         0         0         0         0         0         0
        matcher                    20135944   3369848   9033208   7536400     65456    131032         0         0         0         0         0         0         0
        postselect_cleanup           294872    294872         0         0         0         0         0         0         0         0         0         0         0
        scheduler                    752944    196488    556456         0         0         0         0         0         0         0         0         0         0
        regalloc                     388736    388736         0         0         0         0         0         0         0         0         0         0         0
        ctorChaitin                  160032    160032         0         0         0         0         0         0         0         0         0         0         0
        regAllocSplit               4189544     32728   4156816         0         0         0         0         0         0         0         0         0         0
        postAllocCopyRemoval          65456         0     65456         0         0         0         0         0         0         0         0         0         0
        fixupSpills                   32728         0     32728         0         0         0         0         0         0         0         0         0         0
        chaitinCoalesce1            1505808    262144   1243664         0         0         0         0         0         0         0         0         0         0
        output                    138300376 138300376         0         0         0         0         0         0         0         0         0         0         0
        shorten branches             360008    196368    163640         0         0         0         0         0         0         0         0         0         0

The noticeable line is:

        idealLoop                 918189632         0  38687056 872824784    392776         0         0         0         0         0   6285016         0         0

A lot of memory (almost 1 GB) gets allocated in the comp arena
during idealLoop. So even though the compilation goes over the limit
in Compile::Code_Gen(), the root cause is what happens earlier,
during idealLoop.

_loop_or_ctrl and _body are both allocated in the comp
arena. Accumulated over several loop opts pass, they should not use
that much memory but the test is run with +VerifyLoopOptimizations:
calls to PhaseIdealLoop::verify() cause new PhaseIdealLoop objects
to be allocated and more memory to be used in the comp arena. The
fix I propose is to allocate _loop_or_ctrl and _body in a
dedicated ResourceArea so memory can be reclaimed when a pass of
loop opts is over.

With that change:

Total Usage: 227682272 
    --- Arena Usage by Arena Type and compilation phase, at arena usage peak of 227682272 ---
        idealLoop                  52278416         0  38687056   6913568         0    392776         0         0         0         0         0   6285016         0         0

that is ~50MB total for idealLoop instead of almost 1GB. Total usage
peaks around 200MB.

/cc hotspot-compiler


Progress

  • Change must be properly reviewed (1 review required, with at least 1 Reviewer)
  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue

Issue

  • JDK-8370519: C2: Hit MemLimit when running with +VerifyLoopOptimizations (Bug - P4)(⚠️ The fixVersion in this issue is [27] but the fixVersion in .jcheck/conf is 26, a new backport will be created when this pr is integrated.)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/28581/head:pull/28581
$ git checkout pull/28581

Update a local copy of the PR:
$ git checkout pull/28581
$ git pull https://git.openjdk.org/jdk.git pull/28581/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 28581

View PR using the GUI difftool:
$ git pr show -t 28581

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/28581.diff

Using Webrev

Link to Webrev Comment

@bridgekeeper
Copy link

bridgekeeper bot commented Dec 1, 2025

👋 Welcome back roland! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Dec 1, 2025

❗ This change is not yet ready to be integrated.
See the Progress checklist in the description for automated requirements.

@openjdk
Copy link

openjdk bot commented Dec 1, 2025

@rwestrel
The hotspot-compiler label was successfully added.

@openjdk
Copy link

openjdk bot commented Dec 1, 2025

@rwestrel The following label will be automatically applied to this pull request:

  • hotspot

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the rfr Pull request is ready for review label Dec 1, 2025
@mlbridge
Copy link

mlbridge bot commented Dec 1, 2025

Webrevs

Copy link
Contributor

@mhaessig mhaessig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for fixing this, @rwestrel. Your fix looks good to me. I merely have two nitpicky suggestions.
I will kick off a run of testing and report back with the results.

Comment on lines 25 to 27
* @test
* @bug 8370519
* @summary C2: Hit MemLimit when running with +VerifyLoopOptimizations
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unsure, but would this test qualify for @key stress?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure either what does.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a marker to filter resource intensive tests.

# The list of keywords supported in this test suite
# stress: stress/slow test

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added it in the new commit.


// Compilation environment.
Arena* comp_arena() { return &_comp_arena; }
ResourceArea* idealloop_arena() { return &_idealloop_arena; }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we make it more idiomatic C++ by having the ResourceArea allocated and deallocated together with the PhaseIdealLoop instead of attaching it to the Compile object?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, that makes sense. Done in new commits.

Copy link
Contributor

@benoitmaillard benoitmaillard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing this @rwestrel, I agree with the solution. I noticed that this could be a problem while working on JDK-8366990, but there was no reproducer at the time.

_clinit_barrier_on_entry(false),
_stress_seed(0),
_comp_arena(mtCompiler, Arena::Tag::tag_comp),
_idealloop_arena(mtCompiler, Arena::Tag::tag_idealloop),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To keep the naming consistent with other mentions of IdealLoop in variable/field names (such as _phase_verify_ideal_loop), I would name this _ideal_loop_arena. This will make it easier to find in a code editor. Feel free to ignore if you disagree

* @test
* @bug 8370519
* @summary C2: Hit MemLimit when running with +VerifyLoopOptimizations
* @run main/othervm -XX:CompileCommand=compileonly,*TestVerifyLoopOptimizationsHighMemUsage*::* -XX:-TieredCompilation -Xbatch
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, have you try reducing the test with creduce? I fixed a similar issue in JDK-8366990, and initially reviewers were concerned about the long compilation time. I was able to get decent results with creduce by using -XX:CompileCommand=memlimit. Not sure if it's worth doing here though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have creduce set up. I tried minimizing the test case by hand but it was fairly time consuming. It currently runs in 30s on a fairly fast machine.

@mhaessig
Copy link
Contributor

mhaessig commented Dec 2, 2025

Fwiw, testing passed up to tier3 on linux-x64, linux-aarch64, macosx-aarch64, mac-x64, windows-x64.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

4 participants