Skip to content

Conversation

@sxa
Copy link
Member

@sxa sxa commented Sep 4, 2025

Initial implementation of #5684 based on last year's work in #5683

This PR is designed to allow dynamic containers, provisioned with Azure, to be used for AQA jobs. It spins up a system with docker installed (defined by the Azure VM agents plugin config) and then pulls down a docker image (currently UBI10) and runs a job on it. This behaviour is triggered by setting the parameter CLOUD_PROVIDER=azure in the Test jobs.

This also makes a small modification to the weston support in EL10 distributions to allow a dynamic XDG_RUNTIME_DIR to be used (within the jenkins workspace) so that it does not haver to be in a fixed, reusable location on the machine. FYI @llxia @AdamBrousseau as it would be good to have your confirmation that this does not break anything on your side for EL10.

In the future it is envisioned that this will allow support for different distributions (at the moment we have ubi10, ubuntu2204 and ubuntu2404 on the ghcr.io registry. It is also likely that we will change the address from ghcr.io/adoptium/test-containers to something else in the future to avoid confusion with other products of a similar name :-)

@sxa sxa marked this pull request as ready for review November 13, 2025 18:55
@sxa
Copy link
Member Author

sxa commented Nov 13, 2025

Conflicts resolved. Pre-conflict-resultion branch has been tested again at with ci.agent.dynamic as the label: https://ci.adoptium.net/job/Grinder/15435/ (That's a bad idea - it should be ADDITIONAL LABEL otherwise it ignores the platform - testing that (with conflicts resolved) at https://ci.adoptium.net/job/Grinder/15486/

Marking ready for review as an initial pass that can be tested.

@sxa sxa requested review from smlambert and steelhead31 November 13, 2025 18:55
@sxa
Copy link
Member Author

sxa commented Nov 13, 2025

Hmm in fact using ci.agent.dynamic as an ADDITIONAL_LABEL doesn't work as the ci.agent.dynamic machines get spun up with the labels: ubuntux86-64x64ci.agent.dynamicsw.os.linuxhw.arch.x86ubuntu2204 and not ci.role.test. Tested with LABEL=ci.role.dynamic again at https://ci.adoptium.net/job/Grinder/15487/ (The previous Grinder one from the last comment got stuck)

@sxa
Copy link
Member Author

sxa commented Nov 13, 2025

Made a temporary change to https://ci.adoptium.net/manage/cloud/Azure/template/test-linux-x64/ to add ci.role.test to the list of labels on the dynamic machines and it worked properly, although the "stuck" Grinder 15486 didn't progress and was still stuck.
I have now reverted that change just in case it causes any side effects that I'm not aware of.

@smlambert smlambert requested a review from sophia-guo November 13, 2025 19:31
@smlambert
Copy link
Contributor

smlambert commented Nov 13, 2025

Try different test targets and jdk versions

14:45:20  [ JUnit Containers: found 4, started 4, succeeded 4, failed 0, aborted 0, skipped 0]
14:45:20  [ JUnit Tests: found 5, started 5, succeeded 4, failed 1, aborted 0, skipped 0]
14:45:20  
14:45:20  java.lang.Exception: JUnit test failure
14:45:20  	at com.sun.javatest.regtest.agent.JUnitRunner.runWithJUnitPlatform(JUnitRunner.java:149)
14:45:20  	at com.sun.javatest.regtest.agent.JUnitRunner.main(JUnitRunner.java:95)
14:45:20  	at com.sun.javatest.regtest.agent.JUnitRunner.main(JUnitRunner.java:61)
14:45:20  	at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
14:45:20  	at java.base/java.lang.reflect.Method.invoke(Method.java:580)
14:45:20  	at com.sun.javatest.regtest.agent.MainWrapper$MainTask.run(MainWrapper.java:138)
14:45:20  	at java.base/java.lang.Thread.run(Thread.java:1583)
  • https://ci.adoptium.net/job/Grinder/15491/ - run sanity.openjdk in parallel across 6 nodes (to check it sends to all dynamic nodes) - looks good, 1 testcase failure out of 1657, java/lang/ProcessHandle/OnExitTest.java

for (int i = 1; i <= ITERATIONS; i++) {
echo "ITERATION: ${i}/${ITERATIONS}"
if (env.SPEC.contains('linux') && !(LABEL.contains('ci.agent.dynamic') && CLOUD_PROVIDER == 'azure') && (BUILD_LIST != "external")) {
if (env.SPEC.contains('linux') && (BUILD_LIST != "external")) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this change work for CLOUD_PROVIDER=fyre, EBC?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll have to defer to someone who knows about how we integrate with those systems more than I do.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having said that, I doubt they'd be using CLOUD_PROVIDER=azure so it shouldn't affect those scenarios.

docker.image('adoptopenjdk/centos7_build_image').pull()
docker.image('adoptopenjdk/centos7_build_image').inside {
// Set dockerimage for azure agent. Fyre has stencil to setup the right environment
docker.image('ghcr.io/adoptium/test-containers:ubi10').pull()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we start from ubuntu2404, which is more popular and open?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably also point at the other named dir, adoptium_test_image, as discussed to avoid confusion around the test-containers product.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we start from ubuntu2404, which is more popular and open?

Why do you consider UBI10 to not be open? We have both images so either can work here, although for this initial prototype I chose the potentially more awkward one (UBI10 uses weston instead of Xvfb) to ensure we spot any errors.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably also point at the other named dir, adoptium_test_image, as discussed to avoid confusion around the test-containers product.

Yeah as discussed that's a good idea in the future, but this should be considered an initial pass to see how it works and so I wouldn't wish to block merging of this PR and it's testing on going back and republishing images at this stage.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, both this and container selection can be future revisions.

@sxa
Copy link
Member Author

sxa commented Nov 14, 2025

* https://ci.adoptium.net/job/Grinder/15489/ - sanity.functional

PASS!

* https://ci.adoptium.net/job/Grinder/15490/ - jdk_net (as those tests would be most likely to fail if config of agent differs from static agents)

Hmmm 4 test case failures there, 2 of which are in multicast tests.

* https://ci.adoptium.net/job/Grinder/15491/ - run sanity.openjdk in parallel across 6 nodes (to check it sends to all dynamic nodes)

Timeout in OnExitTest in testList_4 - regrind at https://ci.adoptium.net/job/Grinder/15520

@smlambert
Copy link
Contributor

smlambert commented Nov 14, 2025

Yes, multicast and I believe some testcases may be trying to launch containers, given the output text of failed testcases is saying

14:45:20  [ JUnit Containers: found 4, started 4, succeeded 4, failed 0, aborted 0, skipped 0]
14:45:20  [ JUnit Tests: found 5, started 5, succeeded 4, failed 1, aborted 0, skipped 0]
14:45:20  
14:45:20  java.lang.Exception: JUnit test failure
14:45:20  	at com.sun.javatest.regtest.agent.JUnitRunner.runWithJUnitPlatform(JUnitRunner.java:149)

https://github.com/openjdk/jtreg/blob/master/src/share/classes/com/sun/javatest/regtest/agent/JUnitRunner.java#L149

was going to check if those run 'fine' on a static docker container, if we have one for x64 Linux.

@sxa
Copy link
Member Author

sxa commented Nov 14, 2025

was going to check if those run 'fine' on a static docker container, if we have one for x64 Linux.

100 iteration grinders 15515, 15516 and 15517 on x64 static docker systems all seem to fail.

@smlambert
Copy link
Contributor

100 iteration grinders 15515, 15516 and 15517 on x64 static docker systems all seem to fail.

I suspected as much, we may want to figure out how to exclude or skip those testcases in containerized environments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

5 participants