Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docker: switch to a bespoke test container #5683

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

gdams
Copy link
Member

@gdams gdams commented Oct 10, 2024

Also moved LIB_DIR and SYSTEM_LIB_DIR to be inside the current workspace for docker builds due to permissions errors

Currently using ghcr.io/adoptium/test-containers:ubuntu2204 which is a lightweight image built loosely off the dockerStatic base images. PR incoming to the infra repo to regularly build this and we can add more base images where appropriate.

Grinder to show this working: https://ci.adoptium.net/view/Test_grinder/job/Grinder/11160/

@smlambert
Copy link
Contributor

smlambert commented Oct 10, 2024

Removing ADDITIONAL_LABEL values from Grinder run and filling in CLOUD_PROVIDER=azure, since that is how the tests are expected to be launched in general pipeline code:

Run some additional grinders, varying test group and JDK version:

sanity.system, JDK21:

special.functional, JDK11:

17:05:11  Running on test-linux-x64-b97844 in /home/adoptopenjdk/workspace/Grinder_testList_3
[Pipeline] {
...
17:06:14  Running tests...
[Pipeline] echo
17:06:14  ITERATION: 1/1
[Pipeline] wrap
17:06:14  $ Xvfb -displayfd 2 -screen 0 1024x768x24 -fbdir /home/adoptopenjdk/workspace/Grinder_testList_3/.xvfb-10-..fbdir4319951883012337092
...
17:06:18  Exception: java.io.IOException: Cannot run program "Xvfb": error=2, No such file or directory
...
12:42:52       [exec] The test in the build_image() function is jacoco
12:42:52       [exec] #####################################################
12:42:52       [exec] INFO:  docker build  --no-cache -t adoptopenjdk-jacoco-test:17-jdk-ubuntu-hotspot-full -f /home/adoptopenjdk/workspace/Grinder/jvmtest/external/jacoco/dockerfile/17/jdk/ubuntu/Dockerfile.hotspot.full /home/adoptopenjdk/workspace/Grinder/jvmtest/external/
12:42:52       [exec] #####################################################
12:42:52       [exec] /home/adoptopenjdk/workspace/Grinder/aqa-tests/TKG/../../jvmtest/external/build_image.sh: line 75: docker: command not found
12:42:52  
12:42:52  BUILD FAILED

@gdams
Copy link
Member Author

gdams commented Oct 11, 2024

Exception: java.io.IOException: Cannot run program "Xvfb": error=2, No such file or directory

@ShelleyLambert looking at that job it didn't run in a docker container? I think it's because we need to add the ci.agent.dynamic label to that job?

@gdams
Copy link
Member Author

gdams commented Oct 11, 2024

Rebuilding https://ci.adoptium.net/view/Test_grinder/job/Grinder/11171/ (PARALLEL=Dynamic, NUM_MACHINES=5)

@smlambert
Copy link
Contributor

smlambert commented Oct 11, 2024

@ShelleyLambert looking at that job it didn't run in a docker container? I think it's because we need to add the ci.agent.dynamic label to that job?

Yes, I am well-aware and we should take that machine offline and raise an infra issue.

Looking more closely, it looks like it found a static machine, but still ran on a dynamic machine (test-linux-x64-b97844), console output from https://ci.adoptium.net/view/Test_grinder/job/Grinder_testList_3/10/console

17:05:10  Found a total of 30 nodes with the 'ci.role.test&&hw.arch.x86&&sw.os.linux' label
[Pipeline] echo
17:05:10  Found an idle node: test-docker-debian12-x64-4. The program will not start dynamic vm.
[Pipeline] }
[Pipeline] // node
[Pipeline] node
17:05:11  Running on [test-linux-x64-b97844](https://ci.adoptium.net/computer/test%2Dlinux%2Dx64%2Db97844/) in /home/adoptopenjdk/workspace/Grinder_testList_3
[Pipeline] {
[Pipeline] retry
[Pipeline] {
[Pipeline] timeout
17:05:11  Timeout set to expire in 1 hr 0 min

In 'real' pipeline runs, the test pipeline code will first try to send to idle static machines and spin up dynamic ones as needed after that.

As per the current design, we will not be passing in ci.agent.dynamic explicitly when we trigger test pipelines (only updating to set CLOUD_PROVIDER=azure), so I wanted to check that it works as designed.

@gdams
Copy link
Member Author

gdams commented Oct 11, 2024

Right okay, so if you don't expect the dynamic label to be used I'll have to tweak the existing code slightly, I'll have a playb

@gdams
Copy link
Member Author

gdams commented Oct 11, 2024

@smlambert I've updated the code so it won't explicitly require someone to pass the dynamic label to the job anymore. As long as the cloud is set as Azure it will default to using a container image

@smlambert
Copy link
Contributor

@gdams - you do not need to make the change on L461. I was explaining how the current logic already works, not asking for you to change your PR, which looks fine as it is, I am just running many additional tests to verify that each test group works on these agents (please see what we do on L351).

@gdams
Copy link
Member Author

gdams commented Oct 12, 2024

@gdams - you do not need to make the change on L461. I was explaining how the current logic already works, not asking for you to change your PR, which looks fine as it is, I am just running many additional tests to verify that each test group works on these agents (please see what we do on L351).

reverted PTAL

@smlambert
Copy link
Contributor

Couple extra Grinder runs to verify:
without CLOUD_PROVIDER: https://ci.adoptium.net/view/Test_grinder/job/Grinder/11320/
with CLOUD_PROVIDER=azure https://ci.adoptium.net/view/Test_grinder/job/Grinder/11321/

@smlambert
Copy link
Contributor

smlambert commented Nov 12, 2024

https://ci.adoptium.net/view/Test_grinder/job/Grinder/11321/ fails with:

13:03:02  + ./get.sh -s /home/jenkins/workspace/Grinder/aqa-tests/.. -p x86-64_linux -r nightly -j 21 -i hotspot --clone_openj9 false --tkg_repo https://github.com/AdoptOpenJDK/TKG.git --tkg_branch master
13:03:02  TESTDIR: /home/jenkins/workspace/Grinder/aqa-tests
13:03:02  get jdk binary...
13:03:02  _ENCODE_FILE_NEW=UNTAGGED curl -OLJSks  https://api.adoptium.net/v3/binary/latest/21/ea/linux/x64/jdk/hotspot/normal/adoptium?project=jdk
13:03:05  _ENCODE_FILE_NEW=UNTAGGED curl -OLJSks  https://api.adoptium.net/v3/binary/latest/21/ea/linux/x64/sbom/hotspot/normal/adoptium?project=jdk
13:03:06  _ENCODE_FILE_NEW=UNTAGGED curl -OLJSks  https://api.adoptium.net/v3/binary/latest/21/ea/linux/x64/testimage/hotspot/normal/adoptium?project=jdk
13:03:09  Uncompressing file: OpenJDK21U-jdk_x64_linux_hotspot_21.0.6_2-ea.tar.gz ...
13:03:13  Uncompressing file: OpenJDK21U-testimage_x64_linux_hotspot_21.0.6_2-ea.tar.gz ...
13:03:16  Run /home/jenkins/workspace/Grinder/jdkbinary/j2sdk-image/bin/java -version
13:03:16  =JAVA VERSION OUTPUT BEGIN=
13:03:16  /home/jenkins/workspace/Grinder/jdkbinary/j2sdk-image/bin/java: /lib64/libc.so.6: version `GLIBC_2.14' not found (required by /home/jenkins/workspace/Grinder/jdkbinary/j2sdk-image/bin/../lib/libjli.so)
[Pipeline] }

This failure is unrelated to this PR (related to: #5754)

@smlambert
Copy link
Contributor

smlambert commented Nov 13, 2024

I will launch a couple of Grinders to 'exhaust' our list of static nodes, and see the use of dynamic agents shortly.

Most failures related to not finding files that are generated and expected to be found within the workspace, and wondering if that relates to the comment in the PR for some other changes "Needs to be inside the workspace as the docker container won't have permissions to write to higher level directories". Do we need to redefine some other env var for where TKG generates parallel make files ?

16:52:44  Generated /home/adoptopenjdk/workspace/Grinder_testList_4/aqa-tests/TKG/../TKG/buildInfo.mk
16:52:44  
16:52:44  Error: cannot find the following tests: jdk_build_2, jdk_security3_2, jdk_jmx_2, jdk_tools_2, jdk_nio_1, jdk_nio_2, jdk_net_2, jdk_management_2, jdk_2d_2, jdk_2d_1, jdk_jdi_2, jdk_jfr_2, jdk_sound_2, jdk_sound_1, jdk_instrument_2, jdk_jfc_demo_1, jdk_text_2, jdk_jfc_demo_2, jdk_imageio_2, jdk_io_2, jdk_native_sanity_1, jdk_native_sanity_2, jdk_other_2, jdk_time_2, jdk_awt_2, jdk_awt_1, jdk_client_sanity_2, jdk_client_sanity_1, jdk_swing_1, jdk_vector_2, jdk_swing_2, jdk_security_infra_zos_2, jdk_security_infra_zos_1, jdk_rmi_2, jdk_security_infra_1, jdk_security_infra_2, jdk_beans_2 (note: group target such as sanity is not accepted inside testList)
16:52:44  
16:52:44  make[2]: *** [makeGen.mk:45: autogen] Error 1
16:52:44  make[2]: Leaving directory '/home/adoptopenjdk/workspace/Grinder_testList_4/aqa-tests/TKG'
16:52:44  make[1]: *** [makefile:100: buildListGen] Error 2
16:52:44  make[1]: Leaving directory '/home/adoptopenjdk/workspace/Grinder_testList_4/aqa-tests/TKG'
16:52:44  make: *** [parallelList.mk:20: testList_4] Error 2

buildenv/jenkins/openjdk_tests Show resolved Hide resolved
@smlambert
Copy link
Contributor

@gdams - can you let me know how you pushed the image this PR is using to ghrc.io? I think we want to create a workflow that will allow us to update it as new test dependencies are discovered (fakeroot, etc).

@sophia-guo
Copy link
Contributor

sophia-guo commented Nov 21, 2024

Most failures mentioned in #5683 (comment) are due to the DYNAMIC_COMPILE=true, which only works for functional and external right now. With DYNAMIC_COMPILE=true, the compile step will be skipped https://github.com/adoptium/aqa-tests/blob/master/buildenv/jenkins/JenkinsfileBase#L710-L719. The two child builds without that issues https://ci.adoptium.net/job/Grinder_testList_0/358/ and https://ci.adoptium.net/job/Grinder_testList_2/54/ only run with targets ending with _0 ( Failed one including target ending with _1 or_2 , which I'm not sure why it works)

I rerun the build https://ci.adoptium.net/view/Test_grinder/job/Grinder/11554 with DYNAMIC_COMPILE=false twice. DYNAMIC_COMPILE=true https://ci.adoptium.net/job/Grinder/11686/ https://ci.adoptium.net/job/Grinder/11685/, both has child jobs running on dynamic agents. All builds are running haven't hit any special issues with dynamic agents . Though the thing I don't understand is why only jdk_***_0 are generated.
parallelList.mk.txt

I see, there are message Warning: JVM_OPTIONS specified, ignoring variations . But how and where JVM_OPTIONS is set, the job parameter JVM_OPTIONS is empty?

@smlambert
Copy link
Contributor

I see, there are message Warning: JVM_OPTIONS specified, ignoring variations . But how and where JVM_OPTIONS is set, the job parameter JVM_OPTIONS is empty?

adoptium/TKG#612

@sophia-guo
Copy link
Contributor

It seems with dockerimage as agent all build parameters include empty string will be considered as environment variable. So even the JVM_OPTIONS='' it still cause the variations are ignored. Issue opened adoptium/TKG#640

This may not only affect JVM_OPTIONS. Any conditions based on if a build parameter is a set environment variable may be affected. Need to double check if there are similar cases.

16:48:28  + printenv
16:48:28  JENKINS_HOME=/home/jenkins/.jenkins
16:48:28  OPENJ9_REPO=https://github.com/eclipse/openj9.git
16:48:28  DOCKER_REGISTRY_DIR=
16:48:28  USE_TESTENV_PROPERTIES=false
16:48:28  BUILD_LIST=openjdk
16:48:28  VENDOR_TEST_DIRS=
16:48:28  CI=true
16:48:28  OPENJ9_BRANCH=master
16:48:28  SDK_RESOURCE=nightly
16:48:28  HOSTNAME=027c99e76dd4
16:48:28  RUN_CHANGES_DISPLAY_URL=https://ci.adoptium.net/job/Grinder/11686/display/redirect?page=changes
16:48:28  ARTIFACTORY_ROOT_DIR=
16:48:28  ADOPTOPENJDK_REPO=https://github.com/gdams/openjdk-tests.git
16:48:28  UPSTREAM_JOB_NUMBER=
16:48:28  JVM_OPTIONS=
16:48:28  NODE_LABELS=ubuntu x86-64 x64 ci.agent.dynamic sw.os.linux test-linux-x64-8d33b3 hw.arch.x86 ubuntu2204
16:48:28  VENDOR_TEST_BRANCHES=
16:48:28  HUDSON_URL=https://ci.adoptium.net/
16:48:28  TARGET=extended.openjdk
16:48:28  EXTRA_DOCKER_ARGS=-v ${TEST_JDK_HOME}:/opt/java/openjdk
16:48:28  EXIT_FAILURE=false
16:48:28  STF_OWNER_BRANCH=adoptium:master
16:48:28  USE_JRE=false
16:48:28  OPENLIBERTY_SHA=
16:48:28  HOME=/home/jenkins
16:48:28  ADOPTOPENJDK_BRANCH=docker
16:48:28  SSH_AGENT_CREDENTIAL=
16:48:28  BUILD_URL=https://ci.adoptium.net/job/Grinder/11686/
16:48:28  TAP_NAME=Grinder_20241121214828.tap
16:48:28  LABEL=
16:48:28  JDK_IMPL=hotspot
16:48:28  DOCKER_REQUIRED=false
16:48:28  JENKINS_SERVER_COOKIE=durable-480433e4f3c3ba1235c6b55c5b757ce5d2325e6f031a827336b8be429dce8533
16:48:28  CUSTOMIZED_SDK_URL=
16:48:28  FAILED_TEST_TARGET=
16:48:28  VENDOR_TEST_REPOS=
16:48:28  FAILED_TESTS=
16:48:28  PERF_ROOT=/home/adoptopenjdk/workspace/Grinder/../../benchmarks
16:48:28  JDK_REPO=
16:48:28  WORKSPACE=/home/adoptopenjdk/workspace/Grinder
16:48:28  KEEP_WORKSPACE=false
16:48:28  FQDN=test-linux-x64-8d33b3.uksouth.cloudapp.azure.com
16:48:28  ARCHIVE_TEST_RESULTS=false
16:48:28  TEST_FLAG=
16:48:28  GENERATE_JOBS=true
16:48:28  DOCKERIMAGE_TAG=
16:48:28  TKG_OWNER_BRANCH=AdoptOpenJDK:master
16:48:28  APPLICATION_OPTIONS=
16:48:28  NODE_NAME=test-linux-x64-8d33b3
16:48:28  TEST_IMAGES_REQUIRED=true
16:48:28  PERSONAL_BUILD=true
16:48:28  JDK_BRANCH=
16:48:28  CUSTOM_TARGET_KEY_VALUE=
16:48:28  USER_CREDENTIALS_ID=
16:48:28  RUN_ARTIFACTS_DISPLAY_URL=https://ci.adoptium.net/job/Grinder/11686/display/redirect?page=artifacts
16:48:28  CLOUD_PROVIDER=azure
16:48:28  EXECUTOR_NUMBER=0
16:48:28  UPSTREAM_TEST_JOB_NAME=
16:48:28  STAGE_NAME=Setup
16:48:28  AUTO_DETECT=true
16:48:28  JCK_VERSION=
16:48:28  JDK_VERSION=17
16:48:28  CUSTOMIZED_SDK_URL_CREDENTIAL_ID=
16:48:28  BUILD_DISPLAY_NAME=#11686 - [email protected]
16:48:28  RUN_TESTS_DISPLAY_URL=https://ci.adoptium.net/job/Grinder/11686/display/redirect?page=tests
16:48:28  TIME_LIMIT=10
16:48:28  EXTERNAL_TEST_CMD=mvn clean install
16:48:28  HUDSON_HOME=/home/jenkins/.jenkins
16:48:28  JOB_BASE_NAME=Grinder
16:48:28  DOCKER_REGISTRY_URL=
16:48:28  OCP_TOKEN=
16:48:28  PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
16:48:28  PLATFORM=x86-64_linux
16:48:28  TKG_ITERATIONS=1
16:48:28  NUM_MACHINES=
16:48:28  JCK_GIT_REPO=
16:48:28  RERUN_FAILURE=false
16:48:28  BUILD_ID=11686
16:48:28  EXIT_SUCCESS=false
16:48:28  ARTIFACTORY_REPO=
16:48:28  ACTIVE_NODE_TIMEOUT=
16:48:28  BUILD_TAG=jenkins-Grinder-11686
16:48:28  SYSTEM_LIB_DIR=/home/adoptopenjdk/workspace/Grinder/testDependency/system_lib
16:48:28  JENKINS_URL=https://ci.adoptium.net/
16:48:28  OPENJCEPLUS_GIT_REPO=https://github.com/ibmruntimes/OpenJCEPlus.git
16:48:28  KEEP_REPORTDIR=true
16:48:28  JOB_URL=https://ci.adoptium.net/job/Grinder/
16:48:28  JCK_ROOT=
16:48:28  ORIGIN_JDK_VERSION=17
16:48:28  BUILD_NUMBER=11686
16:48:28  DEBIAN_FRONTEND=noninteractive
16:48:28  ITERATIONS=1
16:48:28  OPENJ9_SHA=
16:48:28  JENKINS_NODE_COOKIE=4d05ef36-e070-46c4-a0e8-f1aaf13e7066
16:48:28  OPENJCEPLUS_GIT_BRANCH=semeru-java17
16:48:28  LIGHT_WEIGHT_CHECKOUT=false
16:48:28  LABEL_ADDITION=
16:48:28  ARTIFACTORY_SERVER=
16:48:28  RUN_DISPLAY_URL=https://ci.adoptium.net/job/Grinder/11686/display/redirect
16:48:28  OPENJ9_SYSTEMTEST_OWNER_BRANCH=eclipse-openj9:master
16:48:28  OCP_SERVER=
16:48:28  SPEC=linux_x86-64
16:48:28  TEST_JDK_HOME=/home/adoptopenjdk/workspace/Grinder/jdkbinary/j2sdk-image
16:48:28  HUDSON_SERVER_COOKIE=2d832652af5afba8
16:48:28  UPSTREAM_TEST_JOB_NUMBER=
16:48:28  DOCKER_REGISTRY_URL_CREDENTIAL_ID=
16:48:28  JOB_DISPLAY_URL=https://ci.adoptium.net/job/Grinder/display/redirect
16:48:28  CUSTOM_TARGET=
16:48:28  CLASSPATH=
16:48:28  UPSTREAM_JOB_NAME=
16:48:28  JOB_NAME=Grinder
16:48:28  ADOPTOPENJDK_SYSTEMTEST_OWNER_BRANCH=adoptium:master
16:48:28  EXTERNAL_REPO_BRANCH=master
16:48:28  LIB_DIR=/home/adoptopenjdk/workspace/Grinder/testDependency/lib
16:48:28  PWD=/home/adoptopenjdk/workspace/Grinder
16:48:28  TEST_TIME=1
16:48:28  BUILD_IDENTIFIER=
16:48:28  OPENJDK_SHA=
16:48:28  PARALLEL=Dynamic
16:48:28  WORKSPACE_TMP=/home/adoptopenjdk/workspace/Grinder@tmp
16:48:28  DYNAMIC_COMPILE=false
16:48:28  EXTRA_OPTIONS=
16:48:28  BASE_DOCKER_REGISTRY_CREDENTIAL_ID=
16:48:28  JOBSTARTTIME=Thu, 21 Nov 2024 21:48:25 +0000
16:48:28  RERUN_ITERATIONS=0
16:48:28  VENDOR_TEST_SHAS=
16:48:28  EXTERNAL_CUSTOM_REPO=
16:48:28  JVM_VERSION=
16:48:28  RERUN_LINK=

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants