Test Result Summary Service (TRSS)

As we support more and more tests, projects and Jenkins servers, monitoring builds health and triaging tests daily is quickly becoming an overwhelming task. We are currently maintaining 6+ Jenkins servers both internally and externally.

Some external Jenkins server examples that we monitor are:


We encounter 4 main challenges that have motivated us to develop TRSS:

Multiple Jenkins servers to monitor

Even though we use Jenkins plugins at each particular Jenkins server for sending build status messages, without a build status overview it is hard to triage failure.

Need for longer-term storage of test results

Secondly, we may want to keep results for a longer period of time for some tests, so we can compare the results with history runs. For example, we may want to keep performance test results for months or even years. Jenkins server often has limited storage and we can only store limited number of builds.

Need for specialized views like side-by-side comparison

The third problem is that we do not have a tool to view/compare test results. Some type of tests is best to compare result with previous releases, previous builds or different platforms within the same build. Additionally, some type of tests are best displayed in graph to see the trends.

Desire for customized views (tailored by each user)

Last but not the least, different users may be interested in different builds. For example, developers may want to only monitor their own personal builds. The FV team maybe only interested in functional test builds. The SV team maybe only interested in system test builds. A project manager may want to know overall test builds status.

We wanted a tool to monitor multiple Jenkins servers and display different type of test build results and history (test log files, compare test result across builds/platforms, display trends, etc.) And it needs to be highly customizable per user.


To solve the issues listed above, we have started creating a thin-layer service called the Test Result Summary Service (TRSS).  You can find the source in https://github.com/AdoptOpenJDK/openjdk-test-tools.

Key Features:
1) Personalized Dashboard:

TRSS can monitor multiple Jenkins servers in real-time. User can add/remove builds, add/remove/rearrange the panels/widgets. User’s modification is stored in each user’s browser local storage, so everyone can have their own customizable view and store their configuration without interfering with others.

Personalized Dashboard
Personalized Dashboard – add/remove builds
Personalized Dashboard – test trend
2) Test Result View

Other than monitor multiple Jenkins servers in real-time. TRSS also stores test history data.

Test Build View

Downstream builds that are launched by above parent build pipelines:

Downstream Build View
Downstream Build View

List of all tests within the build. In this view, TRSS displays test name, test result, test duration and test result history. Columns can be sorted or filtered (to only show FAILED tests or to sort them to the top of the list).

Test Result Summary

From above view, we easily tell cmdLineTester_gcsuballoctests_0 failed. All Platforms shows cmdLineTester_gcsuballoctests_0 test result for all platforms and JDK version in the build.

Test in All Platforms

Deep history shows cmdLineTester_gcsuballoctests_0 test execution history on Linux s390.

Test History

TRSS also displays the test output (as one would see in an individual Jenkins server console view of the test build). Below is cmdLineTester_gcsuballoctests_0 test output.

Test Output
3) Test Compare

TRSS can compare any test output (regardless of test type, build, platforms, etc).  With information for Jenkins server, build name, build name and test name, TRSS can search database and compares test output side by side.  This is an extremely simple but effective way to speed triage by comparing a passed build to a failed one and quickly identifying differences.

Test Output Compare

Implementation Detail:

TRSS uses node.js as server and React as client. It actively monitors multiple Jenkins servers and their jobs. TRSS parses the jobs outputs and store parsed data into MongoDB. If needed, it also stores links to Artifactory for extra data (i.e., logs, core files, erc).

TRSS Overview


If special data is needed for display (i.e., new measurement), user can easily add parser code to TRSS server and client so that different type of tests can be parsed and displayed.

In the multi-server, multi-project scenario, TRSS is a lightweight and customizable open-source solution to monitor, display, compare and triage test results and store history test data. The tool itself is project-agnostic and can be generally applied to any Jenkins-based builds or projects.

We are still in the early development stage.  TRSS is the stepping stone for us to create/integrate with other microservices. For future enhancement, we may tie into Watson Analytics and try out cognitive triage experiments. If you are interested in helping build and improve this project, please engage us in the AdoptOpenJDK #testing Slack channel.   🙂

Jumpstarting Open Performance Testing

Before I dabble in the juicy world of computer architectures and measuring and understanding performance implications, let me premise this entire post with a quick introduction to myself.

I am not a performance analyst, nor am I a low-level software developer trying to optimize algorithms to squeeze out the last ounce of performance on particular hardware.

While I am fortunate to work with people who have those skills, I myself am a ‘happy-go-lucky’ / high-level software developer.  My focus in the last few years has been developing my skills as a verification expert.  I have a particular interest in finding solutions that help testing software easier and more effective.  One flavour of software verification is performance testing.

While I am fairly new to performance benchmarking,  I am experienced in streamlining processes and tools to reduce the friction around common tasks.  If we want to empower developers to be able to benchmark and test the performance impact that their changes, we need to create tools and workflows that are dead easy.  I personally need it to be dead easy!  “Not only am I the hair club president, I am also a client“.   I want to be able to easily run performance benchmarks and at some level understand the results of those benchmarks.  (This seems like a good time to segue to the recent open-sourcing of tools that help in that effort, PerfNext/TRSS and Bumblebench… more to come on that later in preparation for “Performance Testing for Everyone” at EclipseCon Europe).

But back to the current story, a wonderful opportunity presented itself.   We have the great fortune at the AdoptOpenJDK project to work with many different teams, groups and sponsors.  Packet, a cloud provider of bare-metal servers, is one of our sponsors who donates machine time to the project allowing us to provide pre-built and tested OpenJDK binaries from build scripts and infrastructure.  They are very supportive of open-source projects, and recently offered us some time on one of their new Intel® Optane™ SSD servers (with their Skylake microarchitecture).

Packet and AdoptOpenJDK share the mutual goal of understanding how these machines affect Java™ Virtual Machine (JVM) performance.  Admittedly, I attempted to parse all of the information found in the Intel® 64 and IA-32 Architectures Optimization Manual, but needed some help. Skylake improves on the Haswell and Broadwell predecessors.  Luckily, Vijay Sundaresan, WAS and Runtimes Performance Architect, took the time to summarize some features of the Skylake architecture.  He outlined those features having the greatest impact on JVM performance and therefore are of great interest to JVM developers.  Among the improvements he listed :

  • Skylake’s 1.5X memory bandwidth, higher memory capacity at a lower cost per GB than DRAM and better memory resiliency
  • Skylake cache memory hierarchy is quite different to Broadwell, with one of the bigger changes being that it stopped being inclusive
  • Skylake also added AVX-512 (512 bytes vector operations) which is a 2X improvement over AVX-256 (256 bytes vector operations)

Knowing of those particular improvements and how a JVM implementation leverages them, we hoped to see a 10-20% improvement in per-core performance.  This would be in keeping with the Intel® published SPECjbb®2015 benchmark** (the de facto standard Java™ Server Benchmark) scores showing improvements in that range.

We were not disappointed.  We decided to run variants of the ODM benchmark.  This benchmark runs a Rules engine typically used for automating complex business decision automation, think analytics (compliance auditing for Banking or Insurance industries as a use case example).  Ultimately, the benchmark processes input files.  In one variant, a small set of 5 rules, in the other a much larger set of 300 rules was used.  The measurement tracks how many times a rule can be processed per second, in other words, it measures the throughput of the Rules engine with different kinds of rules as inputs.  This benchmark does a lot of String/Date/Integer heavy processing and comparison as those are common datatypes in the input files.  Based on an average of the benchmark runs that were run on the Packet machine, we saw a healthy improvement of 13% and 20% in the 2 scenarios used.

ODM results summary
Summary of ODM results
PerfNext/TRSS graph view of ODM results
ODM results from PerfNext/TRSS graph view

We additionally ran some of our other tests used to verify AdoptOpenJDK builds on this machine to compare the execution times… We selected a variety of OpenJDK implementations (hotspot and openj9), and versions (openjdk8, openjdk9, and openjdk10), and are presenting a cross-section of them in the table below.  While some of the functional and regression tests were flat or saw modest gains, we saw impressive improvements in our load/system tests.  For background, some of these system tests create hundreds or thousands of threads, and loop through the particular tests thousands of times.  In the case of the sanity group of system tests, we went from a typical 1 hr execution time to 20 minutes, while the extended set of system tests saw an average 2.25 hr execution time drop to 34 minutes.

To put the system test example in perspective, and looking at our daily builds at AdoptOpenJDK, on the x86-64_linux platform, we have typically 3 OpenJDK versions x 2 OpenJDK implementations, plus a couple of other special builds under test, so 8 test runs x 3.25 hrs = 26 daily execution hours on our current machines.  If we switched over to the Intel® Optane™ machine on Packet, would drop to 7.2 daily execution hours.  A tremendous savings, allowing us to free up machine time for other types of testing, or increase the amount of system and load testing we do per build.

The implication?  For applications that behave like those system tests, (those that create lots of threads and iterate many times across sets of methods, including many GUI-based applications or servers that maintain a 1:1 thread to client ratio), there may be a compelling story to shift.

System & functional test execution times
System and functional test results and average execution times

Having this opportunity from Packet, has provided us a great impetus to forge into “open performance testing” story for OpenJDK implementations and some of our next steps at AdoptOpenJDK.  We have started to develop tools to improve our ability to run and analyze results.  We have begun to streamline and automate performance benchmarks into our CI pipelines.  We have options for bare-metal machines, which gives us isolation and therefore confidence that results are not contaminated by other services sharing machine resources.  Thanks to Beverly, Piyush, Lan and Awsaf for getting some of this initial testing going at AdoptOpenJDK.  While there is a lot more to do, I look forward to seeing how it will evolve and grow into a compelling story for the OpenJDK community.

Special thanks to Vijay, for taking the time to share with me some of his thoughtful insights and great knowledge!  He mentioned with respect to Intel Skylake, there are MANY other opportunities to explore and leverage including some of its memory technologies for Java™ heap object optimization, and some of the newer instructions for improved GC pause times.  We encourage more opportunities to experiment and investigate, and invite any and all collaborators to join us.  It is an exciting time for OpenJDK implementations, innovation happens in the open, with the help of great collaborators, wonderful partners and sponsors!

** SPECjbb®2015 is a registered trademark of the Standard Performance Evaluation Corporation (SPEC).

JCK Certification and An Anniversary of Sorts

Exactly a year ago today, Tim Ellison sent me a note.  He had just watched a presentation I had recorded, talking about the work my team had started to vastly ‘simplify Java testing’.

He mentioned that there was this project he was involved with, “AdoptOpenJDK”, where they were talking about some of the same concepts that we were implementing.  He wondered if what we had started implementing could be used at this project.  I replied, “sure, by when”.  His answer, “last week”.

Here we are, 1 year later, diligently improving the way we test Java.  I am witnessing the vision that we had laid out over a year ago of “make test… better” become reality.  It is a collaborative and fun effort!  Running all kinds of testing, and very notably this week, AdoptOpenJDK project is able to claim its first JCK certified builds, starting with openjdk8-openj9 builds on 3 Linux platforms (x64/ppc64le/s390x).  See the check marks in the openjdk8-openj9 build archive  (also available in Docker images).

I really do feel lucky to be part of this project, and to work with the small but dedicated team of folks who make it fly.  A big thank you and congratulations to the team on this anniversary of sorts, and oh how you capped it off with the JCK compliant icing!  I can only imagine what it will look like a year from now, as we continue to innovate, refine and deliver on our goals.

Adding third party application tests at AdoptOpenJDK

Adding third party application tests into the automated testing at AdoptOpenJDK is easier now than it has ever been!

There exists a template project in the openjdk-test repository that can be customized to fit any third party application and run as part of the External Tests builds at Adopt.

The thirdparty_containers README file outlines the steps involved in this process.

For a practical example, you may also want to go over the following demo. It shows how we have added the Lucene-Solr nightly smoke test to the list of third party application tests at Adopt.

Please feel free to grow the list of third party application tests we run in AdoptOpenJDK.

For any questions or queries, reach out to us on our testing slack channel.

Testing Java: Help Me Count the Ways

Wow! Hard to believe how much progress has been made since I first posted a mission statement of sorts… (in Part 1: Testing Java: Let Me Count the Ways).  As I look back over 2017, and assess where we are at with testing the OpenJDK binaries being produced at adoptopenjdk.net, I am prompted to write this “Part 2” blog post.  The intent is to share the status and some of the accomplishments of the talented and dedicated group of individuals that are contributing their time, skills and effort towards the goal of fully open-source, tested and compliant binaries.

We have added 4 test categories to date (which constitute tens of thousands of tests, running on several platforms, with more to come as machines become available):

  • OpenJDK regression tests (“make openjdk“) – from OpenJDK
  • system (and stress) tests (“make system“) – from AdoptOpenJDK/openjdk-systemtest
  • 3rd party application tests (“make external“) – the unit tests from each application’s github repo, such as Scala, Derby, Cassandra, etc.
  • compliance (JCK) tests (“make jck“) – under OCTLA License (OpenJDK Community TCK License Agreement)

With 2 more test categories on the way:

  • functional tests (“make functional“) – from Eclipse/openj9
  • performance benchmarks (“make perf“) – from various open-source github repos

To make it easy to run these tests, we’ve added an intentionally thin wrapper that allows us to call some logical make targets to execute these tests.  So the ways that we can tag and categorize the test material include by:

  • Test group (as listed above, openjdk, system, external, jck, functional, perf)
  • Java version (we currently test Java8, Java9 and Java10 builds)
  • VM implementation (we currently test OpenJDK with Hotspot, and OpenJDK with OpenJ9)
  • Test level (for example, a quick check for pull request builds, we can tag a subset of tests from any group with “sanity”, to run the entire sanity set, “make sanity“, to run just the subset of openjdk or system tests that have been tagged then “make sanity.openjdk” and “make sanity.system” respectively)

For more details on this, please check out the presentation Test Categorization and Test Pipelines at AdoptOpenJDK.

There is also an open discussion on setting the test criteria for marking a build “good”.  With the most stringent bar applied to the release builds, nightly builds will have less testing (with constraints such as machine resources and time as factors), and PR builds as mentioned being a targeted sanity check.

We are still getting some of these test builds up and running.  And this is where the call for assistance comes in…  We would love to have extra hands and eyes on the tests at AdoptOpenJDK.  While there are too many tasks to list in this post, here are some meaty work items to pique your interest:

  • Triaging any of the open test issues, we know some are likely due to test or machine configuration irregularities, while others are known OpenJDK issues.  Properly annotating the current set of failures and rerunning/re-including fixed issues are top of the TODO list.
  • Enabling more 3rd party application tests (currently Scala, and shortly Derby and Solr/Lucene tests are running, with the opportunity to include many more).
  • A large percentage of the JCKs are automated.  There is however a set of these compliance tests that are manual/interactive.  We are looking for some dedicated volunteers to help step through the interactive tests on different platforms.
  • We have automated these tests in Jenkins Pipeline builds, and want to continue adding builds for the various platforms that the binaries are built on, extra hands here would also be very helpful

Seeing all this come together has been very rewarding.  It has been wonderful to work with the capable and dedicated folks working on the AdoptOpenJDK project.  There is still a long way to go, but we have a great base to start from, and it seems we can make a big difference by offering a truly open and flexible approach to testing Java.  If you really want to learn more about Java, join us in testing it!

Playing with Test Microservices

Recently, I had the good fortune to speak (fondly!) about some excellent open-source projects that I participate in…  AdoptOpenJDK, Eclipse OpenJ9 and Eclipse OMR.  Across those three projects, we are trying to simplify the activity of testing.  One of the great side-effects of simpler, easier testing is that it frees up time to do some fun work around building microservices for test.  We hope that we can eventually bring the best of these microservices into the open also, to help all open-source endeavours be even better at the activities of test.

If you are interested in hearing more about microservices for test, please give a listen to “Cloud-based Test Microservices“, a presentation which I gave at the Eurostar conference in Copenhagen this year.


New contribution of test cases to AdoptOpenJDK

We have contributed additional tests and a test framework the tests depend on, to the AdoptOpenJDK project in these repositories: openjdk-systemtest and stf.

The tests are longer running tests which cannot be automated using standard unit test frameworks (such as JUnit or TestNG). They fall under the broad category of ‘system tests’ – tests which attempt to simulate running production workloads and specific user scenarios.

The tests are mostly load tests which run a collection of java methods (typically discrete test case methods) for either a set number of invocations, or a period of time, using one or more threads. Although the tests can (and do) find functional issues executing the individual methods in the load, that is not the main goal of the testing. Running a workload for a period of time is often the most effective way to identify defects in the java virtual machine such as the garbage collector and the dynamic java compiler.  See the load test tool documentation for more details of these tests.

Other tests are more multi-process, multi-step in nature. For example, tests which use the Java Debugging Interface (JDI) to examine a test program running in a second JVM, and tests which perform remote operations on a second JVM running a workload, such as attaching a Java transformer agent.  See here for more details of these tests.

You can download and execute the tests locally – follow the instructions at openjdk-systemtest. You will also find additional documentation about the tests and the test tooling in the repositories.


Testing Java: Let Me Count the Ways

For years now, I have been testing Java and if there is a single statement to make about that activity, it is that there are many, many, many ways to test a Java Virtual Machine (JVM).

From code reviews and static analysis, to unit and functional tests, through 3rd party application tests and various other large system tests, to stress/load/endurance tests and performance benchmarks, we have a giant set of tests, tools and test frameworks at our disposal.  Even the opcode testing in the Eclipse OMR project helps to test a JVM.  From those low-level tests, all the way up to running some Derby or solr/Lucene community tests, or Acme Air benchmark there are many ways to reveal defects.  If we can find them, we can fix them… (almost always a true statement).

One common request from developers is “make it easier for me to test”.  Over the last year, we have been working on that very request.  Recently, I’ve had the good fortune to become involved in the AdoptOpenJDK project.  Through that project, we have delivered a lightweight wrapper to loosely tie together the various tests and test frameworks that we use to test Java.

We currently run the set of regression tests from OpenJDK (nearly 6000 test cases).  Very soon, we will be enabling more functional and system-level tests at that project.

My goal with that project is to make it super easy to run, add, edit, exclude, triage, rerun and report on tests.  To achieve that goal, we should:

  • create common ways of working with tests (even if they use different frameworks, or focus on different layers in the software stack)
  • limit test code/infrastructure bloat
  • choose open-source tools and frameworks
  • keep technical ego in check to restrict needless complexity for little or no gain in functionality

There is a lot of work ahead, but so far, its been fun and challenging.  If you are interested in helping out on this grand adventure, please check out the project to see how to get involved at AdoptOpenJDK.