Test Result Summary Service (TRSS)

As we support more and more tests, projects and Jenkins servers, monitoring builds health and triaging tests daily is quickly becoming an overwhelming task. We are currently maintaining 6+ Jenkins servers both internally and externally.

Some external Jenkins server examples that we monitor are:
https://ci.adoptopenjdk.net/
https://ci.eclipse.org/openj9/
https://ci.eclipse.org/omr/

MOTIVATION/CHALLENGE:

We encounter 4 main challenges that have motivated us to develop TRSS:

Multiple Jenkins servers to monitor

Even though we use Jenkins plugins at each particular Jenkins server for sending build status messages, without a build status overview it is hard to triage failure.

Need for longer-term storage of test results

Secondly, we may want to keep results for a longer period of time for some tests, so we can compare the results with history runs. For example, we may want to keep performance test results for months or even years. Jenkins server often has limited storage and we can only store limited number of builds.

Need for specialized views like side-by-side comparison

The third problem is that we do not have a tool to view/compare test results. Some type of tests is best to compare result with previous releases, previous builds or different platforms within the same build. Additionally, some type of tests are best displayed in graph to see the trends.

Desire for customized views (tailored by each user)

Last but not the least, different users may be interested in different builds. For example, developers may want to only monitor their own personal builds. The FV team maybe only interested in functional test builds. The SV team maybe only interested in system test builds. A project manager may want to know overall test builds status.

We wanted a tool to monitor multiple Jenkins servers and display different type of test build results and history (test log files, compare test result across builds/platforms, display trends, etc.) And it needs to be highly customizable per user.

Solution:

To solve the issues listed above, we have started creating a thin-layer service called the Test Result Summary Service (TRSS).  You can find the source in https://github.com/AdoptOpenJDK/openjdk-test-tools.

Key Features:
1) Personalized Dashboard:

TRSS can monitor multiple Jenkins servers in real-time. User can add/remove builds, add/remove/rearrange the panels/widgets. User’s modification is stored in each user’s browser local storage, so everyone can have their own customizable view and store their configuration without interfering with others.

Personalized Dashboard
Personalized Dashboard – add/remove builds
Personalized Dashboard – test trend
2) Test Result View

Other than monitor multiple Jenkins servers in real-time. TRSS also stores test history data.

Test Build View

Downstream builds that are launched by above parent build pipelines:

Downstream Build View
Downstream Build View

List of all tests within the build. In this view, TRSS displays test name, test result, test duration and test result history. Columns can be sorted or filtered (to only show FAILED tests or to sort them to the top of the list).

Test Result Summary

From above view, we easily tell cmdLineTester_gcsuballoctests_0 failed. All Platforms shows cmdLineTester_gcsuballoctests_0 test result for all platforms and JDK version in the build.

Test in All Platforms

Deep history shows cmdLineTester_gcsuballoctests_0 test execution history on Linux s390.

Test History

TRSS also displays the test output (as one would see in an individual Jenkins server console view of the test build). Below is cmdLineTester_gcsuballoctests_0 test output.

Test Output
3) Test Compare

TRSS can compare any test output (regardless of test type, build, platforms, etc).  With information for Jenkins server, build name, build name and test name, TRSS can search database and compares test output side by side.  This is an extremely simple but effective way to speed triage by comparing a passed build to a failed one and quickly identifying differences.

Test Output Compare

Implementation Detail:

TRSS uses node.js as server and React as client. It actively monitors multiple Jenkins servers and their jobs. TRSS parses the jobs outputs and store parsed data into MongoDB. If needed, it also stores links to Artifactory for extra data (i.e., logs, core files, erc).

TRSS Overview

 

If special data is needed for display (i.e., new measurement), user can easily add parser code to TRSS server and client so that different type of tests can be parsed and displayed.

In the multi-server, multi-project scenario, TRSS is a lightweight and customizable open-source solution to monitor, display, compare and triage test results and store history test data. The tool itself is project-agnostic and can be generally applied to any Jenkins-based builds or projects.

We are still in the early development stage.  TRSS is the stepping stone for us to create/integrate with other microservices. For future enhancement, we may tie into Watson Analytics and try out cognitive triage experiments. If you are interested in helping build and improve this project, please engage us in the AdoptOpenJDK #testing Slack channel.   🙂

Jumpstarting Open Performance Testing

Before I dabble in the juicy world of computer architectures and measuring and understanding performance implications, let me premise this entire post with a quick introduction to myself.

I am not a performance analyst, nor am I a low-level software developer trying to optimize algorithms to squeeze out the last ounce of performance on particular hardware.

While I am fortunate to work with people who have those skills, I myself am a ‘happy-go-lucky’ / high-level software developer.  My focus in the last few years has been developing my skills as a verification expert.  I have a particular interest in finding solutions that help testing software easier and more effective.  One flavour of software verification is performance testing.

While I am fairly new to performance benchmarking,  I am experienced in streamlining processes and tools to reduce the friction around common tasks.  If we want to empower developers to be able to benchmark and test the performance impact that their changes, we need to create tools and workflows that are dead easy.  I personally need it to be dead easy!  “Not only am I the hair club president, I am also a client“.   I want to be able to easily run performance benchmarks and at some level understand the results of those benchmarks.  (This seems like a good time to segue to the recent open-sourcing of tools that help in that effort, PerfNext/TRSS and Bumblebench… more to come on that later in preparation for “Performance Testing for Everyone” at EclipseCon Europe).

But back to the current story, a wonderful opportunity presented itself.   We have the great fortune at the AdoptOpenJDK project to work with many different teams, groups and sponsors.  Packet, a cloud provider of bare-metal servers, is one of our sponsors who donates machine time to the project allowing us to provide pre-built and tested OpenJDK binaries from build scripts and infrastructure.  They are very supportive of open-source projects, and recently offered us some time on one of their new Intel® Optane™ SSD servers (with their Skylake microarchitecture).

Packet and AdoptOpenJDK share the mutual goal of understanding how these machines affect Java™ Virtual Machine (JVM) performance.  Admittedly, I attempted to parse all of the information found in the Intel® 64 and IA-32 Architectures Optimization Manual, but needed some help. Skylake improves on the Haswell and Broadwell predecessors.  Luckily, Vijay Sundaresan, WAS and Runtimes Performance Architect, took the time to summarize some features of the Skylake architecture.  He outlined those features having the greatest impact on JVM performance and therefore are of great interest to JVM developers.  Among the improvements he listed :

  • Skylake’s 1.5X memory bandwidth, higher memory capacity at a lower cost per GB than DRAM and better memory resiliency
  • Skylake cache memory hierarchy is quite different to Broadwell, with one of the bigger changes being that it stopped being inclusive
  • Skylake also added AVX-512 (512 bytes vector operations) which is a 2X improvement over AVX-256 (256 bytes vector operations)

Knowing of those particular improvements and how a JVM implementation leverages them, we hoped to see a 10-20% improvement in per-core performance.  This would be in keeping with the Intel® published SPECjbb®2015 benchmark** (the de facto standard Java™ Server Benchmark) scores showing improvements in that range.

We were not disappointed.  We decided to run variants of the ODM benchmark.  This benchmark runs a Rules engine typically used for automating complex business decision automation, think analytics (compliance auditing for Banking or Insurance industries as a use case example).  Ultimately, the benchmark processes input files.  In one variant, a small set of 5 rules, in the other a much larger set of 300 rules was used.  The measurement tracks how many times a rule can be processed per second, in other words, it measures the throughput of the Rules engine with different kinds of rules as inputs.  This benchmark does a lot of String/Date/Integer heavy processing and comparison as those are common datatypes in the input files.  Based on an average of the benchmark runs that were run on the Packet machine, we saw a healthy improvement of 13% and 20% in the 2 scenarios used.

ODM results summary
Summary of ODM results
PerfNext/TRSS graph view of ODM results
ODM results from PerfNext/TRSS graph view

We additionally ran some of our other tests used to verify AdoptOpenJDK builds on this machine to compare the execution times… We selected a variety of OpenJDK implementations (hotspot and openj9), and versions (openjdk8, openjdk9, and openjdk10), and are presenting a cross-section of them in the table below.  While some of the functional and regression tests were flat or saw modest gains, we saw impressive improvements in our load/system tests.  For background, some of these system tests create hundreds or thousands of threads, and loop through the particular tests thousands of times.  In the case of the sanity group of system tests, we went from a typical 1 hr execution time to 20 minutes, while the extended set of system tests saw an average 2.25 hr execution time drop to 34 minutes.

To put the system test example in perspective, and looking at our daily builds at AdoptOpenJDK, on the x86-64_linux platform, we have typically 3 OpenJDK versions x 2 OpenJDK implementations, plus a couple of other special builds under test, so 8 test runs x 3.25 hrs = 26 daily execution hours on our current machines.  If we switched over to the Intel® Optane™ machine on Packet, would drop to 7.2 daily execution hours.  A tremendous savings, allowing us to free up machine time for other types of testing, or increase the amount of system and load testing we do per build.

The implication?  For applications that behave like those system tests, (those that create lots of threads and iterate many times across sets of methods, including many GUI-based applications or servers that maintain a 1:1 thread to client ratio), there may be a compelling story to shift.

System & functional test execution times
System and functional test results and average execution times

Having this opportunity from Packet, has provided us a great impetus to forge into “open performance testing” story for OpenJDK implementations and some of our next steps at AdoptOpenJDK.  We have started to develop tools to improve our ability to run and analyze results.  We have begun to streamline and automate performance benchmarks into our CI pipelines.  We have options for bare-metal machines, which gives us isolation and therefore confidence that results are not contaminated by other services sharing machine resources.  Thanks to Beverly, Piyush, Lan and Awsaf for getting some of this initial testing going at AdoptOpenJDK.  While there is a lot more to do, I look forward to seeing how it will evolve and grow into a compelling story for the OpenJDK community.

Special thanks to Vijay, for taking the time to share with me some of his thoughtful insights and great knowledge!  He mentioned with respect to Intel Skylake, there are MANY other opportunities to explore and leverage including some of its memory technologies for Java™ heap object optimization, and some of the newer instructions for improved GC pause times.  We encourage more opportunities to experiment and investigate, and invite any and all collaborators to join us.  It is an exciting time for OpenJDK implementations, innovation happens in the open, with the help of great collaborators, wonderful partners and sponsors!

** SPECjbb®2015 is a registered trademark of the Standard Performance Evaluation Corporation (SPEC).