Jumpstarting Open Performance Testing

Before I dabble in the juicy world of computer architectures and measuring and understanding performance implications, let me premise this entire post with a quick introduction to myself.

I am not a performance analyst, nor am I a low-level software developer trying to optimize algorithms to squeeze out the last ounce of performance on particular hardware.

While I am fortunate to work with people who have those skills, I myself am a ‘happy-go-lucky’ / high-level software developer.  My focus in the last few years has been developing my skills as a verification expert.  I have a particular interest in finding solutions that help testing software easier and more effective.  One flavour of software verification is performance testing.

While I am fairly new to performance benchmarking,  I am experienced in streamlining processes and tools to reduce the friction around common tasks.  If we want to empower developers to be able to benchmark and test the performance impact that their changes, we need to create tools and workflows that are dead easy.  I personally need it to be dead easy!  “Not only am I the hair club president, I am also a client“.   I want to be able to easily run performance benchmarks and at some level understand the results of those benchmarks.  (This seems like a good time to segue to the recent open-sourcing of tools that help in that effort, PerfNext/TRSS and Bumblebench… more to come on that later in preparation for “Performance Testing for Everyone” at EclipseCon Europe).

But back to the current story, a wonderful opportunity presented itself.   We have the great fortune at the AdoptOpenJDK project to work with many different teams, groups and sponsors.  Packet, a cloud provider of bare-metal servers, is one of our sponsors who donates machine time to the project allowing us to provide pre-built and tested OpenJDK binaries from build scripts and infrastructure.  They are very supportive of open-source projects, and recently offered us some time on one of their new Intel® Optane™ SSD servers (with their Skylake microarchitecture).

Packet and AdoptOpenJDK share the mutual goal of understanding how these machines affect Java™ Virtual Machine (JVM) performance.  Admittedly, I attempted to parse all of the information found in the Intel® 64 and IA-32 Architectures Optimization Manual, but needed some help. Skylake improves on the Haswell and Broadwell predecessors.  Luckily, Vijay Sundaresan, WAS and Runtimes Performance Architect, took the time to summarize some features of the Skylake architecture.  He outlined those features having the greatest impact on JVM performance and therefore are of great interest to JVM developers.  Among the improvements he listed :

  • Skylake’s 1.5X memory bandwidth, higher memory capacity at a lower cost per GB than DRAM and better memory resiliency
  • Skylake cache memory hierarchy is quite different to Broadwell, with one of the bigger changes being that it stopped being inclusive
  • Skylake also added AVX-512 (512 bytes vector operations) which is a 2X improvement over AVX-256 (256 bytes vector operations)

Knowing of those particular improvements and how a JVM implementation leverages them, we hoped to see a 10-20% improvement in per-core performance.  This would be in keeping with the Intel® published SPECjbb®2015 benchmark** (the de facto standard Java™ Server Benchmark) scores showing improvements in that range.

We were not disappointed.  We decided to run variants of the ODM benchmark.  This benchmark runs a Rules engine typically used for automating complex business decision automation, think analytics (compliance auditing for Banking or Insurance industries as a use case example).  Ultimately, the benchmark processes input files.  In one variant, a small set of 5 rules, in the other a much larger set of 300 rules was used.  The measurement tracks how many times a rule can be processed per second, in other words, it measures the throughput of the Rules engine with different kinds of rules as inputs.  This benchmark does a lot of String/Date/Integer heavy processing and comparison as those are common datatypes in the input files.  Based on an average of the benchmark runs that were run on the Packet machine, we saw a healthy improvement of 13% and 20% in the 2 scenarios used.

ODM results summary
Summary of ODM results
PerfNext/TRSS graph view of ODM results
ODM results from PerfNext/TRSS graph view

We additionally ran some of our other tests used to verify AdoptOpenJDK builds on this machine to compare the execution times… We selected a variety of OpenJDK implementations (hotspot and openj9), and versions (openjdk8, openjdk9, and openjdk10), and are presenting a cross-section of them in the table below.  While some of the functional and regression tests were flat or saw modest gains, we saw impressive improvements in our load/system tests.  For background, some of these system tests create hundreds or thousands of threads, and loop through the particular tests thousands of times.  In the case of the sanity group of system tests, we went from a typical 1 hr execution time to 20 minutes, while the extended set of system tests saw an average 2.25 hr execution time drop to 34 minutes.

To put the system test example in perspective, and looking at our daily builds at AdoptOpenJDK, on the x86-64_linux platform, we have typically 3 OpenJDK versions x 2 OpenJDK implementations, plus a couple of other special builds under test, so 8 test runs x 3.25 hrs = 26 daily execution hours on our current machines.  If we switched over to the Intel® Optane™ machine on Packet, would drop to 7.2 daily execution hours.  A tremendous savings, allowing us to free up machine time for other types of testing, or increase the amount of system and load testing we do per build.

The implication?  For applications that behave like those system tests, (those that create lots of threads and iterate many times across sets of methods, including many GUI-based applications or servers that maintain a 1:1 thread to client ratio), there may be a compelling story to shift.

System & functional test execution times
System and functional test results and average execution times

Having this opportunity from Packet, has provided us a great impetus to forge into “open performance testing” story for OpenJDK implementations and some of our next steps at AdoptOpenJDK.  We have started to develop tools to improve our ability to run and analyze results.  We have begun to streamline and automate performance benchmarks into our CI pipelines.  We have options for bare-metal machines, which gives us isolation and therefore confidence that results are not contaminated by other services sharing machine resources.  Thanks to Beverly, Piyush, Lan and Awsaf for getting some of this initial testing going at AdoptOpenJDK.  While there is a lot more to do, I look forward to seeing how it will evolve and grow into a compelling story for the OpenJDK community.

Special thanks to Vijay, for taking the time to share with me some of his thoughtful insights and great knowledge!  He mentioned with respect to Intel Skylake, there are MANY other opportunities to explore and leverage including some of its memory technologies for Java™ heap object optimization, and some of the newer instructions for improved GC pause times.  We encourage more opportunities to experiment and investigate, and invite any and all collaborators to join us.  It is an exciting time for OpenJDK implementations, innovation happens in the open, with the help of great collaborators, wonderful partners and sponsors!

** SPECjbb®2015 is a registered trademark of the Standard Performance Evaluation Corporation (SPEC).

JCK Certification and An Anniversary of Sorts

Exactly a year ago today, Tim Ellison sent me a note.  He had just watched a presentation I had recorded, talking about the work my team had started to vastly ‘simplify Java testing’.

He mentioned that there was this project he was involved with, “AdoptOpenJDK”, where they were talking about some of the same concepts that we were implementing.  He wondered if what we had started implementing could be used at this project.  I replied, “sure, by when”.  His answer, “last week”.

Here we are, 1 year later, diligently improving the way we test Java.  I am witnessing the vision that we had laid out over a year ago of “make test… better” become reality.  It is a collaborative and fun effort!  Running all kinds of testing, and very notably this week, AdoptOpenJDK project is able to claim its first JCK certified builds, starting with openjdk8-openj9 builds on 3 Linux platforms (x64/ppc64le/s390x).  See the check marks in the openjdk8-openj9 build archive  (also available in Docker images).

I really do feel lucky to be part of this project, and to work with the small but dedicated team of folks who make it fly.  A big thank you and congratulations to the team on this anniversary of sorts, and oh how you capped it off with the JCK compliant icing!  I can only imagine what it will look like a year from now, as we continue to innovate, refine and deliver on our goals.

Testing Java: Help Me Count the Ways

Wow! Hard to believe how much progress has been made since I first posted a mission statement of sorts… (in Part 1: Testing Java: Let Me Count the Ways).  As I look back over 2017, and assess where we are at with testing the OpenJDK binaries being produced at adoptopenjdk.net, I am prompted to write this “Part 2” blog post.  The intent is to share the status and some of the accomplishments of the talented and dedicated group of individuals that are contributing their time, skills and effort towards the goal of fully open-source, tested and compliant binaries.

We have added 4 test categories to date (which constitute tens of thousands of tests, running on several platforms, with more to come as machines become available):

  • OpenJDK regression tests (“make openjdk“) – from OpenJDK
  • system (and stress) tests (“make system“) – from AdoptOpenJDK/openjdk-systemtest
  • 3rd party application tests (“make external“) – the unit tests from each application’s github repo, such as Scala, Derby, Cassandra, etc.
  • compliance (JCK) tests (“make jck“) – under OCTLA License (OpenJDK Community TCK License Agreement)

With 2 more test categories on the way:

  • functional tests (“make functional“) – from Eclipse/openj9
  • performance benchmarks (“make perf“) – from various open-source github repos

To make it easy to run these tests, we’ve added an intentionally thin wrapper that allows us to call some logical make targets to execute these tests.  So the ways that we can tag and categorize the test material include by:

  • Test group (as listed above, openjdk, system, external, jck, functional, perf)
  • Java version (we currently test Java8, Java9 and Java10 builds)
  • VM implementation (we currently test OpenJDK with Hotspot, and OpenJDK with OpenJ9)
  • Test level (for example, a quick check for pull request builds, we can tag a subset of tests from any group with “sanity”, to run the entire sanity set, “make sanity“, to run just the subset of openjdk or system tests that have been tagged then “make sanity.openjdk” and “make sanity.system” respectively)

For more details on this, please check out the presentation Test Categorization and Test Pipelines at AdoptOpenJDK.

There is also an open discussion on setting the test criteria for marking a build “good”.  With the most stringent bar applied to the release builds, nightly builds will have less testing (with constraints such as machine resources and time as factors), and PR builds as mentioned being a targeted sanity check.

We are still getting some of these test builds up and running.  And this is where the call for assistance comes in…  We would love to have extra hands and eyes on the tests at AdoptOpenJDK.  While there are too many tasks to list in this post, here are some meaty work items to pique your interest:

  • Triaging any of the open test issues, we know some are likely due to test or machine configuration irregularities, while others are known OpenJDK issues.  Properly annotating the current set of failures and rerunning/re-including fixed issues are top of the TODO list.
  • Enabling more 3rd party application tests (currently Scala, and shortly Derby and Solr/Lucene tests are running, with the opportunity to include many more).
  • A large percentage of the JCKs are automated.  There is however a set of these compliance tests that are manual/interactive.  We are looking for some dedicated volunteers to help step through the interactive tests on different platforms.
  • We have automated these tests in Jenkins Pipeline builds, and want to continue adding builds for the various platforms that the binaries are built on, extra hands here would also be very helpful

Seeing all this come together has been very rewarding.  It has been wonderful to work with the capable and dedicated folks working on the AdoptOpenJDK project.  There is still a long way to go, but we have a great base to start from, and it seems we can make a big difference by offering a truly open and flexible approach to testing Java.  If you really want to learn more about Java, join us in testing it!

Playing with Test Microservices

Recently, I had the good fortune to speak (fondly!) about some excellent open-source projects that I participate in…  AdoptOpenJDK, Eclipse OpenJ9 and Eclipse OMR.  Across those three projects, we are trying to simplify the activity of testing.  One of the great side-effects of simpler, easier testing is that it frees up time to do some fun work around building microservices for test.  We hope that we can eventually bring the best of these microservices into the open also, to help all open-source endeavours be even better at the activities of test.

If you are interested in hearing more about microservices for test, please give a listen to “Cloud-based Test Microservices“, a presentation which I gave at the Eurostar conference in Copenhagen this year.

 

Testing Java: Let Me Count the Ways

For years now, I have been testing Java and if there is a single statement to make about that activity, it is that there are many, many, many ways to test a Java Virtual Machine (JVM).

From code reviews and static analysis, to unit and functional tests, through 3rd party application tests and various other large system tests, to stress/load/endurance tests and performance benchmarks, we have a giant set of tests, tools and test frameworks at our disposal.  Even the opcode testing in the Eclipse OMR project helps to test a JVM.  From those low-level tests, all the way up to running some Derby or solr/Lucene community tests, or Acme Air benchmark there are many ways to reveal defects.  If we can find them, we can fix them… (almost always a true statement).

One common request from developers is “make it easier for me to test”.  Over the last year, we have been working on that very request.  Recently, I’ve had the good fortune to become involved in the AdoptOpenJDK project.  Through that project, we have delivered a lightweight wrapper to loosely tie together the various tests and test frameworks that we use to test Java.

We currently run the set of regression tests from OpenJDK (nearly 6000 test cases).  Very soon, we will be enabling more functional and system-level tests at that project.

My goal with that project is to make it super easy to run, add, edit, exclude, triage, rerun and report on tests.  To achieve that goal, we should:

  • create common ways of working with tests (even if they use different frameworks, or focus on different layers in the software stack)
  • limit test code/infrastructure bloat
  • choose open-source tools and frameworks
  • keep technical ego in check to restrict needless complexity for little or no gain in functionality

There is a lot of work ahead, but so far, its been fun and challenging.  If you are interested in helping out on this grand adventure, please check out the project to see how to get involved at AdoptOpenJDK.