magaudet and I are from IBM and are happy to answer any questions.
tl;dr: IBM is breaking apart its JVM runtime technology to allow easy reuse and integration of its GC, JIT and Tooling into other languages. This has been proven with a JIT + GC + Diagnostics enabled version of Ruby (MRI) and Python (CPython)
This looks like great work! Will there be a paper?
Why did you decide to show results from the synthetic benchmarks in bench9000, when there are 43 kernels from real compute-intensive libraries that are in-use in production (the PSD.rb and Chunky PNG benchmarks)?
Working on JRuby+Truffle I found that I could actually better optimise those benchmarks compared to the synthetic ones. The more complex code provides more opportunities to make big gains in performance compared to the simple code in the synthetic benchmarks.
We presented early perf numbers from the classic subset which we felt had well known benchmarks. We wanted to present an good overall view of what we can do right now as opposed to biasing towards only the best running benchmarks. There was also the matter of presentation, its a bit easier to present a chart of 9 versus 43.
The team is interested in working on a paper, once the hard work is closer to being done.
(BTW: Count me as a very happy user of Bench9k. It's been a very appreciated tool here. I'm hoping to contribute some of our harness customizations back when I have the time to clean up the commits a little).
Our approach allows us to integrate with the existing runtimes, enhancing them, rather than having a new implementation. The PyPy approach is capable of much bigger gains, but we can integrate directly into the existing language community, along with all extensions.
tl;dr: IBM is breaking apart its JVM runtime technology to allow easy reuse and integration of its GC, JIT and Tooling into other languages. This has been proven with a JIT + GC + Diagnostics enabled version of Ruby (MRI) and Python (CPython)