
New JIT optimizer in the Zing JVM - dmit
https://stuff-gil-says.blogspot.com/2017/05/zing-hits-trifecta.html
======
mike_hearn
I am wary of picking an argument with Gil Tene, who certainly knows his stuff,
but I wonder about the AVX2 example.

The post makes it sound like Intel only contributes patches for new
instruction sets to LLVM. But they are also active HotSpot contributors.

For instance, here's a post from an Intel employee contributing optimisations
to use AVX512:

[http://mail.openjdk.java.net/pipermail/hotspot-compiler-
dev/...](http://mail.openjdk.java.net/pipermail/hotspot-compiler-
dev/2016-May/022790.html)

I have no idea if HotSpot optimises the specific loop Gil posted, but I've
seen Intel contribute a lot of upgrades to use AVX2 and AVX512, like optimised
compression intrinsics. And as noted, LLVM is not really designed for Java and
they had to spend three years putting in optimisations that Java-specific
JITCs have had for a long time. How many contributions to LLVM will be
directly applicable to Java, given that all the other contributors are focused
on C++-style languages? Use of new instructions, sure, in some cases, but they
aren't going to contribute intrinsics for the Java APIs.

ReadyNow and C4 are very cool. Java has needed these for a long time and Zing
users have been benefiting for a long time. However, Java 9 gets AOT
compilation for some cases, which gives at least some of the benefit of
ReadyNow (there is still warmup time, as HotSpot AOT does not cache profiles
and re-profiles each time, but you get a lot closer to peak performance a lot
faster). And Shenandoah is driving down pause times for huge heaps, which is
where C4 shines.

That said, Java 9 has not yet shipped, and Zing has. So congrats to Gil Tene
and his team. The competition between Zing and HotSpot only helps.

~~~
gopalv
> I have no idea if HotSpot optimises the specific loop Gil posted,

I'm actually curious to see what the output assembly of LLVM is for that loop,
because JDK8 does not optimize a branch in a loop.

But the JVM does do a pretty decent job using ymm<n> registers if you write
the loops without branches.

But you end up with really obscure java code like

"(((a - b) ^ (b - a)) >>> 63) ^ 1"

To convert a branch into an integer.

~~~
mike_hearn
If you're testing with JDK8 you're missing the last two years of work on the
optimiser. I'd be interested to see what Java 9 does, as it's (supposedly)
coming out in a few months.

~~~
opportun1st
JDK9 has the capability to vectorize a loop with branch inside. I tried JDK9
early release with a simple testing case myself.

------
MrBuddyCasino
> some day someone will eventually find (and ship?) some other holy graal

Was that the dis I think it was?

~~~
_pmf_
I think it is, sir.

------
stephen
I'd love the instant start-up time for Java-based tooling like
IDEs/builds/etc. (assuming it's as slick as they make it sound).

It would be nice if there was a developer version, e.g. not for commercial
server usage, but as your local/desktop JVM installation.

~~~
geodel
For that purpose I think Java 9 AOT might be sufficient. Let's see if and when
Intellij/Eclipse move to that.

~~~
needusername
I think in the current form that's unlikely. It's Linux only and needs to be
built on the exact same CPU. This essentially means that you have to compile
on the same system as you execute.

------
Thaxll
How much it cost to run that Zing JVM per server?

~~~
leastangle
To quote the FAQ:

>The single license annual subscription price for Zing is $3500 USD per
server, with significantly lower prices for higher volumes and longer-term
subscriptions.

~~~
ex3ndr
Isn't we can just spin one more VM for each instance instead of paying 3k$?

~~~
StreamBright
Not only that but you can also use the money to buy faster CPU.

~~~
hhandoko
Faster CPU does not eliminate GC pauses if your workload consumes a large
amount of RAM. This is one of the other benefits which Zing provides.

~~~
Chyzwar
You can load balance your app for 3.5k $ on additional server. By reducing
load, time spend in GC will be less. Additionality by having more hardware you
can use more aggressive strategy[0].

[0][http://www.cakesolutions.net/teamblogs/low-pause-gc-on-
the-j...](http://www.cakesolutions.net/teamblogs/low-pause-gc-on-the-jvm)

~~~
jonaf
I have taken Zing for a spin on 24 c3.8xl ec2 instances running Elasticsearch
with 32G heaps and just under 2 billion indexed documents with millions of
updates/day. The results are considerably better. If I just went up to 30 or
even 50 ec2 instances on HotSpot/Openjdk, I would see no performance benefit
with regard to GC. And I have performed this test. I'm not saying dont use
Java because of GC, but I am saying don't overestimate the value of more
efficient garbage collection in any case.

Shoutout to the Azul guys, Simon, Elaine and Paul for coming on-site and watch
me compile all their extensions on Amazon Linux and help me perform all the
testing.

