
A JVM Does That? (2011) [pdf] - bshanks
http://www.azulsystems.com/blog/wp-content/uploads/2011/03/2011_WhatDoesJVMDo.pdf
======
chollida1
If you are part of the HFT crowd who uses the JVM then you know who the
author, Cliff Click, is.

His Blogs (former and current) are goldmines for high performance JVM
information.

[http://www.azulsystems.com/blog/author/cliff](http://www.azulsystems.com/blog/author/cliff)

[http://www.cliffc.org/blog/](http://www.cliffc.org/blog/)

I had one CTO of a public HFT firm tell me that short of FPGA's Azul's pause-
less GC
([https://www.azul.com/products/zing/pgc/](https://www.azul.com/products/zing/pgc/))
was the biggest performance win they'd had in the past few years, note: this
was a few years ago.

EDIT: someone emailed asking for how HFT firms write their java code. I
haven't written java in 3 years so I'm probably not the best person to author
a list but this is what I'd include:

In order of importance:

\- Measure, measure, measure. Every HFT firm can tell you to the nanosecond,
as much as that's possible, what their time is from receiving a packet to when
it's replied to. Focus as much, if not more on the 95%, 99% times as the
average time.

This time increasing is considered a bug in the same way that an app crashing
due to user input is considered a bug, which is to say that you just don't
ship with this kind of bug.

\- no GC's, everything else starts to become less important if the GC is
called every 30 seconds. Some guys will have only one GC call a day, which
means you have a really large Eden GC space

\- short call stacks, and no recursion

\- one physical core per thread

\- non locking data structures and use the one writer principle where possible

\- don't use the kernel if possible, ie solar flare Ethernet cards that have
userland drivers

\- cache friendly data structures

Watch this video:
[https://www.youtube.com/watch?v=iINk7x44MmM](https://www.youtube.com/watch?v=iINk7x44MmM)

~~~
azeirah
If performance is THAT critical in HFT, why do they pick java as the starting
language?

I understand that java is very fast, faster than most people believe, but does
it beat hand-optimized assembly? Fortran? C?

Why Java?

~~~
PaulHoule
It's close to impossible to get certain kinds of code correct in C, for
instance, string parsing. Also until pretty recently, C and C++ did not have a
sane memory model so what happened when threads was involved was dependent on
what CPU you are on, the sign of the moon, etc.

Java could very well be the first programming language designed by adults and
it shows. (That is, they solve the hard problems, not just pretend they don't
matter)

~~~
vvanders
The amount of contortions you need to do with your "adult" language arguably
negates a lot of advantages of using it. Just because you don't have pointers
doesn't mean that all bugs are now eliminated.

Take Android for instance, getting solid performance out of the JVM there is a
huge bear because the language is just not suited to low latency operations. I
won't claim that C is a perfect language but I hate it when I see people
throwing around that you're committing some sin for each line of C code you
write.

If you want cache friendly operations and predictable performance you're not
going to find it in any JVM or language that has a GC.

~~~
PaulHoule
I just think the people who make Android are high on drugs or something.

For instance the subjective experience of using Android is that you can never
close any app that you've opened other than by uninstalling it. Even after you
turn your machine off and turn it back on you still see windows for every f--
king app.

Then they run a bunch of articles about what an idiot you are if you try to
close these because it won't save your phone's battery.

Well I admit I do have some cognitive limitations and it is --hard-- to scroll
through 30 apps just to switch from (say) the web browser to the PDF viewer,
but I guess Google thinks it is great this way because you always have a
Google Plus window open.

Then there are all the articles about the fancy power management they are
going to have someday that doesn't face up to the fact that an android device
may or may not charge if you plug it into a charger, might turn itself off
when it is running, and that the most reliable way to turn it on is to do a
hard reset... And this isn't any old piece of junk, this is a nexus device.

~~~
vvanders
Those are just screenshots captured from the app you're seeing. If you switch
to any OpenGL based app you'll see it restart.

FWIW Force Stop from the settings->app section will stop an app unless it's
forced to be sticky because another service depends on it.

~~~
BinaryIdiot
> Those are just screenshots captured from the app you're seeing.

That's not entirely accurate. Many of the apps can and will stay in memory (it
doesn't suspend all of them). Not that that's necessarily a bad thing.

~~~
vvanders
If you want to get _really_ pedantic those are screenshots in the app drawer
that are swapped with an application's surface in SurfaceFlinger just as it
comes fullscreen.

But yeah, Android doesn't kill apps outright when you switch away, that would
be a pretty poor experience to restart each time you opened a link in Twitter
and came back from it for instance.

~~~
BinaryIdiot
> if you want to get really pedantic those are screenshots in the app drawer
> that are swapped with an application's surface in SurfaceFlinger just as it
> comes fullscreen.

Oh I know I was just being specific since you said they were "just"
screenshots. Wanted to make sure it was clear some of those apps may in fact
still be in memory. ️

------
lordnacho
Sitting at the offices of an HFT firm here.

I think there's more than one kind of fast. There's development speed, and
there execution speed. Both are important.

I don't know what qualifies as HFT these days, but a loose definition is
people who trade more often than once a day. That's still a whole lot of
different people.

At one end, there's textbook, pure footrace arbitrage. You something offered
at 100 on one exchange and bid 101 on another. You rush to do them both. This
is possibly the most obvious strategy ever, and the only thing that matters is
how fast you can get those messages to the relevant exchanges.

On the other end, there's more intricate stuff like statistical arbitrages (a
wide category), where there's more than one way to skin a cat. Some principles
are known from finance 101, but your implementation will be slightly different
to other people's. You still want to execute fast, because the opportunity may
not be there forever, but it's not like every time you seen an opportunity you
know it's first come, first served.

My sense is that those closer to the former will tend to use c++ over JVM.
Most people are not going to have the time to carefully test what's faster,
and c++ has the reputation as the thing that will be faster.

Similarly, JVM languages have a reputation for being faster to code in. As
strategies get more complex, you need code agility. There's a lot of changing
things up when you're writing strategy. Of course, results vary, but if you're
building a strategy platform, you probably go for JVM.

But that's just what my gut feeling tells me. I've only met the people I've
met, and it's not that easy to find public info on just what people are
getting up to. I'd love to hear what kinds of strats are running on Azul.

~~~
nvarsj
It's been a while since I was working in HFT, but the HFT firms making the
most money were all using Java predominantly. Yeah a lot of firms use C++ but
I think that's more because of myth than actual benefits.

Pure simple arbitrage is a race to the bottom very quickly, and it's very
difficult to remain profitable with those sort of simple strategies regardless
of what language you use. Maybe FPGAs and radio networks could let you make
money off of simple arbitrage (someone must be making money from it). C++
isn't going to give much advantage over Java with GC disabled.

~~~
HockeyPlayer
I'm at a Denver based HFT firm, we are all Java. We do options market making,
triangles on currency futures, calendar spreads, etc.

------
geodel
This is interesting

    
    
      – (Azul GPGC: 100's of Gig's w/10msec)
      – (Stock full GC pause: 10's of Gig's w/10sec)
      – (IBM Metronome: 100's Megs w/10microsec)
    

Some how these numbers are hard to come by in Java land where any request for
hard numbers receives response like Java GCs are 'generational', 'state of the
art', 'best in industry' etc. All the technical arguments while true of course
but they do not tell numbers.

~~~
nulltype
I think when comparing numbers like that you might need throughput figures as
well. Supposedly there is a tradeoff between GC pauselessness and throughput.

~~~
cliffc
Azul's GC required a read-barrier (some instructions to be executed on every
pointer read) - which cost something like 5% performance on an X86. In
exchange _max_ GC pause time is something in the low _microsecond_ range (I
helped implement the Azul JVM and portions of the GC relating to starting and
stopping threads)

~~~
Gibbon1
A friend of mine that worked for Azul said that older 0x86 processors didn't
support the instructions that they needed. Which is why originally they built
their own hardware. I'm uncertain what those are, but my impression is you
can't do what they needed with a single atomic word access.

~~~
sievebrain
There's a talk somewhere (by Cliff again) on how the Azul chips differed from
x86. Intel seems really slow to add features useful for high level GCd
languages but apparently nowadays x86 has caught up which is why they don't
bother making their own chips anymore.

~~~
cliffc
No change to the X86, instead user-mode TLB handler from RedHat allows ptr-
swizzeling, that plus some careful planning and the read barrier fell to 2 X86
ops - with acceptable runtime costs. Cliff

------
dmytroi
Might be unpopular opinion, but I kinda like to look how people introduce
will-be-a-problem-in-the-future things (let's say GC) and than heroically are
solving following problems (pause-less GC), it's like Don Quixotes of modern
age, except you build windmills yourself.

~~~
usrusr
Priorities. Getting rid of manual memory management can be a worthwhile goal
even with slow gc, just like not having expensive runtime memory management
automatisms can be a worthwhile goal. Making gc faster (or just more
predictable) on top of that is not heroically solving a follow-up problem,
it's an optional bonus achievement.

~~~
dmytroi
I might be too old fashioned, but the only case where some sort of memory
management is pleasant and comfortable - constructing/modifying strings,
though even here many languages separate mutable and immutable strings into
different beasts.

In all other cases just allocate virtual page from OS (way too old school I
understand, people usually don't use this one and believe that C Runtime
Library with malloc is the only "approved" way to allocate memory), you get
from 4kb to many megabytes of continuous memory - do whatever you want there,
and deallocate whole virtual page later on, no leaks, no problems. Sure I do
understand that manually managing millions of objects on heap is hard, but
managing 1 to 10 virtual pages? it's simple and easy, just forged about
generic allocators and use specialized ones (like block allocator for
example).

Also it's good to know if you even need to deallocate at all ... for example
ninja build system ([https://ninja-build.org](https://ninja-build.org)) simply
doesn't deallocate memory after build is done, just because it's too slow to
do it (though they do use heap allocations), just kill the app and OS will
clean up after you.

PS. On other hand what I'm saying is just a personal rant. Reality is that the
industry is mostly moving to tools that don't allow writing unsafe code (for
example languages without pointer arithmetic), and this might be very limiting
in language expressiveness. There is an another approach - statically proving
that unsafe code is safe. This is what "Checked C" is all about, and this is
how Windows drivers are working without crashing the kernel (at least BSODs
are pretty rare on Windows nowadays). But I do agree that tools for statically
proving unsafe code are so non casual that only greybeards are interested in
them nowadays :(

------
frankpf
I'm probably missing something obvious here, but on page 25 it says:

    
    
      Azul Systems has been busy rewriting lots of it
      - Many major subsystems are simpler, faster, lighter
      - >100k diffs from OpenJDK
    

If Azul Systems' JVM is based on OpenJDK, shouldn't it be open source? OpenJDK
is licensed under GPLv2.

~~~
desdiv
OpenJDK is licensed under GPLv2 but it used to be available under commercial
licenses as well. Kinda like how MySQL is GPL/commercial dual licensed.

------
derefr
If you were incentivized enough (by being an HFT firm, say), is there anything
stopping you from re-implementing the Erlang memory/GC-model on the JVM? I
know the JVM has green-threads (somewhere amongst a million other concurrency
primitives); but could it be modified to give each green-thread its own heap,
and then do background-GC passes of each green-thread heap when that green-
thread isn't currently scheduled?

You'd certainly have to write your Java somewhat differently—though you
wouldn't necessarily have to move to a full-fledged no-shared-memory message-
passing model to see a benefit. (For example, every JVM object could be
treated similarly to Erlang's large binaries, where they exist as refcounted
objects in a shared heap, and then the ref-handle within the green-thread heap
is _itself_ refcounted by the green-thread, so the shared heap only needs to
be synchronously updated when a green-thread heap is discovered, on GC, to
have released its last ref-ref.)

~~~
newobj
Relevant: [http://mechanical-sympathy.blogspot.com/](http://mechanical-
sympathy.blogspot.com/)

~~~
mamcx
Exist a resource like this for .NET?

------
bogomipz
I am curious if someone could shed some light on page 15 - "Illusion:
Consistent Memory Models", specifically:

X86 is very conservative, so is Sparc Power, MIPS less so IA64 & Azul very
aggressive

I dont't thing I've heard the term "conservative memory model" before, in what
sense are x86 and SPARC conservative?

Also why is Azul mentioned as an ISA?

~~~
PeCaN
Azul has their own hardware for facilitating pauseless garbage collection with
their JVM.

I recall they have a pure software pauseless JVM now as well, but it's not
nearly as high performing.

~~~
cliffc
Azul hardware had a very large core count of low-performing cores. If you had
enough parallelism then it was hard to beat - but most applications didn't
have enough parallel work, so the market wasn't big enough

~~~
PeCaN
I… just got corrected by Cliff Click~! (You're like my language implementation
hero (along with Mike Pall).)

Vega sounds like it would have a nice niche in network processing. Shame it
didn't find a big enough market. It was pretty interesting from what I read.

------
markokrajnc
These slides are great! "Standing on the shoulders of giants..." If time-to-
market is important, JVM is the way to go... Manually managed memory is for
low-level/OS software, GC is for the rest (apps)... with some exceptions...

------
cmrx64
(2011)

Good slides anyway. Anyone know of a recording of the corresponding talk?

~~~
dmit
[https://www.youtube.com/watch?v=uL2D3qzHtqY](https://www.youtube.com/watch?v=uL2D3qzHtqY)

~~~
cmrx64
Thanks! Guess I should have searched on youtube for the title - I searched on
google for the conference mentioned in the slides and didn't find it.

------
Thaxll
I'm curious about going above 32GB of heap size, heard that compressed
pointers kill performance?

~~~
sluukkonen
Going above 32G doesn't kill performance per se, it just means that there is
not point in using heap sizes between 32G and 48G due to the larger pointers.
If you go above that, it's all good again.

------
hyperpape
There's another talk that you might want to watch if you liked these slides:

[https://www.youtube.com/watch?v=vzzABBxo44g](https://www.youtube.com/watch?v=vzzABBxo44g)
Bits of advice for VM writers.

------
ktRolster
Is this assertion really true?

 _Class files are a lousy way to describe programs_

Class files seem alright to me

------
ww520
The JVM's memory model was something amazing when I first encountered it.

------
bshanks
Thank you pas for this link (via
[https://news.ycombinator.com/item?id=12025929](https://news.ycombinator.com/item?id=12025929)
)

------
AlexCoventry
What does this mean?

    
    
      Large chunks of code are fragile (or very 'fluffy' per line 
      of code)

~~~
cliffc
It means what it says it means? "fragile" code is code that can't be touched
because touching it causes it to "break" \- exhibit bugs that can't easily be
fixed. "fluffy" code is bulky code: code that says in 1000 lines what could
better be said in 10 lines. Cliff

~~~
sievebrain
What's your view on Graal? Is that making headway on de-fluffing and
robustifying the code or does this not really tackle the issues you had in
mind?

------
vonnik
Relevant:

JavaCPP, the glue code that acts as a bridge to C++/C

[https://github.com/bytedeco/javacpp](https://github.com/bytedeco/javacpp)

JavaCPP's creator works with Skymind, and we use it for our open-source deep
learning framework, Deeplearning4j

[https://github.com/deeplearning4j/deeplearning4j/](https://github.com/deeplearning4j/deeplearning4j/)

We're push the limits of the JVM (and we love Cliff's work!).

~~~
badlogic
JavaCPP does not compile Java bytecode to C++. It is a Java to C/C++ bridge
ala JNA, just better. Major difference.

