
G1 Garbage Collection JVM7 - Big Performance Problems Shown - NerdsCentral
http://nerds-central.blogspot.com/2011/11/comparing-java-7-garbage-collectors.html
======
fleitz
"a realtime system must have a deterministic time to complete a task".

Incorrect, a realtime system must only have execution times below a certain
threshold. malloc/free is just as non-deterministic as GC allocated memory (in
practical systems). If you put effort into it you can make malloc/free do
really stupid things too. If I wrote a program that allocated and freed the
right size strings the program would eventually crash.

With any sort of modern CPU it would be almost impossible to produce non-
trivial programs with deterministic execution times. I'd hazard a guess that
there are very few 'realtime' programs that couldn't be written using the JVM.
If you can do HFT on the JVM it's realtime enough.

To make systems with deterministic execution times you'd have to start looking
at extremely limited processors that lack RAM (or have really exotic ram that
doesn't refresh), as well as removing other resources that are essential to
building software the modern way.

All this proves is that for this particular problem the JVM is probably
unsuitable for a realtime system. The fortunate thing is that I don't think
any real time program actually needs to allocate strings in this manner.

In all seriousness most of the reason that the JVM is not used on realtime
systems is because there aren't very many cheap CPUs capable of executing java
byte code that are certified for operation in the harsh environments that a
lot of realtime systems operate in. My friend builds realtime systems and he
codes in C because that's the only compiler available for the CPU.

He'd start using Netduino in a heartbeat if it could survive being next to an
electric generator inside the Hoover dam.

~~~
wglb
Slightly off topic, but do you have particular information about JVM being
used in HFT environment?

~~~
fleitz
<http://martinfowler.com/articles/lmax.html?t=1319912579>

This is actually a trading platform but if you can run the exchange on the JVM
you could certainly write a client for it on the JVM.

Also: <http://www.quantfinancejobs.com/jobs/java.asp>

~~~
wglb
I had seen that, but Martin Fowler is not whom I would have thought of as
deeply familiar with low latency and JVM performance under stress.

I will be looking into that platform, but I see one caution flag: _LMAX is a
new retail financial trading platform. As a result it has to process many
trades with low latency._ the key word being "retail". I am wondering if
retail's version of low latency is in seconds or tens of seconds.

~~~
chollida1
> I am wondering if retail's version of low latency is in seconds or tens of
> seconds.

I think you're off by an order of magnitude or two.

Our fund doesn't really worry too much about execution speed and we'd only
tolerate low 100 millisecond latency. 10's of milliseconds is what we'd
prefer.

Many firms want to be lower still.

~~~
wglb
Thanks for the note.

Couple of questions.

1) Do you consider yourself retail? I would have not considered a hedge fund
retail.

2) Can you say if you are colocating your servers?

3) Can you say if you are using JVM in a latency-critical application?

I am working on a benchmark, building a fake exchange framework as a platform
to test relative performance of various languages, where various includes
Lisp, C, Java and possibly others.

~~~
chollida1
> 1) Do you consider yourself retail? I would have not considered a hedge fund
> retail.

Yes fair point. That terminology isn't really well defined. You are correct
that typical retail means individuals and institutional means large funds
(read professionals).

Banks will often count hedgies as retail clients which is where the
terminology gets blurry.

But if you mean non professional individuals then your point a bout latency is
probably correct.

> 2) Can you say if you are colocating your servers?

We don't typically, but we don't live far from the exchanges. We also aren't
super low latency. I honestly think that super low latency is a loosing
proposition as there is usually only one person who can jump on an arbitrage
opportunity.

Having said that we do have one server executing algos that is collocated.

> 3) Can you say if you are using JVM in a latency-critical application?

I can and we aren't though, it's not because of the jvm itself.

------
wglb
Having done some hard-core real-time programming in the past, and currently
thinking about a low-latency benchmark between languages, I am not sure that
this sort of test is useful in a real-time environment. Also, using a garbage-
collected language in a real-time environment would require some extra care.

One of the things to consider in engineering a real-time environment is what
is the central data-flow load of the system. For example, in a medical data-
acquisition environment, allocating an object for each tick would not likely
be wise. In fact, use of malloc might not be called for, statically allocating
buffers being a better avenue.

In the case of a low-latency financial trading system written in a garbage-
collected language, it might be prudent to allocate all the objects before
0830 (if that is when trading starts) and free them after the market close.

There is a wide range of real-time response requirements (for financial
exchanges, response times of less than a hundred of milliseconds (eg, 395 from
BATS) for full turnaround are expected). Electrocardiogram data needs to be
sampled once each millisecond, but the jitter in sampling times needs to be
low.

Looks like some good tools there, but without the engineering context it is
hard to draw a useful conclusion.

~~~
NerdsCentral
Totally agree. My original aim was to use an idea I had about the way the
garbage collector synchronises in the Oracle JVM to give 'windows' of
determinism. So I set about making a program to push the gc really hard so I
could see if my idea worked. But I have not gotten as far as the original aim
yet because I found this interesting stuff about the G1 collector.

As for your points about realtime - I am completely with you. I had in my mind
that one might have a realtime system on the rtjvm or in c++ which needed to
periodically communicate with a standard JVM. There are three approaches I
could see for this. One, would be to send messages and decouple that way.
Another is to make the communication abort if it looked like it might cause a
deadline miss. The third was to briefly (milliseconds) disable the gc in the
standard JVM so the communication could occur and then turn it back on again
immediately afterwards.

Now - that sounds like a really bad idea and it probably is - but it is an
interesting idea to play with.

So - please don't thing I am proposing realtime programming on the standard
JVM, just some ideas for larger systems integration.

~~~
wglb
So my prime hobby is ham radio, and my prime mode is morse code. These days,
we mostly connect computers to the radio to do logging and morse code keying
(done through the LPT port). One popular program runs under windows. The big
boys use another program that runs under DOS, and lately FreeDOS as a
replacement. Running under windows, there is an annoying hesitation once every
word or so that is due to whatever is going on in windows.

The solution, provided by the maker of the above-mentioned windows software is
an external keying module. With this arrangement, windows sends characters to
the hardware brick, which, latency-free, sends out the characters quite
nicely. (Amazing how the trained morse-code ear can distinguish ms delays in
character starts.)

Personally, I now think that way where a system has a component that is
latency rich connected to a component that is latency-free. This of course has
implications to the overall solution.

------
sehugg
A microbenchmark like this is not very conclusive. The G1 collector is
designed to handle the heap fragmentation problem and thus reduce the maximum
pause time due to stop-the-world GC. It's not designed for maximum throughput.

Each GC algorithm has its worst-case behavior; saying "avoid at all costs"
because of a single scenario is not very helpful.

Also, what were the JVM flags for each test? All I can see in these graphs is
the bump at the beginning before the heap has sized to a stable level.

~~~
NerdsCentral
I hope that the article does not imply that it is conclusive but rather that
it is cause for concern. The flags were only those required to set the garbage
collector - everything else is default. If you want to try other settings, the
code is there to use :)

~~~
puredanger
I don't know why I would ever judge the performance of non-tuned GC for any
collector especially when the app is one designed to make life hard for the
collector? There is no useful conclusion you can draw from this test.

------
johanbev
Your graphs make it very hard to compare the performance of the different
collectors. What about putting all of them in the same plot, making that
logarithmic, and include cumulative time (on a separate linear axis)?

~~~
NerdsCentral
OK - I'll see if I can get time.

------
CountHackulus
The G1 Garbage collector isn't certified for realtime operation. If you want a
Java VM that's certified for HARD realtime, you can check out IBM's latest
real time GC policy that I believe came out with IBM's Java 7.

~~~
NerdsCentral
I know - I think the post makes that clear. I explained the purpose of the
whole thing an earlier reply. But Oracle and IBM have realtime implementations
of the JVM I believe, though I am more familiar with the Oracle one.

------
aragozin
G1 is still under active development, but even with ideal implementation it
wouldn't shine best performance

Reason 1. G1 is using SATB write barrier which is more expensive than crad
marking barrier other HotSpot's collectors are using

Reason 2. G1 have to use STW pause to move object around. E.g. if your heap is
fulled in half we life objects, you have to physically move 1MiB live data in
memory to reclaim 1MiB of free space, cleaning sparse regions first will
effectively reduce this proportion but G1 is still have to work very hard to
reclaim each meg of free space

If you need low pause GC in JVM today. use concurrent mark sweep
<http://java.dzone.com/articles/how-tame-java-gc-pauses>
[http://aragozin.blogspot.com/2011/07/gc-check-list-for-
data-...](http://aragozin.blogspot.com/2011/07/gc-check-list-for-data-grid-
nodes.html)

------
dgreensp
Your test is only one very specific scenario. I'm not a GC expert, but on
EtherPad, which was a very complex monolithic JVM app, we had to turn to
concurrent GC as the default GC just choked with any settings. If you haven't
seen much difference between GC settings in the past, have you actually worked
on realistic systems that strain the JVM GC, versus your contrived one?

I don't know the semantics of "real-time" (argued in other comments), but if
you are coming from that angle, maybe you are looking for guarantees -- it
seems like there ought to be a way to write GC so that it just works no matter
what you throw at it, and it seems like G1 is not that.

