Hacker News new | past | comments | ask | show | jobs | submit login
JVM JIT optimization techniques (advancedweb.hu)
112 points by sashee on May 28, 2016 | hide | past | web | favorite | 61 comments

It's interesting how most of the optimizations are only really available if you essentially write your programs in a functional style. That is, they all depend on the compiler being able to analyse the local scope to decide whether the state changes are confined enough that it can perform an optimization without changing behavior. If a reference escapes, even into a private, instance variable, the compiler will (admittedly, only based on the shallow analysis in this article), more or less give up, because it can't predict any more how that variable might be mutated from another scope.

Escape analysis is pretty damn capable if you can give it enough time to run.

The analysis is fine, but "escape analysis" is often conflated with stack allocation, and at this hotspot is actually really terrible. The constraints on allocating to the stack mean it happens very infrequently, even if the literal escape analysis would permit it otherwise.

Calling it 'stack allocation' is confusing - object fields are replaced with scalars in the IR. The whole object is not allocated as it would have been on the heap but on the stack in the alloca sense.

But I don't think the constraints are that bad are they? Can you give examples? Modern compilers like Graal can even scalar replace objects if they will escape in the future.

Given that the HotSpot compilers are not Graal and HotSpot's scalar replacement optimisation is notoriously fragile, I think the original statement was fair. Once Java 9 rolls around and Graal is just a plugin instead of a whole separate VM build, it'll be less fair, and if Graal ever becomes the default compiler that replaces C2 then it'll be even more fair, but I guess that is years away at best.

The biggest constraint by far is the reliance on inlining. Inter-procedural escape analysis could potentially remove the need for value types in many places, but it doesn't seem like the JVM engineers believe that's feasible.

It is not fair, because many of us do deployments in other JVMs, but people keep waving the flag Java performance == Hotspot as if e.g. C++ performance == gcc.

Hot spot is based on a relatively old paper at this point from what I remember and there are alternatives which purportedly perform better.

Can you name one?

Ah, I meant "alternatives to HotSpot" which is what the parent was implying.

I've read the Graal papers and part of the source code. So I am familiar with PEA. It's too bad that optimisation didn't make it into C2 yet. I wonder if it ever will.

Someone told me that Zing also does PEA, but I don't know that to be a fact.

Oh and I think JRockit might have done it as well.

That's not a functional style, but an observation of how most (non-pathological) code behaves. If you're talking about escape analysis and stack allocation, the functional style doesn't help, because references escape all the time. That they're pointing to immutable objects doesn't matter.

In any case, whether functional or not, the observation is that many objects have limited scopes, that correspond to structural scopes in the code.

Actually In all analysis seen so far it mostly is way slower to do the Java EE style (biomorphic, megamorphic) and instead use normal classes, static class functions instead of polymophism. So actually a flat level is way faster than any sane stuff which is mostly done in a lot of Enterprise software.

There is something about this site that absolutely destroys Firefox when scrolling. Expensive work in a scroll handler isn't good guys :(

I would not be too quick to blame the authors of the website. It seems likely to be the same Firefox bug as all the other cases of HN comments complaining about bad scrolling performance in Firefox: https://bugzilla.mozilla.org/show_bug.cgi?id=1250947

Bad link, I'm afraid... It's about a box-shadow glitch.

It's the correct link (if you read the last 15 or so comments, it'll be more obvious that there is a performance component to the issue).

It seems you're correct. Specifically, disabling this rule from the inspector instantly fixes the performance issue.

  .widewrapper.main {
    box-shadow: inset 1px 3px 1px -2px #ababab;

I loaded this in Firefox on a Raspberry Pi 3 (don't ask), and never got to see below the fold--burned CPU for a minute or two before I gave up and closed it. I had pressed page-down once--don't know if it ever got around to processing the scroll event.

Don't know the root cause, but the worst most sites get on here is annoyingly sluggish. Between the site and Firefox, something's definitely triggering pathological behavior.

Wow, it is indeed slow on Firefox! Thanks for letting us know, we both test on Chrome so we didn't notice the problem. We'll look into it.

As a workaround you can go to "View > Page Style > No style". It honestly reads just fine without CSS.

Or use the reader view.

Same for me; 46.0.1 windows 7 (if authors care to reproduce).

Page pinwheels on load for me (FF 46.0.1), OSX

for most of the time, many performance considerations are invisible from the higher abstraction levels, so you can concentrate writing simple, elegant and maintainable applications in Java, Scala, Kotlin or anything that runs on the JVM.

Funny statement, at least for Java. I don't know anyone who would say the majority of Java applications aren't huge, slow, memory-hogging beasts compared to equivalent native ones, and whose codebases are just as unoptimised for the perspective of the humans who have to work with them. That's mainly perception from when I briefly worked with Enterprise Java many years ago --- no one wrote "simple" or "elegant" Java applications, and stuffing as many design patterns and abstractions in as you could was considered "best practice".

There are certainly examples of simple and small Java applications (see https://en.wikipedia.org/wiki/Java_4K_Game_Programming_Conte... ), and I've written a few, but the overwhelming culture seems to be that of anti-optimisation and bureaucratic excess. In some sense, it's almost like working against or "challenging" the JVM's optimiser is the norm.

In my experience, a "simple, elegant" design --- which does not necessarily mean "highly abstract" --- tends to be very close to optimal anyway, with the compiler's optimiser doing the remaining work. That makes me wonder whether claiming a language/runtime/compiler has powerful optimisation abilities is actually a reflection of how much the typical source code in the language has to be "cleaned up" by the optimiser in order to be decently efficient.

Also, does anyone else notice the "Machine code" given for the JIT example is completely unrelated to either the Java or bytecode? It's 16-bit realmode --- I recognise the access to the 40h segment, it seems to be timing-related, and a bit of Googling finds that it's actually part of an old TSR clock utility (with unknown source):


> I don't know anyone who would say the majority of Java applications aren't huge, slow, memory-hogging beasts compared to equivalent native ones, and whose codebases are just as unoptimised for the perspective of the humans who have to work with them.

I would, and I've been developing in Java for over ten years now, after ten years of C/C++.

I think that your complaints have nothing to do with Java, and much to do with a coding style that became popular in the '90s. You could see the very same in C++ applications back then, but then Java took over. The big enterprise applications of the '90s and '00s reflect the prevailing mindset of the time. Java applications written today don't look like that.

> In my experience, a "simple, elegant" design --- which does not necessarily mean "highly abstract" --- tends to be very close to optimal anyway, with the compiler's optimiser doing the remaining work. That makes me wonder whether claiming a language/runtime/compiler has powerful optimisation abilities is actually a reflection of how much the typical source code in the language has to be "cleaned up" by the optimiser in order to be decently efficient.

In that case I think your experience may be limited, or that you're unaware how big "the remaining work" may actually be, especially on modern hardware. HotSpot has proven extremely adept at running many different code styles very efficiently. HotSpot's next-gen compiler, Graal[1], is IMO the biggest breakthrough in compilation technology of the past decade. It runs Java faster than C2 (HotSpot's current compiler), Python faster than PyPy, and JS as fast as any other VM out there. It even has a C frontend, which, while not yet matching gcc, performs surprisingly well considering how little effort has been put into it.

This kind of work can be very important, namely, the ability to optimize code across different languages. A recent paper[2] found that re-writing parts of SQLite in Python (!), instead of the original C, can result in a 3x (!) performance boost when running in an application, as application and DBMS code can be optimized as one unit.

[1]: https://wiki.openjdk.java.net/display/Graal/Publications+and...

[2]: http://arxiv.org/abs/1512.03207

That paper says the actual benchmark was only 2.2% faster. What does the 3x speedup actually mean then?

Is Graal not only faster than the Client JVM, but slower than the server VM?

Well if Graal is faster why is it not made part JDK distribution.

As of JDK 9, Graal will be a pluggable JIT that you can opt to use (it's just a Java library). It is not the default JIT because it is still experimental and not productized yet.

> I don't know anyone who would say the majority of Java applications aren't huge, slow, memory-hogging beasts

This is really a matter of perspective. A typical enterprise application is running on a beafy multicore processor and spending most of its threads' cycles waiting for database IO. Also, it's not really a concern if your jar file is 100 MB when your machine has 256 GB of ram. From this perspective, the fact that an application can be quickly slapped together from the multitude of Java libraries and frameworks by a stackoverflow savvy junior developer far far outweighs the marginal performance benefits that could be gained from using a language that compiles to native code.

The fact is that Hotspot has made the jvm perform "good enough" for this kind of domain. And keep in mind that many enterprise developers (at least in my experience) have never used a language besides Java, so this level of performance is normal to them.

Ah, the Java is slow argument. When were you doing Enterprise Java? It must have been around the pre HotSpot years.

As for applications that are huge, memory hogs, complex and non elegant, well, congratulations you just described the majority of enterprise applications.

Not pre-HotSpot; I saw the release of HotSpot, and while it did improve performance considerably compared to before, it was still pretty bad compared to native.

I see the proliferation of "Java is not slow" articles as a sign that it is --- if it wasn't, why would there be a need for such strong propaganda and carefully constructed microbenchmarks? Sun/Oracle have a deep interest in hiding their "naked emperor". Where are all the "C is not slow" articles, for example? I'm sure anyone who has used a non-trivial Java application, not even an "Enterprise" one, is immediately aware that it seems slower, even those who might not know at all what Java is.

I guess you see a lot of such articles because of comments like yours - lots of people still think Java is generically slow or Java code is always bloated, despite not having used it for 10 years or more. Whilst such views persist, there will be articles arguing they're false based on newer data.

Nobody writes "C is not slow" articles any more, but I remember a time when there were people who believed C++ was inherently slower than C. Maybe some such people still exist! I don't see much talk like that anymore though.

With respect to bloated code, the 1990's and early 2000's also saw things like DCOM and other C++ bloat-zones. It's not really to do with the language. The presence of bloat in enterprise apps is more to do with the lack of competition (bespoke apps are by definition monopolies), the frequent lack of deadline pressure (deadlines are invariably artificial) and the inflexible team sizes (over-engineering is incentivised by the need to retain a nice safe corporate job). The fact that enterprise software and Java go hand in hand speaks more to the lack of competition in the space Java plays in, than anything about the language itself.

despite not having used it for 10 years or more

I just don't write code in Java anymore --- I still need to use some Java apps, and the difference remains obvious.

The other thing I've noticed is that a lot of those "Java is actually fast" articles are [1] entirely words about all the fancy optimisations JVMs do with no real numbers, [2] comparing Java with previous versions of itself, [3] comparing Java with highly dynamic languages like JavaScript, Python, or Ruby, or [4] microbenchmarks designed to specifically highlight certain JVM optimisations.

but I remember a time when there were people who believed C++ was inherently slower than C.

The C++ vs C case is slightly different, because C++ is essentially an extension of C. Some features like iostreams vs stdio are objectively slower on the C++ side, and efficient C++ code usually tends to look a lot like C. Those who say C++ is slower are referring to the "overuse" of C++ features that add overhead. As the benchmarks posted below show, even carefully written and optimised Java that looks like C is slower than C.

Well, here is a paper you may find useful:


They write the same program in C++, Java, Scala and Go, then compare performance both before and after optimisation.

They key part that shows Java can be as fast as C++ is V.E "Java tunings":

Jeremy Manson brought the performance of Java on par with the original C++ version. This version is kept in the java_pro directory. Note that Jeremy deliberately refused to optimize the code further, many of the C++ optimizations would apply to the Java version as well.

The tunings he applied are not exotic, but would not be used in most code unless it was actually performance sensitive. For instance he chose better starting sizes in ArrayLists and replaced boxing collections with more optimal ones (presumably from GNU Trove).

Of course, that is a comparison against the original C++. Later in the paper it is shown that Google were able to optimize the C++ version heavily and it then beat the Java version again, however, apparently the optimised version relied on data structures so exotic that they're Google proprietary and they chose not to open source them, so I'm not sure that'd be reflective of the experience of the average C++ developer.

So many memories...

Yeah, I remember the days when junior Assembly coders could easily write better code than any C compiler for home computers.

Or the days when the arguments that are now used against Java, Go, Swift and many others where used against Turbo Pascal, C, C++, Modula-2, Basic compilers.

The more things change, the more they stay the same, I guess.


Consistently twice as slow, with memory usage between 2 and 400(!) times larger. Java is slow and bloated compared to native.

Your next argument will be something about how micro-benchmarks don't reflect reality for larger programs...

> Java is slow and bloated compared to native.

What do you mean by "native"? C/C++? Yes, given enough effort you can write C code that outperforms Java code, sometimes handily, especially in single threaded or low-contention cases. Yet we did write applications in C/C++ before we switched to Java, we had excellent reasons to make the switch, and there have been precious few companies making the switch back. That shows you that the microbenchmarks don't tell the whole story.

Right now, the JVM's biggest performance issue is the lack of value types. Once they arrive -- and work is well underway -- beating Java would be harder and harder, especially given new compilers like Graal. But in any event, I just don't see large developers switch back to C++ (or Rust) en masse. Memory usage is hardly an issue, as RAM is very cheap and underutilized in server environments anyway. Spending effort to conserve RAM just doesn't make any sense, unless you're targeting RAM constrained environments.

I feel with companies moving to cloud environment defending Java memory bloat will be harder and harder. Java's limited value types may be out by 2020-21 or so and it will be many more years libraries ecosystem start utilizing it. So we are still at least 10 years away from some form of Java value type available for general users.

1. Memory "bloat" is always a better use of resources than increased development time, regardless of how you have to pay for it.

2. How are Java's value types limited?

3. I find your calculation extremely pessimistic. Lambda expression are widespread a couple of years after their release, and I see no reason why value types will be different. I think that 5 years are a better estimate for wide use of value types.

Here is what I read from JEP 169

> Except for pointer equality checks, forbidden operations will throw some sort of exception. How to control pointer equality checks is an open question with several possible answers.

And more:

  Rules for permanently locked objects:
  - restrictions on classes of locked objects
      - all non-static fields must be final
      - there must be no finalizer method (no override to `Object.finalize`)
      - these restrictions apply to any superclasses as well
      - an array can be marked locked, but then (of course) its elements cannot be stored to
      - if not an array, the object's class must implement the marker type `PermanentlyLockable` (is this a good idea?)
  - restricted operations on locked objects (could be enforced, or else documented as producing undefined results)
      - do not use any astore or putfield instructions, nor their reflective equivalents, to change any field
      - do not lock (you may get a hang or a LockedObjectException)
      - do not test for pointer equality; use Object.equals instead (there may be a test for this)
      - do not ask for an identity hash code; use Object.hashCode instead (there may be a test for this)
      - do not call wait, notify, or notifyAll methods in Object
      - at the time it is marked locked, an object's monitor must not be locked (in fact, should never have been?)
  - side effects
      - elements of locked arrays are stably available to readers just like final object fields (i.e., there is a memory fence)
      - a locked object can be locked again, with no additional effect
      - any attempt to mutate a permanently locked object raises java.lang.LockedObjectException
      - any attempt to synchronize on a permanently locked object raises java.lang.LockedObjectException
  - object lifecycle
      - all objects are initially created in a normal (unlocked) state
      - an object marked locked cannot be "unlocked" (reverted to a normal state)
      - an object marked locked must be unreferenced by any other thread (can we enforce this?)
      - the reference returned from the (unsafe) marking primitive must be used for all future accesses
      - any previous references (including the one passed to the marking primitive) must be unused
      - in practice, this means you must mark an object locked immediately after constructing it
  - API
      - the method `lockPermanently` is used to lock an object permanently
      - there is a predicate `isLockedPermanently` which can test whether an object is locked or not
      - for initial experiments, these methods are in `sun.misc.Unsafe`; perhaps they belong on `Object` (cf. `clone`)`
With all this above I feel it is not exactly same as understood in languages which natively support value type.

Java will natively support value types (and already does, just not user-defined value types). What you've quoted is the spec for locked arrays; those arrays are not value types, but reference types. I'm not sure about the relationship between the text in that JEP and the current work on value types: http://openjdk.java.net/projects/valhalla/

It's interesting that the "gz" column, which reflects the amount of source code required, shows a very slight trend towards C being smaller, although there are big exceptions like regex-dna and reverse-complement.

I'm not seeing the 400x more memory - there is a 40x though. But if you add up the columns for the programs that have both C and Java versions, you get this rough summary...

           secs    KB       gz
    Java   76.33   2004048  12775
    C gcc  37.32   793652   11651
...that Java is on average half as fast as C and uses 2.5x more memory, while requiring slightly more source code.

"half as fast as C and 2.5x the memory on average" is a stupendously good result for any managed runtime. Most don't get anywhere near it while providing more than acceptable performance in practice.

Yes, and 2.5x is something that can be dealt with by deploying beefier hardware. The productivity gain is what makes that tradeoff a no-brainer.

I mean, micro-benchmarks are just generally not something to put a huge amount of stock into in general, regardless of language. Software that scales out to hundreds of thousands if not millions of users is generally written in "slow", "bloated" languages, and seems to be doing ok. It really all depends on the use case, and who is creating the software. Most enterprise software is slow and bloated, but a lot of other stuff exists that is quite the contrary. Two easy examples:

- Cyberduck is written in Java, yet nobody seems to be complaining about how slow and bloated it is.

- There exists a version of Quake 2 written in Java: http://bytonic.de/html/benchmarks.html It seems to be able to push out more frames than native, actually.

Both of those are real software that actually run well on Java, regardless of penalty paid for running in a VM, JIT, and GC.

Also when looking at the amount of memory and time taken when comparing C to Java in the microbenchmarks, always be sure to mentally factor in the base overhead of starting up the JVM both in time and memory. It's pretty easy to see that when a C version takes up something like 100KB of memory and the Java one takes up 30MB, Java consistently takes up at least ~30MB regardless of how much memory is required for that microbenchmark. When the C version takes up 300MB and the Java one takes up 700MB, that's a bit better of a comparison. (though still not perfect, because the Java GC will reserve a lot of memory for itself, even if it isn't using the full 700MB, if it feels it needs that much, etc.)

> Cyberduck is written in Java, yet nobody seems to be complaining about how slow and bloated it is.

Lol I just double checked that after you said it, it never felt "Javaish" even the interface is really sane.

Actually they are using JNA for a lot of stuff and they written foundation bindings... (https://g.iterate.ch/projects/ITERATE/repos/cyberduck/browse...) cool stuff I keep that for reference. However he should put the bindings stuff under something like LGPL since most stuff will fall under fair use anyway (simple class Names which you would use anyway even without looking at the Source when making a JNA binding to Cocoa)

>> overhead of starting up the JVM both in time

Not much for those programs:


>> It's pretty easy to see … memory

Yes: http://programmers.stackexchange.com/a/189552/4334

It seems to be able to push out more frames than native, actually.

The table in your link shows it being 6% faster in one (literally) corner case, while all the other entries show it being slower by varying amounts. Keep in mind that in this benchmark a lot of the "heavy lifting" is being done by the GPU via OpenGL, so it isn't great for benchmarking languages that run on the CPU. It also doesn't mention what the "Original C Code" was compiled with.

Does their exist a JVM with an OS level shared heap, to reduce GC overhead? Obviously each can be collected independently of one another so pausing shouldn't be a significant issue and IPC becomes a Normal heap reference which may aid in performance.

Ah, synthetic benchmarks. Are you really sure they're consistently twice as slow? Your link shows otherwise.

  mandelbrot 17% slower
  fannkuch-redux 34% slower
  fasta 38% slower
  fasta-redux 39% slower
  pidigits 44% slower
So much for predicting my next argument.

How come you left out these ones?

    spectral-norm 115% slower
    reverse-compliment 121% slower
    n-body 136% slower
    regex-dna 235% slower
    k-nucleotide 253% slower
Cherry picking your data shows you're either dishonest or really lost in your denial.

Perhaps you should read his reply more carefully next time. He said the benchmarks were "consistently 2x" as slow. He was wrong and the rest of the synthetic benchmarks you posted just corroborated my point.

As for being dishonest, well, you're commenting from a newly minted temp account so it would seem you didn't have the courage/backbone to use your real one.

Maybe a noob observation, but can anyone explain why the Lock Coarsening example is valid? Seems to me that locks on A with a lock on B in between is not equivalent to 2 locks on A and then one on B... Unless the programmer made a cognitive error the cases seem incomparable, unlike the other examples.

"Seems to me that locks on A with a lock on B in between is not equivalent to 2 locks on A and then one on B"

You're looking at:

    public void canNotBeMerged()
The locks can't be merged.

Yeah, but the semantics there refer to merged by an optimization? What's the point of having incomparable examples?

Just to demonstrate an example of invalid code?

Applications are open for YC Winter 2020

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact