
Ahead-of-Time Compilation - hittaruki
https://bugs.openjdk.java.net/browse/JDK-8166089
======
KMag
This is great news, though it initially only supports Linux x86-64 and is
decades late for Java desktop apps (and not having non-blocking I/O until Java
1.4 was shameful for a language explicitly targeted and a pervasively
networked ecosystem.)

In their "tiered mode", they put sampling instrumentation into the native
code, and if they detect a hotspot, regenerate fully instrumented native code
from bytecode using the C1 (fast) JIT, which then allows the C2 JIT to do its
full optimizations on the code as if AoT were not involved.

Since the invention of tracing JITs, I've often wondered why languages don't
package together a compact serialized SSA form such as LLVM bitcode or SafeTSA
along with storing functions as lists of pointers to space-optimized
compilations of extended basic blocks (strait-line code), similar to how some
Forth compilers generate threaded code. A threaded code dispatcher over these
strait-line segments of native code would have minimal overhead, and when a
simple SIGPROF lightweight sampler detected a hotspot, a tracing version of
the dispatcher could collect a trace, and then generate native code from the
visited traces using the stored SSA for the basic blocks.

In this way, they'd have a light-weight tracing JIT for re-optimizing native
code.

~~~
pjmlp
> decades late for Java desktop apps

Commercial JDKs always offered AOT compilation, the problem is that people
nowadays apparently don't buy compilers anymore unless forced to do so (e.g.
embedded, consoles...).

~~~
Asooka
Those are priced for people who already made a big investment in writing their
application in Java and now realise they need features not present in javac.
If you're just starting out, it can very well make more sense to use Microsoft
Visual C++, which costs less than a commercial Java compiler and comes with an
IDE that's light years ahead of anything available to Java developers.

Desktop Java also had many other problems, which can be summarised as "the JVM
is its own OS". You can't write an application in Java that has a native look
and feel. Or at least you couldn't for the first several significant years of
its life and even now I don't think there's a good story for writing a simple
native application. Meanwhile you could grab wxWidgets or Qt (and there goes
your budget for a java compiler) and have a native-looking cross-platform
application. Which very few did, because back then Mac OSX didn't exist, Apple
were on their death bed and "Linux Desktop Environment" was even more of a
joke than it is today.

So yeah, it didn't make any bit of sense to develop Java desktop apps given
that you already had a large pool of proficient C++ developers, the only
platform you cared about was Windows and Java GUI libraries insisted on
reinventing their own look and feel. Oh and you could always just buy Delphi
if you didn't want to suffer C++ (again, for a fraction of the price of a
commercial Java compiler).

Nowadays people wrap a bunch of javascript in an electron instance, but this
only happened after the web took off and nobody really looks at native desktop
apps much. If this AOT work can give us fully contained native executables
that we can distribute without having the user install Java and with
significantly better performance than nodejs, maybe Java on the desktop can
still happen.

~~~
mike_hearn
That's not correct. The first UI toolkit Java had was AWT and it mapped
through to native widgets. AWT was not very successful because it tried to be
cross platform rather than a direct mapping of the Windows UI toolkit, which
was significantly more advanced in that era than its competitors MacOS Classic
and - most problematically - UNIX workstations, which had truly miserable UI
toolkits. So AWT was limited to the lowest common denominator and trying to
abstract UI libraries didn't work very well, the abstraction was leaky.

So for the first few years of Java's existence developers were given native
UI, and said no, actually, we don't care if we have a native look and feel or
not - for the kinds of line-of-business apps they were writing a powerful and
consistent toolkit was more important than one that looked the right shade of
grey. Hence, Swing.

Nowadays if you want to write a small, pure native Java app with native
widgets you can do it with SWT and Avian. There's an example here:

    
    
      https://readytalk.github.io/avian/
    

It demos all the features available in SWT with a 1mb download that's fully
self contained. You still have the problem of leaky abstractions and SWT apps
don't look entirely normal, as some more complex widgets still need to be
custom, but it's another attempt at AWT that works significantly better as
MacOS and Linux closed the gap with what Windows could do, so you can have a
richer abstraction.

------
taspeotis
I can't see anywhere in the linked issue that indicates AOT compilation is
coming to Java 9, or even coming at all. The issue demonstrates nothing more
than an intent to bring it to OpenJDK, and the issue seems to be very nascent?
It was only created a fortnight ago.

Lest the title is changed:

    
    
        AOT compilation is coming to Java 9 (java.net)
        18 points by hittaruki 37 minutes ago

~~~
chc
The person who created this ticket is an Oracle employee, not some random Joe,
so it seems like a reasonable guess that it's something Oracle is planning.

~~~
pjmlp
They are planning it, and there were already several talks, but the roadmap
was Java 10 or later.

------
alblue
Slightly off topic but if you are interested in how HotSpot compiles to native
code I gave a presentation at JavaOne:

[http://alblue.bandlem.com/2016/09/javaone-
hotspot.html](http://alblue.bandlem.com/2016/09/javaone-hotspot.html)

The presentation wasn't recorded but there is a video recorded from a
DocklandsLJC event which is on InfoQ:

[https://www.infoq.com/presentations/hotspot-memory-data-
stru...](https://www.infoq.com/presentations/hotspot-memory-data-structures)

------
avbor
I'm not familiar enough with compilers, but why would an ahead of time
compiler perform worse than a just in time compiler in a static language? I
think I'd understand if it was a dynamic language, because you can't know the
types for sure until you start running the program, but are similar issues
present for Java?

~~~
h4nkoslo
Most of the JIT optimizations amount to "it's been called like this in the
past; assume it'll always be & depotimize (at a penalty) if it isn't."

That potentially includes the fully resolved types of objects (ie
devirtualization), branch prediction (stronger than the CPU can do; for
instance, if a value is only used inside a branch that's never taken, don't
bother mutating it), data sizes (this "array" is only ever size 2, store it in
registers), dead code elimination (keeps the compiled code small), and a whole
bunch more fun stuff.

~~~
catnaroek
> assume it'll always be & depotimize (at a penalty) if it isn't

Stuff like this makes me nervous. Performance is already a complex topic, and
stuff like this makes it even more complex. Unnecessarily so. If we were
talking about a very high-level programming language (say, Prolog), you could
argue that the expressiveness benefits outweigh the cost of the runtime
system's complexity. But Java isn't even as expressive as C++, let alone
Prolog.

> fully resolved types of objects (ie devirtualization)

C++ (and similar languages: D, Rust, etc.) and MLton (a Standard ML
implementation) have been using monomorphization for ages, which is a compile-
time analogue of devirtualization. Moreover, monomorphization has important
advantages over devirtualization:

(0) It's completely predictable. You don't need to guess when it will happen.
It happens iff the concrete type (and its relevant vtables, if necessary) can
be determined at compile-time: [https://blog.rust-
lang.org/2015/05/11/traits.html](https://blog.rust-
lang.org/2015/05/11/traits.html)

(1) It's always a sound optimization, so it doesn't have to be undone at
runtime under any circumstances.

(2) It's relatively simple to implement. In fact, a compiler front-end can
completely monomorphize a program before handing it over to the back-end for
target code generation.

> if a value is only used inside a branch that's never taken, don't bother
> mutating it)

The best way to handle unreachable branches is to avoid creating them in the
first place. With proper use of algebraic data types and pattern matching,
unreachable branches can be kept to a minimum, or even outright eliminated in
many cases.

> data sizes (this "array" is only ever size 2, store it in registers)

C and similar languages natively handle statically sized arrays, so there's no
need for runtime profiling and analysis just to determine that an array will
always have size 2.

ML does something even better: you just use tuples (in this case, pairs),
which reflect your intent much better than using arrays whose size has to be
tested or guessed.

\---

What I take away from this is that the JVM's supposedly “fancy” optimizations
exist primarily to work around the Java language's lack of amenability to
static analysis.

~~~
chrisseaton
I think you're ignoring the fact that there is lots of extra information
available at runtime that isn't available from static analysis of languages
even if they're very amenable to that, and that static analysis can actually
give you worse information.

For example a call site could be statically analysed to be bimoprhic, but then
sometimes when you run it the second type is never actually used and the call
site can be made monomorphic.

The same thing applies to branches - they're both possible to use, but often
when you run it with real data you only actually use one of them.

So I don't think these optimisations are primarily to work around amenability
to static analysis - they achieve somethign different and actually more
powerful.

I can even give you a real-world example of where static analysis is actually
what causes unpredictable performance. There is an implementation of the Ruby
language called Rubinius that statically looks at the instance variables in a
class that are visible in the source code, and optimises the objects for that
many instance variables. If you start to set extra variables dynamically, and
so upset this static analysis, performance drops by a half. In the
implementation of Ruby that I work on we can see the static references to
instance variables in the source code but don't try to do anything based on
this - we purely use a hidden class system and let the true number of instance
variables emerge dynamically at runtime, and we don't have the same
performance drop when you start to set extra variables dynamically
([http://chrisseaton.com/rubytruffle/pppj14-om/pppj14-om.pdf](http://chrisseaton.com/rubytruffle/pppj14-om/pppj14-om.pdf))

I think it's quite a neat philosophy to apply - don't assume anything
statically and let the real characteristics of the program's data and control
flow emerge at runtime.

~~~
catnaroek
> There is an implementation of the Ruby language called Rubinius that
> statically looks at the instance variables in a class that are visible in
> the source code, and optimises the objects for that many instance variables.
> If you start to set extra variables dynamically, and so upset this static
> analysis, performance drops by a half.

You can implement an unsound static analysis for any language, and this in
fact what Rubinius is doing: it's making potentially wrong conclusions about
what instance variables will exist in objects. However, unsound analyses are
outright harmful:

(0) Undoing unsound “optimizations” costs even more performance than was
supposed to be gained by optimizing your program. (As you found out the hard
way yourself.)

(1) They lie to you about what your code means! If this isn't bad enough, I
don't know what else could be.

Unfortunately, a _sound_ static analysis of Ruby code wouldn't be able to tell
you much, precisely because Ruby allows you to subvert everything at runtime.

~~~
chrisseaton
> As you found out the hard way yourself.

That makes it sound like I implemented it - I didn't - I implemented the
alternative mechanism which doesn't have the same problem.

------
BuckRogers
Why was Java ever JIT'd rather than natively compiled anyway? I hate to stick
my neck out and even ask this but I never understood why you'd want to JIT or
interpret when you can just natively compile to a binary. It seems like Go has
gone "back" to the future on this one and in general their toolchain approach
to me looked like the way.

I always got the sense the world is waiting for a statically typed Python that
compiles to native code with Go's CPU performance. I suppose Nim might fit
that bill but a shame it doesn't have compatibility with Python's or even the
extent of a language like Go's libraries. And if possible, an imperative
language that interfaces with OTP.

And that said, I can see why Erlang/Elixir wouldn't make as much sense or even
work with native code AOT compilation due to it's feature set (thinking stuff
like hot code reloading). But I've never grasped why Java or Python were
better off with JIT or interpreters than AOT comp. Seems like a type system
such as Go's is simple enough and allows for good gains in both CPU
performance and memory usage. Add in the fact you don't need to install
anything and less to think about in deploying and it seems to be a no brainer.
Please feel free to fill me in on this or where I went wrong..

~~~
j-g-faustus
Because Java was originally designed for set-top TV boxes and appliances,
where it's kind of a big deal that you don't need to know or care what OS or
processor each appliance is using internally.

[https://web.archive.org/web/20050420081440/http://java.sun.c...](https://web.archive.org/web/20050420081440/http://java.sun.com/features/1998/05/birthday.html)

When the appliance market didn't pan out, they went for web browsers and Java
applets. Bytecodes were a feature because browsers didn't exectute native
code, and because it allowed for sandboxing to limit the attack surface.

Even when Java became more popular on the server than in the browser, the
"write once, run everywhere" was considered a major feature: The same bytecode
could be distributed everywhere; no need to maintain a heap of different build
environments for different CPU architecture and OS combinations.

~~~
pjmlp
Plenty of other platforms do support bytecodes, JIT and AOT on the same
toolchain.

So they could keep the WORA story and still offer AOT as an option, which
actually most commercial JDKs do.

Just Sun was against providing it at all on Java SE, but they actually
supported it on Java Embedded.

~~~
j-g-faustus
I was commenting on "why did they design Java that way in the first place", as
opposed to (say) Go.

I agree that once the primary use of Java moved outside the browser, there was
no particular reason to not give the option of AOT too. I'm not sure why Sun
was so adamantly opposed to the idea.

If I recall correctly, Sun really wanted to stick with JIT on Java Embedded
too, they just couldn't get it to run fast enough on embedded hardware. For
desktop and servers, they considered bytecode interpretation and JIT "fast
enough".

~~~
pjmlp
Sure and actually that is where mobile OSes are moving.

We now have bitcode on iDevices, DEX on Android and MSIL/MDIL on WinRT.

Still, both iDevices and Windows Store take, what I consider the best
approach, to do AOT on the store for each supported target.

As Google found out, using AOT on the device doesn't scale. I just don't get
why they went back to an overly complicated architecture of
Interpreter/JIT/PGO → AOT, instead of following the same path as the
competition and serve freshly baked AOT binaries.

------
Cyph0n
Assuming this comes in Java 9, and compilation of code other `java.base` is
possible, will this make Java a more solid competitor to Go? I guess it partly
depends on how much they optimize the compiled binary size. Go does a really
good job at static compilation, so it will be tough to compete.

~~~
paukiatwee
Java AOT is compile JVM bytecode to native code during startup of JVM, which
is different from Go's compile source to native and distribute platform
specific binaries. So in this case, Java binary size remain same as before,
which is .jar or .war binaries.

For Go, .go -> native

For Java, .java -> .class -> package .jar -> AOT native

For Go part I might be wrong, not working on Go professionally.

~~~
mike_hearn
No, not during startup, AOT can happen at any time chosen by the developer or
user: there's a command line tool that triggers it and I believe the plan is
to integrate it with the "jlink" tool that produces standalone, app specific
JRE images. So you can produce a native installer for each platform.

------
jgalt212
> Infrequently-used Java methods might never be compiled at all, potentially
> incurring a performance penalty due to repeated interpreted invocations.

That sort of makes no sense. How can you incur a real performance hit if the
uncompiled method is rarely called?

~~~
smitherfield
I would assume if the method is rarely called, but very complex or time-
consuming when it is called.

Of course, I'm not an expert on JVMs, so I wouldn't know whether their
analysis is synchronous or asynchronous or a mix of both.

------
johnydepp
That's great! Won't it make the compiled executable platform specific?

~~~
mike_hearn
You're intended to use it like this: either you distribute JARs and the
recipient triggers the AOT compilation if they want it, or you distribute a
"jlinked" JRE image that's inherently OS specific because it includes a
bundled JVM. It's also possible that a future Java module format will allow
the AOT compiled code images to come along for the ride next to the
classfiles.

------
my123
IKVM, the Java VM for .NET converted Java bytecode to .NET since a while, and
crossgen/ngen can be used for .NET AOT(Mono also has AOT)

------
haberman
How does this interact with classloading?

My general impression is that the design of classloaders is pretty actively
hostile to making JVM startup fast.

~~~
the8472
AOT only applies to clases that have been AOT-compiled and are not transformed
at runtime. Everything else will either still be need JITing or could
potentially throw errors if pure AOT is desired.

AOT and JIT are not mutually exclusive. From the proposal itself:

> AOT libraries can be compiled in two modes:

> Non-tiered AOT compiled code behaves similarly to statically compiled C++
> code in that no profiling information is collected and no JIT recompilations
> will happen.

> Tiered AOT compiled code does collect profiling information. The profiling
> done is the same as the simple profiling done by C1 methods compiled at Tier
> 2. If AOT methods hit the AOT invocation thresholds these methods are being
> recompiled by C1 at Tier 3 first in order to gather full profiling
> information. This is required for C2 JIT recompilations to be able to
> produce optimal code and reach peak application performance.

------
premium-concern
Is there anything this adds over Scala-native, which seems to be much further
ahead already?

~~~
edko
I think this could be a great complement to Scala-native. Right now, the
project contributors have to spend effort translating the essential Java
libraries that would allow Scala-native to be successful. This could really
ease that job for them. It could potentially make all the Java code ever
written available to Scala-native.

The other thing it adds is the backing of a giant, like Oracle, which can
bring stability and peace of mind to some people, when deciding whether to
adopt the technology or not.

~~~
premium-concern
I think this assumes that Oracle

\- can ship something in time

\- and that it will be generally available for developers (looking at how hard
Oracle pushes their Java department to invent commercial features they can
sell, I'm not sure about that)

Looking at it, I assume that this will go the way of GWT ... not starting from
"how can we make Java a good citizen in this new ecosystem?", but "here we
have 100% of Java, the JDK and the JVM ... how can we compile this with full
fidelity into X?".

------
saynsedit
not sure why this isn't a transparent feature implemented via caching.

~~~
jcdavis
Hotspot's JIT compiled code tends to be pretty specialized based on runtime
profiling information, which may not necessarily be similar between different
runs even the class itself hasn't changed, or (in an extreme case) even if
none of the code has.

Some other JVMs (at least Azul's Zing) try to solve this by cache profiling
information to speed up code generation.

~~~
alblue
I believe the ReadyNow technology used by Zing records what methods are
compiled at which level, then trigger a compilation of those methods at start
up. So you effectively use profiling information from the previous run to
inform the next run of what the final target state is, allowing the warm up
times to be dramatically reduced.

------
_ZeD_
it's a gcj comeback?

------
singularity2001
resurrect me when it's there
[https://github.com/search?p=3&q=jaotc&type=Code](https://github.com/search?p=3&q=jaotc&type=Code)

