
Kotlin doesn’t need an LLVM backend (2016) - bmaupin
https://blog.plan99.net/kotlin-native-310ffac94af2
======
mike_hearn
Since I wrote this article Kotlin Native has been released, and you can read
about it here:

[https://blog.jetbrains.com/kotlin/2017/04/kotlinnative-
tech-...](https://blog.jetbrains.com/kotlin/2017/04/kotlinnative-tech-preview-
kotlin-without-a-vm/)

It features some interesting language support for C interop, an LLVM backend,
and a reference counting+cycle detecting garbage collector. I wish the team
the best of luck with it!

Meanwhile, I've been working on the tool I suggested in the blog, that uses
Avian to create small self-contained binaries easily. For people who like
using Go to make command line utilities and the like, Avian+Kotlin could be a
very nice alternative. For high throughput servers you are still best sticking
with the JVM, in my view.

~~~
michel-slm
Is there a repo / blog post detailing your work on Avian+Kotlin? I'm sure I'm
not the only one keenly interested in this.

~~~
mike_hearn
There is, but I don't want to post it here as it's unfinished (for one, Mac
only). I'll write a blog post about it when it's further along. If you're
really interested you'll be able to find it.

------
vvanders
> Unreal Engine is written in C++ and has used a core garbage collected game
> heap since version 3. It powers many of the worlds AAA titles. They can hit
> 60 frames per second and they are using a very basic GC.

This is actually very false. Yes they use GC but they also drop 2-3 frames at
each 30s interval because that's when the GC runs.

Go boot up Gears of War and let a character sit idle with no action, you can
set your watch on 30s spans by the obvious frame hiccup. UnrealScript was
great for prototyping and empowering designers but I wouldn't hold it up as an
example of a high performance runtime.

~~~
penpapersw
Speaking of GC, why aren't more newer languages designed using the ARC system
that Objective-C uses? It seems to work amazingly well, it's very high
performant from what I hear, and although it has a few small caveats, they
seem worth the trade-off.

~~~
pcwalton
Because atomic reference counting is slower than tracing GC.

(Yes, really. Adding atomic operations to every load and store of a pointer is
a tremendous throughput loss. Latency is not everything!)

~~~
DougBTX
ARC stands for automatic reference counting, not atomic reference counting, so
this criticism doesn't really apply. The LLVM docs say:

> ARC makes no guarantees in the presence of race conditions.

So, it seems they agree with you, to the extent that they agree that full
atomic reference counting would be a bad idea for performance optimisation,
which is why they don't do it.

[https://clang.llvm.org/docs/AutomaticReferenceCounting.html](https://clang.llvm.org/docs/AutomaticReferenceCounting.html)

~~~
pcwalton
It's atomic. The Core Foundation semantics require it. Disassemble Objective-C
code if you don't believe me.

The race conditions that quote is referring to other kinds of races, not races
on the reference count itself.

------
DannyBee
Graal is pretty brand new in the relative scheme of things, it's not going to
be amazing out of the box. (I'll ignore the "LLVM has a JIT" part)

Really. There is no magic here. The way you get compilers with 100% or greater
perf of gcc/llvm is not through silver bullets. It's through a large amount of
hard work and tuning. IE hundreds of person years.

It's usually easy to get the first 60-70%, and then people say "see, we didn't
need <heavyweight thing>, we'll beat them with <lightweight thing>". But if
you have customers where no performance is ever "good enough", getting that
30%, and eventually beating other compilers, is the thing that takes 10+
years. You'll never do it with <lightweight thing>. That's why all these folks
who try eventually end up with 3 layers of compilers, etc.

If it could have been done with <lightweight thing>, it would have been done
that way.

Also, you can just about always transfer the metadata necessary to do high
level optimization x at a lower level. it just may not be efficient to do so.

Conversely, compilers that focus on fairly high level optimizations often do a
bad job at lower level ones :)

Again, tradeoffs, tradeoffs. (This is why, for example, swift/rust/etc have
some higher level IR, do what they can, and hand the rest off to llvm).

~~~
mike_hearn
Fully agree that it's a ton of work to build a production quality compiler.

Graal is "new" only in the sense that it only recently got good enough for
people to care about it. It's actually been in development for years ...
arguably close to a decade now given that it traces its heritage back to the
Maxine VM.

Graal's generated code quality is similar to HotSpot C2 i.e. the fastest
available Java JITC, at least for most ordinary benchmarks (I have a feeling
it may be weaker at auto-vectorisation than C2 but I'm not sure). Where Graal
really spanks the competition is with more dynamic languages like Scala or
Ruby though. There it sees huge speedups compared to other state of the art
compilers.

Graal can also JIT/AOT compile LLVM bitcode. I don't know what performance is
like for that, but earlier prototypes that JITd C ASTs directly could compile
C with performance about 7% worse than GCC/clang (best of for each benchmark).
The last 7% will take some work I guess but it's not infeasible.

There's a long term plan to make Graal the default JIT compiler in HotSpot.
The problem at the moment is it - no surprise - it uses more memory than C2,
and shares the same Java heap as the application.

------
4bpp
I don't understand why these "people who want to do things without the JVM are
wrong and deluded" articles never seem to even acknowledge that not everyone
might want to go through the trouble of installing and configuring a JVM to
begin with (and so the comparison they make is between "compile and run native
code" and "compile and run java -jar soandso"). Being on an apparently second-
class platform (Linux), every forced encounter with GUI-enabled Java
applications in recent memory for me (Mathematica, MATLAB...) resulted in
having to spend days picking apart esoteric JVM issues to do with everything
ranging from hardware acceleration to font rendering to native library
interfaces which apparently keep popping up every year in such a way that the
previous year's Stack Overflow fixes no longer do anything.

Adding to that the hairiness of having to deal with concurrent versions (often
the system's versus what was included with the software package) and how even
distro-official JRE packages usually follow the "postinst script executes
installer, installer downloads a blob and covers the whole system in untracked
goo" format, I'm usually very relieved to scrub my system of all traces of
Java after I am done with whatever forced me to install it.

~~~
jackmott
There just seems to be some people who don't think having to install large
runtimes, potentially specific versions of them, to make your app run, is a
problem.

I don't understand them either. Perhaps they are focused on server software
and so don't think about client/end user scenarios much.

------
weberc2
I don't think the author did a particularly good job of addressing the points
about startup time or memory size.

In particular, I think the argument about memory consumption is that it's
independent of compilation strategy, but rather it depends on the quality and
tuning of the garbage collector. An AOT compiled program that uses the default
JVM GC with default tunings will not see much better memory performance
(unless the memory consumption is dominated by the JIT).

On the other hand, I think the "AOT improves startup times" argument is a good
one, and the author's explanation was unsatisfactory: AOT Java programs don't
start any faster than JVM programs. What is the methodology for testing? Is he
using the later-mentioned Avian? Because that tool deliberately trades startup
time for small binaries (via compression). Even if he isn't, it's likely a
matter of tool maturity--there's no fundamental reason an AOT Java Hello World
should be slower than a golang hello world (which clocks in at 50ms on my
machine compared to 130ms for JVM). Also the author implies that startup times
only matter for tiny unix tools--I'd like to counter with a practical example:
Gradle is so slow to start up that it's been daemonized. I don't know if this
satisfies the author's characterization of "tiny unix tools" or not, but
clearly start up times matter.

~~~
mike_hearn
I don't think (?) I said AOT programs don't start any faster than JIT compiled
programs. I said Hello World doesn't start any faster when AOTd, because being
able to avoid JITing a few methods (on a spare unused core) is not worth the
increase in page faults needed to load the larger code from disk. But that's
probably rather specific to Hello World. I'd hope real programs see faster
startup!

 _Gradle is so slow to start up that it 's been daemonized._

Gradle is usually run from the command line, so surely that supports my point?
(perhaps emphasising the "tiny" was a mistake).

At any rate, Maven doesn't have the same problem, despite doing the same task.
Gradle is slow to start for a lot of reasons.

~~~
weberc2
> I don't think (?) I said AOT programs don't start any faster than JIT
> compiled programs.

You're right; I paraphrased poorly. I think my point stands--there's no reason
an AOT compiled Java Hello World should be slower than a Go Hello World, which
is nearly 3 times faster than the JVM version on my machine.

> Gradle is usually run from the command line, so surely that supports my
> point?

I assumed that Gradle was slow to start for JIT warmup and the like. It's hard
to believe that daemonizing is the performance solution if the bottleneck is
in the application (for example, if file parsing is the bottleneck, the
natural solution would be to cache the parse result). Maybe my assumption is
bad?

~~~
mike_hearn
The daemon is mostly about caching the configuration, yes. It caches JIT
compiled code too, but Gradle configuration is actually a big pile of Groovy
scripts that create objects, or something like that, and scripting languages
are slower to start up even than interpreted Java.

~~~
vorg
> Gradle configuration is actually a big pile of Groovy scripts that create
> objects

You can now configure Gradle (since v 3.0) using Kotlin instead of Apache
Groovy. That might speed up Gradle start up times.

------
phreack
It's such an utter shame that Microsoft got to buy out and then abandon RoboVM
(presumably because they felt it a threat to their recently acquired Xamarin).
The one issue the author considers the most valid, was pretty decently solved
almost a year ago-- I had made a couple little games that worked incredibly
well out of the box in both iOS and Android on the same Kotlin codebase, it
was a pleasure.

Luckily, RoboVM was forked after a while and it is being actively maintained
by MobiDevelop [1], so give it a try if you're into Kotlin, that scene
deserves a lot more love.

[http://robovm.mobidevelop.com/](http://robovm.mobidevelop.com/)

~~~
sitkack
I have never used it, but the Intel Multi-OS-Engine is something like RoboVM
and they updated the license from proprietary to Apache about 9 months ago.

[https://github.com/multi-os-engine/multi-os-engine](https://github.com/multi-
os-engine/multi-os-engine)

------
eikenberry
Straw-man article. The main problems with the JVM in my experiences are the
slow startup, large memory use and added cognitive load. The first one he says
doesn't matter, the second he never addresses directly and the third he
completely misses.

~~~
discreteevent
From the article: Slow startup -> 80ms. What are you using it for that 80ms is
a problem? Memory usage: He explains why an AOT solution would use more
memory.

What is the cognitive load with the JVM? You install it and the type 'java
my.jar'

BTW the JVM can reduce cognitive load hugely when it comes to profiling or
debugging because of the tools.

~~~
eikenberry
I've never seen 80ms in practice and he elides over how (even if true) it
would only matter for command line programs as if those are rare, when in fact
they are quite common.

Only knowing "java my.jar" means you have zero knowledge about the nature of
your language's run-time and having a good understanding of your languages
run-time is necessary to write good code. The more complex the run-time of
your language, the higher the cognitive load of that language.

His discussion of memory use doesn't mention the pre-allocation of the heap
for the JVM. The memory used by the application itself is irrelevant when the
JVM is what is actually using the memory.

~~~
floatboth
You can start up the VM in 80ms for sure, you'll just need much more time
after that to load all the damn classes from the bloated libraries you're
gonna use. ( _Especially_ terrible with functional runtimes like Clojure's.)

> pre-allocation of the heap for the JVM

Pre-allocation doesn't matter, it's not allocating physical memory?? The GHC
Haskell runtime literally mallocs one terabyte when the program starts :D

However OpenJDK/HotSpot's baseline memory usage is indeed higher than many
other VMs'. Even if you compare with alternative JVMs: I tested a basic Jetty
app — it used ~60 MB on JamVM, ~70 MB on OpenJDK.

------
outworlder
>This impression is heightened by the fact that some well known garbage
collected apps are written by people who just don’t seem to care about
performance at all, like Minecraft or Eclipse, leading people to blame GC for
what is in reality just badly written code.

This assertion is too harsh. Eclipse does seem to be over-engineered(speaking
from the point of view of someone who had to write a proprietary plugin for
it) and Minecraft does seem to run slow on even the latest machines (compared
to AAA games).

But Minecraft is not a trivial program to write, despite fooling people with
low-resolution textures. it is, "algorithmically speaking", out of reach of
many people, including those with a reasonable computer science background. I
dare someone to do better (on the JVM!)

On a personal note, I cringe every time I have to go to a kid's computer and
modify "-Xmx" settings. I'm running a game, I want it to have all available
resources not otherwise allocated to the OS, just like almost every other game
there is.

~~~
DonbunEf7
Hi, several years ago I produced an alternative to the "vanilla" Minecraft
server. It easily spanked the vanilla server in terms of memory usage,
latency, bandwidth usage, and overall stability. Oh, and I was writing Python,
the "slow" language, using PyPy for speed.

Minecraft's code is rotten throughout and nearly nobody in the community cares
about code quality. Remember, this is a group that thought that TLS was easy
to reinvent:
[http://wiki.vg/Protocol_Encryption](http://wiki.vg/Protocol_Encryption)

~~~
wyldfire
Is your source available?

~~~
DonbunEf7
Source:
[https://github.com/bravoserver/bravo](https://github.com/bravoserver/bravo)

Docs:
[http://bravo.readthedocs.io/en/latest/](http://bravo.readthedocs.io/en/latest/)

------
dom0
> Worse, native CPU code is a lot larger than JVM bytecode.

>

> [Goes on to show how Hello World is _just_ 1 MB, which is smaller than Go's
> Hello World]

~~~
weberc2
To make the comparison a little more valid, we can strip symbols with a
compiler flag and pass the result through `upx` for to build a stripped,
compressed binary at only 300Kb:

`go build -o /tmp/hello -ldflags='-s -w' /tmp/hello.go && upx --brute
/tmp/hello`.

Also worth noting that the author is incorrect about the Avian example being
self-contained--it dynamically links against 4 libraries (on OS X, anyway)
while the Go version links against zero.

Further, the Avian example is broken on OS X (the GUI loads, but immediately
freezes).

Of course, the applications aren't doing the same work, so comparing the
binary sizes is silly. I just wanted to point out that avian's tricks are
readily accessible for Go programs as well.

~~~
dom0
> it dynamically links against 4 libraries (on OS X, anyway) while the Go
> version links against zero.

And how exactly does that work out on platforms that don't have stable
syscalls?

~~~
pcwalton
On Mac, Go breaks the rules and hardcodes syscall numbers anyway.

On Windows, Go dynamically links to kernel32.dll and friends.

~~~
penpapersw
So in theory Go programs can't be considered reliable across multiple versions
of macOS without recompilation then?

~~~
coldtea
It only matters if it also happens in practice.

~~~
pcwalton
All forwards-compatibility issues were only "theoretical" issues that didn't
"matter in practice" until a new release of the platform came out and suddenly
they did.

------
kristianp
Off topic comment here about Medium's "open in app", on mobile which can't be
dismissed [1]. It's so infuriating that it sticks out over the text and has no
way of being closed.

No I'm not going to install your app!

[1] [https://imgur.com/a/81hky](https://imgur.com/a/81hky)

------
djsumdog
I remember the old gcj (GCC/Java native backend) being terrible years ago (as
far as performance). I haven't looked at it since.

I agree with a lot of points in this article. The JVM does a lot of nice stuff
for you. One of the big, non discussed, advantages of compiling to native
would be removing the JVM dependency (and hence the dependency on openjdk or
the terrible Oracle JVM license).

As far as LLVM, I'll agree it's a technologically impressive. Still, I feel a
clear tech implementation was only partially the motivation, and the other was
getting away from GPLv3/FSF licensing. That bothers me a bit. I like the
GPLv3, but I feel much of the industry wants to distance itself from the
licensing that essentially helped pull Linux into what it is today.

~~~
AlphaSite
Linux doesn't use GPLv3?

~~~
antnisp
Linux is famously GPLv2.

~~~
georgemcbay
I suspect the parent post is aware of that but worded the post in a confusing
way for a medium in which you can't use tone to convey added meaning.

By "Linux doesn't use GPLv3?", I think the poster meant essentially "But...?
Linux doesn't use GPLv3".

In any case, the idea is a valid point. The GPLv2 and the GPLv3 are
substantially different and it isn't that surprising that some people who were
okay with GPLv2 terms are not okay with GPLv3 terms.

------
bsaul
Isn't one reason for supporting an llvm backend to be less dependant on Oracle
?

~~~
pjmlp
Oracle isn't the only one doing JDKs.

~~~
marktangotango
This is a vacuous comment. Besides openjdk and ibm can you name one that's
passes or even has access to the TCK?

~~~
btbytes
Excelsior JET --
[https://www.excelsiorjet.com/](https://www.excelsiorjet.com/) AOT compiler
producing native binaries.

------
mhh__
Questions: "Many optimisations that can really help high level managed
languages like Kotlin simply don’t apply to C++": Which optimisations?

"Larger code bloats downloads and uses more RAM, lowering cache utilisation".
This depends on your definition of cache _utilisation_ : The code gets
compiled, regardless of whether it's AOT or JITed, therefore the cache gets
used anyway?

~~~
jorgemf
> Questions: "Many optimisations that can really help high level managed
> languages like Kotlin simply don’t apply to C++": Which optimisations?

I understood the JVM does an dynamic analysis of the code when running, so it
can compile to better performance. For example it knows which branch of an if
it is more likely to be executed, so it can compile that code to native in a
way the CPU jump prediction executes the branch that will probably executed at
the end. (it is hard to explain without more knowledge but CPUs do
optimizations in branches and usually executed one branch before even know the
result of the condition, so you sometimes save some cycles, with the dynamic
analysis you know which branch is more often executed). I guess the author is
talking about this type of things

~~~
mhh__
That's called profile-guided optimisation and most good optimising compilers
do it. Not a particularly good example, however the JVM can do it in a
slightly different way because doing PGO on every build/test or in the wild is
prohibitively annoying (e.g. The JVM can do it for each time it runs, rather
than when the compiler is told to)

~~~
zamalek
AOT PGO is fantastic tech! However, it can only optimize for the profile that
you set up for it. Deploy it on a million machines with a million use cases
and some of them aren't going to match up with your profile. In these cases it
may very well degrade performance. So you might try to profile scenarios that
hit all codepoints equally, by doing so you would defeat the purpose of PGO
(and likely nullify it).

JIT PGO is adaptive to the workload that the process is experiencing. It is
likely impossible to achieve this with AOT compilation.

~~~
DannyBee
"JIT PGO is adaptive to the workload that the process is experiencing. It is
likely impossible to achieve this with AOT compilation. "

Of course it's not :) You just generate different binaries for each use case.

Truthfully, the "million machines/million use cases" story is pretty non-
realistic these days (except in the above case!)

Most people run mostly-specialized services for mostly-specialized use-cases.
This is because it's necessary for performance in practice anyway.

------
tmzt
Would something derived from Rust's HIR/MIR make a good target for other
compiled languages, maybe with lessoned memory safety for some basic blocks.

~~~
mhh__
Not Rust's (Specifically). LLVM IR is actually quite high level as it is, so
you can just use it as is.

Compiler IRs are normally not _that_ difficult to implement (Designing them is
a whole different thing), so it wouldn't be (If a group of devs decided do it)
that difficult to define IRs with lowering between each of them. After
transforming into a (Approximately C level) IR (Like Cmm in Haskell), you can
emit LLVM IR to handle code generation, Register Allocation, LTO etc.

------
relics443
First thing I thought of when I saw the article title was Androids runtime
switch in N. Glad to see the author include it.

------
hexmiles
i used avian in the past for some small project, it was a bit complicated at
first but i like it in the end. I would love some integration in ide like
intellj and build system

------
tbezman
This article pretty much summed up everything I thought I knew. LLVM is still
pretty fucking sick though.

