
Jitterdämmerung - mpweiher
http://blog.metaobject.com/2015/10/jitterdammerung.html
======
mafribe
There is another use case for JIT compilers that has not been considered:
speeding up programming language development.

PyPy's meta-tracing JIT-compiler framework [1] allows you to generate
reasonably fast tracing JIT compilers for languages essentially by writing an
interpreter. Laurie Tratt has written a nice description of this, and the
advantages this delivers [2]. This make the development of new programming
languages much simpler, because you don't have to invest a lot of time and
effort into producing a reasonably fast compiler (whether JIT or AOT) at the
beginning of language development. Yes, in many cases you can build AOT
compilers that beat JITs produced by meta-tracing an interpreter, but not
easily so.

Tratt's team has used this approach for building impressive multi-language
tools [3] that allow you to write programs in heterogeneous languages like
Python, Prolog and PHP at the same time.

[1]
[http://stups.hhu.de/mediawiki/images/f/f5/Tracing_JITs11_tra...](http://stups.hhu.de/mediawiki/images/f/f5/Tracing_JITs11_tracing_the_meta_level.pdf)

[2]
[http://tratt.net/laurie/blog/entries/fast_enough_vms_in_fast...](http://tratt.net/laurie/blog/entries/fast_enough_vms_in_fast_enough_time)

[3] [http://soft-dev.org/](http://soft-dev.org/)

~~~
fijal
"Yes, you can build AOT compilers that beat JITs produced by meta-tracing an
interpreter" \- not for all the languages. As mentioned below, there is a
certain tradeoff associated with JITs - warmup time, memory consumptions etc.
But for certain class of problems (say compiling Python) and for certain class
of use cases (running at top speed), I dare you to compete with PyPy. The
biggest contender so far comes from the Zippy project, which is indeed built
on truffle which is a meta JIT, albeit a method based one.

~~~
mafribe
An interesting question is: can one build a meta-AOT compiler framework that
converts an interpreter into a reasonably good AOT compiler?

If not, why not?

~~~
fijal
Not sure, don't have much experience with that. This is a "classic" futamura
projection - you write an interpreter and the "magic" turns it into a
compiler. I'm not aware of any consumer-grade compiler like that, but there is
a huge swath of research on that.

You can very easily create a dumb one - you just copy-paste interpreter loop
essentially (which is what e.g. cython does if not presented with
annotations), however the results just aren't very good

~~~
mafribe
Research on partial evaluation (PE) was fashionable in the 1990s, but largely
fizzled out. I was told that was because they could never really get the
results to run fast. I'm trying to understand why. Clearly meta-tracing and PE
have a lot of overlap. Truffle is based on some variant of _dynamic_ PE if I
understand what they do correctly. Most of the 1990s work in PE was more about
static PE I think. The paper [1] touches on some of these issues, but I have
not studied it closely yet.

[1] [http://stefan-marr.de/downloads/oopsla15-marr-ducasse-
meta-t...](http://stefan-marr.de/downloads/oopsla15-marr-ducasse-meta-tracing-
vs-partial-evaluation.pdf)

------
pete23
Static typing in Java does not give enough information to optimise because of
polymorphism. The JIT can observe runtime behaviour and inline method calls at
monomorphic sites.

[http://insightfullogic.com/2014/May/12/fast-and-
megamorphic-...](http://insightfullogic.com/2014/May/12/fast-and-megamorphic-
what-influences-method-invoca/)

Worth noting also that Azul's Zing ReadyNow technology for AOT (and indeed
their no-pause C4 collector) address some of the specific gripes raised here
around the JVM...

~~~
fmstephe
Zing's ReadyNow is useful but really highlights some of the problems with JITs
(in some domains). I think that ReadyNow is very interesting because its
existence points to some interesting consequences of JITs in performance
sensitive domains.

The driving force behind ReadyNow (as I understand it) was that many
performance sensitive systems needed to be fast out of the gates on start up.
This means that an interpreted->compiled->optimised-compiled transition was
not acceptable.

Developers would try to solve this by running dummy data through the system
before it was open to real traffic. But this had the unfortunate consequence
that the JIT would optimise for dummy data only, including clever inlining at,
apparently, monomorphic call sites etc. When real traffic flows into the
system the JVM sees that its optimisations are no longer effective/valid. This
would trigger a de-compilation/re-compilation of many code sites causing a
very noticeable stutter.

Now we have ReadyNow. If you are really committed to Java or the JVM and you
don't like these JIT stutters this is your solution. But this is an extra
layer of complexity and another thing to be managed and another thing to fail.
This is on top of your JVM, jar file soup you may already be struggling with.

I would prefer a good AOT compiler to remove this concern and give me quite
good predictable performance. YMMV of course.

Sources

[http://www.azul.com/products/zing/readynow-technology-for-
zi...](http://www.azul.com/products/zing/readynow-technology-for-zing/)

[https://groups.google.com/forum/#!searchin/mechanical-
sympat...](https://groups.google.com/forum/#!searchin/mechanical-
sympathy/ReadyNow/mechanical-sympathy/QT_P5dGXjXw/W9uiT-dSqjMJ)

------
zamalek
The problem with native code is platform disparity. It's absurd that so many
binary variants are required when shipping code to e.g. Android or iPhone.

This _can_ apply to x86, too. Not sure if all your users have TSX-capable
CPUs? You're now shipping two binaries. Not sure if your users have AVX? Now
you have at least three binaries.

This is the biggest advantage of intermediate languages - your code can at the
very least execute everywhere with a single image, and in some cases
automatically take advantage of platform features such as AVX.

I think ART takes the correct approach: ship IL and AOT it on the device.
Hopefully some day we can get the same type of system for LLVM IR.

~~~
pjmlp
The idea of using a portable bytecode and AOT it on installation time goes
back to the mainframes.

OS/400, now IBM i, is probably the last living mainframe doing it.

~~~
k__
Also, we ship JIT compilers with the software right now, so why don't just
ship AOT compilers instead?

~~~
ajuc
Gentoo linux distribution works that way. It has automatic package management
system (Portage) similar to apt-get or rpm (and even more similar to ports in
*BSD). The source meta-packages (specyfying the compiler flags, patches etc)
are maintained by the community.

You configure a system-wide flags specifying your cpu, optimization level, the
features you want to enable and disable (many libraries and apps have flags to
compile support for some features or not), etc. There were also app-specific
flags IIRC.

When you install a package it downloads sources and the meta-package from
portage, then it is compiled on your machine with your flags added for your
specific processor with only the features you enabled. It was supposed to be
much faster than "bloated ubuntu", but I suspect now it was mostly placebo :)

On the other hand installing Open Office on my Celeron 2000 laptop with 256 MB
RAM took 10 hours, and if I wanted to enable some feature that turned out to
be useful after all - I would need to wait another 10 hours.

I used it for 2 years when I was on university but then it was too much to
wait for hours when I needed sth.

~~~
ycombobreaker
> It was supposed to be much faster than "bloated ubuntu", but I suspect now
> it was mostly placebo

Gentoo predates Ubuntu by several years. I remember reading some "gentoo
ricer" threads bike-shedding over ultimately inconsequential optimization
flags. But don't underestimate the simple benefit of `-march` and similar. At
a time when binary distributions were pushing x86 binaries which ran on i386,
compiling for a newer architecture could give a substantial increase in
registers available, instructions available, and instruction scheduling
quality. In aggregate, this definitely can improve performance.

Since then, I believe Debian moved to an i686 base which narrows the gap.

------
Kristine1975
One aspect the author doesn't talk about is emulation: The Wii/GameCube
emulator Dolphin uses JIT to translate the console's PPC machine code to x86
machine code.

I'm also not sure if "scripting languages" truly don't need a JIT. The
development of LuaJIT for example has been sponsored by various companies, so
there seems to be a need for the fast execution of Lua code:
[http://luajit.org/sponsors.html](http://luajit.org/sponsors.html)

~~~
w0utert
Yes, but to be fair, Lua is one of these few examples where JIT makes a lot of
sense because the extreme simplicity of the language, which greatly expands
the opportunities for a JIT to generate efficient code with relatively low
overhead and JIT compiler complexity.

This approach is great for some things, but it does sacrifice the flexibility
and expressiveness of the language. Anything not part of the core language
(which is a lot compared to most other programming languages, e.g. anything
related to OOP, or more advanced data types than strings, floats and tables)
has to be re-invented and/or bolted-on, which IMO makes it unsuitable for most
kinds of applications.

This should not be interpreted as critique about Lua the language bu the way,
I'm a big fan of Lua for embedded scripting and I generally love tools with a
narrow focus (as opposed to kitchen sink technology). I would not choose Lua
for anything besides embedded scripting though.

~~~
outworlder
What would you choose instead?

~~~
w0utert
That all depends on the application, I guess ;-)

Usually I end-up using Python for smaller things that aren't mission-critical
or need maximum performance, C++ for almost everything else, and Objective-C
for OS X/iOS stuff. Maybe Java for things where both performance, safety and
ease-of-development/maintenance matter (the latter would be mostly for other
people who would have to work on the same projection who are less skilled in
C++, as I'm not at all a fan of the language myself, but I recognize it has
some properties that make it a suitable choice for many kinds of applications
;-). I don't have enough experience with any other programming languages that
can be deployed easily across Linux and OS X, so I can't comment on those.
Rust seems to have some good ideas so I may want to learn more about it in the
future.

If I wanted to write something for Windows platforms I'd most likely gravitate
towards C#, from what I know about the language it appears to have all the
good things of Java without its downsides.

------
pjmlp
Fully agree, JIT makes sense in dynamic languages or real-time conversion of
op-codes, for everything else AOT is much better solution.

Having already quite a good experience with memory safe languages when Java
came into the world (mostly Wirth and ML derived languages), I never
understood why Sun went with the JIT approach. Other than trying to sell Java
to run in the browser.

There were already quite a few solutions using bytecodes as portable
executable format that were AOT compiled at installation time, e.g. mainframes
and research OSes like Native Oberon variants.

The best is to have both on the toolchain. A interpreter/JIT for development
and REPL workflows. Leaving the AOT for deployment/release code.

~~~
pron
> for everything else AOT is much better solution

Not if everything else includes long-running servers that may require various
kinds of fiddling at runtime, like injection or turning off and on of
monitoring/troubleshooting code, nor if you want to have zero-cost
abstractions that are actually high-level abstractions (as opposed to the
other kind of zero-cost abstractions, that are basically just weak
abstractions that can be offered for free).

~~~
pjmlp
Once upon a time I did drank too much of JIT Kool-Aid, and for a moment I
though it was the future, kind of.

However bearing the scars of the pain of "kinds of fiddling at runtime, like
injection or turning off and on of monitoring/troubleshooting code" actually
means in practice, changed my mind.

It is so damn hard to tune them, that there are consulting companies that
specialize in selling JIT tunning services.

And in the end they offer little performance advantage over using PGO or
writing code that makes better use of value types and cache friendlier
algorithms.

This was one reason why Microsoft introduced PGO for .NET in .NET 4.5.

So yes, I really mean everything else.

~~~
pron
I've gone the opposite way, and I really like my long-running server code to
be JITted when possible. I only wish my old C++ servers could be tweakable at
runtime as my Java servers[1] and I really like being able to use languages
(even for DSLs or rule engines) that have a few powerful abstractions, and let
the JIT make them run nearly as fast as the infrastructure code.

If you really need careful tuning of optimizations (which is never a walk in
the park), it will be made much nicer with this:
[http://openjdk.java.net/jeps/165](http://openjdk.java.net/jeps/165)

[1]: Just like Erlang, only fast! ;)

~~~
pjmlp
It is possible to have powerful abstractions, AOT and performance.

One just has to look at Haskell, OCaml, Common Lisp, Ada, Eiffel, SPARK,
Swift, .NET Native,....

The fact that for the last decade mainstream AOT was reduced to C and C++,
kind of created the wrong picture that one cannot have performance in any
other way.

I always dislike how they present some of the language features (stack
allocation, structs and arrays) as if there wasn't any other language with
them.

For monitoring, although one doesn't get something as nice as VisualVM or
JITWatch, it is possible to bundle a mini-monitoring library if really needed.

Something akin to performance counters in Windows.

~~~
pron
Ada (/SPARK) is a complex language that makes the developer think of every
optimization-related implementation detail as they code (not unlike C++ or
Rust), and Haskell/OCaml would benefit if functions that do pattern-matching
could be inlined into their call site and all of their branches removed but
one or two (the same goes for the other languages on the list).

~~~
pjmlp
Just to provide some feedback that I should have written on the sibling
comment.

I use both Java and .NET since they got out, there is the occasional C++
library, but they are the core of my work. So I know quite well their eco-
systems.

Some of the mentioned scars were related with replacing "legacy" C++ systems,
while keeping a comparable performance.

------
yummyfajitas
The article seems to ignore a variety of very real benefits that you get from
the JIT. In many real world cases, there are optimizations that the JIT can
perform. For example:

    
    
        final boolean runF = ...; //In *this* program run, it works out to be false.
        ...
        if (runF) {
          f(x);
        }
    

The JIT can just eliminate this branch entirely. More interestingly:

    
    
        if (cond) {
          myVar = f(x); //f(x) has provably no side effects
        }
    

Supposing cond is nearly always true, the JIT can optimistically run f and in
parallel check cond. In the unlikely event that cond is false, the JIT will
still compute the return value of f(x), but just not do the assignment.

It surprised me when I saw all this happen, but the JVM's JIT is actually
pretty smart and works very well for long running server processes.

~~~
xyzzyz
In the first case, either the "if" is run in a tight loop, and then the branch
predictor will likely predict it correcly, or it isn't, and then branch
elimination brings only trivial improvement.

In the second, what do you mean by "run f and in parallel check cond"? How
that would translate to the machine code? You want to run and synchronize cond
and f(x) between two separate CPU cores or what?

~~~
yummyfajitas
And if it's not run in a tight loop, the JIT can often still detect it.

I don't know why you believe such things are trivial. On one occasion I
altered some code which broke this, and it added about 10-20 seconds to a 15
minute HFT simulation. If that code made it to prod, those 10-20 seconds would
be distributed around the hot parts of an HFT system. That's not remotely
trivial, at least for performance critical applications (HFT, RTB, etc).

In the second example, basically what happens (if you get lucky - I haven't
found the JVM JIT to be super consistent) is that computation of cond and f(x)
will be put into the pipeline together, and only the assignment to myVar will
be conditional. This is strictly NOT something the CPU can do by itself, since
there are side effects at the machine level (allocating objects, and then
GCing them).

------
Animats
Why did Java stay with a JIT for so long, anyway? Java is hard-compilable, and
GCC can compile Java. The original reason for a JIT was to speed up applets
running in the browser. (Anyone remember those?) Server side, there was never
much point.

~~~
Kristine1975
Cynical answer: It wouldn't fit the "write once, run anywhere" slogan used to
market Java.

~~~
jerven
JIT gives you more information about how the program is actually run. This
allows a large number of optimistic optimizations that GCC etc... can not
make.

This in theory allows faster code to be generated. Than ahead of time
compilers can. In practice this is destroyed by the pointer chasing default
datastructure in java.

If you are executing "C" using JVM techniques then a research JVM JITing C
code gets within 15% of GCC on a reasonable set of benchmarks.
([http://chrisseaton.com/plas15/safec.pdf](http://chrisseaton.com/plas15/safec.pdf)).

These benchmarks are favorable for the AOT GCC as they have few function
pointers, startup configuration switches and little dataflow variety.

I suspect that specific compiler optimizations on the C code for the dense
matrix transform in GCC are bigger part of why GCC is faster than the fact
that it is AOT instead of JIT.

There are also a number of AOT compilers for Java e.g. excelsior. But I have
not seen superior performance with that over an equivalent hotspot release.

~~~
masklinn
> JIT gives you more information about how the program is actually run. This
> allows a large number of optimistic optimizations that GCC etc... can not
> make.

They'd be surprised to learn that considering GCC supports profile-guided
optimisation.

~~~
pron
Modern JITs basically _are_ very sophisticated profile-guided optimizing
compilers. Unlike AOT PGOs, though, JITs can adapt to changing program phases,
that are common in server applications (e.g. when certain
monitoring/troubleshooting/graceful-degradation features are turned on). On
the whole, JITs can produce better code.

But this, too, is a tradeoff. Some JITs (like JS's in the browser) require
fast compilation, so some optimizations are ruled out. Very sophisticated
optimizations require time, and therefore are often useful when you have
tiered compilation (i.e. a fast JIT followed by an optimizing JIT), and tiered
compilation is yet another complication that makes JITs hard to implement.

~~~
jules
An AOT compiler can decide to compile a JIT code generator into the binary, so
even in theory AOT beats JIT. In practice it almost never makes sense to do
this for a language that isn't full of gratuitous dynamism.

~~~
pron
> An AOT compiler can decide to compile a JIT code generator into the binary,
> so even in theory AOT beats JIT

That is not AOT. That is compiling some code AOT and some JIT[1]. It's not a
contest. Those are two very common compilation strategies, each with its own
pros and cons.

> In practice it almost never makes sense to do this for a language that isn't
> full of gratuitous dynamism.

What you call "gratuitous dynamism" others call simple, general abstractions.
BTW, even Haskell/ML-style pattern matching qualifies as "dynamism" in this
case, as this is something a JIT optimizes just as it does virtual dispatch
(the two are duals).

[1]: Also, a _compiler_ doesn't generally insert huge chunks of static code
into its output. That's the job of a linker. What you've described is a system
comprised of an AOT compiler, a JIT compiler, and a linker that links the JIT
compiler and the AOT-compiled code into the binary. Yeah, such a system is at
least as powerful as just the JIT component alone, but such a system is not
called an AOT compiler.

~~~
jules
JITs could only optimize pattern matching if one particular branch is picked
all the time, but not statically knowable. That is a very niche case.

Really, in almost all cases where JITs are effective it's because of a
language where you have to use a dynamic abstraction when it should have been
a static one. The most common example being virtual dispatch that should have
been compile time dispatch. Or even worse, string based dispatch in languages
where values are semantically string dictionaries (JS, Python, Ruby). A JIT is
unnecessary in languages with a proper distinction between compile time
abstractions and run time constructs. Names should be a purely compile time
affair. Instantiating an abstraction should happen at compile time in most
cases. There goes 99% of the damage that a JIT tries to undo, along with a lot
of the damage that a JIT _doesn 't_ undo. Sadly most languages lack these
abilities. Rust is one of the few languages that has a sensible story for
this, at least for the special case of type abstraction, traits, and lambdas.
I'm still hoping for a mainstream language with full blown staged programming
support, but I guess it will take a while.

~~~
pron
> JITs could only optimize pattern matching if one particular branch is picked
> all the time, but not statically knowable.

They would optimize it if one particular branch is picked all (or even most
of) the time but not statically knowable _at any given (extended) call site_ ,
and by "extended" I mean up to any reasonable point on the stack. That is not
a niche case at all. Think of a square-root function that returns a Maybe
Double. At most call sites you can optimize away all matchings, but you can't
do it statically. Statically-knowable is _always_ the exception (see the
second-to-last paragraph).

> Names should be a purely compile time affair.

That doesn't work too well with dynamic code loading/swapping... But in any
case, you're now arguing how languages should be designed. That's a whole
other discussion, and I think many would disagree with you.

> There goes 99% of the damage that a JIT tries to undo, along with a lot of
> the damage that a JIT doesn't undo.

At least theoretically that's impossible. The problem of program optimization
is closely related to the problem of program verification (to optimize
something you need proof of its runtime behavior), and both face undecidable
problems that simply cannot be handled statically. A JIT could eliminate any
kind of branching at any particular site -- be it function dispatch, pattern
matching or a simple if statement -- and eliminating branches opens the door
to a whole slew of optimizations. Doing branch elimination statically is
simply _undecidable_ in many, many cases, regardless of the language you're
using. If you could deduce the pertinent information for most forms of (at
least theoretical) optimization, you've just solved the halting problem.

Whether or not what you say is true in practice depends on lots of factors
(like how far languages can take this and still be cheaply-usable, how
sophisticated JITs can be), and my bet is that it would end up simply being a
bunch of tradeoffs, which is exactly where we started.

------
pvdebbe
I'm surprised to read about this direction, however little practiced it may
be. JIT never was for the fastest of applications but in-time optimization
should (and has, in some occasions) reach and even exceed the speeds of
precompiled ('native'). Simply because the JIT optimizer has access to the
actual runtime data that is being fed to the Tight Loop of Performance
Bottleneck, and can do local and specific optimization for that run alone. AOT
optimized binary can only do generic optimizations in theory.

~~~
joosters
Several years ago, HP (I think) published a research paper investigating JIT
on precompiled binaries, exactly for this purpose. They claimed around 5-10%
speedup on x86 code. Unfortunately the project never got released.

------
lmm
JIT may be a bad fit for real-time phone apps. But at the same time we're
seeing the return of batch processes thanks to "big data".

At my last job I worked on Spark processes that took several hours to run. In
research, performance is important but you do get to average. So I don't think
JIT will go away for that case; it wasn't worth hand-optimising (most jobs
were only ran a couple of times), but at the same time I was very glad it
wasn't Python.

~~~
JohnDoe365
With Big Data we also see stream processing, which is much more Big Data than
batch processing

------
mwcampbell
I'm ambivalent about this subject.

On the one hand, it intuitively makes sense that JIT compilation is inferior
to AOT compilation for an application that will be packaged once on a
developer's machine and then run on thousands or millions of devices. Doing
the JIT compilation on all of those devices is a waste of energy; it's better
to do the compilation just once on the developer's machine.

Also, JIT compilation requires warm-up to get to good performance. The kind of
consumer who won't pay more than a few dollars for an app will also be turned
off by an app that isn't snappy right away. So the first impression is
everything, and a warm-up period before good performance isn't as acceptable
here as it is in server applications.

On the other hand, on the major mobile platforms, native apps which can be
AOT-compiled have to be distributed by gatekeepers, i.e. the app stores. On
these platforms, the only alternative to the gatekeepers is the web. So I have
a non-technical reason to want the last major JIT-compiled language on the
client side, JavaScript, to be good enough for a variety of applications.

I was going to comment on the advantages of JIT compilation with regard to
polymorphism, but then I found that the OP addressed that with his argument
about predictability being more important than best-case or even average-case
performance.

Finally, I'm curious if the author of the OP would use the same argument about
predictability over best or average case performance to argue for reference
counting over tracing GC. Maybe Android's ART and .NET Native would benefit
from using reference counting in combination with a backup tracing GC to
handle cycles, like CPython.

~~~
pjmlp
.NET GC is quite good and they introduced features in version 4.6 that allow
even for fine-grained control over when collections happen and how.

Reference counting is only helpful if directly supported by the compiler, to
remove inc/dec pairs. Also it is worse than GC for multi-threaded
applications.

Currently I think the only alternative to GC are Substructural type systems.
Even C++ guys are now looking into this. The problem is making it palatable to
the average mainstream programmer.

------
corysama
This widespread move towards AOT is interesting in comparison to the
prevailing attitude 5 years ago.

"Have tracing jit compilers won?" [http://lambda-the-
ultimate.org/node/3851](http://lambda-the-ultimate.org/node/3851)

------
vvanders
The "hybrid" model they refer to is very common in Game-Dev. C/C++ down at the
engine level tied together with Lua/UnrealScript/Lisp/etc.

------
wiz21c
In my company, we develop on windows, build on linux and run on solaris
(JavaEE). Moreover (forget devops!) the deployment team are not the
development team.

Having an environment (JVM + web server + our code) that is the same across
the various hardware and company's teams is really something that helps.

So besides the AOT/JIT discussion, the value of the JVM in itself is really
good for us.

~~~
pjmlp
Yes, but that hasn't anything to do with AOT/JIT rather a portable binary
format for Java executables.

A few commercial JVM vendors do offer AOT compilation on their JDKs.

------
noblethrasher
AOTs are best for when you want to use universal computers to run computations
(e.g. apps).

JITs are best for when you want to use universal computers to run better
universal computers.

------
skybrian
In the case of Android, the big downside is very slow system updates because
all apps on the device have to be recompiled. But it probably can be fixed.

------
ilaksh
This is one reason I am pretty excited about the potential for web assembly in
terms of a cross-platform and cross-language performant target.

------
tempodox
The Julia language seems to be designed to be nothing more than optimised glue
between two Python scripts. As a dynamic language, it's certainly a good match
for a JIT (run at once after a quick edit). And that also explains why there
can never be a standalone executable produced from Julia, since that would
require AOT. My guess is that Julia will dodge the Jitterdämmerung for a long
time to come, if not forever.

~~~
Fede_V
Julia does very little tracing, except to infer types, which it then passes on
to LLVM. All the optimizations in Julia are because Julia stresses type
stability, and once types are known, LLVM can generate very good code specific
for those types.

You can easily annotate functions with types in Julia, and pre-compile
specialized functions. Julia does very little runtime magic to speed up code,
especially compared to PyPy. The speed just comes from a very clever type
system + LLVM.

------
pron
There is no "shift away from JIT", just a simple observation that JITs are a
tradeoff that you sometimes don't want to make. They have three drawbacks and
two advantages. The three drawbacks are 1/ slow warmup, 2/ somewhat increased
RAM and CPU usage and 3/ complex implementation. The two advantages are 1/
(often _much_ ) better runtime performance (i.e. more optimized code) and 2/
much better support for runtime manipulation of code (for profiling, debugging
"at-full-speed", hot-patching, monitoring etc.).

The slow warmup makes JITs a bad choice for quick command-line tools, and the
increased RAM/CPU makes them a bad choice for battery-powered devices and
those with very limited RAM. The better performance and optimization
opportunities makes them the _only_ performant choice for some languages that
are hard to optimize AOT (those that rely a lot on dynamic dispatch and/or
have dynamic data-structures, i.e. maps instead of class instances). The
better instrumentation support makes them a terrific choice for long-running
server-side application, where the drawbacks don't matter. On the client-side,
unless the language requires a JIT for decent performance (like JS), there is
no compelling reason to use one, and that's why Microsoft's decision makes
perfect sense, as they've decided to focus on .NET on the client. This has
nothing to do with JITs' great utility in general.

One of the biggest breakthroughs in compiler technology in the last decade is
Oracle Lab's Graal[1], which can also be used as an AOT, but with less-
powerful optimizations. E.g. Graal does this:
[https://twitter.com/ChrisGSeaton/status/619885182104043520](https://twitter.com/ChrisGSeaton/status/619885182104043520)

Graal (alongside its language-construction DSL, Truffle) has yielded
implementations of Ruby, Python and JS that easily rival the state-of-the-art
with far, far less effort, and also a very decent implementation of C (also
with orders-of-magnitude less effort than the competition).

The third drawback makes developing JITs from scratch a bad choice for anyone
but the most well-resourced teams, or those that are in no hurry, or those
that target very simple languages only.

JITs have another drawback -- less predictable performance -- that matters
mostly for hard-realtime code (or nearly hard-realtime), as deopts that
momentarily slow-down performance as a better optimization strategy is sought.
This is why hard-realtime JVMs offer a mixed JIT/AOT mode, where the hard
realtime kernel (which values predictability over perfroamce) is AOT compiled,
and the soft-realtime or non-realtime support code is JITted (which you want
to run as fast as possible, but don't mind a rare hiccup).

[1]:
[https://wiki.openjdk.java.net/display/Graal/Publications+and...](https://wiki.openjdk.java.net/display/Graal/Publications+and+Presentations)

~~~
bjourne
> 1/ (often much) better runtime performance (i.e. more optimized code) and

Do you have a cite for that? Except for method inlining, which only requires a
trivial JIT compiler, I haven't seen them beating normal compilers. The
argument that you should be able to use the programs runtime data to perform
specific optimizations seems to me to be oversold. Since the JIT compiler must
be reasonably fast it eschews many optimization techniques that AOT compilers
can afford to use.

~~~
pron
Look at the tweet I linked to. And bear in mind that languages that are
normally AOT-compiled (like C and C++) are often designed in such a way that
information that is pertinent to most optimization is available at compile
time, plus programmers in those languages don't make use of more general
abstractions unless they have to (which makes, say, C++ more complicated as it
has two dispatch mechanism that the user needs to be aware of). Such languages
obviously won't be accelerated much by a JIT, but they have a big complexity
cost, as various low-level optimization considerations have to be exposed by
the language. JITs make simpler, higher-level languages as performant, and
there's plenty of data to support that.

~~~
bjourne
But your claim was "(often much) better runtime performance" What you link to
doesn't support that. I can concede that a JIT can attain _equivalent_
performance to AOT compiled code. But I haven't seen any evidence that JIT:ing
in practice increases performance. Note that there are many modern high-level
performant languages that do not use a JIT, like Nim, Haskell, Julia and Rust.

For example, the V8 engine has both a JIT an AOT compiler. The engine compiles
all code with the fast AOT compiler and then recompiles frequently used
functions with the optimizing JIT compiler. If it instead compiled _all_ code
with the optimizing compiler AOT, the JIT part wouldn't be needed and you
would get just as fast code.

JIT can only beat AOT if you can exploit patterns in the dataflow of the
program. Doing that profitably (i.e. the optimization must save more time than
it costs to perform) is incredibly hard.

~~~
hyperpape
Haskell and Rust both require monomorphization, right? That's one thing the
JVM doesn't require. You do pay a performance penalty for megamorphic code
([http://insightfullogic.com/2014/May/12/fast-and-
megamorphic-...](http://insightfullogic.com/2014/May/12/fast-and-megamorphic-
what-influences-method-invoca/)) but it's still a difference in what's
allowed.

I also don't understand what you're saying with regard to V8. Can the
optimizing AOT compiler actually do enough optimizations on a language as
dynamic as JS?

~~~
steveklabnik
Rust lets you choose between monomorphization (trait bounds) and vtables
(Trait objects).

~~~
jules
The semantics of those are different. With trait objects the vtable is
attached to the value. You could instead imagine a language with two types of
trait bounds: one specialized statically (like Rust trait bounds), and the
other handled dynamically (like type classes in Haskell). The semantics of
these would be identical (vtable travels independently of values), the only
difference is performance. I'm not sure if you'd ever want Haskell's
implementation strategy in practice though, unless you want to support
features that can't be supported by static specialization like polymorphic
recursion or higher rank polymorphism (not to be confused with higher kinds).
Interestingly C# does support those features _and_ specialization because it
specializes at run time. It's a less known very powerful feature, at least in
theory :) You can abuse it to make the CLR generate arbitrary code at run-
time.

~~~
Jweb_Guru
In what way does the vtable being attached to the value cause different
semantics (what you describe sounds like an implementation detail)? In
particular, you can have Box<TraitObject> which has effectively identical
semantics to TraitObject; yes, it's a fat pointer, but from the perspective of
the trait itself there's no way to tell that this is the case. Anyway, the
only ways I can think of to usefully differentiate the fat from a thin pointer
in a parametric function are those in which Rust already fails to have proper
parametricity for any type (including being able to access its type_id).

~~~
jules
Steve Klabnik was not talking about Box<TraitObject> vs TraitObject, but about
f(x:TraitObject) vs f<T:TraitObject>(x:T). In the former the vtable is
attached to the value, in the latter the vtable travels separately. These do
have different semantics, compare f<T:TraitObject>(x:Vec<T>) with
f(x:Vec<TraitObject>). In the x:Vec<T> case there is a single vtable that gets
passed to the function, and all elements of the vector share that same vtable.
With x:Vec<TraitObject> each element of the vector has its own vtable.

In terms of pseudo Haskell types these two types would be:

    
    
        TraitObject t => List t -> Result
        List (exists t. TraitObject t) -> Result
    

Rust cleverly sweeps the existential under the rug.

~~~
Jweb_Guru
Again, in what way is the use of fat pointers a _semantic_ difference, outside
of parametricity-breaking functions like size_of_ty and type_id? You are
describing an implementation detail. I can think of times that the compiler
should be able to desugar the fat trait objects into thin ones in the absence
of a Reflect bound.

~~~
jules
As I explained, the difference is that in one case you have one vtable per
object in the vector, in the other case you have one vtable for the whole
vector. With one vtable per object you can have one vector containing objects
of two different types that implement the same trait with a different vtable.
With one vtable for the whole vector you cannot. The same difference exists in
many languages, e.g. C# List<IFoo> vs List<T> where T:IFoo.

~~~
Jweb_Guru
Oh, duh, yes. I missed the obvious thing :P Indeed, that is the point of
existentials, sorry for being obtuse.

------
smcl
For anyone confused at the title - it's a reference to a part of the epic
Wagner opera "Der ring des nibelungen" called "Götterdämmerung" or "Twilight
of the Gods"

~~~
Intermernet
I personally rate it as one of the best titles for a blog post I've come
across in a while :-)

If someone eventually advances the state of distributed revision control
systems then they'll get to write an article titled "Gitterdämmerung".

~~~
gjm11
I think "Jitterdämmerung" is strictly better than "Gitterdämmerung" would be,
because there really are multiple JITs but there's only one Git. (On the other
hand, the Hamming distance to "Götterdämmerung" would be less.)

~~~
Intermernet
If we all took Dijkstra's advice we could claim that the current state of
programming is "Gotodämmerung".

Not so good on the Hamming distance though...

