
Interpreter, Compiler, JIT - hnuser1243
https://nickdesaulniers.github.io/blog/2015/05/25/interpreter-compiler-jit/
======
ndesaulniers
Hey all, happy to take questions/feedback/criticism.

Funny anecdote: while developing this post, once I got the JIT working I was
very excited. I showed a few people in the office. Our CTO walked by and came
to take a look. He's worked on numerous VMs in the past. SpiderMonkey's first
JIT, TraceMonkey, being his PhD thesis. He took one look and asked "Is it self
hosted." I replied, "well...not yet." To which his response was "pffft!" and
walked off. I found that pretty funny. Maybe in the next blog post!

~~~
wvenable
You could try using C2BF[1] in an attempt to make it self hosting. :)

[1] [http://esolangs.org/wiki/C2BF](http://esolangs.org/wiki/C2BF)

~~~
ndesaulniers
Hey! I like the way you think, my friend!

------
vardump
Maybe one day someone will write a native JIT for x86[-64]. Native x86 code
in, optimized native x86 code out.

It should be possible to JIT native code and run it faster than running the
native code directly!

It could do:

1) Peephole optimizations (and known sequence/function replacement). Utilize
target instruction set extensions.

2) "Constant" folding. Replace code that processes values that can be proven
to be immutable with a constant. Also removes unnecessary branches. Guards at
loads, indirect stores, by marking page read only, etc.

3) Vectorization. This includes widening existing vectorization if target
instruction set supports it.

4) Loadable library call inlining. This could even mean inlining system calls,
but of course in that case the JIT would need to be running in kernel...

5) Profile guided optimization.

Of course there are some _very hard_ problems. For example, there'd need to be
guards for the case the assumed constant value does change. Or that unexpected
call target occurs. Etc.

Maybe this system could learn possible call targets and unexpectedly changing
"constants".

~~~
ndesaulniers
Doesn't LTO do some of these things?

~~~
vardump
I was talking about runtime executable optimization. In general, I don't think
this is possible at link time. If the binary is linked in 2015, it's not
possible to know about CPUs available in 2035 or to utilize the new features,
even if they're binary compatible.

Compiling (incl. linking) bakes in assumptions about target CPU. If these
assumptions are incorrect, the performance will be sub-optimal. Even
pathologic cases are possible, like order of magnitude lower performance in
AMD Jaguar AVX stores.

CPUs have wildly different combinations of instruction set extensions.
Instruction throughput and latency varies significantly. Cache subsystems are
different.

So code that runs well on one CPU can run sub-optimally on another binary-
compatible CPU.

------
pron
JITs don't have to repeat all their work every time they run. They can cache
their output (this feature is planned for Java 9, I think). And while, as the
article says, JITs are pretty much a necessity for languages with dynamic
dispatch, which are nearly impossible to optimize ahead-of-time, they can be
great for statically-typed languages, too:

1\. Their ability to speculatively optimize (and then de-optimize when the
assumption proves false and recompile) makes it possible for them to implement
zero-cost abstractions, such as inlining polymorphic virtual calls.

2\. They make it possible to optimize across shared libraries, even those that
are loaded dynamically.

To those interested in the future of JITs, I very much recommend watching one
of the talks about Graal[1], HotSpot's (the OpenJDK JVM) next-gen JIT. Like
HotSpot's current optimizing JIT compiler, it does speculative, profile-guided
optimization, but exposes an API that lets the language designer (or even the
programmer) to control optimization and code generation. It is also self-
hosted (i.e. written in Java).

It's still under heavy development but early results are promising. Even
though it supports multithreading (which complicates things), it performs
better (often much better) than PyPy when running Python[2] and on par with V8
when running JavaScript[3].

[1]:
[https://wiki.openjdk.java.net/display/Graal/Publications+and...](https://wiki.openjdk.java.net/display/Graal/Publications+and+Presentations)

[2]:
[https://docs.google.com/spreadsheets/d/1fFMWcRIuPKt7wSAM5Ox9...](https://docs.google.com/spreadsheets/d/1fFMWcRIuPKt7wSAM5Ox9Rho4BBRhA5xgX6oemGhVxAA/edit#gid=1)

[3]:
[http://www.slideshare.net/ThomasWuerthinger/jazoon2014-slide...](http://www.slideshare.net/ThomasWuerthinger/jazoon2014-slides)

~~~
vidarh
> JITs are pretty much a necessity for languages with dynamic dispatch, which
> are nearly impossible to optimize ahead-of-time,

Depends what you consider "nearly impossible". A lot of compilers for dynamic
languages are just awful for no particularly good reason so it's often hard to
assess what is slow because it is hard and what is slow because the
implementation disregards the last 30 years of experience with compiling
dynamic languages.

E.g. I'm working on an ahead-of-time Ruby compiler. While it will need a JIT
component for cases where people call eval, and while Ruby is particularly
nasty to compile for a variety of reasons, the method dispatch is easy to
reduce to one indirection via a vtable (you just need to propagate method
overrides down the class hierarchy and update the vtables, but updates to the
methods is much rarer than cals so it's ok for it to be more expensive),
equivalent to C++ virtual member functions (though C++ compilers have better
hope of being able to optimize away the virtual call).

Despite the pathological cases possible because of the singly rooted object
hierarchy, for most typical applications the wasted space in vtables (for
method slots for methods that are unimplemented in a specific branch of the
class hiearchy) is easily compensated for by e.g. needing less bookkeeping for
methods that are known statically at compile time (which is the vast majority
for most applications).

If you're willing to compile a fallback path and suitable guards, you can
sometimes optimize away many indirections entirely and even inline code ahead
of time even for languages like Ruby (incidentally for example of that you can
look at the work Chris Seaton has done on a JRuby Truffle/Graal backend -
while that does these optimizations at runtime, many of them are applicable
ahead of time too, though the JIT can get the advantage of not having to
generate code for fallback cases unless they're actually needed at runtime)

Note that I agree with you that JIT's have many advantages. At the same time,
while I love the flexibility of dynamic languages, I prefer the smallest
number of moving wheels possible in production environment. Spent too long
doing devops.. Makes me lament the lack of attention to AOT/"as static as
possible" compilers for dynamic languages.

~~~
jerf
"E.g. I'm working on an ahead-of-time Ruby compiler.... easy..."

If you haven't already... it's 2015. You might want to go look at the sun-
bleached bones of all your predecessors who confidently proclaimed that all
the dynamic scripting languages were slow for no good reason and a JIT could
totally just JIT all the slowness away.

If it's a journey you wish to take, by all means, be my guest, but, please, be
sure you know what you're getting into.

(The aforementioned sun-bleached bones are the bulk of the reason why I no
longer believe in the old adage that "languages don't have performance
characteristics, only implementations do", at least in the full sense
intended. Languages may not have performance characteristics but the evidence
rather strongly suggests they _can_ put upper bounds on performance.)

~~~
WalterGR

        You might want to go look at the sun-bleached bones of all your predecessors
        who confidently proclaimed that all the dynamic scripting languages were slow
        for no good reason and a JIT could totally just JIT all the slowness away.
    

I ask this with the best of intentions: where can I learn more about this?

(FWIW, I'm curious specifically about the ways in which Common Lisp (or Lisps
in general) does or does not avoid the performance pitfalls of - say - Ruby or
Python. But I'm definitely interested in the general topic as well...)

~~~
chrisseaton
If you are coming from the perspective of a Ruby programmer, there are lots of
papers on optimising Ruby in the Bibliography
[http://rubybib.org](http://rubybib.org).

But what you want to do is take a look at a few of the VM papers there, and
then look at their references and see what people have tried previously. The
related work sections often talk about previous limitations.

Lisps in general are much more static than a language like Ruby, so it
wouldn't be very instructive to compare against them. The trickiest Ruby
features are things like Proc#binding and Kernel#set_trace_func, as these
require you to modify running code and data structures which may have already
been very aggressively modified.

I don't think (but am not an expert) that most Lisps have features like those.
Lisps have code as data of course, which sounds like a nightmare to compile,
but it's really not - the code trees are immutable (I believe) when generated,
so just compile them whenever you see a new tree root and cache.

~~~
lispm
> Lisps in general are much more static than a language like Ruby

What?

You mean running code directly off of data structures like in Lisp
interpreters, using late binding, dynamic CLOS dispatch with method
combinations, a full blown meta-object protocol, etc. is more static than
Ruby?

Okay...

~~~
chrisseaton
> running code directly off of data structures

I specifically addressed this. You may need to re-compile if you see a new
code data structure instance, but when you've compiled the fact it came from a
data structure isn't interesting. All JIT compilers have code as data - they
have an IR.

> late binding

This is no more troublesome than it is in Ruby.

> dynamic CLOS dispatch with method combination

But these are optional features that you sometimes use. In Ruby all
operations, method calls and most control structures use the strongest form of
dynamic dispatch available in the language.

> full blown MOP

So does Ruby.

So I can't say Lisp is any _more_ dynamic for the compiler than Ruby. A
sibling of yours mentioned a trace extension but said people don't expect it
to be fast so nobody is bothering. In the implementation of Ruby I work on,
we've even made set_trace_func as fast as we can.

~~~
lispm
> All JIT compilers have code as data - they have an IR.

Lisp interpreters run mutable source code, not IR.

> > dynamic CLOS dispatch with method combination

> But these are optional features that you sometimes use. In Ruby all
> operations, method calls and most control structures

If you look at actual Lisp systems it's not optional. IO, Error handling,
tools, ... all use CLOS.

> use the strongest form of dynamic dispatch available in the language.

CLOS uses a more dynamic form of dispatch.

> So I can't say Lisp is any more dynamic for the compiler than Ruby.

Using a compiler or interpreter makes no difference for CLOS.

~~~
chrisseaton
Well I'm not an expert in compiling Lisp and I guess you're not an expert in
compiling Ruby, so we're probably not going to come to an agreement on this.

------
ndesaulniers
Over in the comments in proggit the inventor of brainfuck, Urban Müller,
showed up and gave the Chuck Norris thumbs up:
[https://www.reddit.com/r/programming/comments/377ov9/interpr...](https://www.reddit.com/r/programming/comments/377ov9/interpreter_compiler_jit/crkkrz4)

