
What I Learned Making My Own JIT Language - developer-mike
http://www.mikedrivendevelopment.com/2018/06/what-i-learned-making-my-own-jit.html
======
filereaper
PSA: If there are folks writing their own JIT languages (and are highly
encouraged to do so) please also checkout out the Eclipse OMR project. OMR is
a bunch pluggable runtime components (GC, JIT etc...) intended to ease
building of runtimes from scratch.

[https://github.com/eclipse/omr](https://github.com/eclipse/omr)

~~~
PeCaN
OMR is amazing. It's essentially IBM's J9 virtual machine, with all the man-
decades of effort that went into that.

However, if you're writing your own JIT language and you've never written one
before, you probably want to learn something and write your own GC, JIT, etc.
It's really not that hard. Then once you get an idea of how everything works
you can use OMR and make it fast.

~~~
qznc
How does it compare to PyPy?

~~~
ufo
Different technology. PyPy/RPython is a more "high level" approach built on
top of metatracing while OMR is more low level and geared towards method-at-a-
time JIT. (I might be wrong. I worked more with pypy but only know OMR
superficially)

------
corysama
I wish [http://terralang.org](http://terralang.org) got more attention. It’s a
great way to write a jitting library. Add in LuaPEG and it’s a great way to
write a jitting language.

~~~
iTokio
Doesn’t it rely on llvm internally? That means that the jit path is slow to
start and that your have a huge dependency.

~~~
corysama
terra.dll is 53 megs. Terra is really oriented around high performance
computing, so if you want a quick little script, it's a bad fit. For small
apps, it can still be useful as an ahead-of-time compiler to either a stand-
alone executable or a C-linkable library.

------
stcredzero
_Sure, there are things that can be "poison pills" for performance, even in
JITs. For example, another reason why my fib benchmark beats V8 is that V8 has
to continually check if the fib function was redefined as a deoptimization
check. I don't allow that in Vaiven, so I can produce faster code. Dart was
designed to have fewer of these poison pills, for instance._

Note to language designers: Reduce your language's poison pills. For those
which can't be avoided, make them easy to profile!

~~~
jerf
I would be intrigued to see a modern attempt at a dynamic language like Python
or Ruby, but with attention paid from day one to ensure that only very JIT-
able constructs were used, and with an eye towards the language helping the
user stick to those and be aware of when they deviate, rather than designing a
language, setting "runs fast" somewhere around the third or fourth priority,
then trying to JIT it after 10-15 years of non-JIT development.

Even LuaJIT, which is to my understand the closest anyone has come to this,
still was retrofitting JIT onto an existing language.

I'm not convinced we're going to see much more performance out of JS, for
instance, which is about as fast as a dynamic language can go nowadays. But I
wonder what the real limits of a "dynamic scripting" language would be from
this perspective.

Edit: Per my other comment about indirection being a performance poison pill,
here's an example of an idea where a new scripting language might be able to
get a lot of performance. Suppose you keep the ability to dynamically create
classes and load code and so forth, so that (just as an example, not
necessarily a good idea) you can write code that dynamically connects to a
database and loads in tables as classes with automatically defined properties,
etc. But instead of working the way the languages do now, instead of
constantly walking through all the layers of indirection that can be used to
implement all this at every call site and for every call, what if you could do
something like the "pledge()" call that says "OK, this is it, I'm all set up,
the dynamism is all done, you may now assume that all the type analysis you've
done is now complete". Now the JIT can drop all of its paranoia code. It isn't
completely obvious to me how to do this correctly (by implication, if you can
"connect to a database" before this pledge()-like call is done, you have to
have a pretty complete runtime available just for that), nor is it obvious to
me how this would affect the type system, etc. Maybe it's not possible. But
it's the type of thing I mean that would be interesting to have examined by
someone, who might be able to concretely show why it's not possible, or,
whoknows, make it work. And that's just one idea of how to design a dynamic
language from the top for performance, basically off the cuff; who knows what
a smart person who sat down and thought about this for a couple of weeks
before even beginning to code could come up with.

Another one is that it isn't obvious to me that scripting languages _must_ be
all based on hash tables that laboriously can be cast back to structs by the
JIT; it seems to me that it would also be feasible to go the other way, and
allow users to define things as structs, and if you want to offer hash-like
access it would be easy to lay down hash-like access to the struct members
(iteration, etc.), and either implicitly or explicitly also offer an
"overflow" hash table if desired. This would give at least a bit of locality
control. Something something arrays too for array control, and suddenly you're
starting cook with gas.

~~~
goatlover
Nim and Crystal exist, although they're compiled. For a dynamic language made
for JIT performance, there is Julia.

~~~
chriselrod
I think Julia is a great example. It also uses a "world-age" for function
redefinitions, like another commenter suggested, to put the burden on
redefinitions rather than than using runtime deoptimizations. Before
implementing the world age, they still didn't deoptimize. This meant that if
"bar" called "foo", you call "bar" (causing it to compile), and redefine
"foo", and then call "bar" again, it would still use the old foo method. That
workaround (redefining "bar") was inconvenient, but unlike with deoptimizing,
there was a workaround.

One comment on Julia though. It isn't like other JIT languages. It's a lazy
statically compiled language with a REPL. It uses LLVM (like Clang) to compile
functions the first time they're called for a given set of input types (every
function is like a C++ template by default). This makes it easy to get C-level
speed, because all it is is a different front end (with extensive type
inference infrastructure/default "auto" for all types) to the same back end.

This is in constrast to an interpreted language, where code only gets compiled
when hot (but then with the potential of profile-guided compilation, which
could sometimes let them run even faster).

------
mncharity
A while back, it looked like v8 deopt type guards, for at least monomorphic
call sites, were consistently ending up off of the processor speculative
execution mainline. They were "free"-ish. Then speculative execution attacks
and mitigations hit.

Does anyone know the current state of play?

I've wanted fast multiple dispatch in js for many years. V8's handling of
inlining budgets improved, and guards seemed off mainline, so I'd looked
forward to trying again to get dispatch trees inlined away. Then Spectre hit.
:/

~~~
jcdavis
Not really familiar with v8 internals, aren't there ways to do deopts without
relying on guards? The JVM deopts methods by hotpatching the start of them to
be a jump to runtime handlers.

~~~
mncharity
> aren't there ways to do deopts without relying on guards

Yes, in general. But I was basically hoping to craft a javascript multiple
dispatch implementation that mostly landed on V8's existing dispatch fast
path.

A polymorphic inline cache for a call site, starts with a check(s) that the
object is an expected type. And a bit of inlined code, may start with checks
that the inlining assumptions (types of arguments, etc) are still valid. In
the JITed assembly, it looks like a bunch of test and jump instructions that
come before the real work. But the processor's branch prediction, ideally
recognizes that the common case is a still-valid cache/inlining. So the
processor's speculative execution, ideally proceeds immediately with the real
work. The checks happen off to the side - they don't delay the real work.

One way to do dynamic multiple dispatch is a dispatch tree of similar checks.
As with regular dispatch, most multiple dispatch call sites are monomorphic.
So if the dispatch checks gets inlined, you would have an extended version of
the usual checks. And V8 changed how it allocates it's inlining budget, in a
way that made arranging for inlining of dispatch seem more plausible than it
once was. My hope was the dispatch checks would also happen off mainline. So
multiple dispatch might be as fast as single dispatch.

But then speculative execution implementation flaws were recognized as a
security threat, and jits began intentionally disabling or constraining
speculative execution. Which might be sufficient to invalidate the whole idea.
Since those mitigations seem an ongoing work in progress, and in the past I've
found it expensive to tool up for v8 jit code analysis, I thought I'd see if
anyone already had a feel for where we're at/headed.

------
timClicks
wren is another language that fits lots of extra data within the "spare" bits
within NaNs. [http://wren.io/](http://wren.io/)

~~~
52-6F-62
That seems like a great project

------
mingodad
I'm impressed that people didn't mention
[https://www.gnu.org/software/libjit/](https://www.gnu.org/software/libjit/)
it's a small well designed library with a very good interpreter. I wish more
people put some effort on improving it and all of us could benefit form it
without reinventing the whell again and again.

~~~
b2gills
I'm very confident it would not be very good at optimizing Perl 6 code.

On the first page of the documentation for libjit is this:

    
    
        “Only a very small proportion of the work is concerned with language specifics.”
    

The thing is that Perl 6 tends to break more of the established rules than it
follows.

For example, most normal operators in Perl 6 are just subroutines with a
special name:

    
    
        a + b     ==     &infix:«+»(a,b)
        
        put &infix:«+».candidates.elems; # 26
    

There is currently 26 multi subs with the same name for handling the numeric
infix addition operator. (27 if you count the proto sub)

That is among the easiest of things to optimize in Perl 6.

I would also like to know if it is a JIT that profiles and compiles the
optimized code on another thread like MoarVM does. Is there a plugin system
that allows high level code to influence what code gets JIT compiled like
MoarVM recently received?

I'm sure that for just about every other dynamic language libjit would be very
suitable. It's just that I imagine it would quickly strain under the weight
that is Perl 6.

(Perl 6 brings in many varied features from many languages, and combines them
in a way that it feel like they have always belonged together.)

------
sigjuice
Title should say 'JIT compiler'.

~~~
jamestimmins
Is it not fair to say that this is the same thing? My impression was that in
designing a compiler you're explicitly determining the rules for language,
which is the same as designing the language itself? Or do you mean that
technically the compiler, not the language, is JIT?

This is a new area for me so apologies if I'm just misunderstanding the
mechanics here!

~~~
ufo
In theory the programming language is independent from the implementation. You
could write an alternate implementation if you wanted.

JIT-ness is a concept related to the implementation, not to the language.

~~~
jamestimmins
Gotcha, that makes sense. So I'm guessing that java based python vs c based
python is an example of theory vs implementation? Thanks for explaining!

~~~
ufo
Yup

------
barrkel
It is so pleasant to read an article with Intel asm syntax rather than AT&T,
my eyes thank you.

------
s2g
This is cool, I used to really like this sort of thing.

At some point I transformed into a bitter sad man who can't stand reading
articles like this because I feel like such a worthless piece of crap.

~~~
sitkack
> bitter sad

This can be your muse, channel it into something!

