

Hello, JIT World: The Joy of Simple JITs - justincormack
http://blog.reverberate.org/2012/12/hello-jit-world-joy-of-simple-jits.html

======
tptacek
This is a great post and I probably wouldn't have read up on DynASM without
it. But the comparison between DynASM syntax and the BPF JIT is a bit unfair;
the Linux kernel is not pervasively JIT'd, and the BPF language hasn't changed
in decades, so there was little incentive to create a flexible dynamic code
generation system for it.

You could probably very easily clean up the syntax for a C->x64 JIT without
requiring (yech) a preprocessor.

~~~
haberman
I'm not faulting the BPF authors for not inventing DynASM, I'm just
illustrating what DynASM buys you over the more traditional approach.

Without a preprocessor you'll always be dealing in encoded instructions
instead of symbolic ones. It's manual work that is surely nicer to avoid if
you can. I'm surprised that you're put off by the preprocessor aspect of it,
given the benefits of this approach.

~~~
tptacek
There's something to be said for instructions encoded into structs instead of
raw assembly; those structures are parameterized, so it's easy to interrogate
them or bind new values to them; without structures, you end up having to
invent new assembly language features for dynamic labels and things like that.

~~~
haberman
> There's something to be said for instructions encoded into structs instead
> of raw assembly

I'm confused, what is this in response to? Are we still talking about BPF's
JIT vs DynASM?

By "instructions encoded into structs" are you referring to BPF's byte-code?
I'm certainly not arguing against a byte-code approach at all -- byte-code
obviously has numerous advantages, including (as you mention) ease of
inspection. Even LuaJIT (the project that DynASM was built for) uses byte-code
pervasively.

The question is: at the point that you decide you want to generate machine
code from your byte-code, what's the easiest way to do it?

The BPF JIT's code is functionally identical to a DynASM-based code generator;
the _only_ difference is that the BPF JIT requires a human to put the machine
encoding of every instruction directly in the source file. DynASM saves the
human from performing this step by using a preprocessor.

~~~
tptacek
No, I'm saying, rather than have the representation of assembly code in your C
program be textual --- raw assembly code interpolated into C code and expanded
into structs by a preprocessor --- there is something to be said for having
the representation of your instructions be "native", expressed without a
preprocessor. In particular, it makes it easier to modify the assembly code in
C code later on in the runtime of the program; and, if you look closely at
things like DynASM, it's not "really" assembly _qua_ assembly, because they've
added features to the language to handle the dynamic things anyways.

I'm definitely not saying bytecode is better than assembly code! I'm talking
strictly about the mechanism by which you generate opcodes from "assembly
language".

A library that can generate opcodes without exposing all the fiddly mod/rm
stuff is probably just as good as the preprocessor tool.

I am, like everyone else, appropriately reverential of LuaJIT. :)

~~~
haberman
> A library that can generate opcodes without exposing all the fiddly mod/rm
> stuff is probably just as good as the preprocessor tool.

I'm skeptical that such a library could have a very simple or clean interface
-- the "mov" instruction alone has almost 40 variants, all encoded in
different ways -- but if you ever create such a thing I promise to check it
out and give it a fair opinion. :)

~~~
tptacek
Most of the x64 instruction set shares a common set of addressing mode
variants all encoded the same way.

I've done this from C (messily) and from Ruby (very cleanly) and while the x64
ISA is definitely a pain to work with directly, it's not so painful that I
think it defies the abstractions C provides natively. :)

------
gruseom
This is a beautiful introduction to DynASM, the code generation library from
the amazing LuaJIT. Josh, I hope you keep writing more pieces like it.

~~~
haberman
Thanks! It took a while to write, but I was motivated by the positive feedback
I'd gotten on HN previously about writing an introduction to DynASM.

Some feedback I got from Mike suggested very simple optimizations for the BF
JIT that will let me catch (and exceed) the performance of bf2c. That might be
the next article in the series.

------
MaysonL
For a nice little JIT, see also maru –

<http://www.piumarta.com/software/maru/>

------
pacala
TL;DR DynASM is a toned down JIT compiler which directly generates target
assembly instead of a more abstract IL. Retargeting and optimizations will be
hard.

I'm looking forward the next installment, where (hopefully!) one generates
target independent code via LLVM, then uses one of the LLVM backends to
generate the final assembly code.

~~~
haberman
> Retargeting and optimizations will be hard.

What evidence do you have for this? I have evidence to the contrary: LuaJIT is
a one-man effort and yet is one of the fastest and most portable JITs around
(x86, x86-64, ARM, PowerPC, MIPS).

> I'm looking forward the next installment, where (hopefully!) one generates
> target independent code via LLVM, then uses one of the LLVM backends to
> generate the final assembly code.

LLVM is very cool, but it is an absolute mistake to think of it as obsoleting
all other JITs. LLVM uses an IR that is well-suited to some things but not
others. If LLVM fits your problem, great. But many problems it does not fit as
well -- just look at Unladen Swallow, and notice that none of the mainstream
JavaScript JITs use LLVM (not V8, not IonMonkey, not Nitro).

LLVM's design tightly couples an IR with a machine code generator. If you use
DynASM, you can write your own machine code generator that accepts whatever IR
is best suited to your problem.

~~~
pacala
I'm assuming that emitting "movzx edi, byte [PTR]" is using x86 as the target,
thus retargeting for ARM will likely require a complete rewrite of the
brainf#ck jit. In that sense retargeting is hard. But I may be wrong! I am
looking for a further article that shows how the brainf#ck jit can be
retargeted to ARM without a full rewrite of the jit code.

From the jit code that generates assembly tied to a specific architecture and
register allocation, plus the code generation process encoded as a
preprocessor step instead of a library I can only deduce that optimizations
aren't the focus of this work. But I may be wrong! Perhaps the preprocessor is
syntactic sugar over a library that build the code representation as a data
structure and there are ways to programmatically manipulate this data
structure to implement optimizations. Looking forward for a further article
with more details!

I'm not suggesting you necessarily use LLVM, but LLVM is the closest to a
assembly generator library I am aware of. To the best of my knowledge, you'd
have a harder time extracting the code generator of, for example, v8 as a
standalone library.

~~~
haberman
It is true that this approach requires a separate code generator for every
architecture. That is not the same as saying that "retargeting will be hard"
(which makes it sound like DynASM somehow gets in your way).

Yes, as I said before, if you have a problem that maps cleanly onto LLVM and
you don't mind the weight that LLVM brings along, by all means use it! But you
shouldn't think of LLVM as an "assembly generator library." That implies that
it is far more general-purpose than it actually is. DynASM is _actually_ an
assembly generator library. LLVM is an IR, a set of optimization passes for
that IR, and a set of target-specific code generators for that IR. The key
point is "for that IR."

DynASM is a tool that you can count on when no existing IR's like LLVM, .NET,
etc. fit your needs. It's a lower-level tool -- LLVM could conceivably use
DynASM to perform its own target-specific instruction encoding. DynASM is a
small, focused tool that does one thing and does it well. LLVM is more of a
toolbox that tries to get the 99% case right for its target audience. As a
result, it represents a lot more compromises and changes in more fundamental
ways over time (for example, it recently completely rewrote its register
allocator).

> Perhaps the preprocessor is syntactic sugar over a library that build the
> code representation as a data structure and there are ways to
> programmatically manipulate this data structure to implement optimizations.

No, definitely not. The idea is that you perform optimizations _before_ the
code generation step. I didn't do this in the article because these were just
simple "Hello, World" examples, but maybe I should write a follow-up article
that illustrates how optimization fits into this framework.

