
I wrote an LLVM-powered trace-based JIT for Brainfuck - Halienja
http://github.com/resistor/BrainFTracing
======
resistor
Hey folks, I'm the actual author of this.

I actually work on LLVM-proper during my day job. This was just a fun exercise
to demonstrate that it was possible. I also have plans to write a tutorial
based on it.

~~~
resistor
Also an example of how to implement a direct-threaded interpreter.

Some performance data from a Brainfuck mandelbrot benchmark.

Interpreter: 37.787s

Tracing JIT: 11.716s

Static Compiler: 2.402s

The tracing JIT loses out to the static compiler largely because there's no
dynamic dispatch in Brainfuck for the tracer to optimize out. There's probably
some performance to be recovered by tuning the tracer thresholds and minor
optimizations, but I would be shocked if it ever beat the static compiler at
least for Brainfuck.

~~~
samps
Thanks for writing this -- it's awesome to see JIT principles boiled down to
the point where you can easily understand the whole system. Please let us know
if you publish the tutorial; I'd love to see more detail on the JIT. In
particular, it would be the perfect template to demonstrate feedback-directed
optimization opportunities and to measure the overhead of tracing; it would be
incredibly interesting to see what has to be done to make the JIT outperform
the AOT compiler.

~~~
resistor
The tracing overheads are pretty huge. Running with tracing but without
compilation takes 107s.

~~~
mikemike
The performance problems originate in the design of your trace compiler, not
in static vs. dynamic dispatch. Some suggestions:

* The interpreter should have a fast profiling mode (hashed counting of loop backedges) and a slower recording mode (for every instruction call the recorder first, then execute the instruction). Either implement it twice (it's small enough), use a modifiable dispatch table and intercept the dispatch in recording mode (indirect threading), or compute branch offsets relative to a base (change the base to switch modes).

* Don't record traces for a long time and then compile everything together. Do it incrementally:

\- Detect a hot loop, switch into recording mode, record a trace, compile it,
attach it to the bytecode, switch to profiling mode (which may call your
compiled trace right away).

\- Make the side exits branch to external stubs which do more profiling (one
counter per exit). Start recording hot traces and continue until it hits an
existing trace or abort if it hits an uncompiled loop.

\- If you completely control the machine code generation (i.e. not with LLVM),
you can attach the side traces to their branch points by patching the machine
code. Otherwise you may need to use indirections or recompile clusters of the
graph after a certain threshold is reached.

\- Region selection has a major impact on performance, so be prepared to
carefully tune the heuristics.

* Sink all stores, especially updates of the virtual PC, data pointers etc. Don't count on the optimizer to do this for you.

* Due to the nature of the source language you may need to preprocess the IR or you need to teach the optimizer some additional tricks.

\- E.g. the loop [-] should really be turned into 'data[0] = 0'.

\- Or the loop [->>>+<<<] should be turned into 'data[3] += data[0]; data[0] =
0'.

\- It's unlikely any optimizer handles all of these cases, since no sane
programmer would write such code ... oh, wait. :-)

~~~
resistor
> * The interpreter should have a fast profiling mode (hashed counting of loop
> backedges) and a slower recording mode (for every instruction call the
> recorder first, then execute the instruction).

It already does this. The recording method is specialized for '[' (since loop
headers can only be '['). All other opcodes go through a fast path that simply
checks if we're in recording mode and stores to the trace buffer.

> * Don't record traces for a long time and then compile everything together.

The tricky part with this is knowing how to start up the profiler when we hit
a side-exit. PC 123 may occur at multiple places in the trace tree. If we want
to extend the tree on side-exit, we need to be able to recreate the path
through the trace tree that led to that point. In essence, we need the
compiled trace to continue updating the trace buffer. Certainly possible, but
doesn't seem like a great idea offhand.

> * Sink all stores, especially updates of the virtual PC, data pointers etc.
> Don't count on the optimizer to do this for you.

Because I'm using tail-call based direct threading, there are no stores to the
virtual PC or the data pointer. They're passed in registers to the tail-
callee.

> * Due to the nature of the source language you may need to preprocess the IR
> or you need to teach the optimizer some additional tricks.

Yes, there's a whole range of pre-processing tricks that could be used to
accelerate both the interpreter and the traces. I haven't even scratched the
surface of that.

------
danieldk
Nice work!

Let me make a tiny plug for a short Sunday project as well... Brainf*ck in
Prolog:

<http://github.com/danieldk/brainfuck-pl/>

One nice thing is that unit testing is really simple:

[http://github.com/danieldk/brainfuck-
pl/blob/master/unittest...](http://github.com/danieldk/brainfuck-
pl/blob/master/unittests.pl)

And for some very trivial outputs, it can generate the program to create that
output.

    
    
      ?- brainfuck:interpret([A,B],[],[0],[0],[1,0]).
      A = <,
      B = + ?
    

Ps. Yes, it's easy to improve generation...

------
mathgladiator
Is anyone else oddly inspired to make an OCaml to Brainf __k translator just
to build a staggeringly awesome rube goldberg machine?

~~~
koenigdavidmj
I can not find it, but I have seen a C to brainfuck compiler. Don't ask.

~~~
RodgerTheGreat
Here's the best reference page for the project:

<http://esolangs.org/wiki/C2BF>

------
VMG
Nice work - can you give us some data on how it is?

~~~
VMG
(how _fast_ it is of course)

------
udzinari
I wish I had free time too! brainfuck is boring though.. why not some stack
based language with lisp like syntax or something like that.

------
davidw
I don't know... "neat hack", but it seems there is so much out there that
could actually have some kind of practical application that it's a bit of a
waste to work on "silly" projects. I love to hack on things that don't have
any immediately evident business model or real world application, but I think
purposefully working on something that never will is perhaps a bit
unfortunate. Yeah, he learned something for sure, but that's pretty much all
it can be.

To expand on that: if he'd written his own toy language, say, odds are it
would never go anywhere, but, who knows... maybe it will find a niche. Using
"brainfuck" pretty much guarantees that the code will never find a practical
use.

~~~
vox
I'm assuming you've been downvoted because slightly more than 50% of HNers
think of this project as an artistic/fun project.

But the fact is, even a purely artistic/fun project will have some creativity
or originality in it. I would consider a toy language or Brainf__k written for
the first time as artistic.

But this project is just a JIT for Brainf__k, there's no creativity in it, and
all it did was give the author some experience writing JITs. In that sense
this is an exercise project, and IMO exercise projects do not belong to HN.

~~~
StavrosK
Does HN censor the "fuck" in "Brainfuck", or was it just you?

EDIT: Ah, it doesn't.

~~~
steveklabnik
Generally, as long as it's part of something constructive and adds rather than
detracts from the message, the community won't downvote profanity.
<http://news.ycombinator.com/item?id=1636262>

