
NativeJIT – JIT compilation of expressions involving C data structures - AnbeSivam
https://github.com/bitfunnel/nativejit/
======
stormbrew
Title seems incorrect, or at least misleading. It is not a JIT for C++ code
(ie. not "C++ to..."), it is a JIT library for C++ that seems focused on
compiling arithmetic algorithms as used in bing.

Interestingly, it requires the functions called by the JIT'd code to be side-
effect free, since it guarantees it will call any given function invocation
_at least once_ , since it evaluates both sides of any but top level branches.
See "Design Notes and Warnings" in
[https://github.com/BitFunnel/NativeJIT/blob/master/Documenta...](https://github.com/BitFunnel/NativeJIT/blob/master/Documentation/README.md)

~~~
AstralStorm
The real question is why would you use it instead of mature LLVM.

~~~
stormbrew
LLVM is a fairly heavyweight tool to use for simplistic or really frequent JIT
use-cases. There is definitely room for smaller and simpler tools for that
purpose, imo. Like, I'm pretty sure LuaJIT runs rings around LLVM in terms of
how quickly it generates code, but at the cost of doing considerably less
optimization. This is sometimes a useful tradeoff.

~~~
RossBencina
"but at the cost of doing considerably less optimization"

That's arguable. Ravi[1] is a Lua implemented with an LLVM-based JIT. The
benchmarks here[2] compare runtime performance of Ravi vs. LuaJIT. The Ravi
timings don't include compilation time, the LuaJIT timings do. Sometimes
LuaJIT siginficantly outperforms Ravi, sometimes it's on par, sometimes worse.
But the results don't suggest that there is less (effective) optimisation
going on.

[1]
[https://github.com/dibyendumajumdar/ravi](https://github.com/dibyendumajumdar/ravi)

[2] [http://the-ravi-programming-
language.readthedocs.io/en/lates...](http://the-ravi-programming-
language.readthedocs.io/en/latest/ravi-benchmarks.html)

~~~
pcwalton
The parent post is correct. LLVM's optimizations are low-level optimizations
tuned for C and C++. By contrast, the optimizations you want to do for dynamic
languages (and high-level languages in general) are high-level optimizations
that are better done on language-specific IRs. Moreover, dynamic languages
need dynamic optimizations such as polymorphic inline caching, speculation and
bailouts with on stack replacement, etc.

Optimization pipelines are invariably language-specific. If you were to
somehow make LuaJIT into a C/C++ compiler without doing any work on its
optimizer, you would get similarly poor results.

~~~
ihnorton
> If you were to somehow make LuaJIT into a C/C++ compiler

Clang -> Sulong -> Java bytecode -> Luje [1]? May need more turtles :)

Technically, one reason -of several- this probably wouldn't work without
significant effort is that Sulong relies on Graal's FFI intrinsics. But those
should be convertible to LuaJIT FFI calls (in theory).

[1] [https://cowlark.com/luje/](https://cowlark.com/luje/)

------
paulasmuth
What I find amazing about NativeJIT is how small the codebase seems to be --
under 10k SLOC. Impressive!

Will seriously consider using this to speed up expression execution in EventQL
[0] (we shied away from llvm so far because it's such a massive dependency).

[0] [https://eventql.io/](https://eventql.io/)

------
bedatadriven
This is similar to a technique used in Renjin to compile specific vector
computation expressions down to straight-line JVM bytecode, which is then
JITed down to machine code. It's a very powerful technique because it removes
all the indirection involved in a evaluating dynamic expressions and lets the
processor just do its job.

For example, if you evaluate sum(sqrt(x^2 + y^2) * 3) in Renjin, and x or y
happen to be very long vectors, then we'll jit out a JVM class for this
specific expression that would look something like this in Java:

    
    
      class JittedComputation1E5374A3 {
        SEXP compute(SEXP[] args) {
          double[] x = args[0].toDoubleArrayUnsafe();
          double[] y = args[1].toDoubleArrayUnsafe();
          double sum = 0;
          for(int i=0;i<x.length;++i) {
            double xi = x[i];
            double yi = y[i]
            sum += Math.sqrt(xi*xi+yi*yi) * 3;
          }
          return DoubleVector.valueOf(sum);
        }
      }
    

The computation is specialized to the types of x and y, so if for example x is
a sequence 1:1000000 then a new class gets written for that doesn't even use
an array for x.

The speedup is so impressive that even if you don't cache the compiled
expression you see dramatic improvements:
[http://www.renjin.org/blog/2015-06-28-renjin-at-
rsummit-2015...](http://www.renjin.org/blog/2015-06-28-renjin-at-
rsummit-2015.html)

------
AnbeSivam
Related -

[http://bitfunnel.org/debugging-nativejit/](http://bitfunnel.org/debugging-
nativejit/)

[https://twitter.com/danluu/status/771622870132809729](https://twitter.com/danluu/status/771622870132809729)

------
deno
Better link: [http://bitfunnel.org/getting-started-with-
nativejit/](http://bitfunnel.org/getting-started-with-nativejit/)

------
willvarfar
I know it can be sandboxed, but it makes the hair on the back of my neck stand
up to think that something people type into a search box on a web page gets
turned into machine code and executed.. :)

~~~
mkup
It's no big deal as long as JIT compiled code is restricted to the safe subset
of instructions. For example, in FreeBSD BPFs (berkeley packet filters) are
JIT compiled and directly executed in the kernel mode.

------
eDameXxX
More information: [http://bitfunnel.org](http://bitfunnel.org)

------
samfisher83
Can't you use function overloading? or function pointers? Why do you need to
recompile the code? Can someone give an example of where this should be used?

~~~
bedatadriven
This is precisely a strategy for optimizing away the cost of function calls
and other indirection.

You basically really want this in any case where you're applying a dynamic
expression to a large dataset.

We use the technique in Renjin, an R interpreter built on the JVM, for
vectorized expressions such as sqrt(x^2+y^2) where x and y are long arrays.

You can implement this with function pointers, or interfaces in Java, but if
you're evaluating the expression millions of times, then the cost of the
indirection is huge, and worse yet, it gets in the way of the processor's
branch prediction and pipelining.

If you can compile the same dynamic expression down to straightline machine
code, the impact is pretty fantastic:
[http://www.renjin.org/blog/2015-06-28-renjin-at-
rsummit-2015...](http://www.renjin.org/blog/2015-06-28-renjin-at-
rsummit-2015.html)

------
SCHiM
Looks very similar to
[https://github.com/asmjit/asmjit](https://github.com/asmjit/asmjit)

------
zokier
So would it make sense to use this as the engine for something akin to jq or
xsv?

------
EgoIncarnate
The title is wrong. This is not a C++ to x86 JIT (which would be really cool),
this is a JIT library for C++ (of which there are a number). It has it's own
domain specific language for expressions. It doesn't take arbitrary C++. No
std C++ syntax, no classes, etc.

Ex:

    
    
        // nativeJIT DSL
        auto & area = expression.Mul(rsquared, expression.Immediate(PI));
        auto function = expression.Compile(area);
        // C++
        auto area = rsquared * PI;

~~~
RossBencina
"this is a JIT library for C++ (of which there are a number)"

Out of interest, can you recommend any others that are actively maintained?

