
JIT native code generation for TensorFlow graphs using Python and LLVM - perone
http://blog.christianperone.com/2016/08/jit-native-code-generation-for-tensorflow-computation-graphs-using-python-and-llvm/
======
jklontz
Going to take this opportunity to plug my related project
[Likely](www.liblikely.org), a DSL for lowering machine learning inference
algorithms.

One of the projects we've built on top of it is a static compiler for Caffe
model files. This allows you to execute Caffe models _without_ a runtime
dependency on Caffe libraries. Thus you can target OSes and CPUs not supported
by mainline Caffe. If you have commercial interest in this capability please
reach out to me.

~~~
kersny
Very interesting, I recently put together a simple caffe static compiler of my
own on the path to FPGA deployment of CNN inference pipelines. I looked at
halide[0] as a possible intermediate representation for CPU, GPU, and FPGA,
(as a step above LLVM IR), but Likely seems like an interesting option.

[0]: [http://halide-lang.org/](http://halide-lang.org/)

~~~
jklontz
Targeting FPGAs is something we've been keeping our eye on as well. Most of
the pieces are in place. I think the last non-trivial part is removing any
malloc()/free() calls. This should be possible as the size and lifetime of
memory allocations are known at compile time and can be either moved to the
stack or made global as desired.

One of the reasons I opted against Halide myself was the feeling that for this
domain there ought to be enough information available to the compiler to
intelligently pick tiling and vectorization parameters. For example, using
Polly [0]. However in practice, manual tiling and vectorization is hard to
beat.

[0] [http://polly.llvm.org/](http://polly.llvm.org/)

------
jo_
Forgive my ignorance, but it seems like this is just attempting to take
advantage of the optimization done by LLVM, yes?

What I would love is a simple way of writing standalone functions that compile
into a cross-platform LLVM file that I can call from a variety of other
languages on a variety of other systems. In particular, if I train a recurrent
network on text data for a chat bot, I want to be able to use that LLVM file +
model in a game I release for the PC and for Android without worrying about
the NDK/gcc/clang/Windows/OSX build nightmare. The ability to easily and
quickly define a model in TensorFlow, write a Python function that takes an
array of data, and spits out an array of data would be incredible and would
mean that all the work I'm doing for a native Rust library is unneeded.

Admittedly, with Bazel I could create a C++ wrapper for the function which
loads the library. It's just... that produces a 150mb shared library with all
the dependencies and it's also a pain in the ass.

~~~
perone
This is actually easy to do, you just need to generate the IR and then merge
them into a Module, after that you apply passes over the entire module to
optimize, to do function inlining, etc.

------
zump
Cool idea, but is this of any benefit?

Isn't this essentially what TensorFlow does internally, except it inserts CUDA
primitives at the right positions...

~~~
vrv
TensorFlow doesn't yet do loop fusion (though I believe the specific example
shown in that article may already be done via constant folding). But if you
have a bunch of elementwise operations, JIT-techniques can reduce the number
of memory passes over a buffer. If your model is already very computationally
dense (time dominated by matmul or convolution), then this won't help as much,
but otherwise, JIT techniques can help.

------
jakozaur
Maybe P2P Tensflow as a service would be a neat idea?

E.g. I have data, TensorFlow model anybody can bid who can do quickly the
cheapest compute power. E.g. EC2 spot, Google Cloud preemptive or some NVIDIA
CUDA spare computer.

~~~
asimuvPR
I decided to build an obscenely overpowered desktop machine because I can't
get myself to trust 3rd parties to run all the tensorflow code I have. For
some reason I can't trust this code in the same way I trust my server side
code running on multiple servers across the world.

~~~
woah
Are you worried about them stealing your code or something?

~~~
asimuvPR
Not the code but the data being leaked, stolen by third parties (I assume all
servers are powned), or damaged due to a botched os install. I prefer the
peace of mind.

------
eva1984
Cool project. But since TensorFlow works best with built-in proprietary
backend like cudnn, what role will LLVM play here?

------
denfromufa
can't we use numba directly, not llvmlite?

~~~
Fede_V
numba works on python code, not on dags generated by tensorflow.

In practice, what numba does is turn the python code into llvm types, and then
compile those with LLVM. What the OP is doing is turning tensorflow dags into
llvm types, and then compiling those with LLVM. You can look at numba and the
OP's project both as front ends to LLVM.

------
bbsome
I'm not sure that LLVM is the correct way to go about this. Don't get wrong,
it can be used, but most of the work of the frameworks work on very large
tensors/multi-arrays. As such optimization of the computation graph for such
arrays, although very similar to standard optimization, has also and some
significant differences. I do believe, however, that all frameworks should
start using the same graph IR representation and optimize procedure, with
potentially having different back ends based on hardware and different front
ends based on language. I in fact tried to achieve this some time ago, and is
still in progress, but lately have no time to work on it. Still the post is
really great.

