One of the projects we've built on top of it is a static compiler for Caffe model files. This allows you to execute Caffe models _without_ a runtime dependency on Caffe libraries. Thus you can target OSes and CPUs not supported by mainline Caffe. If you have commercial interest in this capability please reach out to me.
One of the reasons I opted against Halide myself was the feeling that for this domain there ought to be enough information available to the compiler to intelligently pick tiling and vectorization parameters. For example, using Polly . However in practice, manual tiling and vectorization is hard to beat.
What I would love is a simple way of writing standalone functions that compile into a cross-platform LLVM file that I can call from a variety of other languages on a variety of other systems. In particular, if I train a recurrent network on text data for a chat bot, I want to be able to use that LLVM file + model in a game I release for the PC and for Android without worrying about the NDK/gcc/clang/Windows/OSX build nightmare. The ability to easily and quickly define a model in TensorFlow, write a Python function that takes an array of data, and spits out an array of data would be incredible and would mean that all the work I'm doing for a native Rust library is unneeded.
Admittedly, with Bazel I could create a C++ wrapper for the function which loads the library. It's just... that produces a 150mb shared library with all the dependencies and it's also a pain in the ass.
Isn't this essentially what TensorFlow does internally, except it inserts CUDA primitives at the right positions...
"We also have a number of concrete directions to improve
the performance of TensorFlow. One such direction
is our initial work on a just-in-time compiler that
can take a subgraph of a TensorFlow execution, perhaps
with some runtime profiling information about the typical
sizes and shapes of tensors, and can generate an optimized
routine for this subgraph. This compiler will understand
the semantics of perform a number of optimizations
such as loop fusion, blocking and tiling for locality,
specialization for particular shapes and sizes, etc."
E.g. I have data, TensorFlow model anybody can bid who can do quickly the cheapest compute power. E.g. EC2 spot, Google Cloud preemptive or some NVIDIA CUDA spare computer.
In practice, what numba does is turn the python code into llvm types, and then compile those with LLVM. What the OP is doing is turning tensorflow dags into llvm types, and then compiling those with LLVM. You can look at numba and the OP's project both as front ends to LLVM.
llvmlite is an Python interface to LLVM
So, JIT compilation for TensorFlow graphs would need llvmlite as TensorFlow graphs aren't Python.