* MemSQL (http://highscalability.com/blog/2016/9/7/code-generation-the...)
* Tableau/TUM HyPer (https://blog.acolyer.org/2016/05/23/efficiently-compiling-ef...)
* CMU Peloton (http://db.cs.cmu.edu/papers/2017/p1-menon.pdf)
* Vitesse (https://www.youtube.com/watch?v=PEmVuYjhQFo)
I think Greenplum was talking about doing this too, but that was about a year ago (http://engineering.pivotal.io/post/orca-profiling/)
Most systems just compile the predicates (Impala, SparkSQL).
A lot of companies are talking about adding this now. The performance gains are significant.
Compiling everything is questionable. There's not much point re-compiling runtime code or code outside of the hot path, it's expensive and doesn't bring any benefit.
E.g. things that aren't beneficial to compile per query include:
* Loops over a column of the same datatype, with no query specific branches (e.g. decoding a column of integers)
* Other static code, e.g. some hash table operations
* Outer loops that don't execute frequently
* Rarely executed code, e.g. error handling.
There are two general designs that let you compile only the necessary things. 1) the runtime calls into compiled code for hot loops vs 2) the compiled code drives the query and calls into the runtime. A lot of the systems you mentioned do the second, but Impala does the first, which seems to be the source of some misunderstanding. Also there were some cases where hot loops in earlier versions of Impala weren't compiled, but that's changed - generally we try to ensure that all hot loops are compiled.
I think generally the optimal design wouldn't be complete query compilation, but rather something more like a traditional JIT that selectively compiles parts of the query.
Source: work on Impala's query compilation
Edit: down votes... hah
* HyperDex (http://hyperdex.org/)
* HyperGraphDB (http://www.hypergraphdb.org/)
* HyperSQL (http://hsqldb.org/)
* Hypertable (http://hypertable.org/)
"The commercial license for HyPer, a spin-off of TUM, has been acquired by Tableau Software"
Edit: and probably it is not only about SSDs, but also about cheap DRAM as today in most deployments I know about most queries only touch indices and tuples that are in RAM (and in fact often the whole pg_base is smaller on disk than RAM of the server).
The pg community often said so as well, particularly because it was not commonly used in a major analytical capacity...
I think it's good enough to do so on a per query basis in the first version, but after that it will probably need to become more sophisticated.
* You really need to add a C++ compiler to your configure.ac. I see some tricks coming up with clang++ compiling the llvm gluework to bitcode and shipping this, but I fear it will be too big. And then you need clang.
* ORC (lazy compilation is back) can finally do again what the legacy jit could do: native jit on demand with a stub, into modules and adding functions to modules on the fly.
But the new llvm stuff is exciting: saving the old headers as bitcode headers, merging and doing expensive inline optimizations with these in the background. Remember, the legacy jit was gone when it couldn't support cross-jitting to foreign architectures needed for lldb. OCR jit can now do what MCJIT did, cross-arch jitting, and it got lazy compilation back. The module abstraction with it's resolver quirks are still there, but it's still just a simple interface to the compiler and linker lib. It's still extremely awkward to use, only via the C++ interface, as you have to mixin all the compiler, linker, resolver classes with lambda's for your wanted behavior. But at least it's functional again.
Indeed, I initially tried very hard not doing so, but it turned out to be infeasible.
> I see some tricks coming up with clang++ compiling the llvm gluework to bitcode and shipping this, but I fear it will be too big. And then you need clang.
Not sure I understand what you're proposing here? You mean to avoid needing a c++ compiler? That'd not work, as the generated bitcode is not architecture independent.
If you instead mean that the installed version will contain bitcode of its own source, yes, that's the plan to facilitate inlining (& specialization). It's not that big, and can be located in a separate package.
> * ORC (lazy compilation is back) can finally do again what the legacy jit could do: native jit on demand with a stub, into modules and adding functions to modules on the fly.
I use orcjit, but its lazy stuff isn't particularly interesting for my usecase. Lazily JITing is done a layer above LLVM. There's already one pointer indirection anyway, adding another indirection via a stub isn't useful...
> It's still extremely awkward to use, only via the C++ interface, as you have to mixin all the compiler, linker, resolver classes with lambda's for your wanted behavior. But at least it's functional again.
Yea, I'm not a big fan of the ORC APIs. Additionally natively there still isn't any debugger / profiler integration - I don't quite know how people are using it without those...
Yes, that was my idea.
> Additionally natively there still isn't any debugger / profiler integration - I don't quite know how people are using it without those...
That's the best thing about this postgresql llvmjit project. He added nice patches for debugger and profiler integration. This is needed locally only for the devs, so not a blocker.
He is me ;)
It's interesting to see LLVM taking over the world not only in ahead-of-time compilation, but also in JIT.
There are other JIT compiler frameworks out there, like GNU Lightning, but only LLVM seems to have any traction.
I looked at a few libraries before deciding on LLVM. Lightning isn't that interesting for postgres' use case for a few reasons. The biggest issue is that it can really be used to implement inlining of operators defined in C - there's no equivalent of LLVM bitcode generated from C that can then be inlined. Secondarily, it doesn't include, afaict, much of an optimizer. That's not great for postgres' usecase.
The inlining issue imo makes it really hard to compete with LLVM, even though there's quite some space for a code generator and optimizer much more tuned towards emission speed.
Really excited to see it makes its way into Postgres.