
JITing PostgreSQL using LLVM [pdf] - grzm
http://anarazel.de/talks/fosdem-2018-02-03/jit.pdf
======
assface
AFAIK, there are only a handful of DBMSs that do complete query compilation
with the LLVM:

* MemSQL ([http://highscalability.com/blog/2016/9/7/code-generation-the...](http://highscalability.com/blog/2016/9/7/code-generation-the-inner-sanctum-of-database-performance.html))

* Tableau/TUM HyPer ([https://blog.acolyer.org/2016/05/23/efficiently-compiling-ef...](https://blog.acolyer.org/2016/05/23/efficiently-compiling-efficient-query-plans-for-modern-hardware/))

* CMU Peloton ([http://db.cs.cmu.edu/papers/2017/p1-menon.pdf](http://db.cs.cmu.edu/papers/2017/p1-menon.pdf))

* Vitesse ([https://www.youtube.com/watch?v=PEmVuYjhQFo](https://www.youtube.com/watch?v=PEmVuYjhQFo))

I think Greenplum was talking about doing this too, but that was about a year
ago ([http://engineering.pivotal.io/post/orca-
profiling/](http://engineering.pivotal.io/post/orca-profiling/))

Most systems just compile the predicates (Impala, SparkSQL).

A lot of companies are talking about adding this now. The performance gains
are significant.

~~~
nimish
Sql server does in memory compilation as well.

~~~
assface
AFAIK, SQL Server's Hekaton engine only compiles the queries inside of stored
procedures and UDFs. They do not compile every query that comes in from the
client:

[https://docs.microsoft.com/en-us/sql/relational-
databases/in...](https://docs.microsoft.com/en-us/sql/relational-databases/in-
memory-oltp/survey-of-initial-areas-in-in-memory-oltp)

------
dfox
For me this is great demonstration of how SSDs have changed the performance
landscape. Exactly this idea was discussed on P2D2 10 or so years ago and
generally dismissed as too much work for negligible performace gain because
the disk IO will still dominate. Today I believe that this makes sense.

Edit: and probably it is not only about SSDs, but also about cheap DRAM as
today in most deployments I know about most queries only touch indices and
tuples that are in RAM (and in fact often the whole pg_base is smaller on disk
than RAM of the server).

~~~
anarazel
I think even 10 years ago that argument was less often true than people might
have thought. A raid of a few rotational disks, with multiple controllers,
could easily deliver a few hundred MB/s back then too. And thus it was easy to
get CPU bottlenecked back then too.

The pg community often said so as well, particularly because it was not
commonly used in a major analytical capacity...

~~~
fulafel
Also, DRAM hasn't gotten that much cheaper in 10 years. In 2008 you could
already get reasonably priced commodity x86 servers with 128 GB RAM, so for
most DB applications you could keep all your working set cached and only worry
about writes on your storage layer.

[https://blogs-
images.forbes.com/jimhandy/files/2011/12/DRAM-...](https://blogs-
images.forbes.com/jimhandy/files/2011/12/DRAM-GB-Price.jpg)

[http://thememoryguy.com/wp-
content/uploads/2016/05/2016-05-0...](http://thememoryguy.com/wp-
content/uploads/2016/05/2016-05-06-Blog-Picture.jpg)

[https://en.wikipedia.org/wiki/List_of_Dell_PowerEdge_Servers...](https://en.wikipedia.org/wiki/List_of_Dell_PowerEdge_Servers#Generation_10)

------
elvinyung
The most interesting part I found about this is that the decision to do JIT
compilation is part of the cost-based optimization process. It's a nice way to
make PG versatile enough to be performant at both OLTP and OLAP queries.

~~~
anarazel
Thanks. Honestly, I went there primarily because it is trivial to implement. I
guess it might be one of the "laziness is good" cases ;)

I think it's good enough to do so on a per query basis in the first version,
but after that it will probably need to become more sophisticated.

------
rurban
What I learned from it:

* You really need to add a C++ compiler to your configure.ac. I see some tricks coming up with clang++ compiling the llvm gluework to bitcode and shipping this, but I fear it will be too big. And then you need clang.

* ORC (lazy compilation is back) can finally do again what the legacy jit could do: native jit on demand with a stub, into modules and adding functions to modules on the fly.

But the new llvm stuff is exciting: saving the old headers as bitcode headers,
merging and doing expensive inline optimizations with these in the background.
Remember, the legacy jit was gone when it couldn't support cross-jitting to
foreign architectures needed for lldb. OCR jit can now do what MCJIT did,
cross-arch jitting, and it got lazy compilation back. The module abstraction
with it's resolver quirks are still there, but it's still just a simple
interface to the compiler and linker lib. It's still extremely awkward to use,
only via the C++ interface, as you have to mixin all the compiler, linker,
resolver classes with lambda's for your wanted behavior. But at least it's
functional again.

~~~
anarazel
> * You really need to add a C++ compiler to your configure.ac.

Indeed, I initially tried very hard not doing so, but it turned out to be
infeasible.

> I see some tricks coming up with clang++ compiling the llvm gluework to
> bitcode and shipping this, but I fear it will be too big. And then you need
> clang.

Not sure I understand what you're proposing here? You mean to avoid needing a
c++ compiler? That'd not work, as the generated bitcode is not architecture
independent.

If you instead mean that the installed version will contain bitcode of its own
source, yes, that's the plan to facilitate inlining (& specialization). It's
not _that_ big, and can be located in a separate package.

> * ORC (lazy compilation is back) can finally do again what the legacy jit
> could do: native jit on demand with a stub, into modules and adding
> functions to modules on the fly.

I use orcjit, but its lazy stuff isn't particularly interesting for my
usecase. Lazily JITing is done a layer above LLVM. There's already one pointer
indirection anyway, adding another indirection via a stub isn't useful...

> It's still extremely awkward to use, only via the C++ interface, as you have
> to mixin all the compiler, linker, resolver classes with lambda's for your
> wanted behavior. But at least it's functional again.

Yea, I'm not a big fan of the ORC APIs. Additionally natively there still
isn't any debugger / profiler integration - I don't quite know how people are
using it without those...

~~~
rurban
> If you instead mean that the installed version will contain bitcode of its
> own source, yes, that's the plan to facilitate inlining (& specialization).
> It's not that big, and can be located in a separate package.

Yes, that was my idea.

> Additionally natively there still isn't any debugger / profiler integration
> - I don't quite know how people are using it without those...

That's the best thing about this postgresql llvmjit project. He added nice
patches for debugger and profiler integration. This is needed locally only for
the devs, so not a blocker.

~~~
anarazel
> He added nice patches for debugger and profiler integration.

He is me ;)

------
AlphaSite
I wonder how well the SQL/query tree model fits into Graals JITting system. It
feels like they should map near Y perfectly in theory.

~~~
ksec
Holy Moly, that is an Interesting thought!

------
Tostino
I love reading the threads on -hackers with all of this development, it's
really amazing stuff. I hope I get to see it make it into core soon!

~~~
grzm
For those curious, here's (one of?) the threads 'Tostino is referring to:

[https://www.postgresql.org/message-
id/flat/CA%2BTgmoYqvQAYLt...](https://www.postgresql.org/message-
id/flat/CA%2BTgmoYqvQAYLtHo96wbG7-HPjYc-
yy2ws%2BjVB8SBwV1%3D%3D6L9g%40mail.gmail.com#CA+TgmoYqvQAYLtHo96wbG7-HPjYc-
yy2ws+jVB8SBwV1==6L9g@mail.gmail.com)

------
MaxBarraclough
Great stuff!

It's interesting to see LLVM taking over the world not only in ahead-of-time
compilation, but also in JIT.

There are other JIT compiler frameworks out there, like GNU Lightning, but
only LLVM seems to have any traction.

~~~
anarazel
> There are other JIT compiler frameworks out there, like GNU Lightning, but
> only LLVM seems to have any traction.

I looked at a few libraries before deciding on LLVM. Lightning isn't that
interesting for postgres' use case for a few reasons. The biggest issue is
that it can really be used to implement inlining of operators defined in C -
there's no equivalent of LLVM bitcode generated from C that can then be
inlined. Secondarily, it doesn't include, afaict, much of an optimizer. That's
not great for postgres' usecase.

The inlining issue imo makes it really hard to compete with LLVM, even though
there's quite some space for a code generator and optimizer much more tuned
towards emission speed.

------
dis-sys
two great projects combining their magical power together, just amazing!

------
barrkel
Cloudera Impala also uses LLVM to JIT chunks of the query plan into machine
code.

~~~
CandidlyFake
Don't all DB servers "JIT" and then cache query plan?

~~~
riku_iki
Executing machine code still can be more performant than interpreting cached
execution plan.

