
PostgreSQL Begins Adding LLVM JIT Support For Faster Performance - manigandham
https://www.phoronix.com/scan.php?page=news_item&px=PostgreSQL-LLVM-JIT-Landing
======
anarazel
This actually has been fully merged since (JIT compiled expressions
(projections, aggregates, etc), tuple deforming, operator implementation
inlining). There's a lot of further work to do however (caching, non-crappy
code for expressions, better planner logic).

~~~
anarazel
Link to the (development version) docs
[https://www.postgresql.org/docs/devel/static/jit.html](https://www.postgresql.org/docs/devel/static/jit.html)

------
tejasmanohar
This is exciting. That said, it's important to note that this is unlikely to
help much with most OLTP queries just looking up information quickly via
indexed data. Where this could really shine is CPU-bound queries that are
performing complex operations, where the SQL interpreter is actually a
bottleneck.

I wonder if AWS will incorporate these improvements into Redshift, which is
Postgres forked for OLAP use cases where this is far more prevalent. In the
past, they haven't done a great job at this (ran into a float bug fixed
upstream 5+ years ago), but this may be a big enough value proposition to
justify the effort.

~~~
ankrgyl
> I wonder if AWS will incorporate these improvements into Redshift, which is
> Postgres forked for OLAP use cases where this is far more prevalent. In the
> past, they haven't done a great job at this (ran into a float bug fixed
> upstream 5+ years ago), but this may be a big enough value proposition to
> justify the effort.

Redshift is forked from ParAccel, which was originally forked from Postgres
several years ago. ParAccel has had code generation (via transpilation to C)
since long before Amazon even purchased its source code. In other words,
Redshift has had this feature since its inception, albeit via C-transpilation
(compiled with GCC) not LLVM codegen.

In addition, Redshift's primary index is a column store which is particularly
well-suited to code generation (you can inline and even vectorize expressions
across rows, without hopping pointers in between rows). Postgres does not yet
include a column store, but this piece of infrastructure is a big step in the
right direction!

~~~
tejasmanohar
Oh, neat! I hadn't heard of ParAccel. Suppose I'm missing some history here.

------
CurtHagenlocher
Anyone who's interested in this (and related) DBMS topic would be well served
to check out the videos of Andy Pavlo's "Advanced Database Systems" course at
CMU:
[https://www.youtube.com/watch?v=poEfLYH9W2M&list=PLSE8ODhjZX...](https://www.youtube.com/watch?v=poEfLYH9W2M&list=PLSE8ODhjZXjYplQRUlrgQKwIAV3es0U6t)

------
gigatexal
For what it’s worth MSSQL has been doing this for years. What you get in
return is the chance to debug parameter sniffing and stale plans.

~~~
barrkel
Databases have been generating machine code for things like sorts since the
60s. JIT code for queries is a very old approach.

Databases are a lot more interesting than they appear on the outside. They
combine the disciplines of compilers (parsing, typing, optimization, codegen),
operating systems (caching and memory management, concurrency, file systems)
and distributed computing (replication, distributed transactions) all in one
project.

------
weberc2
This is really cool. I've often wondered why queries weren't be compiled to
native code to remove interpreter overhead. I imagine that overhead must not
have been particularly low hanging fruit until recently.

~~~
jandrewrogers
Actually, the performance benefits can be huge (design dependent) and have
been for a long time. The reasons have more to do with complexity and
difficulty of implementation. The query compilation and execution engine must
preserve the operational robustness, integrity, and behavior of the database
engine, which is not a trivial design and implementation problem. The number
of things that can go horribly wrong increase significantly versus an
interpreter.

JIT query compilers were relatively common in closed source databases even a
decade ago, it was one of their technical advantages versus open source. This
is a necessary step toward closing that gap. There is a similar gap when it
comes to storage/execution schedulers that is currently lacking in open source
implementations which has a similar level of impact.

~~~
barrkel
As an example of something that can go wrong: Apache Impala uses LLVM to JIT
queries. Turns out some code optimization algorithms are O(n^3) (or worse) or
so, where n is small enough not to matter too much with human code, but falls
apart with a few thousand columns in a single table, not super unusual with
analytics in an environment with expensive joins like Hadoop.

------
nerdymanchild
Does this actually improve the performance of most queries? Most queries are
light on computation and heavy in IO. Seems kind of like a waste of effort but
maybe there are people with very complex / compute-heavy queries.

~~~
anarazel
> Does this actually improve the performance of most queries?

"most queries" \- probably not. By sheer number that's going to be OLTP
queries, and there it doesn't help. You need analytics queries that take
upwards of 0.2s or such to benefit.

> Most queries are light on computation and heavy in IO.

That's something often said. I think it's definitely wrong today and at least
has been for a while. In a lot of workload a good chunk of the hot data set is
in memory, and even a single decent SSD can often more saturate a single core.

If you look at analytics benchmarks and real world analytics usage, you'll
often see CPU being the bottleneck. Using multiple cores can alleviate that to
some degree, but that can imply a need for a bigger hardware / less
concurrency. And doesn't come for free. Efficiency is important.

------
riku_iki
Hope this will happen to plpgsql one day..

------
khhh35575544
Very odd use of the word Landing.

~~~
saagarjha
"Landing a feature" is common way of phrasing that a feature has been added.
For example, one might say that "C++17 support has landed in master",
referring to the master branch for LLVM.

