
Faster hash joiner with vectorized execution - jordanlewis
https://www.cockroachlabs.com/blog/vectorized-hash-joiner/
======
angelachang27
Hey everyone, I'm the intern who did this work, happy to answer any questions
if you have them!

~~~
fwip
Incredible work! 40x speedup is ludicrously good.

~~~
angelachang27
Thank you!

------
obl
The "core" of the trick is nice : amortizing interpreter dispatch over many
items. (ignoring the column layout/SIMD stuff which basically helps in any
case)

Essentially it's turning :

    
    
      LOAD
        DISPATCH
        OP1
        DISPATCH
        OP2
        ... (once per operation in the expression)
      STORE
      ... (once per row)
    

into

    
    
      DISPATCH
        LOAD
        OP1
        STORE
        LOAD
        OP1
        STORE
        ... (once per row)
      DISPATCH
      ... (once per operation in the expression)
    

The nice trade-off here is that you don't require code generation to do that,
but it's still not optimal.

If you can generate code it's even better to fuse the operations, to get
something like :

    
    
      LOAD
        OP1
        OP2
        ...
      STORE
      LOAD
      ...
    

It helps because even though you can tune your batch size to get mostly cached
loads and stores, it's still not free.

For example on Haswell you can only issue 1 store per cycle, so if OP is a
single add you're leaving up to 3/4 of your theoretical ALU throughput on the
table.

------
evrydayhustling
Pretty impactful work for an intern! One thing I would have liked to see, both
from a process and communication standpoint, is leading with some stats on how
much faster the vectorized loops are in isolation from the surrounding engine.

It's always good practice to dig into a deep project like this with some
napkin estimates of how much you stand to gain, and how much overhead you can
afford to spend setting yourself up for the faster computation. (Not to
mention how much of your own time is merited!)

~~~
asubiotto
Indeed! We spent some time running isolated benchmarks before deciding to dive
into the project:
[https://github.com/jordanlewis/exectoy](https://github.com/jordanlewis/exectoy)

------
nimish
Awesome work. Really cool!

I have a hobby project to write an analytics DB that uses ISPC for vectorized
execution. Currently not much (sums are real easy) but I really wonder if it
could reduce the effort to vectorize these sorts of things.

~~~
gary__
Is it open source / online?

~~~
nimish
Nope, and by toy I really mean toy. It can handle a single type of query and
that's it. Maybe one day I'll polish it up.

------
blr246
Great write-up. Is the long-term vision to go completely to the vectorised
query execution model, or are there cases where a row-oriented plan might be
better, such as cases when there are complex computations involving multiple
columns of a single row?

~~~
jordanlewis
We don't have any plans to remove the row-by-row execution engine. Likely,
we'll have some analysis during planning that can inform whether to use the
row-oriented or column-oriented engine. I think the use cases for the row-
oriented engine are exactly what you mention - things like single-row
computations or more OLTP queries like point scans, inserts, update etc -
where the overhead of setting up the data structures required to use the
column-oriented engine would dominate.

------
sAbakumoff
I looked at the title and thought that the article is about rolling joints
with hash by using some advanced robotic technology and software. Damn!

~~~
ddorian43
Pretty sure that they can use same tech as cigarettes.

------
andonisus
It was a pleasant surprise to see the code base written in Go!

------
jnordwick
Column oriented dbs have been doing parallel (column per thread) joins for a
while now, no? And I know they have been leaning heavily on vectorization for
over a decade.

------
JanecekPetr
Genetic specialization (List<int>) is coming. See
[https://openjdk.java.net/projects/valhalla/](https://openjdk.java.net/projects/valhalla/).
Not this year (value types will be first). But they're working in it really
hard.

