
Advanced techniques to implement fast hash tables - fanf2
https://attractivechaos.wordpress.com/2018/10/01/advanced-techniques-to-implement-fast-hash-tables/
======
gabeiscoding
The author of this post wrote klib/khash, and uses it in his very popular cpu-
intensive bioinformatics programs such as bwa (short read alignment against
the human ref genome).

I've been learning rust recently. As a learning exercise, I compared the
robin-hood hashing of std::collections::HashMap in rust to his klib/khash he
mentions in this article , and then tried various hash functions to try and
match his performance:

[https://github.com/gaberudy/hash_test](https://github.com/gaberudy/hash_test)

No dice, his hash table is smaller and faster.

My next step is to try and implement his data structure and hashing functions
directly in rust and see if I can get it to near-C performance...

~~~
sametmax
How does that compare to the dict implementation of Python 3.6 ? It's supposed
to be damned fast but I don't have the skills to make such comparisons.

~~~
rurban
Python, ruby, php hash tables are roughly 2x slower. perl5's being the worst
maybe 4x slower.

Damn fast is only relative. It's damn fast compared to the previous
implementations. Wikipedia even dares to say that the worst hash tables of
all, perl5, is one of the very best. Bias all over.

SIMD optimized hash tables, such as the Swiss tables tricks have the potential
to be much faster than khash. And khash is not cache oblivious at all, with
its double hashing scheme.

~~~
sametmax
Does it preserves insertion order ?

~~~
rurban
Of course not. hash table iteration should assume random order, otherwise it's
a security risk or performance nightmare.

------
romed
Hrmm this is the first I’ve heard that Google has opened swisstable[1].
Strangely the three submissions to HN of this attracted no comments. This blog
is interesting but a write-only benchmark of a hash<int,int> is sort of niche.

[https://abseil.io/blog/20180927-swisstables](https://abseil.io/blog/20180927-swisstables)

------
jorangreef
I recently wrote [https://github.com/ronomon/hash-
table](https://github.com/ronomon/hash-table) for Node.js, which has a few
more design tips listed in the Readme, specifically around optimizing a cuckoo
hash table by reducing cache misses.

------
notatcomputer68
Nit: You should implement quadratic probing with triangular numbers by
addition for a slight speedup. I.e. x+= i, rather than x = ... × ...

~~~
attractivechaos
Of course. Notable libraries implementing quadratic probing all use addition
instead of multiplication. The example in the blog post is just for
demonstration purpose.

~~~
abhishekjha
>addition instead of multiplication

What is the advantage of this?

~~~
amelius
Addition being cheaper than multiplication on many CPUs?

~~~
sametmax
ELI5: wouldn't it be the job of the compiler ?

~~~
amelius
Sort of, but there are no guarantees.

------
namibj
klib could do with a bit more documentation for what the abbpreviated argument
and function names do. Apart from that, I just have praise for this. It's
succinct, performance-oriented and comparatively portable. From what I
reversed for the purpose of grasping the effect of parameters, I like the
interface/API, but the scope over which I am speaking is pretty much limited
to basic kbtree, due to the lack of documentation.

------
maxxxxx
I love this stuff but how many of us really get to (have the privilege) to
doing this kind of work?

~~~
timClicks
If you're writing code in C/C++/Rust/... then your code will benefit from
understanding hash function performance because the choice of function can be
another parameter to tune.

