
A Seven Dimensional Analysis of Hashing Methods [pdf] - tjalfi
http://www.vldb.org/pvldb/vol9/p96-richter.pdf
======
aappleby
Really interesting analysis, but constraining their keys to 64-bit integers
and using 'hash functions' that only accept 64-bit integers as input skews the
results away from real-world use cases.

In practice I've found that a variant of Robin Hood hashing with cache-line-
sized bins and a hard limit on how deep to search for movable keys when
inserting new keys (max of 3 possible bin locations per key, max search depth
of 3 moves per insert) gives very good performance even with high load
factors.

-Austin, author of MurmurHash

------
tjalfi
The actual title is "A Seven-Dimensional Analysis of Hashing Methods and its
Implications on Query Processing".

------
nosefouratyou
Recently I was reading about hash tables on wikipedia and I was surprised to
learn about how much the birthday problem influenced their design.

"A real world example of a hash table that uses a self-balancing binary search
tree for buckets is the HashMap class in Java version 8."

[https://en.wikipedia.org/wiki/Hash_table#Separate_chaining_w...](https://en.wikipedia.org/wiki/Hash_table#Separate_chaining_with_other_structures)

~~~
rurban
This is only to counter worst-case attack scenarios which do not happen in a
realistic scenario. Like >100 collisions.

Robin Hood or its better cousin Hopscotch hashing, which was forgotten here,
counter that scenario much better, with only ~1-3% performance loss on average
in the best case, and much better perf. numbers in the average write-heavy
case, where their "move to front" strategy pays off. And they have the worst
case scenario also covered, unlike all others.

