
Leapfrog Probing for open addressing hash tables - alanfranzoni
http://preshing.com/20160314/leapfrog-probing/
======
cyphar
The author doesn't mention deletion (which is actually the most problematic
part of open-addressing hash maps). If you use gravestones (which is the
solution most people take) you end up with wasted space. You can optimise it
slightly by retaining the stored key in the gravestone so if the key is set
again, you can reuse that spot. However, and more optimisation starts causing
orphaning problems. This is why I much prefer buckets because lots of removals
don't incur a sudden performance penalty.

Also, I can see that if your first delta hits another probe chain's beginning,
then you'll end up with the same chain. This is slightly solved by quadratic
probing, where your first and second multipliers have differing effects on the
output so two chains that meet once aren't forever locked together. Maybe if
you incorporated some of the quadratic effect into Leapfrog this issue might
be alleviated.

------
kelvin0
It would have been nice to have some context with regards to the algorithm
being explored. Although some ideas seem pretty clever, I've too often seen
code implementing this in a haphazard way (buggy, no comments). This leaves
the next programmer scratching their head and wondering ... why? Unless you
have very specific performance and memory constraints, I do not see the need
for such structures in most every programmers daily business.

~~~
mtanski
This is a very unproductive comment / view-point. I find the anti-learning
stance troubling.

This is a link to a very technical article hash table implementation a pretty
popular data structure. Right at the top it tells you that's what it's about.
Also, if you bother to check the side of the article when it list recent
article or go look at the other blog post you'll see it's a follow up on a
previous article on lock-free hash tables that use this method. That's more
then plenty of context if you just give it more then 5 seconds.

> Although some ideas seem pretty clever, I've too often seen code
> implementing this in a haphazard way (buggy, no comments). This leaves the
> next programmer scratching their head and wondering ... why? Unless you have
> very specific performance and memory constraints, I do not see the need for
> such structures in most every programmers daily business.

Again, right in the article... the guy tells you that the common
implementation of using a linked list for buckets in hash table is performance
unfriendly (cache unfriendly) on modern machines.

Just because you don't see a need for it in your daily business, doesn't mean
others do not there are plenty of cases where linear probing hash tables make
sense. This includes any kind of GPU computing, databases, or anywhere your
bottle neck is a particular lookup data structure. And yeah, those are all
niche use cases, but they numerous and important enough where somebody might
learn something.

~~~
kelvin0
I don't think this type of information shouldn't be disseminated. I wanted
more context as to the use of the structure, or what brings the author to
demonstrate it. Not having such context, I then express some concerns with the
shoddy implementation I've seen in other code bases associated with this
structure. Not to say this is not useful, OR that all implementation of hash
tables are sloppy of course. Also, I clearly think it does have it's use, I
just wanted more background info from the author. The article is otherwise
very well written and explains fairly clearly the structure and some caveats.

------
herge
Also an interesting variation on hash tables is the Cuckoo Hash
([https://en.wikipedia.org/wiki/Cuckoo_hashing](https://en.wikipedia.org/wiki/Cuckoo_hashing)).
However, I don't think it makes a lot of effort in keeping marching keys near
to each other, so I suspect Leapfrog probing might cause less cache misses.

~~~
rurban
Cuckoo needs double space, and even if you add those two tables one after the
other, it begs for cache misses with larger arrays, besides needing a much
lower fill rate than leapfrog, which only needs 2 bytes extra. leapfrog looks
really promising

~~~
todd8
I'm not sure how Cuckoo hashing would compare with regard to cache misses, but
it doesn't take double space. Each element makes can use a second hash
function and in space efficient implementations (with a third hash function,
etc.) the space utilization is over 90%.

See the conclusion of "Space efficient hash tables with worst case constant
access time"
([http://www.itu.dk/people/pagh/papers/d-cuckoo.pdf);](http://www.itu.dk/people/pagh/papers/d-cuckoo.pdf\);)
it starts:

> From a practical point of view, d-ary Cuckoo Hashing seems a very
> advantageous approach to space efficient hash tables with worst case
> constant access time. Both worst case access time and average insertion time
> are very good.

------
failrate
I had a moment of panic where I had interpreted this as a security
vulnerability in Leapfrog computer toys.

