
Hash tables with O(1) worst-case lookup and space efficiency [pdf] - tianyicui
http://www.ru.is/faculty/ulfar/CuckooHash.pdf
======
sigstoat
I've had some luck turning a cuckoo hash into a sort of LRU cache. Whenever
you do an insert, replace the older item. Iirc, everything else stayed the
same. Using more than 2 hashed really helped.

~~~
pjscott
I'd been wondering how best to make an LRU cache, and now you come along and
give a really good solution. Nice!

~~~
jrockway
That is not LRU, it's "replace with some other random thing some time, based
on a hash function".

------
ComputerGuru
I'd just like to point out that the worse your hash algorithm is, the better
Cuckoo Hashing compares to traditional hashing...... therefore, _study your
hash functions_!

The ones that come with most languages or libraries by default quite simply
aren't optimized for use on anything. There's hash functions that work better
on strings and hash functions that work better on numbers (length bias). In
the perfect world, this wouldn't be true, but in the real world, yes it is.

That said, may I recommend MurmurHash3:
code.google.com/p/smhasher/wiki/MurmurHash3

Switch your hash tables to this. The performance difference is incredible.

------
ot
Cuckoo hashing is not only optimal in theory, it is also very fast in
practice. The only downside, though, is that the insert time has linear-time
worst case. Thus it may be not the best solution if latency is an issue.

~~~
jemfinch
"Linear-time worst case" is different from "amortized constant time", which is
what cuckoo hashing provides.

Almost all the containers we use on a regular basis provide only amortized
constant time inserts.

~~~
jules
Linear time worst case and amortized constant time are not mutually exclusive.
You are right that most containers that we use have amortized constant time
complexity, but they also have linear worst case complexity. Most times this
is because an internal array has to be resized and the contents have to be
copied over to the new enlarged array.

~~~
jemfinch
No, they're not mutually exclusive. But what actually _matters_ in most
software is "amortized constant time": linear time worst case only matters in
hard real-time systems which can't tolerate any deviation from the expected
runtime.

Even algorithms like binary trees can end up with very bad worst cases unless
great care is taken in allocating their nodes.

------
thurn
An interesting algorithm, and an engaging presentation. If only more academic
writing was this accessible!

------
lecha
Is open-source implementation of this data structure available somewhere?

~~~
sanxiyn
There is ckhash.

<http://canasai.tcllab.org/software/ckhash/>

------
pbewig
I recently did an exercise on cuckoo hashing at
<http://programmingpraxis.com/2011/02/01/cuckoo-hashing/>.

~~~
kunjaan
I had loved that article. It was very succinct.

------
jchonphoenix
I coded a cuckoo hashed hashtable for a class once, and recently had a lecture
on cuckoo hashing given to my class by Rasmus Pagh himself (inventor of cuckoo
hashing). Its really surprising actually how complex the probability behind
cuckoo hashing really is (and how tied to graph theory/bipartite matchings it
is).

The amazing part is, not only is access O(1), the expected insertion time
whenever there is no rehashing is also O(1) and rehashing occurs infrequently.

------
sambeau
"WHY ELSE IS THIS COOL?" seems a very casual heading for an academic article.

~~~
_delirium
It looks like it was a workshop paper, from the 7th Workshop on Distributed
Data and Structures. I don't know much about that particular workshop, but
workshops often aim for a more informal atmosphere, since one of their main
roles is fostering discussion (incl. of works in progress) and bringing
communities of researchers together to discuss things.

------
16s
It's been my experience that the O(1) was on average, not worst case.

~~~
ot
Cuckoo hashing has O(1) worst-case _access_ time, and O(1) "average"
(amortized, to be correct) _insert_ time.

~~~
16s
OK. That makes sense. If O(1) is worst case, what could be better :)

~~~
rudiger
Well, smaller constant factors could be better :)

------
brg
Mitzenmacher, the author of Probability and Computing, has an interesting
survey on cookoo hashing. In it there are a number of open research problems
that, for those of you with interest, a worth looking over. The 7th is very
interesting to me, regarding optimal ways to maintain a hash table in a
parallel computing environment.

[http://www.eecs.harvard.edu/~michaelm/postscripts/esa2009.pd...](http://www.eecs.harvard.edu/~michaelm/postscripts/esa2009.pdf)

------
ithayer
Every time I've tried comparing cuckoo hashing vs traditional hash algorithms
in practice, the time taken to compute the additional hash functions outweighs
any gains in performance.

Counter-intuitively, I've also noticed in many cases that using binary search
over sorted elements in contiguous memory is actually faster than using a hash
table at all.

Has anyone else found this?

~~~
tripzilch
> the time taken to compute the additional hash functions outweighs any gains
> in performance.

Sounds like you're using the wrong hash functions.

Are you using secure cryptographic hash functions perchance? (such as MD5,
SHA, etc) Because they're not intended for use in data structures.

Most data structure algorithms just require a hash function with good
avalanche behaviour and a statistically even bit dispersion. The FNV hash will
do this for you with just a MUL and a XOR per byte, which is (rough guess) at
least 100 times faster than SHA. FNV hash
(<http://www.isthe.com/chongo/tech/comp/fnv/>) it's super-effective!

