
Cuckoo++ – High-Performance Hash Tables for Networking Applications (2017) - jsnell
https://arxiv.org/abs/1712.09624
======
alexhutcheson
It would be interesting to see this benchmarked against absl::flat_hash_map[1]
and folly::f14:F14FastMap[2].

[1]
[https://abseil.io/about/design/swisstables](https://abseil.io/about/design/swisstables)

[2]
[https://github.com/facebook/folly/blob/master/folly/containe...](https://github.com/facebook/folly/blob/master/folly/container/F14.md)

~~~
scaramanga
The point of cuckoo++ is to handle timers and expiry efficiently so neither of
those hash tables is comparable. It would have to be "swisstables + heap" or
"\+ timer wheel" or some other timer management algorithm.

~~~
alexhutcheson
Interesting, thanks for pointing that out - it wasn't obvious to me from
skimming the paper.

In their "Conclusion" section, they write:

"We thus propose Cuckoo++ which adds a bloom filter in the primary bucket,
allowing to prune unnecessary accesses to the secondary bucket without
requiring expensive computation. Cuckoo++ hash tables have a uniformly good
performance when compared to both pessimistic and optimistic implementation,
and an improved performance over DPDK and Horton tables for all cases.

We also describe a variant of Cuckoo++ that integrate support for entry
expiration directly in the hash table, avoiding the need for external
management of timers and the associated overheads. This relies on a new memory
layout more compact than DPDK’s original one, and on the use of 16-bit
timestamp."

That seems to imply that the timer/expiration feature is an optional added
bonus, rather than a core feature of Cuckoo++.

~~~
scaramanga
So yeah, you're right, comparison would be possible if you subtract all the
timer stuff. But as I recall of the offered implementation there wasn't
actually an easy way to do that...

~~~
alexhutcheson
Their graphs include "Cuckoo++" and "Cuckoo++ w/ timers".

From their GitHub repo[1] it looks like the timer feature isn't built-in by
default: BLOOM is just Cuckoo++, while LAZY_BLOOM is Cuckoo++ with timers.

[1] [https://github.com/technicolor-
research/cuckoopp#benckmarkin...](https://github.com/technicolor-
research/cuckoopp#benckmarking)

~~~
scaramanga
You can see that when TIMER macro is undefined in that benchmark code then no
timer management is performed at all.

What's shown in the paper that bucketized cukoo + bloom compares well to he
other hash tables they looked at (DPDK is probably a bit of a red-herring
since it supports, and is optimized for, concurrent operations - but that's by
the by). And here, a comparison with a fast robin-hood-ish hash like swiss or
bytell would be interesting.

But also in the same benchmarks, cuckoo++ with timers is being compared
directly against benchmark runs where no timer management is performed at all.
The results show that it to be comparable. So presumably, when compared
against traditional timer management like heaps/wheels it would be
significantly ahead. But the explicit comparison hasn't been done anywhere as
far as I know.

That's more what I was getting at in previous comments. Of course, you can
look at it the other way around and you can rip out the timer stuff and
compare bucketized cuckoo + bloom as just a hash table in its own right too.

The other comparison might also be interesting, can you do similar timer stuff
with a robin-hood hash? You'd probably have to give up the SIMD, or come up
with a new scheme to make SIMD usable, since that's what bucketized cuckoo
enables you to do easily and that robin hood doesn't.

Edit: thanks for the correction btw!

------
doctorsher
The submitted link is a pre-publication draft of this conference paper
published at the 2018 Symposium on Architectures for Networking and
Communication Systems (ANCS):
[https://dl.acm.org/doi/10.1145/3230718.3232629](https://dl.acm.org/doi/10.1145/3230718.3232629)

------
aeontech
A neat interactive visualization
[http://www.lkozma.net/cuckoo_hashing_visualization/](http://www.lkozma.net/cuckoo_hashing_visualization/)

Edit: as child comment points out, this is a visualization of normal cuckoo
hashing, not cuckoo++ that this paper describes.

~~~
abainbridge
Just to be clear, the paper is about an extension to Cuckoo hashing, while the
link to the visualization is about standard Cuckoo hashing.

From the paper: "we describe algorithmical changes to cuckoo hash tables
allowing a more efficient implementation. More precisely, our hash table adds
a bloom filter in each bucket that allows to prune unnecessary lookups to the
secondary bucket."

~~~
scaramanga
Also, when the paper says "cuckoo hashing" they mean "bucketized cuckoo
hashing" which is different enough from the academic description of cuckoo
hashing, which is what is described in GP's link, as to be a bit confusing!

------
rurban
The size of a cache line must be fixed in the preprint. He assumes a cacheline
is 64 bit, whilst it is 64 byte, 8 words not just one. This affects the
timestamp chapter mostly.

------
skyde
so is that thread-safe ?

~~~
scaramanga
It's for networking apps where scaling is typically achieved by shared-nothing
multi-threading making use of the receive-side-scaling feature of NICs.

Incoming packets are hashed by their 5-tuple and divided among CPUs and a
thread is capturing traffic on each CPU and works independently of the others.

Edit: so the answer is that the question if thread safety isn't addressed,
since it doesn't arise in the intended use-case. But I'm sure it could be
added with per-bucket locking since it is based on cuckoo hash.

~~~
jsnell
Specifically, here's the system the hash table was developed for:

[https://www.usenix.org/conference/atc18/presentation/andre](https://www.usenix.org/conference/atc18/presentation/andre)

So yes, very much a system architecture with no locking or sharing.

