
The Swiss Army Knife of Hashmaps - matt_d
https://blog.waffles.space/2018/12/07/deep-dive-into-hashbrown/
======
ajtulloch
One thing worth trying out as well are the folly::F14 hashmaps
([https://github.com/facebook/folly/blob/master/folly/containe...](https://github.com/facebook/folly/blob/master/folly/container/F14.md)),
which has a roughly similar design. You can find some discussions on the
performance tradeoffs by the SwissTable and F14 authors at
[https://groups.google.com/forum/#!msg/hashtable-
benchmarks/-...](https://groups.google.com/forum/#!msg/hashtable-
benchmarks/-nT-3fESXq8/1ckpGBiAAgAJ).

------
jorangreef
This is a fantastic explanation of developments in hash tables in general.

If anyone is interested, I recently released a scalable hash table for Node.js
(@ronomon/hash-table), designed to store literally billions of elements
without stressing the GC.

At first, I tried writing a native addon module for Node.js in C (using SIMD),
but the Javascript-to-C call cost was just too high, and doubled latency. In
the end, I was surprised to find that writing directly in Javascript proved
faster than the C module with SIMD, simply by avoiding the bridging cost.

Also, instead of linear or quadratic probing (or double hashing or triangular
probing), I used cuckoo hashing, but with a twist. Normally, in the case of a
naive cuckoo hash table, if an element is not in its first bucket, then its
second bucket must be fetched from memory, causing a cache miss, which for a
hash table is embarrassingly slow. This is why SwissTable and F14 tend to
avoid naive cuckoo hashing.

However, in our design, the first bucket includes an 8-byte coarse-grained
bloom filter (k=1, or a single bit per element). If the element is not in this
bloom filter, the element is guaranteed not to be in either of its first or
second buckets, and a cache miss can be avoided nearly 90% of the time. Using
cuckoo hashing in this way then starts to produce major advantages over linear
probing: no tombstones, higher load factor (which has knock-on implications
for resize latency), and constant time worst-case lookups.

More design details are here:

[https://github.com/ronomon/hash-table](https://github.com/ronomon/hash-table)

------
lucas_membrane
I read TFA discussion of performance issues for hashmaps, and I see that
deletes are something of a serious hindrance. As I often use hashmaps for a
single-pass accumulation or transformation of data, I suspect that a hashmap
without a delete operator might be designed to give superior performance in
many such cases. Is this so, and are there any such implementations of
hashmaps, mutable but with no delete method?

~~~
samcheng
Bloom filters are an example of a hashing data structure that does not support
delete. So yes, such a tradeoff can be made.

Maybe some DHT implementations also make deletion hard? (In general, I'd much
rather mark something as invalid than delete it.)

~~~
kccqzy
I think GP is talking about improving performance just by omitting deletion
support in hash maps. For example the tombstone marking in the original
article won't be necessary. Experience tells me that for a host of data
structures (vectors, BSTs, hash maps and many others), omitting deletion
support can simplify the code a lot as well.

