I do wish it could be used header-only out of the box, but still a great library.
For my application, the problem with implementations that use more traditional tombstone algorithms is that removing items from the table doesn't decrease the load, so that at some point the entire table has to be recreated. For my application, worst case performance is far more important than average performance.
dense_hash_map definitely behaves as described above, and (from a quick look at the code) I believe khash does as well.
I haven't looked at ska::flat_hash_map but thanks for the tip.
What is your access pattern? Are insertions and deletions frequently interleaved? Or do you insert a batch and then delete another batch? It will be interesting to have a micro-benchmark to measure how the F14 strategy works in practice.
> I believe khash does as well.
Yes, khash uses the traditional tombstone solution. It rehashes in the same space when there are too many tombstones.
See the steady_state_stats test at https://github.com/facebook/folly/blob/master/folly/containe...
It's basically random, but generally they are very interleaved and most of the time the number of insertions and deletions are approximately equal over some period of time (~seconds). This is for keeping track of orders while processing a depth of book feed, in case you're familiar with that sort of thing.
F14 is using some kind of reference counting scheme for tombstones, described in the article. I don't fully understand it, but it definitely seems different than the standard tombstone-based algorithm, which I do understand. In any case, when using something like dense_hash_map, I see huge spikes in processing time (due to rehashing) that I don't see when using F14. I also resize the hash tables up front to avoid having to do so later on, but of course for my use case that doesn't help with implementations that use a traditional tombstone algorithm unless I use ridiculously large initial sizes.
You can't win them all, I guess…
Also, TIL that invalidating iterators does not actually invalidate references to keys and values:
> The standard guarantees reference stability: References and pointers to the keys and values in the hash table must remain valid until the corresponding key is removed.
If you haven't seen it before you will appreciate https://accidentallyquadratic.tumblr.com/post/153545455987/r...
Both of them have two variants, one which guarantees that references to keys and values will remain valid even if iterators are invalidated, and another which is faster but iterator invalidation invalidates references to keys or values.
IMHO it looks like the Folly team saw the Abseil presentation last year and said, "That's a good idea, we should do that."
abseil presentation: https://www.youtube.com/watch?v=ncHmEUmJZf4 It's super interesting.
Edit: Or at least I think that's relevant to this announcement. I don't know enough about it to say for sure.
For a hash table implementation, it's even more relevant that out of 180 people in a room, chances are > 99.999% that two people share a birthday (meaning collisions), whereas the chance that 15 people in that group are born in the same fortnight is still very small.
It's still fairly likely, but far from the ~1 in 10^23 odds of having NO shared birthdays between 180 people.
they seem to be calling the table_ member for every method in the book. Still weird by they are deriving from std::unordered_map in the first place.
the implementation class F14Table - https://github.com/facebook/folly/blob/master/folly/containe...