
How Google Sparsehash achieves two bits of overhead per entry using sparsetable - Smerity
http://smerity.com/articles/2015/google_sparsehash.html
======
eutectic
Note that the maximum fill factor of hashtables can be increased well above
90% with comparable or better performance to standard open addressing using
Robin Hood collision resolution with backwards shift deletion. The hashtable
in the Rust standard library works like this and there is an argument to be
made that it should become standard.

Of course, this doesn't do away with the need to resize, so it might not be
appropriate for low latency or memory constrained systems. I wonder whether a
tree-based data structure like a crit-bit tree or a HAMT might be able to do
similar things to sparsehash with better performance.

~~~
detrino
My understanding is that Robin Hood hashing has worse performance than linear
probing due to greater cache misses.

If you are looking for low memory overhead ordered sets/maps, then B-Trees can
provide that.

~~~
eutectic
Why would it give greater cache mosses? My understanding is that it requires
fewer because it reduces the average probe distance and obviates the need for
tombstones and the attendant performance degradation.

~~~
detrino
Sorry, it's early, I was thinking about cuckoo hashing.

------
im3w1l
tldr: Open addressing. Group adjacent slots into buckets. Every bucket
consists of an occupancy bitmap (1 bit overhead per element), and a pointer
(ptrsize / nElementsPerBucket bits overhead, per element) to the elements in
the bucket.

~~~
efaref
Surely that's 1 bit overhead per element or element slot (where if you're
aiming for 50% occupancy means at least 2 bits per element).

Or are these magic bits where zeroes are free?

~~~
im3w1l
Oops, yeah you are right. Should have said per slot, not per element.

------
dspeyer
So we want to insert a value. We find an empty slot. We find the bucket it's
in. Do we have to realloc the bucket and move all downstream data down by one?
That sounds slow.

~~~
Someone
'All the data' most of the time are references to all the data.

Also, I would not reallocate, but allocate at max size or, possibly, half size
and grow if needed.

With that change, once you access the array, you likely have all pointers to
shift in your level 1 cache.

Because of that, I expect it to be plenty fast enough.

(Hm, are there CPUs that have instructions for shifting parts of cache lines
around?)

------
TheLoneWolfling
Any idea how much space overhead malloc has with entries as small as this? (<1
to 4> * 8 bytes)

Seems to me it could easily be a significant amount of overhead.

~~~
kuschku
Most malloc implementations use a set of segregated explicit linked lists,
which can be implemented with as low as 2 bit overhead, although, due to
aligning, this ends up being 2 times the wordsize: in the end of the chunk you
have a bit that tells if it’s free, and in the beginning of the chunk. If the
bit is set to "free", then you use the free space inside the chunk for an
offset pointer to the next and the last free element of the same size.

~~~
TheLoneWolfling
So in other words, up to twice the overhead as amount of storage allocated for
the actual elements? Yow!

This makes this sort of scheme decidedly less attractive.

~~~
kuschku
well, sometimes you can actually put those bits into the padding, but often
you can not.

This is how many of the simpler malloc implementations, which do not directly
mmap, work.

So, remember, if you want to malloc 20 ints, don’t do 20 times
malloc(sizeof(int)), do malloc(20*sizeof(int)) and treat them as int[].

------
signa11
in case folks are interested in this topic, i would heartily recommend Mihai
Pătraşcu blog for some theory :)

~~~
Smerity
Thank you for bringing up Mihai Pătraşcu :) I lightly hinted users towards
succinct data structures in the article. His work was exactly the type I meant
for people to explore - searches for the extreme lower bounds in computer
science!

I am saddened to hear that he passed away in 2012 - I had no clue. He gave so
much in the time he was active. It is bittersweet that his obituary[1]
contains open problems he hoped others would solve.

[1]:
[https://docs.google.com/file/d/0B8ttd1KbGd3EWktsR29qNVdNVEE/...](https://docs.google.com/file/d/0B8ttd1KbGd3EWktsR29qNVdNVEE/edit)

