

Using Uninitialized Memory for Fun and Profit (2008) - frontsideair
http://research.swtch.com/sparse

======
userbinator
Due to multilevel caches I doubt this would be so applicable to modern systems
- the index-set representation takes several times more memory and naturally
requires lots of random accesses, which are bad for caches. In some ways, it's
no longer a time-space tradeoff: smaller _is_ faster.

Also, initialising the space doesn't take so long; REP STOS on x86 is _fast_
since it can write an entire cache line at once:
[http://ptspts.blogspot.ca/2014/10/on-speed-of-
memset.html](http://ptspts.blogspot.ca/2014/10/on-speed-of-memset.html)

~~~
prestonbriggs
I'll bet it's still faster in plenty of circumstances. It's hard to beat
superior asymptotic complexity.

~~~
frozenport
The 'magic' here is that you can quickly iterate over the data because it
lives in 'dense'. In dense cases 'm'='n' and we loose. The apparent faster
clear performance is because the length ('n') is stored as an auxiliary value,
which is not unique to this scheme. As pointed out before, the extra logic
will break vectorization and result in poor performance.

~~~
prestonbriggs
You can't just assert "we win" or "we lose"; you need to measure different the
different possible implementations for your particular application. In the
paper I wrote with Linda Torczon (cited by Russ Cox), we did exactly that. In
that application (a graph coloring register allocator), we were very sparse (n
was significantly less than m) and we cleared the working set quite often -
the win was pretty significant.

------
gizmo686
Python uses a similar trick in its internal allocater (pymalloc) [1]. Given a
memory address, pymalloc determines if it controls it by computing what would
be the base of the arena containing said address. It then finds the index of
the arena (or trash) at a constant offset from the base. It then uses the
index to check a vector of all the arenas that it controls.

[1][http://svn.python.org/projects/python/trunk/Misc/README.valg...](http://svn.python.org/projects/python/trunk/Misc/README.valgrind)

------
crispweed
Something in between traditional bit vectors and sparse sets is described
here:

[http://upcoder.com/9/fast-resettable-flag-vector/](http://upcoder.com/9/fast-
resettable-flag-vector/)

This is designed for a fast clear, but doesn't actually have constant time
clear, it's just O(n) clear with very small (amortized) constant. It's a bit
more cache friendly, though, I think, and gave us some nice speedups in
practice.

------
mannykannot
I think I have seen this method used in a data structure called something like
a 'B.* set' or 'B.* index', but I can't remember the exact name. Does anyone
have a clue as to what I am (mis)remembering?

Update: I remembered that the algorithm had something to do with checking if
an item is not in a set, which quickly led me to Bloom Filter - not
particularly similar, as it turns out.

------
blt
Ahhhh, I love hacks like this :) But it is super branchy; I would definitely
profile before assuming it's an improvement.

------
im3w1l
Reading from uninitialized memory in C seems to be a bad idea:

[http://blog.frama-c.com/index.php?post/2013/03/13/indetermin...](http://blog.frama-c.com/index.php?post/2013/03/13/indeterminate-
undefined)

~~~
crispweed
I didn't read all that, but it seems like the issue there is that, if the
compiler sees you are reading from uninitialised memory, then it can
effectively 'choose' a value for the contents of that memory to suit it's
optimisation purposes. Is that right?

If so, then this isn't actually a problem in the case of sparse sets. With
sparse sets we don't care what is in the uninitialised memory and if the
compiler wants to get tricky and choose values for this memory (which I don't
think actually applies in this context) that doesn't change the correctness of
the data structure..

~~~
marvy
If you read it, you'll see that things are much worse than that.

~~~
crispweed
Ok, yes. Reading to the end of stuff is a good thing. :)

