

Using Uninitialized Memory for Fun and Profit - l0stman
http://research.swtch.com/2008/03/using-uninitialized-memory-for-fun-and.html

======
chmike
That data structure can be actually used for a log structured database where
the sparse vector is the log file and the dense vector is the record index.

The index of the dense vector is then the record identifier, and the dense
vector contains the offset of the valid record in the log database.

The record identifier is a handle to the record data. It can be reused when
the record is deleted but will remain the same for the lifetime of the record.

The dense vector is very compact and may be stored and retrieved efficiently,
even if stored in the log itself. The record key index stores the association
between the key and the record identifier.

The GC or crash recovery process can easily locate valid records by checking
if their offset matches the one found in the dense index.

So this data structure is not just of historical interest. Thanks for the
link.

------
antirez
An interesting implementation of this trick can be found in the LZF
compression lib: <http://oldhome.schmorp.de/marc/liblzf.html>

Basically the hash table used to compare the current input with already seen
input is uninitialized, since the two values are anyway compared bit-per-bit
to actually make sure there is a match.

------
jrockway
malloc + memset takes about .24 seconds per gig on my machine. So be sure that
your dataset is really, really sparse before you try this.

~~~
alttab
I'm not sure if your comment suggests that .24 seconds is fast enough not to
worry about it, or not fast enough.

In my time at IBM doing very performance oriented C development, saying that a
server operation only took .24 seconds would get you laughed out of the
development meeting as the operations were designed to take 12 _microseconds_.

~~~
ramchip
Allocating and initializing a huge array of memory at start-up can take a
second, and subsequent server operations take 12 microseconds. Nobody's
talking about allocating a gig of memory every time a server gets a request.

~~~
alttab
One thing to consider is we also aren't using small development boxes. These
were Power7 zSeries servers with usually 16 cores (8 in hardware) and up to
32GB ram. You would be surprised what these machines are capable of. Of
course, I'm not saying I can allocate a gig in .24 seconds and set it to zero,
but I'm not saying its not possible either (I simply don't know as I no longer
have access to these machines).

------
barrkel
Liveness analysis - the motivating example - is typically in the form of a set
with one element per local variable. Most practical routines will have less
than 32 or 64 variables, however, so even in practice a 32-bit or 64-bit word
will be better. Also, you often have to compute other set operations, such as
union, intersection and difference, rather than just membership and iteration.

~~~
rsc
You still have to do something for the functions with many local variables. In
those cases you have little choice but to allow all N variables into the set,
even if comparatively few are in any particular set at one time.

Sparse sets are also a great way to implement NFA state sets, where again you
have a large number of possible set members but most sets are small, and you
don't want to pay the O(all possible states) cost over and over.

