
How Fast is Bit Packing? - ANTSANTS
http://lemire.me/blog/archives/2012/03/06/
======
barrkel
I've implemented bit packing because I wanted to store a large array of
integers (actually, indexes to another array) in memory and have efficient
random access over them.

But what is shown implemented here is more like a compression / decompression
routine, because it's predicated on the idea you want to read / write long
strings of numbers. If that's the case, it's probably better to use a cheap
and fast stream compression algorithm - you'll lose some speed, but likely
gain significant space (depending on data of course).

~~~
ANTSANTS
If you look at the code[1], there is basically a special cased/"unrolled" pack
and unpack routine for each "bit size" integer that is not already machine
addressable. Each routine seems (disclaimer: I didn't check _all_ of them) to
operate on the lowest common multiple of words between "bit size" and word
size.

I believe he wrote the rest of the program as a benchmark for his "chunk
oriented" routines using large amounts of data. It doesn't make the pack and
unpack routines any less suited to processing small amounts of data.

With a little work, you could use those routines to allow random access to a
large packed array with a small buffer -- in the simplest implementation,
calculate if the integer you're trying to address is in the buffer, if not,
unpack the correct chunk and overwrite the buffer with it.

[1] <http://pastebin.com/ugGnk00p>

~~~
barrkel
You don't need to use the routines; the routines are trivial to write. I
expect he gets most of his performance from unrolling and cache effects.

My point is rather that if you are in a position to bulk read / write, there
is probably a better option.

~~~
ANTSANTS
I think the author knows there are better options for compressing large
streams, but still chose to use the situation as a simple benchmark to test
the performance of his (admittedly trivial) packing and unpacking routines.

You're right that the benchmark is biased due to cache effects/etc from
processing in a large stream, however. A more honest (and useful) benchmark
would be to test the performance of packed and unpacked integers under a
random access pattern.

