You should checkout some of the other impressive compression related stuff that Daniel has open sourced: https://github.com/lemire
There are many flavours of bitsets. Sometimes it's not the largest element that's a limit but the cardinality. You should also take note of the cost of clearing a bitset: This cost is almost never included in performance considerations. For instance, when you're making small random subsets (of size k) of a set (size n) you might be tempted to use a bitset and recycle it between subsets.
you typically end up with a performance of k.O(1) for set but O(n) for a clear.
pick your poison ;)
So this is basically just an array of booleans, and each element has two states.
But are there equivalent fancy compressed data structures for more than two states? (i.e. like a compressed array of enumerations) Or does that not make sense for some reason?
This representation, and the various others the paper compares it to, are useful when the array is fairly sparse - the introduction suggests the range 1/10000 to 1/64 of the bits being set. They're a lot more compact, but still support key operations - ORing, ANDing, testing a particular bit, enumerating all set bits - fairly efficiently.
The major motivating use of structures like this is for bitmap indexes in databases. Bitmap indexes store a single bit of data per record - it could be "is this person a current employee?", "has this user ever bought anything?", etc. If you can encode these compactly, you can use them to very quickly winnow vast amounts of data down to a small number of interesting records, which you can then retrieve one by one.
"On the Java platform we used for our experiments, we estimate that we can compute and write bitwise ORs at 700 million 64-bit words per second. If we further compute the cardinality of the result as we produce it, our estimated speed falls to about 500 million words per second. That is, we suffer a speed penalty of about 30% because we maintain the cardinality. (We find that this penalty effectively disappears when using an equivalent optimized implementation written in C.) In contrast, competing methods like WAH and Concise must spend time to decode
the word type before performing a single bitwise operation. These checks may cause expensive branch mispredictions or impair superscalar execution."
They also discuss why this is so (specialized instructions in modern processors, and modern processors and superscaler allowing execution of multiple such operations per cycle). All of this is in the section that begins "Bitmap vs. Bitmap".