We use EWAH (another compressed bitset from Daniel Lemire) heavily at work to store inverted indices in a compressed form. Apart from lowering the memory foot print (vs storing sorted integers), AND/OR operations are much faster when you want to do stuff like intersecting 2 bitsets to find the overlapping members etc.
You should checkout some of the other impressive compression related stuff that Daniel has open sourced: https://github.com/lemire
There are many flavours of bitsets. Sometimes it's not the largest element that's a limit but the cardinality. You should also take note of the cost of clearing a bitset: This cost is almost never included in performance considerations. For instance, when you're making small random subsets (of size k) of a set (size n) you might be tempted to use a bitset and recycle it between subsets.
you typically end up with a performance of k.O(1) for set but O(n) for a clear.
How do these compare space and performance wise with Judy arrays, which are 256-ary trees whose nodes also distinguish between sparse and dense subsets?
https://en.wikipedia.org/wiki/Judy_array
So this is basically just an array of booleans, and each element has two states.
But are there equivalent fancy compressed data structures for more than two states? (i.e. like a compressed array of enumerations) Or does that not make sense for some reason?
Semantically, it's a boolean array. Or, to look at it another way, a set of integers, where a[n] is true if n is in the set.
This representation, and the various others the paper compares it to, are useful when the array is fairly sparse - the introduction suggests the range 1/10000 to 1/64 of the bits being set. They're a lot more compact, but still support key operations - ORing, ANDing, testing a particular bit, enumerating all set bits - fairly efficiently.
The major motivating use of structures like this is for bitmap indexes in databases. Bitmap indexes store a single bit of data per record - it could be "is this person a current employee?", "has this user ever bought anything?", etc. If you can encode these compactly, you can use them to very quickly winnow vast amounts of data down to a small number of interesting records, which you can then retrieve one by one.
It's covered pretty well in the paper they link to:
"On the Java platform we used for our experiments, we estimate that we can compute and write bitwise ORs at 700 million 64-bit words per second. If we further compute the cardinality of the result as we produce it, our estimated speed falls to about 500 million words per second. That is, we suffer a speed penalty of about 30% because we maintain the cardinality. (We find that this penalty effectively disappears when using an equivalent optimized implementation written in C.) In contrast, competing methods like WAH and Concise must spend time to decode
the word type before performing a single bitwise operation. These checks may cause expensive branch mispredictions or impair superscalar execution."
They also discuss why this is so (specialized instructions in modern processors, and modern processors and superscaler allowing execution of multiple such operations per cycle). All of this is in the section that begins "Bitmap vs. Bitmap".
One restriction is that random insertion into the bitmap is a O(N) operation. You typically expect to construct the bitmap in a sequential fashion and once constructed, treat it as immutable.
You should checkout some of the other impressive compression related stuff that Daniel has open sourced: https://github.com/lemire