I like that this algorithm shows the strength of non-determinism. You simply cannot achieve lower than O(log n) bits in a deterministic way for a counter.
It's also neat that you can trivially extend this to O(log log ... log n) by making the increment probability 2^(2^(... 2^(-X), all the way up to O(log* n).
To the ones questioning the usefulness of this, note that this is essentially equivalent to floating point arithmetic, where X is the exponent (and eps dictates mantissa size). This shows you can do all sorts of unbiased operations just like you would with floating point values. For example, you can add counter A and B e.g. A=10 B=8 by taking C="A+B"=max(A,B)+{1 with prob 2^-(|A-B|+1)}=10+{1 with prob 2^-3}.
Honest question: What's the use of compressing the counter for n smaller than log(n) bits? Even if you were counting something on the order of 10^30, log n is only around 100 bits. Wouldn't storing the algorithm take more space than you'd save through this counter?
For example, I might want an approximate counter of number of reads of each row of a cache. Assume my cache has a lot (millions?) of rows, each of which stores a small amount of data. With the given algorithm, I could store the counter in a single byte per cache row. Without it, I'm probably looking at 4 or 8 bytes. It adds up.
As an aside, the article presents an alternative in which you have multiple simultaneous counters and take the median of the results to get finer bounds. I wonder how this compares with just using a single counter containing more bits and modifying the probability that you "increment" the counter?
[Edit: I realize the answer to my question about modifying the probability of a single counter may be that it's hard to get an unbiased random source for probabilities of 1/3, 1/5, etc.]
Algorithm is re-usable, i.e. you can count multiple events with the same algorithm. This way you get an array of bytes, or even half-bytes, instead of an array of very long integers. Can be useful, especially for embedded systems.
It's also neat that you can trivially extend this to O(log log ... log n) by making the increment probability 2^(2^(... 2^(-X), all the way up to O(log* n).
To the ones questioning the usefulness of this, note that this is essentially equivalent to floating point arithmetic, where X is the exponent (and eps dictates mantissa size). This shows you can do all sorts of unbiased operations just like you would with floating point values. For example, you can add counter A and B e.g. A=10 B=8 by taking C="A+B"=max(A,B)+{1 with prob 2^-(|A-B|+1)}=10+{1 with prob 2^-3}.