My previous job was at an advertising firm, and we used HyperLogLogs for almost all of our real-time analytics infrastructure. They are incredibly space and time efficient. Each "counter" fits into about a single page of memory, and can count into the trillions with <2% error.
We were typically hitting with with tens of thousands of requests per second across about 50K counters. Although it was benchmarked to >1MM ops a second.
Similarly, we also make bloomd, which is an equivalent for using bloom filters, which provide a more set-like abstraction: https://github.com/armon/bloomd
HLL also has two nice real-world optimizations possible depending on use-case.
We're storing 100,000+ unique counters, but only around 1% have more than 100 unique objects counted. Some of those 1% have millions of records so HLL is very useful. As the HLL itself is a fixed size (~10kb for decent accuracy) regardless of #counted objects, in the small case you can replace the HLL with a pure set of counted values and produce a HLL when it grows beyond a bound. Because you're storing the raw values, the transition to HLL is seamless.
Once you've moved beyond raw storage of values there's a harder but still space-saving technique. If you look at the raw bytes of a ~10kb HLL structure with "only" 10's of thousands of counted values around 90% of them will be zero. Below a certain bound it can save a lot of space to have a map of locations and non-zero byte values rather than a raw array of bytes.
One thing people forget about in all the excitement over HLL's is how effectiveness of compressed bitsets, which aren't lossy and so yield precise answers. They exploit the same "90% of them will be zero" phenomenon for space and execution efficiency, but are much more flexible... in exchange for consuming more memory and being slower than HLL's.
We developed an extremely high performance server around it (hlld): https://github.com/armon/hlld.
We were typically hitting with with tens of thousands of requests per second across about 50K counters. Although it was benchmarked to >1MM ops a second.
Similarly, we also make bloomd, which is an equivalent for using bloom filters, which provide a more set-like abstraction: https://github.com/armon/bloomd