Just to be that guy, the title of the article should include the word "approximately" as HyperLogLog and other probabilistic sketches are all (albeit often quite accurate) approximation methods.
The article like you pointed out is comparing apples to oranges. I could use 0 bits of memory to count if I’m able to accept 100% error to take the extreme case
> Specifically, by throwing out the 30% of buckets with the largest values, and averaging only 70% of buckets with the smaller values, accuracy can be improved from 1.30/sqrt(m) to only 1.05/sqrt(m)!
How do you do this without sorting? Or, do you approximate again by throwing out everything < 0.3 * max_hash? It’s not obvious to me how that approach would actually give any better results: the spacing between 0.3 * max_hash and the next lowest hash is just as prone to variance as the distance between 0 and the absolute lowest hash, and it’s that distance which feeds the estimate.