
Cardinality Estimation (2011) [pdf] - ingve
https://www.cs.princeton.edu/~rs/talks/AC11-Cardinality.pdf
======
antirez
Sedgewick is a very good writer IMHO. I learned algorithms with his famous
book "Algorithms in C". Later I read many different algorithms books, and
there are better books from specific point of views, however as an entry level
book I think that AiC is one of the most accessible for students.

Btw, Redis recently switched from HyperLogLog to the slightly better LogLog-
Beta algorithm which is not listed in this publication AFAIK.

~~~
pselbert
> Redis recently switched from HyperLogLog to the slightly better LogLog-Beta

Interesting, has anything been written about that? I've learned most of what I
know about cardinality estimation from Redis and your writing on the matter,
so I'd love to see more. It would be great to re-visit an article I wrote
about HLL[0] from a new perspective as well.

[0]: [https://blog.codeship.com/counting-distinct-values-with-
hype...](https://blog.codeship.com/counting-distinct-values-with-hyperloglog/)

~~~
oertl
If you are interested in new cardinality estimation algorithms for HyperLogLog
sketches you could also have a look on the paper I am currently working on:

[http://oertl.github.io/hyperloglog-sketch-estimation-
paper/](http://oertl.github.io/hyperloglog-sketch-estimation-paper/)

There, I present two different algorithms based on theoretical considerations
that are both accurate over the entire cardinality range. Unlike LogLog-Beta
or HyperLogLog++, they do not need any empirically determined coefficients or
bias correction data.

------
tmsh
Sedgewick also has a nice coursera lecture on 2-3 trees leading to LLRB trees.
Covered in these slides:

[https://www.cs.princeton.edu/~rs/talks/LLRB/RedBlack.pdf](https://www.cs.princeton.edu/~rs/talks/LLRB/RedBlack.pdf)

A simpler implementation of red-black trees. Came across it and enjoyed it.

~~~
MaxScheiber
I haven't actually found LLRBs to be simpler to implement, and a couple other
sources agree with me. They're pedagogically simpler than standard RB trees,
but if your goal is to introduce students to a balanced binary tree, I'd argue
that AVLs are easier to teach and implement than either RBs or LLRBs.

[http://t-t-travails.blogspot.com/2008/04/left-leaning-red-bl...](http://t-t-
travails.blogspot.com/2008/04/left-leaning-red-black-trees-are-hard.html)

[http://www.read.seas.harvard.edu/~kohler/notes/llrb.html](http://www.read.seas.harvard.edu/~kohler/notes/llrb.html)

------
brilee
I had an opportunity to dig into the details of hyperloglog recently, and I
wrote up my results here -
[http://moderndescartes.com/essays/hyperloglog](http://moderndescartes.com/essays/hyperloglog)

HyperLogLog is definitely one of the most elegant algorithms I'm aware of. The
only thing is that the problem it solves is very specific, and it's hard to
generalize it to do other things.

------
foliveira
In the HyperBitBit initial experiments slide, the table with the results and
estimates is throwing me off guard in regard to the ratios.

From my quick calculations, I'm getting [1.03, 0.96, 0.96, 1.05] but the table
specifically has [1.05, 1.03, 0.96, 1.03]

Am I missing something here? Specifically the 2M case has a positive ratio
when the HBB estimate is higher than the actual count.

------
carbocation
As a layperson, I found this to be a pretty accessible background that
ultimately leads up to a useful description of HyperLogLog. I had never heard
of Philippe Flajolet before, and now I'm glad I have.

------
barrkel
This presentation is from 2016, not 2011.

------
mrcactu5
Understood since Babbage: AofA is a scientific endeavor.

• Start with a working program (algorithm implementation).

• Develop mathematical model of its behavior.

• Use the model to formulate hypotheses on resource usage.

• Use the program to validate hypotheses.

• Iterate on basis of insights gained.

Knuth's methodology (slide 2) can be used to analyze programs we currently
use, which have no obvious mathematical content.

e.g. how do these fine code linters spot errors in your code?
[https://github.com/showcases/clean-code-
linters](https://github.com/showcases/clean-code-linters) but also does it
manage your keystroke events an present information to you in a handy
interface? algorithms are managing all of that

