

Probabilistic Multiplicity Counting - seiflotfy
https://github.com/seiflotfy/pmc

======
skimpycompiler
I didn't know that one of the authors was Philippe Flajolet, one of the leads
in developing of analytic combinatorics, a mathematical tool that simplified
the analysis and the discovery of algorithms. Of course, the HyperLogLog paper
is a pretty example of how powerful the tool is for analysis of algorithms
with stochastic components.

It's quite silly of me that I've used Redis and never figured out the PF
prefix of functions.

------
versteegen
Neat; the code is pretty simple actually.

I think this could do with a better example which actually demonstrates its
utility. Probably many people who will see this here on HN won't understand
why this is useful for reducing storage requirements.

~~~
seiflotfy
I am working on a benchmarking blog post where I will be comparing count-min,
count-mean-min, count-min-log and psc... Its very nice to see how much space
is being saved sadly estimating is a bit expensive (yet doable). I plan to
also do some minor optimisation to the code before the benchmarking. I find it
very sad that no one implemented it publicly. AFAICT this is the only
implementation around.

~~~
seiflotfy
voila. my first benchmarks based on my use case
[https://news.ycombinator.com/item?id=10149522](https://news.ycombinator.com/item?id=10149522)

~~~
versteegen
Thanks, interesting. I replied in the comments there.

~~~
seiflotfy
With zipf distribution did you mean for every flow insert a zipf fetched
number of events?

~~~
versteegen
You mean whether to decide the number of events randomly? You could certainly
do that, whether you do that or use the "ideal" distribution without sampling
noise just effects how much noise you get in the computed/actual graph.

But maybe the order in which you add the events to these sketches affects the
results? If so it might be bad to add X of flow Y before moving to the next
flow.

