
HyperLogLog and MinHash - ColinWright
http://tech.adroll.com/blog/data/2013/07/10/hll-minhash.html
======
covi
For those interested, here's a nice HyperLogLog implementation in Scala:
[https://github.com/twitter/algebird/blob/develop/algebird-
co...](https://github.com/twitter/algebird/blob/develop/algebird-
core/src/main/scala/com/twitter/algebird/HyperLogLog.scala)

------
Harimwakairi
I love that every month or two someone who missed it the first time
rediscovers this article, has the "holy crap!" moment, and posts it to HN. :)

~~~
ColinWright
Can you point me at earlier conversations? I'd love to see what the HN hive-
mind has to contribute to this. I've only found one previous submission[0],
and that has no discussion.

Thanks.

[0]
[https://news.ycombinator.com/item?id=6460048](https://news.ycombinator.com/item?id=6460048)

~~~
logicallee
Try HN search -
[https://hn.algolia.com/#!/story/forever/prefix/0/hyperloglog](https://hn.algolia.com/#!/story/forever/prefix/0/hyperloglog)

But I take exception to hive-mind; this place really isn't.

~~~
ColinWright
Thanks for the link to search, but I've already done that, and you serve to
confirm my point. Unless I'm mistaken, that search shows lots of items about
HyperLogLog, but exactly one previous submission of this specific subject
wherein the facilities of HLL are extended to include intersections. And that
item has no discussion.

This submission is not about HyperLogLog - it's about an extension that has
more capability. When Harimwakairi said:

    
    
        > I love that every month or two
        > someone who missed it the first
        > time rediscovers this article,
        > has the "holy crap!" moment,
        > and posts it to HN. :)
    

... they were, quite simply, wrong. More, that comment may have served to make
people say "Oh, I've seen that before" and ignore this one.

    
    
      > But I take exception to hive-mind;
      > this place really isn't.
    

I guess this is a case of terms carrying different baggage in different
places. In my circles the implication is more that of "collective
intelligence" rather than "group-think". I intended the former - it's a
compliment.

------
vtuulos
We recently open-sourced the algorithm described in the blog post here:

[https://github.com/AdRoll/cantor](https://github.com/AdRoll/cantor)

Btw, we are hiring! Feel free to ping me at ville@adroll.com if you find
topics like this relevant to your interests.

------
dmit
Streaming algorithms, or sketches, are a fascinating topic. Some of the
results are virtually indistinguishable from magic.

I can recommend this blog for those interested:
[http://research.neustar.biz/tag/sketching/](http://research.neustar.biz/tag/sketching/)

