
Detecting voting rings using HyperLogLog counters (2013) - bemmu
http://opensourceconnections.com/blog/2013/10/14/detecting-reddit-voting-rings-using-hyperloglog-counters/
======
nemothekid
> _Each post gets 2 HLL counters, one positive votes and one negative. When a
> user upvotes or downvotes a post, their id is submitted to the corresponding
> HLL counter._

Third, you wouldn't be able to undo your vote.

~~~
illicium
It shouldn't be an issue given that HLL is approximate and that relatively few
votes are undone. You could also debounce the vote before recording it in the
counter, at the cost of a small delay.

------
placeybordeaux
Nice idea, but it doesn't matter if this doesn't work in practice.

~~~
pfd1986
I don't think this would work for small-ish subs, would it? Most upvotes one
gets _might_ actually come from the same group of people...

~~~
bluejekyll
Yeah. This seems as likely to detect cohorts of similarly interested groups of
people as it would a voting ring.

This might be useful though, in that things which trend towards _honest_ are
actually trending toward _general interest_.

------
throwaway973201
Ye gods, what a waste of time. This is what you get when you let reddit
brogrammers run away with their own pedestrian imaginations.

    
    
      To keep a count of uniques, you have 
      to store every IP address that you 
      ever see. And upon receiving a new IP 
      address, you have to first check that 
      the new IP address has not been run 
      across before, and only then do you 
      increment the site counter. Under the 
      best of situations, the storage and 
      the computation probably scale 
      as O(log(n)). 
    

Okay, guy. Scamper off back to your SEO click-bait advertising gif banner day
job, and don't work on anything important.

The seas are boiling the foundations of the ecosystem in a stew of toxic waste
and plastic. Species are going extinct. Governments are waging wars with
robots, while starving countries are crushed, bought and sold wholesale, and
spy satellites are enumerating all of us, as we walk to seven eleven for
another pack of smokes, and here's an article about tracking unique users
by... checking their... IP adresses...

What is this? 1998?

~~~
doug1001
It looks like you stopped reading after the second paragraph; this caused you
to _completely_ mischaracterize the OP.

In particular, the paragraph you quoted is the second in the OP; in it, the
author is summarising the conventional technique _not_ the one they propose.

of course, this is a common device--ie, "here's the conventional way to do X;
here's what we propose here"

in other words, to provide the appropriate context, the paragraph you copied
above, should be prefaced with "here's the conventional way to do X"

and in fact, the next paragraph, begins like this: "With the HyperLogLog
counter it’s all different"

~~~
throwaway973201
This is all just an over-engineered way of saying:

" _If a person 's lifetime tally of all unique fans (say: 150 different
people, cumulative) consistently matches the typical number of unique fans per
thread/article (say: ~135 on any particular post) then those fans are probably
employees with vested interests_"

(in other words, it's the same people upvoting that guy every time, and
they're all in a gang)

You don't need big O notation and hash tables to conflate that idea.

See:
[https://en.wikipedia.org/wiki/HyperLogLog](https://en.wikipedia.org/wiki/HyperLogLog)

And:
[https://en.wikipedia.org/wiki/Cardinality](https://en.wikipedia.org/wiki/Cardinality)

------
chmike
There is no info on the accuracy of the HyperLogLog algorithm relative to the
data set size. These algorithms are for big data application.

But I would test it if I was working at Reddit or At Hacker news.

~~~
morecoffee
Play with the Javascript implementation that the site links to. Even after
trying it a couple times, the error bounds swing up to 20%.

