
Mining Product Hunt, Part 1: Detecting Vote-Rings - ANaimi
http://blog.algorithmia.com/post/124542129914/mining-product-hunt-detecting-vote-rings
======
minimaxir
The issue with Product Hunt is that _they don 't care_ about voting rings.
Just look at Twitter for "product hunt upvote":
[https://twitter.com/search?q=product%20hunt%20upvote&src=typ...](https://twitter.com/search?q=product%20hunt%20upvote&src=typd)

I've criticized PH for allowing voting rings and being a "black box" in
general with respect to voting (see interview in Recode which had a comment
from me: [http://recode.net/2015/06/18/product-hunt-the-startup-
kingma...](http://recode.net/2015/06/18/product-hunt-the-startup-kingmaker-
faces-charges-of-elitism/) ). The response to that interview is essentially
"haters gonna hate," which means all attempts at offering suggestions to
improve Product Hunt will be futile.

Yes, it still an echo chamber which I believe actually _hurts_ the startup
community because it incentives them to work on products which have zero value
outside the echo chamber. It also incentives this bad behavior, and it has
been spreading outside of Product Hunt which is why I'm upset by it.

If Product Hunt has enough community awareness such that they can tweet a GIF
to every user who gets more than 100 points (seriously?!), then they have
enough manpower to effectively punish and discourage vote manipulators.

~~~
rrhoover
We do have voting ring detection in place and often inform makers of our
policy on Twitter when we see them asking for upvotes (often times they simply
don't know it's not OK) in addition to highlighting this policy in the email
that people receive when they're tagged as a maker.

~~~
minimaxir
Hi Ryan,

While I have to take your word that you tell people not to ask for votes,
that's not sufficient to creating a fair system without manipulation. The
Product Hunt demographic is "growth hackers" and scrappy entrepreneurs who
want to do anything to get their startup to succeed. These are the same people
who want to "move fast and break things." The flip side of that is that they
know that what they do is wrong, but _they can get away with it_ without much
consequence.

Case in point, there's a _lot_ of meta discussion on "how to tell people to
vote for your PH submission without getting flagged by the voting ring
detector"
([http://www.reddit.com/r/startups/comments/36hsfc/product_hun...](http://www.reddit.com/r/startups/comments/36hsfc/product_hunts_secret_algorithm_trick/)
) and "how to get upvotes on PH because you know an influential investor."
([https://medium.com/ferris-life/zero-to-featured-how-
ferris-c...](https://medium.com/ferris-life/zero-to-featured-how-ferris-
cracked-the-launch-day-code-ebd48e8de4c8)).

Voting manipulation, not limited to but including voting rings, completely
compromise Product Hunt mission to "find the best products." A one-liner in
the FAQ will not resolve it, and I don't believe PH is doing enough to
facilitate awareness _and enforce_ that voting manipulation, in any form, is
very bad.

~~~
andreasklinger
(cto here)

We work similar to HN as we have spam rating on votes and devalue and derank
posts based on those.

Essentially if you ask for upvotes there is a (almost too high) chance that
our system will notice and punish your post for this. Sadly a lot of really
good products drop because of this.

Regarding those blogposts - we don't really speak up if they are right or
wrong (for obvious reasons) but those kind of articles exist for any
system/website/mechanic/process where users believe they can get a benefit if
they manipulate.

------
giarc
Product Hunt has always felt 'off' to me. It has always had a sense of a
closed club open for public viewership. Even though they have opened it up a
bit, it still feels the same.

~~~
onedev
Yeah honestly the community doesn't come off as sincere or genuine, so I've
never felt compelled to participate or even visit the site more than a handful
of times.

~~~
timjahn
Man it feels good to hear somebody else say this. I feel like I'm shooting
myself in the foot sometimes business-wise by not participating regularly in
Product Hunt but I find that it's just not inviting. I don't feel a part of
the club. I browse, find cool products, click through, and that's about it.

But I rarely comment as it just feels forced, like I'm standing next to a
circle of people having a conversation at a networking event and I'm a few
inches outside the circle, and it's clear I'm not a part of that circle.

------
torbair
It's just another marketing platform, right? Nothing really "crowdsourced"
about it. We dealt with similar voting ring issues when I worked on another
industry forum, but people were really not subtle about it, even when it was
explicitly discouraged. Sort of mindblowing, actually.

~~~
torbair
And by unsubtle, I mean all the accounts had the same prefix and similar
profile pictures.

------
valee
This is a very clever article. I suggest further research into whether vote-
rings are actually indicative of product teams and founders with strong
networks who they are able to motivate to support them. It may be that up
votes from voting rings are just as useful (or perhaps more) in determining
the likelihood of success of a product.

~~~
tosseraccount
Is it still a vote ring if it's _" unconscious"_ ?

~~~
teh_klev
This is/was a problem on Stack Overflow (retired mod here). Work colleagues in
the same office or company upvoting each other's questions or answers. You
could often tell it was fairly innocent from the activity on their accounts
(active, positive participation, asking and answering with reasonably good
posts) and from the spread of votes (lots of upvotes given to/from unrelated
users outweighing their votering count).

But sometimes the votering detection would ring bells and when contacted these
users had genuinely not considered what they were doing was creating a
votering, yet were willing to understand the problem at hand and back off from
each other a bit.

~~~
molotv
I'm the admin for one of the rare instance of Stack behind a corporate
firewall, which doesn't have votering detection - I've noticed this happening
among a few co-located sprint teams. I created a d3.js Sankey Diagram to show
the volume of people voting for other people and posted it on the site to let
the community discuss it and it died down.

------
willtheperson
I would enjoy seeing the same with Hacker News. It would be interesting to see
how affiliation with Y Combinator influences the vote.

~~~
minimaxir
This isn't possible with Hacker News since the voters are not public.

I had done brief analysis that shows that the _(YC X)_ submissions do receive
more upvotes on average. Which is hard to attribute to a voting ring
specifically.

~~~
mattmanser
I often feel more inclined to upvote them because, in the end, YC created this
community and I feel their incubees deserve a bit of that reflected glory.

It's my way of saying thanks, and I suspect other people's too.

------
snorkel
This could also be used another approach to recommendation engine. The typical
recommendation engine predicts that users who buy butter also buy eggs, but
doesn't make direct connections between the individual users. The process
described in this article instead identifies individual users who act alike,
and that can also be used to predict if user A buys an ostrich egg for no
logical reason, then their "collusion" peers are also highly likely to buy an
ostrich egg ... you can assume if they're not colluding directly then they at
least think alike if they have a very high collusion ratio.

~~~
doppenhe
Part II is a recommendation engine we built for the PH.

------
cam_pj
I think there is a lot of frustration from some people with regards to PH and
it’s understandable. But I don’t know if PH can be blamed for this. There has
to be some kind of curation for something like this, and whether it’s a
journalist (ex: TechCrunch), a community of people (PH), an editorial team
(AppStore), as with all curation systems, there will always be people who feel
it’s “unfair” (typically when their thing does not get selected).

I think what this fails to capture is that actually, it seems to me PH is a
great proxy to the real world of startups. Yes it helps to be connected to
influential people to be featured on PH. But the exact same thing is true for
your startup in general. If you don’t know anybody, you’ll have a hard time
getting noticed and finding investors. Your network and ability to connect to
influential people can make or break your venture (I used to NOT think it was
the case… I changed my mind based on my personal experience :-) ). I think
that’s one of the very reasons Silicon Valley works much, much better than
other places in the world: London, Paris, etc. They are a tightly connected
community of makers, investors, journalists, influencers etc. It’s the whole
echo-chamber thing and it’s absolutely fundamental.

------
AdamSC1
Calling it a "vote-ring" is a bit of a logical leap, that would imply that its
individuals purposefully upvoting one another's products.

Yet with such a low amount of posts meeting this tests standards, and the fact
that Product Hunt was specifically designed to focus on influencers it doesn't
seem unlikely that a small group of users would vote in similar patterns on
either quality products, or ones shared by influencers they follow.

And, when you take into account that they share via email posts from
influencers you follow it just adds to this behavior. The last few times I've
been on Product Hunt was because I got an email that Hiten Shah and shared
something. I follow him because we have similar interests. I opened the PH
link, agreed it was a great find and upvoted it. It doesn't make me a vote-
ring.

------
splike
Is it possible to run this algorithm efficiently for a large number of users?
My intuition is telling me that the naive implementation its factorial time
due to the problem of selecting combinations of users and posts to test.

~~~
ANaimi
The formula/collusion ratio is exponential, but not factorial. The
implementation however is very efficient: instead of applying the algorithm on
the complete dataset of users, we apply it to each group of users within a
post. This drastically reduces the running time.

The implementation goes over every post and computes the ratio for voters
within that post. It then removes one user from that group and recalculates
the ratio. If the ratio drops, it brings that user back in. If it increases,
it keeps them out.

You can check the implementation here (click on Edit Algorithm):
[https://algorithmia.com/algorithms/ANaimi/SimpleVoteRingDete...](https://algorithmia.com/algorithms/ANaimi/SimpleVoteRingDetection)

Running SimpleVoteRingDetection on the complete Product Hunt dataset (16k+
posts, 52k+ users) takes a few seconds. If you have a dataset for any other
website/application, you can easily feed it into the algorithm and experiment
with that.

~~~
joehilton
Cool solution. I like it.

Is it helpful to first look at the names and sign-up times of a particular set
of users, and then search for votes on common posts? This would result in a
slightly different ratio:

SUM Votes(U1, P) / Votes(Un, P)

where U1 is a particular user, P is the post voted on by that user, and Un is
the rest of the users up to n total users.

The reason this occurs to me is because you can still make this run more
efficiently by limiting the number of users you examine (as opposed to running
across only certain posts - should be the same number of queries for a
particular number of either users or posts), and it would allow you to start
the top of the detection funnel on heuristics around obviously fake IDs or
correlated sign-up times.

This might help get around vote bots that set up fake accounts and all vote
for the same posts, but also vote randomly for at a certain frequency for
other posts (which would not be differentiated in the first algorithm from a
true vote ring versus voters with similar trends in taste, such as the effect
observed on pinterest).

Anyway, just thinking out loud. Or whatever the typing equivalent of that is.

------
jtokoph
"We removed users’ names to protect the innocent. Instead, we’re showing
random fruit names prepended to the real users’ ids."

Can't someone just go lookup the IDs to get real names?

~~~
ANaimi
Yeah... but the purpose of the post is to explain the methodology not to point
fingers. We kept the IDs because they are useful to demonstrate whether users
created those accounts consecutively. Product Hunt's public API can be used to
find the real names and even expand this method further.

