Hacker News new | comments | ask | show | jobs | submit login
How 4chan hacked ReCAPTCHA to win the TIME 100 Poll (musicmachinery.com)
127 points by mariorz on Apr 28, 2009 | hide | past | web | favorite | 37 comments

While I don't doubt that TIME's poll security team (if it existed) was more than overmatched, how could a website defend the integrity of their online poll against such an attack?

Or is running an effective online poll truly hopeless?

defend the integrity of their online poll

I'm coming up with all sorts of similes but they all sound snarky, and I don't want to be snarky, so I'll just say it straight: there is no "integrity" in an online poll.

The results are always stunningly, catastrophically, inarguably invalid for any sort of rigorous use. The only thing that makes this particular poll more obviously flawed than the Ron Paul surges which were more obviously flawed then the garden variety online poll is that the latent vulnerability was exploited to an extent approaching parody.

(Note you don't have to have an adversary at all to make an online poll invalid. They're always the result of self-selection on the part of the participants anyhow.)

Actually, online voting may have some integrity if you find the voters, instead of letting the voters find you. If you had some kind of reliably random population, you can simply select X members to vote, thus ensuring that the stats are relatively bias-free (you get bias from people who abstain).

But you're right - in any case where the voters find you, your results will be trash.

Sure, but the whole concept of "online", as people understand it, is "clients wander around doing to servers whatever they damn-well please." If you're pulling in voters, there's no difference between doing it online and doing it, say, over-the-phone or by-mail or door-to-door—so you drop the distinguishing "online" when explaining it.

That seems like a distinction without a difference. Are you saying Facebook old poll feature wasn't "online?"

Correct, for example HotOrNot voting. If you eneter a profile url directly, you can vote but it would not be counted. If you are sent randomly to a profile(by selecting "next random" button), then there is no self selection bias and your vote is counted.

What I meant by integrity was to have the results be more or less representative of the actual beliefs of an average site visitor.

I'm guessing some kind of statistical method for determining which votes don't fit the profile of a site's visitors combined with actively weeding out obvious instances of mass voting could make the results at least appear more accurate.

Sure there's no actual validity or rigor to online poll results, but the point is more to have results that at least appear plausible.

But then why bother having the poll in the first place? If you're just going prune the results so that they look like what you expect, you aren't really polling. You already know the answer, and you're going to throw out data until you get it.

Note that the chief engineer from reCAPTCHA offers a comment on the blog. He indicates that rC is intended to be "only one element" in a defense against attacks. (and seemed good-spirited about the cake-in-the-face of the whole thing).

Seems to me the issue raised is about the "integrity" of online/offline "journalism" (of Time) in not acknowledging the meaninglessness of the poll results (or even the fact they were badly hacked). [ Maybe that's for Newsweek to report?]

Why not only allow one vote per IP? It would be possible to spoof your IP, but still, could all of the manual 4chan voters spoof their IPs for every vote?

They could use a proxy to "spoof" their IP. But there is no known way they could use IP spoofing to use any old IP address, as the voting app runs via HTTP, which runs over TCP, which requires a full connection, and the known spoofing attacks on TCP are blind, e.g. you can send but not receive data. So HTTP would not work over blind TCP spoofing.

I think that if one vote, or any small number of votes were allowed per IP, the attack would have been much more difficult, as there simply are not tens of thousands of readily available proxies, unless these people have access to a big botnet.

A downside to one vote per IP is that AOL and some organizations place their outgoing web traffic behind one or a small pool of IP addresses. So these users wouldn't have been able to vote.

> A downside to one vote per IP is that AOL and some organizations place their outgoing web traffic behind one or a small pool of IP addresses. So these users wouldn't have been able to vote.

That would not have been such a big problem. But be sure to play 'dead man' and maintain the illusion that every vote counts.

Eg here on Hacker News after you click on the vote-arrows Javascript manipulates the counts accordingly, but did you ever check whether your vote has had any effect on the "true" counts in the server? (Of course at Hackers News it has, because PG is not evil.)

Even more devious would be accepting the unwelcome votes, but also reversing each one of them after a random time has passed. This way the attackers get the see illusion, that their attacks succeed, but are fought back (or drown out in counter-votes from real people) only a few hours later.

Sometimes your vote does not have an effect on the "true" count on the server. For example, try voting every comment on a page down, and then reload to see the real counts. This isn't "evil" per se.

This would potential block voters from DSL and cable modem accounts who use a small pool of shared IPs via dynamic reallocation. This would also mess up office networks using NAT. Both these effects would seriously bias any poll...

But probably less than a purposeful attack.


There's no way (if there's no offline part)

At least I spent a few minutes here and there since yesterday thinking how it could be secured. Any method I thought was quickly demolished by a few attacks that would work.

But I am open to be corrected! If anyone thought they could have solved this problem, please reply :)

It's not as though you were getting anything statistically valid out of a self-selected sample anyway.

Of course, if you were only polling existing users, you could limit voting to those users who were there before you started the poll.

Hmm .. something's not right.

Why didn't Time blacklist the "devoters" by their IPs (or respective small subnets) ? They couldn't be that incompetent. So it's reasonable to assume that the blacklisting wasn't working, which means the hack must've been mounted in a distributed fashion, which in turn implies it was ran over a botnet of some kind. Hmm ..

Web proxy farms mean you can't just say "100+ votes from a single IP address = blacklist". You'd probably need manual intervention to distinguish proxies and individual abusers. Once you're manually intervening, you may as well just wait until the poll closes and drop the results you don't want.

...which in turn might make oone wonder whether or not that's happening already all the time and on just any poll around. Haven't it be the respected TIME one could suspect they kept the poll as it turned out just because the Anonymous group knew the exact number of votes for every rank.

They couldn't be that incompetent.

As the article points out, they didn't even write the poll correctly: Two pairs of candidates shared the same ID. That means that if Oprah Winfrey got the highest score, then Ratan Tata would also get the highest score. If the competition was not hacked, these pairs might have done quite well, since they'd get the combined votes of the two candidates.

If TIME couldn't be bothered to get the poll right in the first place, it's not surprising that they barely tried to fix the hacking.

Why didn't Time blacklist the "devoters" by their IPs (or respective small subnets) ?

After it was obvious they were being gamed, they did just that. For IPv4.

Then someone discovered there were no blacklisting going on for IPv6 requests and ran amok, effectively being able to throw out around 30k votes a minute without a botnet. After that the poll pretty much looked exactly like whatever whoever gamed it decided it should be.


A simple forced login with email verification would have ended all of this nonsense. Throw recaptcha for good measure.

TIME's purpose with the poll was to drive traffic and interest - the integrity of the poll is a very distant second concern. Throw up barriers around voting and you remove the participation and thus traffic from the equation.

If their purpose(s) include only "to drive traffic and interest" -- then they can forget about pretending that "journalism" (i.e., valuable, reliable content) is no longer their business.

I believe this is how MLB and the other leagues do it for All-Star voting. In MLB, you get 25 votes (one for each hypothetical home game your team has before the end of voting).

Bugmenot ftw!

Presumably the bugmenot login would only be allowed to vote once.

I feel that it is more a crack than a hack. :-)

It's a hack to me, and a pretty cool one. They wanted something done, they found a novel way of doing it. Probably taught themselves some valuable skills in the process.

Learn by doing. Get things done, one way or the other. The very essence of hacking!

I very much doubt that the gaming of the poll was merely an exercise in autodidacticism.

The moar you know.

It's certainly a hack in the traditional MIT sense: http://hacks.mit.edu/

I just think a hack is an improvement to anything while the action in the post is just ruining the online voting system.

A hack would be to let the TIME web team to know the details of the crack and the solution to fix it.

Cheers. :-)

Classically, a subset of "hacks" are pranks, which are relatively-but-not-totally harmless (someone had to take down that car):


In the grand scheme of things, the ranking of TIME's list is relatively unimportant. I believe this is only the second time they've done a ranking poll, and it was effectively gamed last time as well (by a much larger group of people, though: Stephen Colbert fans and Rain fans).

Thanks for letting me to understand more about the definition of "hack".

I assume that a hack is the product from a hacker and I get my definition of hacker from the article "How to Become a Hacker" by Eric Steven Raymond. http://www.catb.org/~esr/faqs/hacker-howto.html

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact