Hacker News new | past | comments | ask | show | jobs | submit login

On a site I administer that used to be deluged in spam, I managed to eliminate it with a three-pass filter:

1. Simple mathematical question, e.g. "What do you get if you add five and three?" Answer is processed on the server.

2. Hidden form field that is supposed to remain blank.

3. Blacklist of common spam words.

On a forum I run (phpbb3) I eliminated 99% of the spam by adding 1 field that says "enter 42 here to prove you are human". No image, no hidden field, nothing.

We still get the occasional spammer but the real problem was our phpbb3 board showing up in the automated spam programs. As soon as we were slightly different than the default install, nearly all the spam stopped.

The interesting thing was that even the built-in captcha didn't stop the spam--it was worth cracking since everyone uses it.

Yeah, even recaptcha is broken. A new board I helped set up at my company got some spam before even being publicly announced!

On my blog I generate two random sequences of characters and tell the user to join them together without a space. This seems to have worked really well. (Though in the past I've also had static strings like "join 'bow' and 'ser' together" or "join 'doc' and 'tor' together".) I used to have the addition challenge like the GP but it was broken. My comment form was slammed with hits, so I rate-limited attempts, but a few still got through (since it's actually not a big set of responses to go through and you can defeat rate limits). That's when I implemented my string scheme and changed the comment form submission url (which only lives in Javascript now), haven't had a spammer get through yet.

On another forum I used to moderate (I think it was an Invision Powerboards one) I fixed it with a second field asking something like "What makes things fall down? gravity or noodles?" And if they entered gravity it would let them register. It lasted a few years, then a few randomly got in but by that time the forum had died.

That works great. Though the first spam botnet to specifically target your site is really going to go to town.

> 1. Simple mathematical question

Best CAPTCHA ever: http://random.irb.hr/signup.php

omg if you refresh they just get harder and harder. calculus? trig?

What I loved was when I signed up some time ago they had given me a partial derivative with a single variable, telling me what the variable was. Meaning that the answer was 0. Some of them look REALLY complex but they're actually far simpler than they appear, except for the fact that it'd be incredibly difficult to break them in practice given the variation they produce.

The answer seems to be zero nearly all the time.

I've occasionally seen it where it's -1 or 1. I think it's all three until you measure it.

Nice schrodinger cat reference

When you design solution, you have to decide if you're protecting against targeted or not targeted attack. It's not all just "spam".

If your concern are only dumb, fully-automated bots not targeting your site specifically (which is true for the bottom 99.5% of the web) then you don't need CAPTCHA.

2 and 3 are great for non-targeted attack. 1 is a very weak protection against targeted attack and it's likely an overkill unnecessarily burdening users.

Visitors have the option to register a user account, which eliminates the spam filters.

2 and 3 are decent, as long as you don't have commenters trying to discuss something spammy (depends on the site community). #1 only works because your site isn't big enough for anybody to specifically target, though. I'm not saying it's bad (so long as it works, it's by definition at least "good enough"), just don't expect it to scale.

A system to solve problems like #1 was actually one of the very first tasks solved by early AI research. It was a PhD in MIT in 1964.


I should note that registered users get to skip the captcha. Right now the site gets around 1,500 visitors and 3,500 pages a day, and growth has been steady and incremental for some years.

We wanted to do something similar on a site I was involved with.

Unfortunately it wasn't allowed because the site owner pointed out that the market the site was aimed at had a reasonable number of people with connotative difficulties - ie, they struggled to follow multi-step instructions.

(Yes, this does mean that computers are able to solve a problem that is supposed to identify a human much better than some humans.)

I've often thought captchas were doing it wrong.

Even my pre-school self could solve the Sesame Street "one of these things is not like the other".

There are so many sets with an odd-one-out that would only be easily determinable by a human over a computer.

I've seen similar systems, such as "Which one of these four images is a puppy?". I think the problem is that the set has to be small, so it ends up being a multiple choice quiz. With one correct answer out of four or five choices, it is very easy to brute force.

The ones I've seen don't ask you to identify one single puppy. They ask you to identify all the puppies, making it rather harder to brute force:



I have a few sites only getting about 1k visitors a month and #1 does reduce the spam a bit, but I still get 2-3 submissions a day, and I would not say these are targeted at all, just mass spam bots.

I'm disappointed to find that "What do you get if you multiply six by nine?" (http://www.wolframalpha.com/input/?i=What+do+you+get+if+you+...) just returns 54. (Cf. http://answers.yahoo.com/question/index?qid=1006050815188 .)

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact