Hacker News new | comments | show | ask | jobs | submit login
3-D CAPTCHA: A way to fix the broken CAPTCHAs? (spamfizzle.com)
30 points by iamelgringo on July 16, 2008 | hide | past | web | favorite | 41 comments

> Object recognition is a completely unsolved computer vision problem.

Awesome, let's hand the problem over to the spammers and it will be solved within 1 year.

> > Object recognition is a completely unsolved computer vision problem.

It's funny you quoted that part. I can't tell you how many "next-gen" captchas I've broken trivially with the object recognition software our company has developed (http://demo.pittpatt.com). Granted, our software is useless against this type of Captcha, but many aren't.

Captchas have become a really interesting area of research for us because it's essentially the opposite problem that we are trying to solve. What's really curious about this is object recognition people seem keenly aware of the advancements in Captcha design and Captcha designers seem blissfully ignorant of the advances in object recognition.

I've seen so many proposed "next-gen" captchas that I could break before I finished reading their powerpoint slides.

http://img.timeinc.net/time/time100/2007/images/time100landi... - misses Obama. (It's good software, was just funny to me.)

Yea, it does. I just re-ran that image on our newest models and it does find Obama in that image, but at low confidence. See above about how our website demos are out of date.

Gonna be adding newer, better, cooler ones soon.

Sweet, good work! Being able to get the top google results for "faces" might not mean much in reality, but those are some of the first things people will throw at the demo.

Just out of curiosity, do you guys have a facial recognition (not detection) demo available?

Not yet, but hopefully soon. We have a recognition SDK but not online demo. I am pushing hard to get our demos up to date. We have so much cool stuff that isn't online demo'd yet. Once other stuff cools down, we can turn more attention to the website demos.

If only there were some way to implement a captcha that could only be solved automatically by doing useful work--say, a few steps in a protein folding sequence, in a way that's easy to visualize.

"Imagine a beowulf cluster of spammers"

Either way, we win!

I have a captcha idea: lets make users watch a 90-minute movie and summarize the plot in 6 words. Oh, and serve ads over it while they are watching, just to recover the bandwidth costs.

Or maybe just four? :) http://www.fwfr.com/

rofl - best idea I've heard yet!

Holy usability nightmare Batman!

I'm with you on this one. It's way too complicated. We know that Web users have little patience for forms with too many fields, pages that take a second too long to load, text that runs more than a couple sentences, and so on. Try to make them do a complex dance just to get past the bouncer at the door and they're likely to give up and go to a different club.

Agreed... The idea isn't bad, but this particular implementation is.

If the challenge of your captcha is parsing 3D images, there's no need to make the label letters a tiny unreadable magenta. In fact, you don't need letters at all -- this is merely an artifact of the "enter text for an image" structure of current captchas. It would be far better to have clickable target squares in the image, and just use JavaScript to ask the user to click on several targets in succession -- which is maybe even easier than text-based captchas. Using JavaScript has the additional advantage that it won't fit as well within existing captcha-forwarding spammer frameworks.

I'd happily take this over mass obnoxious in-your-face spamming any day.

You know, a lot of users don't even realize its spam. That's kind of the idea. Some of them seem to enjoy it, even.

I'm relieved that the comments on this are mostly against it. I was sort of worried. A captcha is a very delicate balance between not pissing off the good users and keeping the bad users out.

And let's not forget what seems to be the most effective method of cracking a captcha, to just proxy it to an actual user who thinks they are verifying themselves for some other site (porn). This doesn't address that at all.

Today I was told, via spam email subject lines, that Elton John died in a rocket crash and James Brown died of a heart attack. Almost fell for it.

And I was told I was "caught naked in the shower". How humiliating it would be if everyone learned I take my shower without clothes!

Gmail does such a good job at filtering spam that I hardly worry about it anymore.

Email is only one of the many avenues for spam. Try running a search, using Craigslist, reading a popular blog's comment or reading Youtube comments...

The example is awkward but could be simplified as "click on the walking man's left arm", and if there were five images 150x150 side by side and only one image had the walking figure the spambot would have to choose the correct image and the correct 15x15 pixel range in that image, so a 1/500 chance of being right.

Still fails to address the proxying of captcha's off to be solved by unsuspecting people. * setup porn/warez site * proxy captcha's from the target site to the porn/warez site * let real users solve the captchas for you.

I harbor a strong distaste for CAPTCHAs. At best, they're annoying. At worst, they're infuriating, such as when I simply can't see the letters or when I stumble into a cross-site hosted CAPTCHA that requires JavaScript -- which I have disabled -- and then loses the post I was going to make when I press the back button to try again with JavaScript enabled.

I see CAPTCHA as a bandaid and I can understand why people turn to it: spam has reached epidemic proportions. That said, any form of CAPTCHA, either the existing form or the proposed 3-D form of this article suffers from two fundamental flaws: 1. It annoys users and makes it harder for people to contribute 2. Spammers will get around it eventually, and once one spammer gets around it, they all will

CAPTCHA bears similarities to copy protection schemes in these two flaws: both annoy users and merely are road bumps to the undesirables (spammers and crackers).

I think the solution is three fold: 1. Make it easy to post so real humans contribute 2. Filter spam aggressively 3. Incorporate trust mechanisms

Making it easy so real humans contribute removes obstacles (such as CAPTCHAs, registration, etc.) that get in the way of people posting. Every obstacle means less people will post. With community contributors at about 1% of visitors, there's a lot of room to grow by making it easier to contribute.

Spam filtering works because spam is fundamentally different from a valid post, and always will be. Bayesian schemes such as pg described long ago work well. Gmail, for example, does an amazing job of filtering spam. The few posts that get through are easily dealt with.

Trust mechanisms take advantage of the fundamental weakness spammers have: they aren't members of the community. A simple trust mechanism is don't auto-link links from posters with less than 10 comments. Since most spam contains links of some sort and most comments don't, spam will be predominantly affected by this. Once a posters get to 10 posts (or 10 karma), their comments retroactively get auto-linked. This is simple to implement, but reduces the impact of spammers significantly since their spam isn't accessible unless someone goes to the trouble of copy-pasting it, preventing accidental clicks by users. At the same time it doesn't punish new users, since their valid links will still be accessible and will become on equal footing once they grow into the community.

Hysterical: "A bot attempting to brute force a solution to the above example will need to work its way through (26)(25)(24) = 15,600 possible combinations. Asking for the identification of four unique features gives 358,800 possible combinations while 5 unique features will render 7,893,600 possible combinations"

This situation reminds me of a Simpson's quote funnily enough.. let me see if I can dig it up.

Lisa: What have you done with my report? Bart: I've hidden it. To find it you'll need to decipher a series of clues, each more fiendish than... Lisa: Got it! Bart: D'oh!


These numbers don't mean much, but it's "hilarious" because you could simply generate every image and compare them to the one on the site in about a second of CPU time.

This is actually not a meaningful way to attack current CAPTCHAs, so now that I think about it... this 3D CAPTCHA would probably be less secure than the current ones that rely on OCR.

The final image is rendered based on random variables /each time/ - Even just moving the light source would result in an entirely (from a bitmap point of view) different image.

So even if you could get your hands on the 3D source file used for rendering, generating all possible images is impossible.

The numbers don't refer to brute force since the answer changes on each try.

If you used the same answer 'ABC' each time you'd take (assuming perfect random distribution) 15,600 tries before getting it right.

I don't know about you, but after getting 15 thousand failed requests in a row from the same IP, I'd assume they were a bot ;)

While moving the light source results in different pixels, the object silhouettes don't change, and even internal edges will remain reasonably consistent under different lighting.

Given an object under different lighting and vantage points, the captcha breaker can build a similar object and automatically generate a database of silhouettes from a sparsely sampled set of vantage points. Then, given a captcha image, he can search the database for an approximate silhouette match, then iteratively improve the vantage point by matching the silhouettes of nearby views. Since the vantage point and the labeled object entirely determines the captcha answer, this approach may be good enough to break the captcha.

A more dynamic scene would be more challenging for this approach, but it would also be more difficult for the server to come up with human-solvable scenes.

You are describing object recognition, which even on just a 2D static image is an insanely hard to get working correctly (I have experience as it was my university project).

However in a 3D context there is no way a computer can infer what an object would look from a different vantage point, since not even a human can do this.

For example, looking at a CRT and an LCD head on, would give you the same image - but would give you no information about the depth of the monitor. Multiple view points would help the computer figure out the full three dimensional object, but then again, object recognition comes into play, which object is which?

This system works with humans because we have good 3D object recognition and a huge database of experience with which to compare it against, all of which is calculated in an instant.

Replicating that behaviour in a computer is still a long way away.

I realize that the general problem of object recognition can be arbitrarily difficult, but so is the general problem of text recognition: How can a computer determine if a downward stroke is a one, or a lowercase L, or an uppercase I? And yet the text-recognition captchas have been broken -- not because the problem is easy but because captcha breakers have exploited artifacts of individual captchas to get a correct answer a modest percentage of the time. The 3D captcha (as the article author described it) is highly constrained -- a small library of objects in static poses -- so it has similarly exploitable artifacts.

The captcha-breaking computer has no need to infer what an object would look like from another view if someone has already manually reproduced the library of models; in that case the problem reduces to identifying which models from the library are in the picture and what angle they are being viewed from. Although the problem is no doubt difficult, the silhouette strategy I described is similar to other published object recognition approaches known to work, e.g.:


And the approach doesn't need to work perfectly: the captcha breaker is only interested in improving his chances of guessing correctly. If an automated approach only guesses correctly even 20% of the time, the captcha is effectively broken.

In the case where two images of objects are very similar -- like your CRT vs. LCD example -- even a human would have difficulty differentiating. By definition that makes these objects bad for the captcha, so the captcha author would either leave them out of the library of objects, or he would need to make the captcha more tolerant of human error, which makes things easier for the captcha-breaker.

Agreed, however the article already acknowledges that approach (read near the end about the flower).

Luckily, unlike text, which follows a very constrained set of rules, (eg an X will always be two lines criss-crossed), the same doesn't apply to 3D objects, where you can have 2 images of the same object that look entirely different, a simple example being the chair, that comes in all varieties of shapes but still easily identifiable to a human.

So this would automatically require human input in respects to identifying the object, you can't create a program that would 'learn' new objects, at least, not yet.

Also, the silouhette strategy can only be applied when a shape remains relatively constant, moving the camera a little to the left would render a completely new silouhette.

Add that the bot would still need to be told how to answer the arbritrary 'How many legs does the chair that the man is sitting on have?' questions.

The fact that so much human input is required just to identify /one/ object in the captcha, the fact that once that object has been compromised it is trivial to switch in another one (which is impossible in text captcha because there are only 26+10 amount of characters that the whole world knows) means that this is a damn effective captcha.

It is a meaningful way to attack many of the current captchas, if the alphabet and the space of transformations applied to it is sufficiently small. It is being done right now, in fact.

Imagine what cracking this will do for the world of image processing! :)

Not sure how this is entirely different that using a library of images that are tagged with common words. Is it a cat? dog? frog? giraffe?

The 3D captcha generates new images rather than using a library, so a captcha breaker needs to use a more sophisticated approach than comparing the current captcha image to previously seen images.

The difficulty of the image library captcha depends on the size of its library, while the difficulty of the 3D captcha depends on the fact that it's much easier for a computer to go from a 3D model to a 2D image rather than the other way around.

Considering we have image recognition algorithms that can control cars driving on the road, I doubt this would be that difficult to break.

It would, however, be very difficult for anyone to correctly guess ;)

Considering we have image recognition algorithms that can control cars driving on the road, I doubt this would be that difficult to break.

We do?

Seems like it definitely violates Krug's "Don't make me think" principle, but if it's the best we've got then I guess it's ok for now.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact