
How to break a Captcha system in 15 minutes with Machine Learning - ageitgey
https://medium.com/@ageitgey/how-to-break-a-captcha-system-in-15-minutes-with-machine-learning-dbebb035a710
======
flavio81
> _Since we have the source code to the WordPress plug-in, we can modify it to
> save out 10,000 CAPTCHA images along with the expected answer for each
> image._

This is the key to doing this in the easiest way possible. The training set is
created almost automatically!

~~~
username223
Hah! Come to think of it, CAPTCHA is basically the definition of "security
through obscurity," except worse: it's "security through the difference
between obscurity to machines and clarity to humans," and that's a game that
will keep getting harder to win.

~~~
kernelbandwidth
This is supposedly a feature and not a bug of CAPTCHA, at least according to
the apocryphal story John Lafferty told me. IIRC, this was at CMU so he must
have been referring to the 2003 CAPTCHA claim by von Ahn et al. The idea was
primarily to stop spammers, but also secondarily to make them more useful. The
argument was that if spammers managed to "break" CAPTCHA, then whatever
technology was used to break it would necessarily be a useful (compared to
2003 knowledge) advance in AI, so it was a win-win whether it stopped spammers
or just made them do free out-of-band research.

~~~
username223
Yes, back in the day, CAPTCHA was a good way to use human time trying to
submit forms to both block spam-bots, and do useful work (e.g. label training
data). This worked well when ML was bad at image recognition, but that's no
longer the case. Unfortunately, as ML got better at the tasks, CAPTCHAs forced
humans to do more work, and we're now at a point where either the machines are
better, or it's not worth the humans' time. If your site has a CAPTCHA,
there's a good chance I'll just close the tab and move on.

~~~
fpoling
If one uses Google's capture, then most likely that capture already has enough
data on you, so it doesn't ask anything.

~~~
harshreality
Greetings, comrade! I see by your comment that you don't often use VPNs, or
tor, and even limit your use of incognito/private tabs[1]. Thank you for being
a good citizen! Carry on!

/s NNAlphabetPopulationSentimentBOT 0.1alpha-3zulu-beta-5-mark20

[1] We are getting better all the time at tracking activity in
private/incognito tabs, but there are still some gaps particularly when users
manually enable extensions like ublock in incognito mode. This hurts the
user's experience on the web, and we suggest only enabling google extensions
in incognito. It's incognito even without privacy-enhancing extensions. We
promise. Nevermind that we know enough about you to let you bypass our
captchas.

~~~
fpoling
Brave is my primary browser running under Firejail. I do not use private tabs.
Rather I just delete all browser files time-to-time. After that Google capture
do ask questions but after a couple of times it stops.

------
bogomipz
I have a tangential question regarding this post.

I have notice more recently at I am asked to identify object(signs, roads
etc.)using multiple iterations. It's not unusual to be asked to identify these
on 3 separate iterations. This is despite identifying all cells correctly on
both the first and second passes.

I don't believe it was always this way. What is the reason for this? Is there
some heuristic in the captcha program that decides to ask for further
identification or is this randomly generated?

~~~
ageitgey
Google is literally using you as unpaid labor to label/validate their image
datasets.

My guess is that either (1) your first attempt at validation didn't match what
their automated system identified so it didn't have high enough confidence to
proceed or (2) the first image is one you are classifying from scratch but the
second/third image is the one that you are only validating against their
guess.

In either case, it's amazing to me that Google is able to get the whole world
to do unpaid menial work for them by offering a free product to website
owners.

~~~
Nition
The "identify the grid squares which contain a sign" type CAPTCHA isn't clear
about whether you should be selecting all grid squares that contain mostly
sign, or all grid squares that contain ANY of the sign. I imagine that
particular one causes a fair discrepancy in what people select.

In fact I want to know what other people choose to do for those one. I'd like
to guess that Hacker News types would disproportionately choose the latter
option.

~~~
harshreality
A million times this.

\- Do the posts or supports of the signs count as part of the signs?

\- What if it's a sign but not a street sign?

\- If the camera is looking across a road, but the pavement isn't visible,
does that image contain a road? I suspect humans are answering that
differently, but it's not as bad as the sign one.

\- Storefronts. Sometimes I can't tell what I'm looking at. There's a building
with lettering on it, but I can't tell whether it's a _store_ much less a
store _front_.

Google needs to add a tutorial for humans on how to answer those captchas,
because there's not enough context for us to figure out what google really
wants.

If they'd dogfood this and require their employees to solve a captcha to login
to their workstations, this problem would have been solved yesterday.

------
badrabbit
This makes me think,why not use trained humans to come up with challenges that
can't be solved easily or efficiently with training data. As an example, a
team of 2,000 humans can come up with about 10,000 challenges an hour to be
solved by 1,000,000 visitors an hour. The key being humans trained to generate
questions and challenges that they themselves wouldn not have to solve as part
of any regular activity or event.

The labor cost can obviously be outsourced(or not) and it makes sense to
charge as a service to sites.

However,this makes me ask one more question - what if popularity in ML creates
meanial jobs for humans similar to my theoretical solution? Everyone looks at
the benefits of ML,but all good technologies get abused and it's hard to
imagine even more ML countering abused ML.

~~~
firethief
Creating menial jobs isn't a problem. People being in situations where they
need to settle for menial jobs is the problem.

------
nytf3
Cool writeup! Although I agree those captcha's are fairly trivial.

In college I wrote a term paper on breaking Microsoft's captcha (which is a
little harder but not by much) twice: first with a simple template-based
classification method and then a CNN approach.

[https://www.dropbox.com/s/jfp5xbv3eh589f6/6_857_CAPTCHA.pdf?...](https://www.dropbox.com/s/jfp5xbv3eh589f6/6_857_CAPTCHA.pdf?dl=0)

At the end, we go over approaches that would help captchas fight attacks. I
think the quick flickering approach would work best (split the image into
uneven parts, flicker them quickly so the human eye can read the aggregate
image but any single slice doesn't show the full picture, and the superimposed
image is incorrect)

~~~
ageitgey
Cool idea and thanks for sharing!

One of the challenges here (which I'm sure you are very aware of) is that
perception tricks that fool computers like flickering images also can block
out users with different types of visual impairments. Sometimes users with
even minor or infrequently-symptomatic visual impairments won't be able to
read an image[1] that uses a special "trick" like this.

For example, consider the risk of triggering an epileptic seizure with
flickering. At a certain point it becomes an accessibility/legal issue.

[1] The animated example from nytf3's paper - please note that in contains
strong flickering:
[http://people.csail.mit.edu/recasens/images/captcha.gif](http://people.csail.mit.edu/recasens/images/captcha.gif)

~~~
ChristianGeek
Would flickering even be necessary? Why not just overlay several transparent
GIFs/PNGs? It’s still hackable (so is the flickering solution), but you could
also add in a few more tricks to make it more work for the hackers. For
example, combine the layers dynamically into a single image with a separate
HTTP request to retrieve the (random) positions of each layer within that
image. (Just a thought...you could make it as simple or as complex as you
want.)

~~~
thedirt0115
At that point, you could have your captcha-breaker wait for the page to finish
rendering, screenshot the relevant portion of the page, and solve from there.
Seems easier than trying to download and stitch together the transparent GIFs
or decode the jumble of HTTP requests.

------
lhuser123
I really liked this article. Even with a very limited basic understanding
about machine learning or image processing, I was able to understand what the
author was talking about. Well done.

------
orliesaurus
Good article! I don't know what you call those new Google captcha, where you
have to press "I am not a robot" and then you are given a bunch of images and
a question that goes something like: "Select all images that contain cars" Do
you consider that a CAPTCHA? Or is it something else? Now that's one system
that I would like to see beaten. From the article:

    
    
        Yep, it generates 4-letter CAPTCHAs using a random mix of four different fonts.
        And we can see that it never uses “O” or “I” in the codes
        to avoid user confusion.
    

this seems to be a very simple CAPTCHA system to beat from a ML problem
perspective, right?

~~~
eXpl0it3r
I would like this NOT to be broken. reCAPTCHA is currently one of the few
captcha systems the still work to some extend. It locks out most spam bots and
keeps my sites clean. Of course it's a good idea to try and break it, to see
how "secure" it is.

~~~
ibelimb
Unfortunately the latest revision of reCAPTCHA has become such a pain that
I've come very close to just giving up on signing up for whatever service it
is I'm trying to use that has it implemented.

I have to go through several attempts to verify myself because Google didn't
like that I missed one box with a car in it and have to start all over looking
for street signs.

~~~
teddyfrozevelt
I often use a VPN and browser extensions that make the new one always asks me
to verify. The new tests are just terrible (choose all photos with an
apartment building??). It will sometimes take me a few minutes to complete the
test when it asks me to check all boxes with a car and the slow fade animation
that takes 5 seconds starts. Then it decides that I didn't choose them well
enough and starts over again. The old CAPTCHA was much better and I didn't
feel like I was just feeding Google's street view.

~~~
ChrisSD
I've found that disabling JavaScript actually makes the test easier. I often
fail the js version but I rarely fail the no-js one.

------
adrianN
I'm pretty sure you could break these trivial captchas with nothing more than
a kNN classifier. There is no need to involve deep neural networks here.

~~~
dangerlibrary
You might be right. But who cares if using a deep neural network takes 15
minutes? There’s also no need to use a chainsaw to cut down a Christmas tree,
but if you’re already holding a chainsaw...

~~~
Blazespinnaker
Is dnn a chainsaw? I dunno. More like a Home Depot multipurpose power tool.

------
brango
The book is $645?!?!?

~~~
zionsrogue
Hi, Adrian here, author of the book you are referencing. You are referring to
the highest tier of the book. There are other lower tiers as well that are
cheaper.

The highest tier (again, which you are referring to) includes 800+ pages,
detailed experiment journals on how to reproduce the state-of-the-art
publications (ResNet, SqueezeNet, VGG, etc.) on ImageNet (which is 1.2 million
images). I demonstrate how to implement each model from scratch and then train
them, detailing which parameters to change and when. The highest tier is for
people looking to train really large networks on massive datasets where you
could be spending thousands of dollars in the cloud for GPU costs (you can't
train these networks without a GPU, or ideally multiple GPUs). I've also
included the pre-trained models as well if people want to get started with
them and skip training. This tier is really for researchers/practitioners who
need to save time and finances by starting with experiment journals that
detail how to replicate the results.

The lower tiers are for people just (1) getting started with deep learning in
context of computer vision and/or (2) looking to apply best practices. Each
book also includes video tutorials/lectures once I have finished putting them
together. Realistically I should rebrand the book as a course as it's much
more in line with something you would get from Udacity (only with more theory
and more detailed code and implementations).

If anyone has any questions about the book do feel free to ask.

~~~
sytelus
No offense, but your book website looks lot like a late night TV ads and
frankly leaves a bad taste.

The way I like to buy a book is go to Amazon, look at table of content, read
few pages and most importantly read some reviews. Your book currently doesn't
even appear in Amazon search (or even Google search). Despite myself being
quite active in the field, I had never heard about your book before (I know of
at least other dozen books on the subject). I wonder this is why you might
have relatively much lower volume and such a high price to make up for your
revenue target. I would think putting your book on Amazon would increase your
volume by an order of a magnitude (or two) and help reduce price to may be
1/7th or 1/8th without requiring tiered pricing (which again is a huge
turnoff) while increasing your net revenue actually more than before (probably
by an order of magnitude). You might want to look in the theory and economics
of price-volume curves.

The biggest problem with your book website is that you as an author comes out
as hard-selling hard-charging marketer who wants to maximize profits and make
a sell like an old car salesman to anyone who is walking by rather than
experienced calm expert for who learning, teaching, academic honesty and
integerity is more important than making money. Again, not saying this is who
you are, it just feels that way from the style and content of the book
website. Hope this helps.

~~~
imgyuri
I don't know about his books(courses?), but his site PyImageSearch is one of
the more well known sites regarding computer vision. I certainly got a lot of
help from the site (Thanks Adrian).

~~~
zionsrogue
Thanks, I'm glad you've found the site helpful! :-)

------
EGreg
Since Machine Learning via CNNs and MCTS became mainstream, is there any
CAPTCHA that makes any sense today?

~~~
IshKebab
There are some tasks that haven't been solved well by machine learning
algorithms yet. Like visual question answering. However I'm not sure how easy
it would be to get them into the form of a CAPTCHA without making it easy for
a bot to just guess and get it right a lot of the time. And even though they
haven't been solved well, it's probably good enough to get past a CAPTCHA a
lot of the time.

I expect these sort of systems will move to 'social proofs' \- do you have a
long-standing account etc.

~~~
EGreg
Why not ask the kind of questions Cyc was designed for? Maybe it will spur
research in that area.

[https://en.m.wikipedia.org/wiki/Cyc](https://en.m.wikipedia.org/wiki/Cyc)

------
michaelmcmillan
Great post, but could you not have saved a lot of time by generating CAPTCHA
images with a single characters, instead of separating them after the fact?

~~~
kejaed
They could have done that but then the data wouldn't have been the same as it
would be when they were trying to solve real CAPTCHAs. The OpenCV part where
they found the characters in the CAPTCHA leads to some messiness in the
training data, which will also be there in the 'real' CAPTCHA data when the
system is tested. I'd say training the model on this messy data would lead to
better results, especially for the case where the letters overlap.

------
emilfihlman
One wouldn't even need to generate those 10k images.

Simply integrate the code together and generate on the fly. Much faster and
simpler!

------
bob_theslob646
The problem is that CAPTCHA's have evolved. Some you have to click objects in
an image, others you have to click something to prove you are not human.

Unfortunately, I think those mentioned in the post, will be a thing of the
past.

~~~
ibelimb
And sadly it seems the bots have improved even more then the CAPTCHA's.
Googles latest variation of "Click on all boxes that contain X until none are
shown" takes a significant amount of time to complete. It also seems that the
faster you are in clicking the images the longer the next image takes to load,
often leading to me hitting submit only to have to start all over again since
the next image still contained what Google was looking for.

Sometimes I think it might be easier to just run a ML algorithm to complete
them for me...

------
sAbakumoff
To me it's very strange that someone is using some alternative to google
captcha which has 2 obvious advantages:

1) it's not hackable

2) with every click you contribute to the driverless cars vision improvement.

~~~
amorphid
> with every click you contribute to the driverless cars vision improvement

Interesting I hadn't heard that. Supporting links:

\--neural nets captcha analysis--

[https://spectrum.ieee.org/tech-talk/robotics/artificial-
inte...](https://spectrum.ieee.org/tech-talk/robotics/artificial-
intelligence/artificial-intelligence-beats-captcha)

\--obligatory XKCD--

[https://xkcd.com/1897/](https://xkcd.com/1897/)

~~~
sAbakumoff
I think it's kind of obvious - most of those pictures you face in the captcha
are bridges, cars, street signs, these are free training data for google cars.

------
Dolores12
Author did not explain how he created X_train, Y_train, X_test, Y_test.

I am pretty sure he was fast enough to create it in 10 seconds or less.

------
verroq
How difficult is it to include the separation step as part of the ML pipeline
to make it end to end?

~~~
GistNoesis
It can be almost as easy, but it can be made as hard as you want. These kind
of Captcha exercises are a fun way to test ideas especially if you want to
work on "attention" models (spatial Transformers, ROI, deformable
convolutions, soft attention, hard attention). You can also try to "read" one
letter at a time using a RNN. You may even test the new capsule networks for
their rotation invariance. All these different network architectures, encode
various strategies one can use to decode a picture. Obviously all will work at
> 99% on the simple cases, but as you increase the captcha difficulty you will
see that the more modern architectures can trade computation for increased
accuracy.

------
WalterBright
I'll be impressed when they can decipher my grandmother's Suetterlin
handwriting.

