
Breaking the MintEye image CAPTCHA in 23 lines of Python - cidquick
http://www.jwandrews.co.uk/2013/01/breaking-the-minteye-image-captcha-in-23-lines-of-python/
======
paulgb
For those interested, the minteye captcha has been broken by other methods as
well.

Speech recognition: <https://gist.github.com/4520930>

Laplace: <https://gist.github.com/4564489>

Fourier transform:
[http://nbviewer.ipython.org/urls/raw.github.com/rjw57/mintey...](http://nbviewer.ipython.org/urls/raw.github.com/rjw57/minteye-
captcha/master/captcha-test.ipynb)

~~~
aidos
Could someone explain the general idea behind using FFT in these situations
(excusing my ignorance)? I recall it from my university days but that was
completely out of context. I know you can use it to decompose audio signals
into frequency components, I don't understand how it applies to images though.

~~~
kens
The basic idea is that just as an audio signal can be decomposed into
frequency components, an image can be decomposed into frequency components,
but in two dimensions. (Imagine stripes of varying frequency and direction
instead of sine waves of various frequencies.)

Sharp edges in an image have high frequency components, just like a sharp
transition in an audio signal. Filtering the high frequencies will blur the
image, while filtering low frequencies will enhance edges.

This is very oversimplified, but will hopefully give you a bit of an idea. I
encourage people to learn more about Fourier theory, because it explains a
lot.

One introductory page that you may find useful is: <http://cns-
alumni.bu.edu/~slehar/fourier/fourier.html>

~~~
aidos
That makes perfect sense. Thanks for the link - that's given me a much better
intuitive understanding. I'll have a dig around some of the other
transformations mentioned now.

------
Goranek
Hhahahaha, comment on the blog

"Please post the Visual Basic codes for this. The language you post in the
article is not Visual Basic.

Thank you."

xD God !

~~~
borplk
It's totally sarcasm

~~~
raverbashing
Wasn't there something about how 'high' sarcasm is undistinguishable from the
real thing.

Unfortunately I can't make the distinction, and I think it's most likely the
commenter was serious

~~~
mikeash
Sounds like Poe's Law to me. <http://en.wikipedia.org/wiki/Poes_law>

------
robomartin
This is nitpicking. Well, maybe not.

This code does not take a swirled image and solve it. It takes a set of images
with various swirl levels and finds the one with shortest sum of edge lengths.

The code does not do any un-swirling of the image. It also uses external
libraries to convert to grayscale, apply the Sobel filter and the sum of
edges.

I other words, it is far from breaking the swirled CAPTCHA in 23 lines of
Python. If you had to write the un-swirling, gray-scaling, Sobel filtering and
summation code in Python you'd be looking at a much larger pile-o-code.

The demo and the intent is good. I just wish the title didn't say "23 lines
of". I don't think the article would have suffered at all if the title was
"Breaking the MintEye image CAPTCHA with Python".

Again, the intent and what is being demonstrated are excellent. The title, in
my not-so-humble opinion, is hugely misleading.

~~~
chmod775
>it is far from breaking the swirled CAPTCHA in 23 lines of Python

Except it does break the captcha in 23 lines. I don't get your point.

Why would he need to do any 'un-swirling' in the first place? That's not the
point of the captcha. The captcha provides you with multiple images and you
have to pick the least swirled one - which is exactly what his code does.

And you argue that because he did use external libraries it's not 23 lines of
python. Of course it is. _He_ had to write only 23 lines that make up the
logic to break the captcha. If you'd argue that one would have to add any
previous work (or lines) done by other people to the linecount of your own
project, I could also argue that 'print("Hello")' is not one line of python,
because it uses some standard python library, which in turn calls native code,
that native code would make a few syscalls and at some point resulting in
multiple hundred lines of compiled kernel C code being executed.

So the linecount of his 23 lines python script would actually be: 23 lines of
python + various libraries + the entire Linux kernel = roughly 20 million
lines of code. Good job.

And yes, that was nitpicking. And I'm angry for some reason.

~~~
robomartin
No need to be angry. None of this is going to remove food from your table or
affect your life in any way whatsoever. Take it easy. I am just opening the
topic for conversation. We can discuss things without pulling out semi-
automatic weapons, right?

When someone publishes code and says something like "solved <problem> in <n
lines> of <pick your language>" there generally is an implied "my language is
better than yours" subliminal message that, for some strange reason, is based
on line count.

Let's not debate the merit of line count as a measure of how good a language
may or may not be. If we are going to go there I'll write the same solution in
APL and we'll start counting characters instead. Again, line or character
count, in my opinion, is not a measure of the "superiority" of a language.

The 28 line solution is not pure Python. How do we define pure Python. Let's
just say that "solved <problem> in <n lines> of Python" to me means that you
download and install Python:

<http://www.python.org/download/releases/2.7.3/>

Install nothing else whatsoever and write code that solves <problem>.

OpenCV, which is what the author used for a couple of aspects of this code, is
an extensive (and really cool) C++ library that is being accessed from Python.
Here's the source:

<https://github.com/Itseez/opencv>

I'll leave it up to the reader to find the source for the Sobel method.

A much more honest title could have included something like "solving <problem>
with <n> lines of Python using OpenCV".

As I said before, the intent of the article and what is being demonstrated are
good, no, great. I just wish the title was a little more reflective of
reality. That's all. No need to get worked-up. This is not that serious of an
issue. Just a comment.

~~~
readme
Someone has disagreed with you -- which you anticipated would happen.
Therefore he must be angry?

I agree with him. If it's wrapped up in a library and it's distributed by a
package manager then it doesn't count towards the total SLOC of your project.

~~~
robomartin
Read the last line of his post please. He said he was angry, no me.

>If it's wrapped up in a library and it's distributed by a package manager
then it doesn't count towards the total SLOC of your project.

Think about what you are saying. With that logic I can write a library in C++
that evaluates all images in the current working directory for least edge
length and returns the name or index of the winning file. My Python program,
then, might look something like this:

    
    
        import magic
        print magic.evaluate()
    

And then I claim that I have written a program that solves a swirly CAPCHA in
two lines of Python.

C'mon.

I you want to count true Python lines, download the language from python.org
and write a solver without the use of any add-on libraries. Then we can talk
about Python lines.

~~~
cidquick
[http://www.jwandrews.co.uk/2013/01/breaking-the-minteye-
imag...](http://www.jwandrews.co.uk/2013/01/breaking-the-minteye-image-
captcha-in-34-lines-of-python/)

OK, I didn't write the JPEG decoder, but how far do you want me to go?!

~~~
robomartin
That's great.

Please note that I did not have any issues with the method you used. My issue
was only with the title of your post not being accurate. You used Python +
OpenCV in addition to pre-un-swirled images.

To me at least, solving the problem with n lines of Python means that I email
you a single swirled image and you, using Python as downloaded from python.org
and nothing more, write a solver. The image doesn't even have to be JPG. It
can be an easy to read non-compressed format. That'd be fine. But you'd have
to un-swirl and do everything else, which is a lot more code.

Don't loose any sleep over this. It isn't important. Your original post was
fantastic and very informative. My comments were only about the title and how,
in every language camp, there's sometimes a tendency to look down upon other
languages by quoting such nonsense as line counts.

------
mistercow
That's a really clever way of detecting the swirl effect. As for simply
measuring overall sharpness, that could be thwarted by normalizing the FFT
after swirling. That wouldn't help against the sum-of-edges technique though
(in fact, it would make it worse).

------
flaviojuvenal
remembered me of how Interpol was willing to keep the reverse twirling
capability as confidential:
[http://thelede.blogs.nytimes.com/2007/10/08/interpol-
untwirl...](http://thelede.blogs.nytimes.com/2007/10/08/interpol-untwirls-a-
suspected-pedophile)

------
buster
Its actually a super fun excercise to do. I did that years ago with a browser
game that showed images on login, some years ago. Also with python. It was a
fun learning experience because it's so much more visual then your typical
"todo list tutorial" or "hello world".

------
wrt54g
Since the algorithm can only detect the relative swirled amounts of a set of
the same images, a solution could be to, rather than using a set of the same
image, use a set of different images swirled different amounts (or with some
other transformation), and the user must select the unmodified image.

------
borplk
I'd love to see MintEye respond.

~~~
paulgb
"Like all CAPTCHAs, sliding CAPTCHA can be cracked . Still, it's more usable
and mobile friendly"

<https://twitter.com/EyeOnTheMark/status/291837470486704129>

------
StavrosK
Very interesting article, and an insightful realization. I had no idea Sobel
could do that.

~~~
experiment0
What? Edge detection is pretty much the only use for the Sobel operator.

<http://en.wikipedia.org/wiki/Sobel_operator>

------
Goranek
Breaking captcha is bad, mkay :)

~~~
hcarvalhoalves
If you're using a CAPTCHA, you're doing it wrong already.

~~~
brownbat
Yeah, all these CAPTCHA stories are definitely making me think there's no
refuge left of simple tasks humans can do that machines can't.

What's the replacement? Tying every account to a mobile phone or credit card?

~~~
hcarvalhoalves
The thing is, CAPTCHAs are used to validate it's a human operating the
interface, not a robot.

That's a stupid idea, based on some weird assumption that it's somehow safe to
give access to your system for a user, but not for a robot. If the only thing
stopping people from messing your system is automation, you should rethink it
from the get go.

~~~
brownbat
> weird assumption that it's somehow safe to give access to your system for a
> user, but not for a robot.

Nice point.

> If the only thing stopping people from messing your system is automation,
> you should rethink it from the get go.

Are there systems that can't be abused through automation, or where that's
much less of a concern? (Serious question, not intended as snark.)

Maybe throttling activity is a better approach... it's volume that is harmful
to the system, not interactions with programs per se.

Don't know how you throttle spammers distributing activity across a number of
accounts and IP addresses though...

~~~
hcarvalhoalves
> Are there systems that can't be abused through automation, or where that's
> much less of a concern?

Let's analyze. One place CAPTCHA has been used a lot is in comment forms. It
increases the opportunity cost of posting a comment via automation by
introducing a complicated challenge.

Fair enough. But that doesn't solve the _underlying_ problem (abusive
comments). You'll _still_ have abuse and spam, it will just come from humans
instead of robots. If I'm a blackhat SEO with a budget, I pay people to solve
CAPTCHAs all day and spam the internet (that's what happens).

If you design the comment system to fight abuse from the ground up (e.g.,
markov chain spam filters, user flagging), you don't need to care anymore
wether it's a human or a robot behind the POST, because all the spam actually
_improves_ your training set. You defeat spammers with their own data.

------
martinced
Love it. +1.

But wouldn't an obvious answer to this be to use a background full of swirl,
then add people onto it, then swirl again (so the background swirl is swirled
twice).

Then when you'd be "deswirling" with you code, you'll be finding something
always swirled?

maybe even use a background who's "swirled right" and then "swirl left" the
"backgroung + people" (or nyan cat or whatever)?

~~~
hcarvalhoalves
> Then when you'd be "deswirling" with you code, you'll be finding something
> always swirled?

It wouldn't change what he is detecting, which is the sum of the edges.

