
Uncaptcha: Defeating Google's audio reCaptcha with 85% accuracy - nreece
https://github.com/ecthros/uncaptcha
======
bdcravens
> From there, each number audio bit is uploaded to 6 different free, online
> audio transcription services (IBM, Google Cloud, Google Speech Recognition,
> Sphinx, Wit-AI, Bing Speech Recognition)

Gotta love that they're using Google's cloud resources to defeat Google's
reCaptcha

~~~
earenndil
Didn't people use google reverse image search to defeat the image-based
recaptcha a while back?

------
stevenwoo
There are some confounding factors (from my reading) - a.) the more likely
Google thinks you are a human, the easier the tests/less tests you have to
pass - are they repeating the test on the same computer all the time or are
they mixing it up some how b.) the README states Google uses some of your
interaction speed times/history with internet/Google to determine level of
difficulty of captcha - if it's the same computer - and they repeat the tests
on the same computer, are they just getting easier tests the better it gets?
(My experience with visual captchas is I would be happy to get 85% success
rate with those). They probably are not seeding the computer with 9 days of
simulated activity to get the easiest captcha suite to make it seem human or
are they? c.) since there could be multiple tests if Google thinks you are a
computer, are they counting successful attempts in an ultimately unsuccessfuly
run - is passing 5 then failing the 6th count as 5 success /1 failure or 1
failure?

------
hedora
I’d love for this to be a browser plugin. I don’t stay logged into google or
let them set cookies, and some days I think that instead of doing machine
learning in house, they just farm it out to people like me by holdings random
sites hostage.

~~~
jdavis703
Actually you're helping to classify their datasets for ML purposes, so in a
sense you're tilling their farm for them (while also helping ensure the site
you're on has less spam than it would otherwise).

~~~
sildur
I usually make small mistakes identifying things, intentionally.

~~~
halflings
What's the point? If it's to "stick it to the man!", it won't have any effect
because they are probably using multiple ratings from different users to
classify things (in average people will be right).

~~~
thisacctforreal
And it will now be, on average, a hair less right, or less confident, or maybe
just less confident in sildur’s classifications.

Either way I think being a bit adversarial towards this kind of stuff is a
healthy, albeit small, protest.

------
cbhl
Suppose that audio captchas could be solved with 100% accuracy. Then what? How
do we make sure that visually impaired users can still access the internet /
solve captchas, but robots cannot?

~~~
harrisi
I suspect at some point in the near future it will generally be impossible to
differentiate between humans and computers on the internet. At that point I
wonder if we'll be required to use registered identifying information to use
certain sites and apps.

~~~
wybiral
This problem has been on my mind for some time now.

Requiring some kind of unique identifier seems like a difficult thing to sell
to users, but otherwise we may be at the mercy of bots.

And it's not the scammers and normal spammers using bots that worries me. It's
the potential to shepherd public opinion on social networks.

~~~
knodi123
> It's the potential to shepherd public opinion on social networks.

And the unique identifiers need to be made public, and publicly verifiable -
otherwise you've made it harder for, say, russian politibots - but easier for,
say, facebook, who could post all the fake comments they want, and people will
be more likely to trust that each commenter is unique and was vetted.

So basically, abolish private communication. I'm not mocking you, in case it
sounds like that - I see the same dilemma, and don't know the answer.
Personally, I prefer just making sure that the big communication sites are as
vigilant as they can be and make it prohibitively annoying to engage in spam-
like activities. It's not perfect, but it's the least awful choice (IMO).

~~~
wybiral
> So basically, abolish private communication.

Not _private_ communication. You and I should be able to talk privately all we
want. Just _anonymous_ communication.

And really our ability to communicate privately would be pointless if we
weren't sure about one another's identity.

But you wouldn't event have to abolish anonymity online entirely... Just on
major social influences.

The other option, imo, is that people become more skeptical and approach all
content as though it could be politibots and/or advertisement. Then the notion
of using the internet to "sample the crowd" is lost entirely because the crowd
is tainted by bots and skepticism.

This seems difficult because people seem to have built in systems for relying
on peer samples to establish norms.

------
boltzmannbrain
Also out this week in Science is George et al. "A generative vision model that
trains with high data efficiency and breaks text-based CAPTCHAs":
[https://www.vicarious.com/2017/10/26/common-sense-cortex-
and...](https://www.vicarious.com/2017/10/26/common-sense-cortex-and-captcha/)

------
modzu
can captcha go away now? its such a lazy garbage solution to the problem

~~~
komaromy
What's better?

Hashcash is interesting, but it's probably not applicable to every scenario
and it hasn't seen much adoption. There might even be a bit of a stigma now
around proof-of-work algorithms because of cryptocurrency mining.

~~~
gsich
Hashcash. If you want to add money revenue (like reCaptcha does by making me
work without payment) use something like Monero mining (Coinhive for example).

I'll take the PoW captcha any day.

------
barbolo
Nice work. It also publishes a file mfcc.py which uses Mel spectrogram to
solve the audio offline. With enough data, a model based on MFCC should work
much better than any cloud service (general speech recognizer).

~~~
barbolo
Another interesting fact is that TensorFlow 1.4 supports native MFCC
spectrogram tensors.

------
rajington
This is worthy of a bug bounty, and I feel it should be handled as such:
reported privately to them first. The people trying to hack reCaptcha at scale
are not good people.

The temporary fix Google might have to do in the mean time however is to stop
all audio reCaptchas, blocking the people who vitally depend on it.

~~~
barbolo
Blocking audio means blocking visually impaired people from accessing websites
with reCAPTCHA.

Hacking reCAPTCHA is not only for bad people. There are several use cases
where solving reCAPTCHA automatically is needed.

~~~
Chickenosaurus
What are these legitimate use cases for automatic reCAPTCHA solving?

~~~
barbolo
Automating searches on a government website that decided to use reCAPTCHA just
because it wants to look modern. There are dozens of them in Brazil for
example.

