
Cloudflare ReCAPTCHA De-Anonymizes Tor Users - walterbell
https://cryptome.org/2016/07/cloudflare-de-anons-tor.htm
======
mmaunder
"The Tor design doesn't try to protect against an attacker who can see or
measure both traffic going into the Tor network and also traffic coming out of
the Tor network. That's because if you can see both flows, some simple
statistics let you decide whether they match up."

[https://blog.torproject.org/blog/one-cell-
enough](https://blog.torproject.org/blog/one-cell-enough)

Work on a client to try and mitigate the risk of timing attacks:

[https://news.ycombinator.com/item?id=9585466](https://news.ycombinator.com/item?id=9585466)

~~~
djsumdog
I remember someone at a security conference talking about a kid at a
University who sent a bomb threat via Tor.

The University simply looked their their logs to see who was connecting to
known Tor nodes, narrowed it down by time and found the kid.

Source:
[http://www.theregister.co.uk/2013/12/18/harvard_bomb_hoax_ch...](http://www.theregister.co.uk/2013/12/18/harvard_bomb_hoax_charge/)

~~~
mmaunder
Good opsec involves multiple layers of security. There's a fun talk at defcon
next month on extending wifi range to avoid detection along with signal
'hiding' via SDR:

[https://www.defcon.org/html/defcon-23/dc-23-speakers.html#Gr...](https://www.defcon.org/html/defcon-23/dc-23-speakers.html#Graham)

~~~
_asummers
There's another good talk on more general opsec from Defcon a few years ago
called "Don't fuck it up".

[https://youtube.com/watch?v=J1q4Ir2J8P8](https://youtube.com/watch?v=J1q4Ir2J8P8)

~~~
rev_bird
This is great, thanks for linking it. I'm really interested in this kind of
stuff, but it's so hard to find resources on it. Lots of people on Twitter
making fun of mistakes people make, not a lot of folks giving advice.

~~~
j_s
[https://news.ycombinator.com/item?id=6521145](https://news.ycombinator.com/item?id=6521145)

[https://news.ycombinator.com/item?id=6521517](https://news.ycombinator.com/item?id=6521517)

These discussions are definitely post-mortem analysis but there is also
discussion on another OpSec presentation:

[http://www.youtube.com/watch?v=9XaYdCdwiWU](http://www.youtube.com/watch?v=9XaYdCdwiWU)

------
jgrahamc
This short piece doesn't have much detail. But if reCAPTCHA is usable to
deanonymize Tor users then I would like to know about it in detail so I can do
something about it.

~~~
tedunangst
I didn't see anything that makes it unique to recaptcha. Any fingerprint able
traffic pattern that can be observed coming and going will work.

I could make a website that adds random(1, 64) one pixel images to each page.
As you browse the site, you'll be broadcasting 6 bits of identifier with every
click.

~~~
jerf
I don't see anything that makes this unique to CloudFlare, either.

(You imply this in your point, but given the specificity of the accusation, I
think it's worth clearly pointing out.)

~~~
SwellJoe
I believe the "unique to CloudFlare" element is that CloudFlare effectively
sees traffic for significant portions of the web...but is one entity. So, a
powerful enough hostile actor (say, a state) would only need to compromise one
entity (CloudFlare) to exploit users of thousands of websites, including many
major ones. Er, well, two entities, because they also need entrance data. So,
if a state were to compromise an ISP and CloudFlare it would give that state a
lot of Tor users identities.

Very few small-ish entities have such a large reach and can interject
themselves into so many connections on the web.

~~~
jerf
But if we're talking about The Adversary, then they're already deeper in than
CloudFlare will ever be, so... what's different?

~~~
yoo1I
The difference is that reCAPTCHA provides a detectable traffic pattern and is
already widely deployed. This provides plausible deniability. Other than that,
I don't see a difference.

------
pyromine
I didn't realize just how fragile TOR is. . . While I understand that
remaining anonymous requires adjusting your browser habits somewhat
extensively, the fact that a ReCAPTCHA is enough to (theoretically) de-
anonymize a user seems to me that it's not able to anonymize at all when
browsing.

While TOR may be useful for evading firewalls, my general perception of the
project has changed from general anonymity tool to a tool tailored for very
specific use.

Granted, this is probably what my understanding always should have been.

~~~
0xmohit
> I didn't realize just how fragile TOR is. . .

It's JavaScript that causes it (you could choose to disable it [0]). The FAQ
[1] warns of it:

But there's a third issue: websites can easily determine whether you have
allowed JavaScript for them, and if you disable JavaScript by default but then
allow a few websites to run scripts (the way most people use NoScript), then
your choice of whitelisted websites acts as a sort of cookie that makes you
recognizable (and distinguishable), thus harming your anonymity.

...

Until we get there, feel free to leave JavaScript on or off depending on your
security, anonymity, and usability priorities.

[0]
[https://www.torproject.org/docs/faq#DisableJS](https://www.torproject.org/docs/faq#DisableJS)

[1]
[https://www.torproject.org/docs/faq#TBBJavaScriptEnabled](https://www.torproject.org/docs/faq#TBBJavaScriptEnabled)

~~~
onecooldev24
Not only javascript, you can have a http server that can send timed
responses/packets and that would still work. If network traffic is being
monitored at the modified server and ISP.

------
bostik
In other news, a global passive adversary can use traffic analysis, timing
data, and known patterns to deanoymise a Tor user.

The only "new" thing here was the rough traffic pattern analysis of CF captcha
page.

------
cuonic
One way around this is to disable javascript for ReCAPTCHA, the service
provides you with a rather primitive HTML form with checkboxes over the
images, generating only one request on submit.

~~~
happyslobro
Yeah, this again. You can't secure your system, if you are running your
adversary's code. Tor is upfront about this, this is why Javascript is
disabled by default, and why there is a warning if you enable it globally. I
suppose this does make for decent clickbait headlines though.

~~~
subliminalbrad
TorBrowser does not disable Javascript by default, and neither does TAILS.

~~~
happyslobro
It is shipped with the noscript plugin enabled, that part is pretty important.
You aren't just referring to the fact that JS is disabled via a plugin, are
you?

------
sp332
Isn't this explicitly outside Tor's threat model?
[https://svn.torproject.org/svn/projects/design-paper/tor-
des...](https://svn.torproject.org/svn/projects/design-paper/tor-design.html)
See section 3.1

------
Johnny555
Why is this phrased as if it's Cloudflare's fault?

If it's this easy for a side effect of a recapcha image to de-anonymize a Tor
user, then this seems like a failing of the Tor protocol that they should fix.
Maybe they need to introduce more jitter, repackage requests into a single
stream with consistent (or randomized) packet size, or pad the packets with
random data.

~~~
rohit89
Introducing jitter would mean increasing latency and generally slowing down
your browsing.

~~~
softawre
as a tradeoff for security? Ok..

I mean, using a VPN is slower than not, but tons of people use them

~~~
rohit89
The effect will be exaggerated since tor traffic goes through three nodes and
random jitter would need to be added for each. In general, beating traffic
analysis in a low latency network is a hard problem. You could have perfect
anonymity if you could add arbitrary delays but that would make the network
unusably slow.

~~~
Johnny555
So make it a configurable option that you can turn on and off when you need
it.

Having to wait a few seconds for a web page to load seems like a small price
to pay to avoid government agents banging on your door because you're looking
at "subversive content".

~~~
rohit89
The delays would have to be longer than that, not just a few seconds. You're
looking at minutes to hours for strong guarantees. Also having a configurable
option could make you more fingerprintable.

------
mikegerwitz
Traffic analysis is always a problem; this is a specific case, but I'm not
sure this is anything new.

Many attacks on Tor are facilitated by or require JavaScript. Consider
disabling it rather than executing arbitrary, untrusted software on your
computer automatically.

~~~
daxorid
_Traffic analysis is always a problem_

Not always. High-latency mix networks with fixed message size and randomized
transmission are very robust against traffic analysis.

The elephant in the room is, as usual, PEBKAC. The demand by users for low
latency will always be the killer for anonymity networks.

------
captainmuon
Huh, I always thought that Tor breaks up traffic in a random, but
deterministic (not data dependent) way - sometimes joining data from two
packets into one network packet, sometimes splitting packets and holding data
for a while [x]. That's how I explained the jitter to myself. Sometimes a
connection would be really fast, and sometimes it would hang on a single
packet for hundreds of ms. Seems I was mistaken.

In this case, it would have helped a bit, since an attacker would not have
seen the characteristic staccato of the reCAPTCHA exchange. They would have
seen a few kB in either direction, in 40-100 packets, over a period of a few
seconds. If the implementation is clever, on end would even have a different
signature than the other.

At least this is something I would have included in Tor. Now that I think
about it, randomly introduced delays (from the outside) might actually be a
technique to _deanonymize_ users....

([x] You'd generate packet sizes and minimum transmission times from a known
seed. First packet is 501 B, 24 ms later a packet of 2048 B, then 15 ms later
one of 1718 B, and so on. If there is not enough data after a grace period,
pad with junk. If you constantly need more time to send packets than allowed,
or need to pad, then adjust the model. Also choose the model to match regular
traffic if possible. Disclaimer: I'm just making this up on the spot and am no
expert, but it seems plausible and obvious to me.)

~~~
baby
This is called a mix network and only one server currently does that in Tor's
network.

~~~
niij
Could you please expand on this? I tried to find more info of a Tor relay
running mixing, but couldn't find anything. I would like to turn it on for my
servers if possible.

------
hewhowhineth
I stopped visiting sites with image recognition reCAPTCHAs. It has to be one
of the worst UX patterns ever devised. It's dirt cheap to automate them away
so it doesn't really stop any self-respecting bot maker, and it comes at a
price of being a huge pain in the ass for a real user. Every time I run into
them I felt used and abused.

It's really sad. So much brain power and this is what they come up with.

Apologies for the rant, couldn't help it. ReCAPTCHA is one of very few things
I genuinely hate.

~~~
mpitt
> It's dirt cheap to automate them away

Have any examples?

------
tlrobinson
Are there any anonymity networks that transmit streams of packets between
nodes at a constant rate regardless of whether it's being actively used?

Obviously it would be a very bandwidth hungry network, though if exit node
bandwidth is currently the limiting factor (is it?) then maybe not entirely
impractical.

~~~
sp332
There are a few systems based on the Dining Cryptographers protocol, but the
ones I've seen implemented are very slow and don't support many users.

------
matt_wulfeck
It's bizarre that this article is critical of Cloudflare. If TOR can't stand
up to a recaptcha without leaking PII, then it sounds like TOR ultimately
needs to be fixed.

I stand by Cloudflare. So much malicious traffic comes through Tor that
administrators need to do a lot to protect themselves from it.

~~~
rvern
Almost all the pages I see CAPTCHAs on with Tor have _absolutely nothing_ to
protect. Arguably the website owners who use CloudFlare are far more to blame
than CloudFlare itself. Nevertheless, I believe CloudFlare should stop using
reCAPTCHA and create a challenge system that does not require JavaScript, and
does not require sending requests to any website other than the website being
visited. I like their proposed solution[0] of automatically creating an onion
service for websites using CloudFlare and redirecting Tor users there.

[0]: [https://blog.cloudflare.com/the-trouble-with-
tor/](https://blog.cloudflare.com/the-trouble-with-tor/)

------
gnud
I wonder why these anti-abuse systems don't use proof-of-work. Instead of a
captcha, let the browser chug for 5 seconds, and then POST the solution in
order to gain a temporary access cookie.

Sure, this could be attacked - but not at scale, and that's the whole point of
the capchta anyway, right?

~~~
mikegerwitz
That requires JavaScript.

CloudFlare does have a JS-only challenge, which presumably does this type of
thing, but this has a couple different problems. From a security perspective,
you're executing arbitrary software, which is unwise, especially if you're
looking for anonymity. The other issue is that the software is also
proprietary.

[https://support.cloudflare.com/hc/en-
us/articles/204191238-W...](https://support.cloudflare.com/hc/en-
us/articles/204191238-What-are-the-types-of-Threats-)

"During a JavaScript challenge you will be shown an interstitial page for
about five seconds while CloudFlare performs a series of mathematical
challenges to make sure it is a legitimate human visitor."

Related: I have started trying to get into contact with webmasters of sites
that enable JS Challenges; my template is at the bottom of this page; it'd be
great if others could do the same:

[https://gitlab.com/mikegerwitz/dotfiles/blob/master/emacs.d/...](https://gitlab.com/mikegerwitz/dotfiles/blob/master/emacs.d/mail.org)

~~~
gnud
Good point. The Cloudflare capthchas I've seen seemed to use Javascript, but
maybe it's just incredibly good CSS.

Interestingly, while a javascript calculation might leak more information to
CloudFlare (since they might collect other info beseides the result of a
proof-of-work function), it would probably leak less to anyone trying to
analyze tor traffic from the outside? Seems to me like it would be harder to
correlate the two ends of the tor circuit.

~~~
mikegerwitz
It'd be harder to do traffic analysis, yes, though I really wonder why it
makes so many requests to begin with. I'd like to see an analysis of the
CAPTCHA.

Considering that they allow it without JS enabled, I wonder why they'd need
any requests at all.

------
the8472
Don't most recaptcha http requests go to google, i.e. wouldn't google be the
one with the information/control necessary to de-anonymize?

------
Illniyar
A lot of comments here talk about recaptcha having a distinctive traffic
signature, but I don't understand this.

Why does recaptcha have a distinct signature and if it does couldn't an
attacker just make a distinct signature without recaptcha?

And why does recaptcha have a traffic signature that can distinguish between
users? I mean how does a simple request response create a distinct traffic?

~~~
captainmuon
Right, include Javascript (or heck, just a bunch of images of different sizes)
that open requests in Morse code. Long request, long request, short request,
long request...

Or, my favorite, the binary search (assuming you control the server / the
network in front of the server / some exit nodes, and can monitor the traffic
of your targeted user): have sites that cause transmissions for some time
(long running JS / requests, or just a lot of content the user interacts
with). Freeze 50% of the servers connections. Is the user still connecting?
Then s/he is in the 50%. If not, in the other. Repeat until the is user
matched to activity on the server.

------
danthejam
Having a dynamic IP from the 3rd world, I can't help but notice how so many of
the sites I visit are behind CloudFlare. 70% of the time I have to solve a
captcha the first time I enter a domain during my browser session. This space
could really do with more competitors on their same level.

------
mabbo
>No one is that incompetent.

Well, I'm not sure I'd go that far.

------
libeclipse
If an attacker ran the entry and exit node for Alice's connection, they could
exploit this technique and not need access to the relay node.

------
muthdra
"No one is that incompetent." Yeah I don't think so. Beautiful article,
otherwise.

------
gcb0
does that captcha works without JavaScript?

~~~
mikegerwitz
The one they're describing does, yes.

~~~
gcb0
still can't understand why people use TOR with javascript enabled.

------
lumberjack
Browser signatures are probably easier still.

~~~
akerro
The point of Tor-browser is to make all signatures the same, when screensize.

~~~
akerro
*even screensize.

------
LinuxFreedom
It is an USA company - that is enough to not trust them.

We do not need any more evidence, there is enough out there about gag orders,
secret courts, worldwide compromise of network security.

USA tec company inhabitants and founders, read this: please move out of the
country, build your companies in other places, do it now. There is no time to
waste. You can not repair the system, that corrupt bureaucrats have
irreversibly destroyed.

It will take one or two generations to rebuild a freedom oriented democracy in
some other place. Currently Europe still seems to be a good starting point,
especially now that the main USA influence channel GB is out.

Please give up the false hope and act now. Get out of that failed state!
Freedom can not be rebuild in a fascist system without help from the outside -
you can do help much better from outside!

People who still stay in USA will be seen as cooperators by history, the
window of opportunity is closing, hurry on and get out asap. Help to defend
freedom in other places!

~~~
eeZah7Ux
Why the downvotes?

~~~
tedunangst
Seems a little tangential to the original article.

