
The Trouble with Tor - jgrahamc
https://blog.cloudflare.com/the-trouble-with-tor/
======
Klathmon
"To make sure our team understood what a pain CAPTCHAs could be, I blacklisted
all the IP addresses used in CloudFlare's office so our employees would need
to pass a CAPTCHA every time they wanted to visit any of our customers'
sites."

Say what you will about cloudflare, that's an impressive move.

~~~
mdip
Impressive, yes, but I'm going to hazard a guess that they didn't route all of
that traffic through Tor to experience the CAPTCHA with the bandwidth
constraints imposed by using an exit node.

The CAPTCHAs are (mostly) easy to solve, but all of the ones I was presented
with were "pick the right one out of 9 different images" and loading the
entire CAPTCHA in Tor Browser took several seconds (and many revealed a new
image after clicking one of the 9). This is then repeated _at least_ once (I
received three on one site, I'm guessing because I didn't know if the picture
was a store front or just the front of some building). After completing the
challenge I was given a connection error and had to repeat the entire thing
again in one case.

There are much lower bandwidth CAPTCHAs out there and those should be favored
over these large image-based ones for connections originating from the block
of addresses represented by Tor exit nodes.

~~~
ultramancool
Eugh, I hate the image based CAPTCHAs, they take way more mental power than
the old text based ones did and often take me 2 or 3 tries to get right. Is
there a way to permanently opt back to the text based ones?

~~~
Jordrok
Glad that I'm not the only one who dislikes the new style. They do have one
advantage - they seem to be solvable 100% of the time (at least for now). In
every other aspect though I find them to be way more annoying than the old
ones. They're slower to load, take longer to solve, require much more
concentration, and have extremely variable difficulty. The worst are the ones
which replace each image you click with a new one, requiring another 5-10
seconds for loading.

~~~
jeffasinger
I've gotten one or two wrong before. Sometimes the questions aren't well
defined for a given picture.

For example, the question was "Does this have a river in it?" with a picture
of the Grand Canyon, where you couldn't quite see down to the Colorado River.

------
lazyjones
What's the rationale for this weak crypto in Tor?

 _Tor uses hashes generated with the weak SHA-1 algorithm to generate .onion
addresses and then only uses 80 bits of the 160 bits from the hash to generate
the address, making them even weaker._

Other weaknesses include use of RSA-1024, people have been complaining to no
avail since at least 2013: [http://arstechnica.com/security/2013/09/majority-
of-tor-cryp...](http://arstechnica.com/security/2013/09/majority-of-tor-
crypto-keys-could-be-broken-by-nsa-researcher-says/)

~~~
petertodd
This isn't actually a vulnerability yet, as SHA-1 is only known to be
vulnerable to collision attacks (where you try to find two messages with the
same hash) rather than pre-image attacks (where you try to find a message with
a specific hash); almost no hash functions have ever been found to be
vulnerable to pre-image attacks: [https://github.com/zooko/hash-function-
survey/blob/master/pr...](https://github.com/zooko/hash-function-
survey/blob/master/preimage-attacks-color.rst) Secondly, the issue is moot
because generating collisions on 80 bits takes just 2^40 work - not hard at
all.

Basically what that means for Tor, is that while it'd be pretty easy for a
Onion site operator to generate two keys corresponding to the same .onion
address, an _attacker_ still has to do 2^80 work to attack a site by
generating a key with the same Onion address. While that's not great - 2^128
work is considered "standard" in cryptographic work - 2^80 work is still hard
enough that there are probably cheaper ways of attacking Onion sites (for
reference, the cumulative total work done by all Bitcoin network miners in the
entire history of Bitcoin is about 2^80 hashes).

As for the 1024bit pubkeys, I'm not sure what the status of that is; from what
I hear Tor is actively working towards a Onion redesign that will fix these
issues, and longer pubkeys may have already been fixed.

~~~
baby
nitpicking but:

> This isn't actually a vulnerability yet, as SHA-1 is only known to be
> vulnerable to collision attacks

should be:

* this isn't a vulnerability (there are no reason to believe that we might see it vulnerable to pre-image attacks one day)

* SHA-1 is thought to become vulnerable to collision attacks

> total work done by all Bitcoin network miners in the entire history of
> Bitcoin is about 2^80 hashes

without thinking of hashes, current cycles done per second by the bitcoin
network is around 2^90

~~~
dsp1234
_SHA-1 is thought to become vulnerable to collision attacks_

SHA-1 is actually known to be, not just thought to become, vulnerable to
collision attacks at less than the full bit strength of the hash[0].

The important part is the pre-image resistance, of which there is no known
attack.

[https://en.wikipedia.org/wiki/SHA-1#Attacks](https://en.wikipedia.org/wiki/SHA-1#Attacks)

~~~
baby
Right, it's more than certain now that with time we will be able to find a
collision ("an estimated cost of $2.77M to break a single hash value by
renting CPU power from cloud servers").

But there is a difference between the theory and actually finding a collision.
And then a huge difference as well on how to exploit that.

------
barrkel
I have never successfully completed a captcha served up by cloudflare (and
thus Google) on Tor. They are fiendishly difficult to the point I suspected
the mechanism is broken.

~~~
buro9
I use the audio captcha.

Works every time.

The ones that are impossible for my are the street signs, and after that all
of the cultural ones (I see US captchas and when asked to select a sandwich or
a recreational vehicle I'm doomed to not complete them).

An interesting side effect of failing a captcha is that to Google this looks
like proof the captcha is working, that you're likely to be a bot, and that
they should definitely give you the hard captchas.

As such, if you cannot complete a captcha the chances increase that you must
now complete multiple difficult captchas.

The audio captcha is delightfully simpler though it does take a moment longer
to complete.

~~~
MaddoScientisto
Doesn't the audio captcha work only if you have javascript enabled? If you
enable javascript on TOR then you are doing it wrong.

I could be wrong though

------
jobbleobble
It's also worth noting that they did all this whilst getting some serious
stick from some of the core Tor devs:
[https://trac.torproject.org/projects/tor/ticket/18361](https://trac.torproject.org/projects/tor/ticket/18361)

~~~
MichaelGG
I love Tor, and I like CF as a service though I'm not convinced they're a net
positive for society. But wow, the non-CF guy (cypherpunks) on that ticket was
really being a dick. Gotta hand it to the CF people that they kept engaging
and trying to figure out some solution.

Though it seems that the idea of allowing GETs when a site isn't under load
attack is probably the right solution?

~~~
mirimir
All anonymous comments show up as "cypherpunks".

------
dredmorbius
I'm glad to see CloudFlare addressing Tor users and issues with CAPTCHA, as
I've been victim of this myself multiple times in recent years. In particular
is the issue that CloudFlare assumes javascript-enabled browsers, a condition
which may well not be met. I recorded an exchange with CloudFlare support some
time back in which the CloudFlare rep was apparently unaware how or why this
might occur:

[https://plus.google.com/104092656004159577193/posts/H2sakaRx...](https://plus.google.com/104092656004159577193/posts/H2sakaRxMPA)

I'm also aware of some tools/approaches which address the question of fair
anonymity -- ensuring well-behaved clients while retaining anonymous status
for the client. Best I'm aware these are very experimental. I've also
forwarded the information to TK Hyponnen of F Secure, who may have some
impression of the approaches.

FAUST: [https://gnunet.org/node/1704](https://gnunet.org/node/1704)
(Efficient, TTP-Free Abuse Prevention by Anonymous Whitelisting | GNUnet)

Fair Anonymity:
[http://arxiv.org/pdf/1412.4707v1.pdf](http://arxiv.org/pdf/1412.4707v1.pdf)

Assessing these is beyond my skills, but the references may be useful to
CloudFare (or others).

------
tofupup
I previously didn't have an opinion on cloudflare until recently using the net
from a network that was black listed - web surfing was reduced to constantly
filling captachas. Bottom line for me is no single organization should have so
much power and I have stopped using cdn's and encourage everyone else to do
the same.

------
foxylad
I'm one of Cloudflare's customers who would blacklist all Tor traffic if I
could. I genuinely don't understand why so many people obviously use Tor for
_all_ their browsing, and not just for sites where remaining anonymous is
desirable. Why not simply switch to a normal browser for normal sites?

Some background - we run several SaaS services for schools, which are
politically and socially non-sensitive. The only realistic reasons anyone
would want to connect anonymously would be nefarious. Allowing Tor traffic is
like a bank having a special ATM round the back with no security cameras -
you're giving a free pass to attackers to try anything they want with
impunity.

I'm having a hard time seeing what the compensating advantage is. How does not
accepting Tor traffic to our "normal" sites lessen the anonymity of Tor
traffic to sites where it _is_ important?

~~~
voltagex_
The canonical answer is
[https://www.torproject.org/about/torusers.html.en](https://www.torproject.org/about/torusers.html.en)

Really, no one has any business looking at anything I do on the Internet. I
_don 't_ use Tor for everything but I may use it when I'm connecting via a
network that I believe to be hostile (i.e. just about everything outside of my
house)

~~~
ashitlerferad
> (i.e. just about everything outside of my house)

Which ISP do you use? I would be surprised if a trustworthy ISP exists.

~~~
voltagex_
Internode, in Australia. They're subject to the data retention laws, so I use
HTTPS and VPNs as necessary.

When they're completely consumed by their new owners, TPG, then no, there
won't be a trustworthy ISP in Australia. I have high hopes for SkyMesh,
though.

------
amelius
CAPTCHAs are essentially a broken idea: it is easy for an attacker to send the
CAPTCHA image to another website and ask users to fulfill the CAPTCHA for a
completely unrelated goal. This trick has been used in the past on certain
pr0n websites (users are allowed to see a picture only after they complete the
CAPTCHA). Also, one could use a mechanical turk service to circumvent
CAPTCHAs.

~~~
basch
reCAPTCHA does things like track your mouse movement and a bunch of other
hocus pocus.

~~~
ryanlol
Yes and it's still ridiculously easy to automate.

~~~
pfg
To be fair, the "I'm not a robot"-one-click-thing wasn't done to make
automation harder or impossible, but rather to make things more convenient for
users. It _will_ fall back to a regular visual captcha if you're doing
anything suspicious like requesting captchas at the rate necessary to do
comment spam or vulnerability scanning efficiently, so that's probably not
going to reduce anyone's captcha typer farm bill too much.

------
noonespecial
I encountered an impossible situation working on a wordpress site the client
insisted needed to be fully reachable via Tor. _Parts_ of the page loaded from
a cloudflare CDN but the main site didn't. The user was never presented with
the CAPTCHA of course, 3/4 of the page was just missing with no explanation. I
never did find a way around that.

~~~
jobbleobble
As mentioned in the article, if you control the cloudflare CDN you can now
whitelist all tor access.

~~~
dsl
I think he is talking about [https://cdnjs.com/](https://cdnjs.com/)

I'd really hope that CloudFlare whitelists Tor for CDNJS.

~~~
rdl
Probably not cdnjs -- there are lots of people who use CloudFlare for an
assets domain (since it's free) -- if they are just serving images, it's a
problem to display the CAPTCHA. It is probably best practice, if none of those
assets are sensitive, to disable as much security as possible on that domain.
It might be worth having some packages of defaults for tuning that. (One of
the benefits for our enterprise customers is one of our staff works with them
to tune settings.)

------
vox_mollis
Something people need to understand: real anonymity is really, really hard.
Your COMSEC is a fairly small portion of the attack surface area, and the
consequence of this is that staying anonymous is, BY NECESSITY, going to be
very inconvenient.

From this perspective, captchas are a very minor concern. I'm as pro-privacy
as anyone, but this expectation that anonymous activity is supposed to be easy
or convenient will never be satisfied. Thousands of years of lessons from both
military and civilian clandestine operations bears out the critical lesson
that anonymity is, by default, very very inconvenient. Nothing is going to
change that.

~~~
nxzero
Agree, though pointless systems that likely extract the identity of a user,
force them to work for free, etc - and fail to counter the risk they
supposedly stop is abusive.

------
hippich
Probably not really suitable for Cloudflare scale, but for others who are
forced to use CAPTCHA, consider [https://hashcash.io/](https://hashcash.io/)
for easy Proof-of-Work integration.

------
belorn
One should always be doubting when it comes to security features that do not
present the benefits in a clear way. The article for example claims that 18%
of global email spam comes from harvested emails that is collected using tor,
but is those 18% exclusively using tor?

Do uses who publisher their email address on their website that is hosted
through cloudfare see a lower number of spam? It should be a fairly easy thing
for cloudfare to test, while also testing vulnerability and login attempts. As
an aside, it would also be interesting to see if there is a quality vs
quantitative differences in the malicious activity (ie, if serious attempts
are done through botnets, and script kiddie activity is done through tor).

The last a final test in order to verify a security measure that has such a
high cost as this one, is to ask if its has any meaningful impact to the end
result. A website with 10000 vulnerability scans per day is not going to be
meaningful improved if it was reduced to 5000 per day, even if that is a 50%
reduction. If there is a known vulnerability, the site is going to get
hijacked either way.

------
monoclonal
Worth noting that a surprisingly common amount of sites have other sections of
their site which are not routed through Cloudflare. Look for instances of DNS
records like this:

    
    
        admin.example.com
    

Such a record is usually not routed through Cloudflare because the last thing
a webmaster wants is to solve captchas for their _own_ website. They don't
however care much for their visitors if they're subjecting (possibly a
substantial amount of them) to captcha solving nonsense.

The content in the non-cf sections of a site can still be accessed because the
webmaster is lazy and didn't care to check if a visitor can do a DNS DIG on
all their DNS records.

Or you can simply use TOR pluggable transports to pretend you're Googlebot,
and also hide all your traffic in Google-like traffic.

I would reserve this for rare cases as there are people in censorship prone
countries who really need this bandwidth :)

~~~
besselheim
The pluggable transports are for connections from your Tor client into the Tor
network, not connections from exit nodes to the rest of the Internet.
Cloudflare (or any destination host) would still be able to detect your
connection as originating via Tor.

~~~
monoclonal
There are innumerable ways infact to spoof the fact you're _not_ using TOR to
a website, and you can read up on these in the TOR documentation.

Ideally you're looking to use TOR as the first hop, and then you dial into the
wider Internet with a VPN, or as I mentioned: Using various Google services to
camouflage traffic instead of a VPN. This is where pluggable transports come
in, because Google doesn't like TOR, so you want to choose how you're
connecting to Google, and get to traverse the TOR network to find an optimal
route.

~~~
besselheim
Pluggable transports are intended to stop your local ISP detecting or blocking
your connection to Tor: [https://gitweb.torproject.org/torspec.git/tree/pt-
spec.txt](https://gitweb.torproject.org/torspec.git/tree/pt-spec.txt)

While I agree that you can mask your exit from the Tor network via an
additional proxy or VPN, that's not the role of pluggable transports. They're
only for connecting in to the Tor network, not out from it.

------
ownedthx
Antecdata: most of the scamming directed towards our startup comes from Tor
exit nodes.

------
tech-no-logical
I don't use Tor daily (although I run a small exit node), but I do surf via a
VPN. cloudflare's captchas drive me to the brink of insanity, I see at least
50 each day...

~~~
brianwawok
Have you considered setting up your own VPN? That would make the ip "clean"
and get less captchas, right?

~~~
amdavidson
> Have you considered setting up your own VPN? That would make the ip "clean"
> and get less captchas, right?

And would negate any anonymity offered by using a VPN.

~~~
mathrawka
Tor : Anonymity :: VPN : Privacy

VPNs do not provide anonymity.

~~~
startling
I think you misunderstood -- I parsed it as "And using a VPN would negate any
anonymity offered (by using Tor)".

------
beeboop
In addition to CAPTCHAs, why not just have a button that runs some JavaScript
that completes a proof-of-work similar to what mining bitcoins does? You could
make it only take 5 seconds on a modern laptop CPU, about as long as it'd take
to enter the CAPTCHA anyway, but it'd potentially be a very large road block
for spammers/DDOSers.

For those on phones, you can still opt for CAPTCHA if you don't want to kill
your battery.

~~~
dgfofd09fv
Because that wouldn't stop them. Let's rather say instead everyone gets a
fixed delay in seconds. Then the spammers will just wait out that delay and
then spam. Even if the delay is on every single page visit, that doesn't harm
a botnet, because they can still do _delay /machine_count_ visits per second.

~~~
beeboop
It doesn't stop them, but it raises the costs of it significantly. Instead of
hundreds of requests a second, they're down to one request every 5 seconds,
and they're having to run the computer at a full CPU load 100% of the time. It
would be more profitable for them to just mine bitcoins at this point, meaning
they wouldn't waste that CPU load on spam submissions.

~~~
dgfofd09fv
You're mixing the two cases. If every page is limited then yes they have to
work hard, but still get _delay /machines_ visits per second. But a human will
have to wait the full delay every time a page is visited. This is unacceptable
for modern browsing.

If the delay is only once per, let's say a domain, then you don't do anything
against the spammers, they only have to wait a full delay once.

~~~
beeboop
Maybe Cloudflare could have a browser plugin that preemptively creates
"tokens", or essentially just mines Cloudcoins that you then spend to bypass
CAPTCHAs. That way you could make it much more expensive than 5 seconds of CPU
and there'd be zero delay (or perhaps not even the Cloudflare splash page).
The use case for needing to constantly bypass CAPTCHAs is rare enough it seems
reasonable to ask those people to use a browser plugin.

------
hartator
> We also made a change based on the experience of having to pass CAPTCHAs
> ourselves that treated all Tor exit IPs as part of a cluster, so if you
> passed a CAPTCHA for one you wouldn’t have to pass one again if your circuit
> changed.

I don't get how this is different than a super cookie. Anyway, I think
globally that's a well balance reaction to the TOR issue.

~~~
schoen
> I don't get how this is different than a super cookie.

The supercookie would survive or be detectable across multiple browser
sessions (keep in mind that the Tor Browser automatically deletes regular
cookies when you quit). The behavior that CloudFlare is describing here works
within a single Tor Browser session but not across multiple sessions.

I believe the Tor Browser is willing to send some first-party session data to
a site after changing circuits, so that you wouldn't be logged out of an
account if you logged in over Tor and kept that session active for long enough
that Tor switched over to a different circuit. This behavior is basically what
should allow CloudFlare to recognize that a particular Tor user has recently
passed a CloudFlare CAPTCHA (on a particular site). However, if the user quits
and restarts Tor Browser, CloudFlare will no longer be able to detect that
it's the same visitor (if it could, that would be the supercookie case).

------
MaddoScientisto
The one time I used TOR the captchas were the ones with both words as squiggly
letters which wouldn't work even after 20 attempts, I had to give up.

I was pleased when, recently, I found out they switched to the image based
one. Sure, sometimes it still refuses to accept that I selected all the street
signs but at least I don't have to give up in frustration after 30 consecutive
failed attempts

------
stegosaurus
Why do CAPTCHAs even exist for standard websites (as opposed to account
creation, etc)?

Wouldn't something like rate limiting or proof of work achieve the same
result? If you're simply allowing someone to browse, you don't actually care
whether a user is real or not. You care about stopping automated
comments/spam.

Is this just another tentacle of the advertising industry?

~~~
nodja
It's explained in the article.

 _On the other hand, anonymity is also something that provides value to online
attackers. Based on data across the CloudFlare network, 94% of requests that
we see across the Tor network are per se malicious. That doesn’t mean they are
visiting controversial content, but instead that they are automated requests
designed to harm our customers. A large percentage of the comment spam,
vulnerability scanning, ad click fraud, content scraping, and login scanning
comes via the Tor network. To give you some sense, based on data from Project
Honey Pot, 18% of global email spam, or approximately 6.5 trillion unwanted
messages per year, begin with an automated bot harvesting email addresses via
the Tor network._

Rate limiting would block legitimate users, and pow doesn't impede the
malicious uses of tor.

~~~
stegosaurus
That doesn't answer my question, though. All of those are reasons to suspect
Tor exit nodes, but not reasons to place CAPTCHAs on standard article pages on
a site.

The only vaguely reasonable one I can see there is 'ad click fraud', and I
think that fundamentally restricting the usefulness of a site for advertising
purposes is awful.

------
deegles
Are there any public projects that have used deep learning to defeat captchas?
If not, I'm sure it's only a matter of time.

~~~
Mahn
It's a cat and mouse game, of course captchas have been defeated in the past,
but captchas just keep getting increasingly complex. Currently, for anonymous
requests, reCAPTCHA by Google, which is what Cloudflare uses, asks you to
"choose bodies of water" or street signs from a series of images, with each
click sometimes revealing more options. It's fairly complex so I guess it
hasn't been broken yet. It's also a massive pain in the pass for legitimate
visitors.

------
cyphar
> If we provide a way to treat Tor differently by applying a rule to whitelist
> the network's IPs we couldn't think of a justifiable reason to not also
> provide a way to blacklist the network as well.

Yes there is: "we don't provide censorship as a service".

------
zhte415
My trouble with reCAPTCHA...

By using reCAPTCHA, mentioned in the article as a preferred solution, visitors
from China are routinely blocked, as reCAPTCHA it is now hosted by Google.

The trouble with CAPTCHA hosting: And if you intend to do anything with China
based orgs.

~~~
pfg
This wouldn't apply to Tor users (which this article is about), as reCAPTCHA
can't identify users from China while they're using Tor.

~~~
TorKlingberg
Rather, The Great Firewall of China cannot identify reCAPTCHA (Google) when
they are are using Tor. I think it blocks Tor specifically though.

~~~
pfg
Yep, that's it. I think "normal" relays get blocked almost immediately, but
bridges with OBFS work to a certain degree.

------
wolfgke
This is why people should browser a lot more over Tor, such that the relative
amount of malicious traffic is reduced and thus operators will not be able to
uphold the argument that Tor is often malicious traffic anymore.

~~~
seabee
Unfortunately, it won't work.

1\. It's a charity tax; you have to convince people to incur the cost of Tor
(i.e. CAPTCHAs everywhere) for activities that don't require Tor.

2\. You can't neutralise a poison by diluting it.

Firstly, from the operators' POV, if there's a widespread agreement that
people use Tor even though they don't need to, then they know voluntary users
can be pressured not to use Tor through sheer inconvenience. Even if you
wanted to boycott a service that blocked Tor, it's notoriously hard to make
good on that threat unless you wield a lot of power or annoyed a very large
number of people. So the consequences are minor.

Secondly, the percentage of malicious Tor traffic is a red herring. What
operators care about is the origins of malicious traffic. If 50% of your
attacks come from one particular country (or Tor) and the cost of losing that
traffic is less than the cost of that malicious traffic, there is a real
incentive to block that traffic. Combined with the first point, the cost of
losing voluntary Tor users is insignificant if they can easily choose not to
use it.

~~~
wolfgke
> 1\. It's a charity tax; you have to convince people to incur the cost of Tor
> (i.e. CAPTCHAs everywhere) for activities that don't require Tor.

People are willing to invest personal ressources for charitable purposes. Why
not here?

People are willing to fight against discrimination. Why not against
discrimination of Tor users?

> 2\. You can't neutralise a poison by diluting it.

There is also poisonous traffic from non-Tor adresses.

> Combined with the first point, the cost of losing voluntary Tor users is
> insignificant if they can easily choose not to use it.

People would strictly avoid restaurants that don't serve coloured people. Why
don't they avoid services that don't serve Tor users?

~~~
Vendan
How easy is it for you to know they don't serve Tor users unless you are a Tor
user? This is like saying "Why don't colored people avoid restaurants that
don't serve colored people?"

------
lake99
HN is the only place I bother solving CAPTCHAs. For everything else, Firefox
(Tor Browser) has plugins to get a copy from arhive.org, archive.is, or
google-cache. So if the page asks me to solve a CAPTCHA, I don't visit them.

This could be a benefit for the website (lower server load) or a harm (fewer
people appear to be reading their content, fewer people see their ads).
Whatever the case may be, I'm caught in the crossfire between crackers and
servers. I don't care about their war at all. As far as I'm concerned, I'm
winning.

Cloudfare wanted me to solve a CAPTCHA to read their article. I tried to
archive it, but arhive.is already had a copy of it. This happens to me quite
often. So, obviously, I'm not the only one who has figured out a way around
their war.

> Security, Anonymity, Convenience: Pick Any Two

Nah, I usually have all three.

~~~
scrollaway
> I usually have all three

...

> Firefox (Tor Browser) has plugins to get a copy from arhive.org, archive.is,
> or google-cache. So if the page asks me to solve a CAPTCHA, I don't visit
> them.

You don't have convenience.

~~~
Sir_Cmpwn
A one-click measure to circumvent a captcha is pretty good convenience-wise
imo.

~~~
scrollaway
Sure, but google cache/archive pages often lack some images, have broken
javascript, etc. Additionally, there's a huge difference between "point and
click", and "point, click, click again, look for the plugin, another click, do
this for every page".

~~~
Sir_Cmpwn
Smart Tor users have JavaScript turned off anyway. You might be lacking the
images, but lately this has been pretty good on archive.org.

~~~
scrollaway
Thus cementing the "no convenience" clause. I understand this is an
_acceptable_ tradeoff for some people (myself included) but you can't pretend
it's convenient.

~~~
aw3c2
Give it a try for a week. You might be amazed how fast, calm and content-rich
the web can be, if you disable Javascript by default and whitelist when
needed.

~~~
scrollaway
I'm well aware of what the web is like without JS. I know it's usable. I'm
saying it's not _convenient_.

Whitelisting is not convenient.

If people are pretending it is, they're doing a disservice to the security
community. Kinda how like people pretend GPG is usable and convenient, thus
holding back progress in the security UX front.

~~~
aw3c2
I find SUBSCRIBE TO OUR NEWSLETTER and LIKE US ON SOCIAL MEDIAS popups and ads
much more inconvenient.

Sorry for the late reply.

------
dev1n
For users who can afford it, routing a vpn through tor solves the captcha
issue with cloudflare. Also adds an extra layer of security

~~~
amdavidson
Can you elaborate?

I cannot imagine how Cloudflare could distinguish VPN traffic routed through
Tor and standard traffic routed through Tor. The only difference is a hop on
the front end, no change to what comes out the exit node.

~~~
dev1n
if you setup an access point to route all of your traffic through tor, then
connect to your VPN through that access point, your IP is the VPN IP, not the
exit node.

~~~
icebraining
What the point of doing that over connecting directly to the VPN? Seems like
the benefit granted by Tor (avoiding leaking who connected to the VPN) would
be negated by the fact that there are now payment records from you to the VPN.

~~~
dev1n
If you pay with BTC, prepaid gift card (paid for in cash) etc.. then there are
no payment records from you to the VPN. The benefit of this is that the VPN
provider doesn't know who you are because you are accessing the VPN through
tor. Yet the VPN provides you a stable IP that won't be CAPTCHA'd like most
normal tor exit nodes are.

Edit: This is good if you are trying to maintain an Internet profile (i.e.
Facebook, twitter etc.) that isn't tied to your true identity.

~~~
jobbleobble
But you are losing out on some anonymity here. The VPN provider may not know
who you are but you are consistently making access with the same user ID and
from the same IP. Your activity can be correlated to that account and that IP.

If that's not your aim (like you say - being signed in to the same facebook
account all day suffers you the same problem) then this isn't an issue.

But this isn't what a lot of tor users want tor for.

------
thraway235
anyone else concerned that the captcha's offer an ability to de-anonymize
someone completely?

for example having a backend algo offer certain captcha's that show up only in
certain areas of the world?

I feel like this is entirely possible and is part of the reason I will not
complete and captcha's moving forward.

~~~
TorKlingberg
When using Tor, reCAPTCHA cannot tell where you are.

------
mdip
I played around with Tor a couple of months ago when it hit the headlines
again, just to see what it was all about, and the experience was _awful_. As
an exercise, I decided to fire up Tor browser today just to see if it's gotten
any better. It's worse.

Here's what I found:

Large Image based CAPTCHA: Tor is slow. I'm on a 75/25Mbps internet connection
and it loads images similar to 24k dial-up. The CAPTCHA I was presented with
was the highest bandwidth CAPTCHA I've ever seen. I was given a 9 pictures and
needed to select "Bodies of Water". Each click yielded a new square. I had to
wait for 4 additional images to load before I could click "Validate". This
took over a minute. Then, repeat, this time "Store Fronts", which were hard to
discern (is it a blurry Apartment Building front or Store Front?). I received
a connection error on one site so I had to repeat the process. With Javascript
features turned off, it was a little easier, but included the extra step of
having to paste a Base64 encoded string into a text box which failed twice.
Every site I tried gave me this CloudFlare CAPTCHA page.

One of the sites I pulled up had no images. I set my privacy settings to the
least protected, enabling JavaScript and HTML5, assuming this was the problem.
Nope. They used images from another site and I had to grab the image URL and
paste it into a browser to see what was going on. It was yet another CAPTCHA.
A few minutes later, the previous site displayed images properly.

On to Google. Privacy settings are still at the weakest setting. Type "Google"
into the search engine and I get a "wavy text in an image" CAPTCHA. This loads
quickly and is easy to answer, but just results in another CAPTCHA. I gave up
after 10 tries. Bing, Yandex and Yahoo all worked with Yandex only presenting
a CAPTCHA once after the third search I did (simple, like Google, but worked).

This is a _terrible_ experience for people seeking to get around oppressive
governments. While I applaud CloudFlare for dogfooding their CAPTCHA system, I
doubt they did it in a way that truly simulates the experience via an
extraordinarily slow internet connection which is what I ended up with when
using Tor Browser. I wonder how much slower things would be if my internet
connection was 1Mbps or being interfered with by government infrastructure. I
understand the trade-off between securing a site from "evil traffic" that is
more likely to originate from a Tor exit node, but why must they use such a
bandwidth intensive CAPTCHA? A browsing experience that would have taken
seconds to complete took me almost 5 minutes (and a lot of frustration) _not
including actually consuming the content I was looking for_. Are the text in
image based CAPTCHAs not good enough for this task? Are there other reasons
I'm missing?

~~~
the8472
> hey used images from another site and I had to grab the image URL and paste
> it into a browser to see what was going on. It was yet another CAPTCHA.

Maybe browsers shouldn't request <img> src with Accept _/_ and cloudflare
should use that to detect whether it can actually serve html?

------
hueving
A "whitelist" tor solution that isn't whitelisting tor by default is really
lame. Approximately 0% of website operators will think to enable that so tor
users are generally still treated like garbage visiting any CF protected
website.

------
sajid
Great blog. Some of the best technical writing out there.

------
CiPHPerCoder
"We this is so important"

------
fosco
captcha's are evil.

unfortunately I do not trust tor.[0][1] I'm unsure the internet will ever be
anonymouse unless large completely private networks start gaining popularity.

[0][http://fusion.net/story/238742/tor-carnegie-mellon-
attack/](http://fusion.net/story/238742/tor-carnegie-mellon-attack/) (11/2015)
[1][http://www.theguardian.com/technology/2014/jul/22/is-tor-
tru...](http://www.theguardian.com/technology/2014/jul/22/is-tor-truly-
anonymising-conference-cancelled\(7/2014\))

------
nxzero
"Trouble with Cloudflare"

Treating TOR traffic the same as non-TOR traffic makes no sense; read the main
link for confirmation they do.

Case in point, and for starters, STOP repeatively requiring a user from a
session to keep passing "I'm not a robot" tests. Set a global cookie that's
valid for the session, across all of Cloudflair's network, and honor it.

If the "I'm not a robot test" doesn't work unless it's repeatively given, then
that is the problem, not TOR.

Please address this issue; thanks.

~~~
alex_duf
If you set a global cookie, then someone intercepting the public traffic
(think NSA, GCHQ,...) can identify what the user is reading. You just killed
anonymity.

~~~
nxzero
Global cookie is a session, if the user wipes the cookie, resets the TOR
connection, etc. that's their issue. TOR is not designed to hide sessions, nor
would setting a global cookie break anonymity unless the user doesn't
understand how TOR works. All exit nodes are watched and session device
fingerprints are correlated with or without a global cookie.

~~~
ikeboy
If nobody set a global cookie, a passive attacker cannot correlate different
tabs using different circuits. See
[https://trac.torproject.org/projects/tor/ticket/3455](https://trac.torproject.org/projects/tor/ticket/3455)
and
[https://www.torproject.org/projects/torbrowser/design/#ident...](https://www.torproject.org/projects/torbrowser/design/#identifier-
linkability)

~~~
nxzero
If the same global cookie is accessible via two circuits, that a bug in a
product that uses TOR, not TOR; I personally go above and beyond simply
creating a new circuit, never open two circuits at the same time or boot,
limit TOR sessions to single use, and locally compartmentize data per session,
etc. TOR is not plug in play, it takes effort and discipline, and will never
be a fully automated solution.

~~~
ikeboy
Yes, TBB on default settings is vulnerable to associating multiple tabs (if
I'm reading the link above right), if an adversary sets a shared cookie. That
does not mean it's ok for someone to set a shared cookie.

The possibility of exploitation does not mean exploitation or making
exploitation easier is fine.

~~~
nxzero
Point is Cloudflair giving abusive volumes of requests is ironic, they should
stop, and a global cookie won't harm anyone that knows how to use TOR and they
could even give the option NOT to set the cookie. Not offering a solution
because "I'm not a robot" doesn't work (happy to prove this) and users don't
userstand how to use TOR is not an excuse for their behavior and exploitation
of users.

~~~
ikeboy
How are they exploiting users?

~~~
nxzero
Requiring user to do work for free is the very definition. Google and
Cloudflair are very aware that there test don't work for stopping bots, but
they're very good at extracting free labor.

~~~
ikeboy
Cloudflare gets no benefit from the captcha, so if they were useless as you
claim, they have no incentive to keep them.

~~~
nxzero
Unless you work at Cloudflare and aware of it's relation to Google, any
comments on there relationship is speculation. That data is vital to Google
future and it think being valuable to Google beyond any direct benefit is of
value; I'm not aware of any company that provides more of this type of data to
Google; Google would have to pay 10k+ contractors $30+ an hour to do this if
it wasn't being done for free; Google [Google Search Quality Rater] if you're
not aware of what I'm talking about.

~~~
eastdakota
I work for CloudFlare. We don't get anything from Google for using reCAPTCHA.

~~~
nxzero
Thanks, might be worth updating the blog post to reflect this, what percent of
Google's reCAPTCHA data comes from Cloudflair, and why Cloudflair doesn't roll
their own to insure data is not being leaked/given to Google.

------
rnhmjoj
Why is everyone obsessed with knowing whether the user is a bot or an actual
person? What differences does it make? A bot it's not inherently malicious,
there are thousand of legitimate use cases: a bot may be downloading content
as part of a script or some application, checking updates or creating a cache.
Traffic generated by bots should not be blocked per se. There are certainly
lots of malicious bots scanning for vulnerabilities, DDoSing sites and so on
but this applies to people as well.

I think human generated traffic may have priority but blocking bots entirely
is nonsense: ultimately the user agent is always a "bot" acting on behalf of
an actual person: by blocking this traffic you may always break some user
workflow.

~~~
sigmar
>There are certainly lots of malicious bots scanning for vulnerabilities,
DDoSing sites and so on but this applies to people as well.

Nah, a human isn't going to waste their time refreshing a page manually 50,000
times.

~~~
rnhmjoj
But it surely can attemp cross-site scripting, send phishing and spam
messages, broken requests and look for exploits. Also TOR is generally so slow
I don't think there is even the possibility of generating enough traffic for a
DOS.

~~~
lenish
It would depend on the website's resources and services. For example, a layer
7 DoS which just queries an expensive endpoint on the website over and over
may not need high traffic volume to overload the website's systems.

