
Handling bot attacks against a Tor hidden service - dsr_
http://www.hackerfactor.com/blog/index.php?/archives/762-Attacked-Over-Tor.html
======
abuani
I have never felt such a greater sense of inadequacy in cyber security as I
have after reading this article. This is simply amazing the level of
sophistication Eddie was in terms of an attack on the service, and the
mitigation techniques used were things I never would have considered. I
thought zip/tar bombs were just relics of yesteryear that older folks talk
about when they discuss how fun it was to prank the new hires.

Serious question: how does one begin to gain the knowledge necessary to
mitigate such an attack, and is it something that developers should be more
familiar with?

~~~
WrtCdEvrydy
Zip bombs are a classic way to crash any web service that allows you to upload
files, modern AV will sometimes fuck up and bite into the file (normally in
the legacy fields since they require 'brand name' 'well known' antivirus like
'Norton')

Especially if you are aware that they open the files to "extract info" from
them. You can modify the file extension to the correct type and let it rip.

~~~
abuani
I found this:
[https://github.com/abdulfatir/ZipBomb](https://github.com/abdulfatir/ZipBomb)
which I will be looking into today! Are zipbombs typically something a
developer should be actively writing to protect against? Or does a library
like Helmet typically provide protection against these attack vectors?

~~~
WrtCdEvrydy
Protection generally comes in the fact that the AV will explode or your
library will explode. You just need to ensure that such an explosion does not
destroy your service as well.

~~~
SomeStupidPoint
And try to make it explode quickly -- if it fails slowly, it can be uses to
DDoS, by getting all your worker threads to spend most of their time on your
files.

------
nikcub
Pretty easy explanation: you've bumped into a few of the ~dozen people who are
crawling hidden services for research or law enforcement purposes.

When you publish your HSDir they'll come and crawl, and chances are none of
them were expecting a 50PB archive.org mirror and just got stuck.

It's likely that once the operators of each crawler realized this HS was an
archive.org mirror they stopped the crawls.

The early version of a crawler I ran across hidden services would have tripped
up in exactly this way[0]

Everything else in this post is either a misunderstanding of Tor[1] or plain
paranoia.

[1] the top exit nodes have little to do with who is crawling or attacking a
hidden service, France and Germany feature heavily in nodes because of the
many cheap Tor-friendly hosts, there is nothing 'unusual' about unnamed nodes
and the AS confusion is just someone doing a good job of staying anonymous -
thanks for reporting them

~~~
rrobukef
I'd say not holding to the standards of robots.txt and 403-Forbidden is quite
malicious, just not evil or bad. If you build a crawler, you should play nice.
But bot A-D were easily discouraged.

Eddie however is another problem. It overloads the network, doesn't crawl and
doesn't parse the responses. This is not crawler behaviour...

The rest of the post is solid inductive reasoning (from my perspective): the
bot is identifiable by his behaviour. It has a faster response time that a
source-relay-source roundtrip. Thus the bot must originate there.

This is supported that the anonymous relays were set up just before the
attack, all at the same time. And after the attack stopped the majority of all
traffic through the relays stopped.

There are also ways to keep your registration private without resorting to
fraud. Though probably a number of people think of this as the 'easy'
solution.

~~~
nikcub
> I'd say not holding to the standards of robots.txt and 403-Forbidden is
> quite malicious

Most hidden services don't publish robots files. The only ones that do are the
proxy services (which are hidden services but not usually 'hidden'). The
purpose of the proxying is to find, discover and monitor what are usually
illegal or malicious services.

I don't think there are legitimate crawlers on hidden services - there are a
couple of drug market search engines but they identify themselves outside of
robots.txt

It's really difficult to run a large-scale hidden service because of this -
you need to be able to throttle or block connections but not based on the
inbound circuit. You also need to setup guards (which OP makes no mention of)

> It overloads the network, doesn't crawl and doesn't parse the responses.

It's likely adding those later responses into a crawl queue that is tens of
thousands of URLs long.

Overloading the network is unintentional, usually your crawling is throttled
by your circuit.

------
apeace
> If I could easily tear down the entire tunnel from the remote client to my
> hidden service, then the delay to rebuild the tunnel would mitigate the
> resource exhaustion attack ... For example, if I see hostile activity from
> 127.0.0.1:12345, then I want to close the entire Tor connection associated
> with this port ... forcing him to renegotiate the entire tunnel.

This seems like a great suggestion for Tor. I hope the author will get in
touch via the mailing list and see what solutions might be possible.

------
irl_
This is the second time in a week now I've seen this guy talking about Tor on
HN. He just seems to not understand Tor.

> He's exploiting a vulnerability in the Tor daemon

This is a vulnerability present in literally any proxy and is a limitation of
the operating system. If you open more sockets than you can open, you can't
open any more. You can not have anonymous unlinkable connections and have
tracking psudeonyms together.

If you wanted to scale, try OnionBalance.

[https://onionbalance.readthedocs.io/en/latest/](https://onionbalance.readthedocs.io/en/latest/)

> The TorStatus page has no country associated with the ASN information

I know that for atlas.torproject.org we use the MaxMind GeoLite GeoIP service.
If this is not listed here, no information will show. This is more common than
you would think, especially for Tor relays which are commonly hosted on
smaller ASs than MaxMind care about.

------
DamonHD
The mitigation goes against my instincts to tar-pit; but in this case it is
the server not the client that is resource limited. Very interesting, and
bloody annoying that someone would make such an effort to break a tool such as
this.

------
libeclipse
Did anyone else get a notification to install that site's certificate?

It's a strange technique.

~~~
oger
Same here - but the site was asking me to identify myself with a cert. So the
other way round to what you are describing. Did not have the time to follow
through - but it's certainly _very_ weird behavior. Could be some bad guys
trying to identify who is stealing their show...

~~~
EvilTerran
Most likely there's just something hosted on the same IP address that makes
legitimate use of the "client certificates" feature of TLS. In order for that
to work, the server has to express an interest in client certs, and that
happens early in the TLS handshake, IIRC before SNI has been resolved - so
even if you only want to use them on one domain, your server will always ask
for them.

The way it's meant to work, the server can specify which certificate
authorities it accepts client certs from, and your browser will only prompt
you to pick a cert if you have one loaded from one of those CAs - if you
don't, you won't even know the server's asking; in practice, some browsers
will show the dialog in any case. ISTR some versions of Safari act like that.

(I ran into that at work - we were setting up a web API authenticated with TLS
client certs, and started getting bug reports from (largely non-technical)
users, completely befuddled by these dialogs that had started popping up for
them on our human-facing domains; we ended up provisioning a dedicated IP just
for the API to work around it.)

------
belorn
The fake IP address whois was very interesting. While fake whois for domain
names are common and not that big, fake IP address whois means that there was
a ISP out there that endanger their peering. Feels similar to when a CA gives
out bad certificates, and I wish RIPE would act similar to how browsers do it.
There is few enough ipv4 address that we can afford to be picky about it in
clear cases like this.

------
Sami_Lehtinen
I played at some point with Tor bot code and I got rates up to tens of
thousands of requests per second and hundreds of megabits per second. So the
attacks don't sound too serious in this case. Also modifying Tor itself to
allow higher data rates is possible, if you're already using anonymous
systems. Then you can avoid the slowness caused by onion hops.

------
philamonster
I always find the mitigation sleuthing stuff to be so damn fascinating and
humbling. Inspirational stuff. Thanks to OP and Dr. Neal.

------
Matt3o12_
This might be a stupid question but it sounds like those attackers access the
Tor Server directly (without using relays). If this is intact the case, why
does he not just ban those IPs from those offending hidden and seemingly
private relays? Wouldn't that solve the problem until they get a new ip?

~~~
c22
They're connecting through tor, not directly. I'm not sure what makes you
think that.

------
10165
"And it isn't like they were doing HTTP 'HEAD' requests -- no, they were doing
'GET' requests."

Some httpds treat them the same -- they still send the file after HEAD
requests instead of only the headers.

~~~
jimktrains2
1) Can you provide an example?

2) Wouldn't it still be best to do HEAD in any circumstance where you don't
want the body?

~~~
10165
1) nautil.us

2) Yes.

I was not implying one should do otherwise. I was just pointing out that
servers that respond to GET will not always respond to HEAD as expected. Some
sites treat it the same as GET. Others may not allow it. For example, Amazon
responds with 405 MethodNotAllowed.

~~~
jimktrains2
[http://nautil.us](http://nautil.us) appears to respect HEAD, and uses Apache.
Do you have a specific example of it not respecting it?

% curl -vX HEAD [http://nautil.us](http://nautil.us) Warning: Setting custom
HTTP method to HEAD with -X/\--request may not work the Warning: way you want.
Consider using -I/\--head instead. * Rebuilt URL to:
[http://nautil.us/](http://nautil.us/) * Trying 107.20.148.228... * Connected
to nautil.us (107.20.148.228) port 80 (#0) > HEAD / HTTP/1.1 > Host: nautil.us
> User-Agent: curl/7.47.0 > Accept: _/_ > < HTTP/1.1 200 OK < Access-Control-
Allow-Origin: * < Cache-Control: post-check=0, pre-check=0, max-age=0 < Cache-
control: no-cache="set-cookie" < Content-Type: text/html; charset=utf-8 <
Date: Sun, 07 May 2017 04:55:46 GMT < Expires: Thu, 11 May 2017 00:00:00 GMT <
Last-Modified: Sun, 07 May 2017 04:55:47 GMT < Pragma: no-cache < Server:
Apache/2.4.25 (Amazon) PHP/5.5.38 < Set-Cookie:
lbh_session=%2B67OvEeIwXsDYbsLxSZLVjlyp%2BWUj%2BOntgIOlRdx6qoOLqyx3WuVpd2ZEH074o5bxTr7IebRTJsGpVdyaw75GEir4ZwZwrmiKAojkoOkvduxZAtpg8D4SAqwNb1EB0l3eOb1gMt%2FMuYpGZsouFJtPHTXssM82%2FKFkU7Gxm%2BTAheHa%2F7VyQ%2BAysgzthDcDyd9RYvU7NXmFAwh596ZEk7TtkwzAGVcoL%2FLjImPvk5q6Xl%2BKMWQDvOkVPIc0JtuC1rWIy3DUsOas8vCM%2BWYdv9KW9lElqzk5IHS6L7kkWSNb7U%3D44229d0bd83cc6954cf8ad73bc14d08a1d039d9a;
expires=Wed, 17-May-2017 04:55:46 GMT; Max-Age=864000; path=/ < Set-Cookie:
lbh_session=eDfxDyIM%2BJFhuIpEml2KXA9B8Wcyc4Bo8GJfD0Xr3dNzGgh%2B2QdqgZWRhFVFBguslYrnQfnmrKorJjhwM47N969Qwx1NFLintVOKhP3ivrS5BVq4Kwos59OOpklaUifDEOH1FX9BG8%2BHGX9Fn8kb2duHS%2F1BRJFnGaEyOA1qmB7sFPhsjVPAL2%2BTYHNByRvxwnA2CqaY09uKs%2FC5ui6rnYCRYvI3Q7Z6KLL8QWVlT5rs71FQ%2BYXbdQyIHgiPR7yN8JnaHMgaz4qzETfr6heE04uLfUSKjIjQMM5v0YAEK0I%3D06738da1155f8af1a61c5d13cd8cee0513d4175c;
expires=Wed, 17-May-2017 04:55:47 GMT; Max-Age=864000; path=/ < Set-Cookie:
lbh_session=9jlaxMJdXhuYik9BjvgSVfG2Xp8HJBLTUNeI8HNcw52ORZC5bbei%2F22YgBTWHMmym1fSQHljSs9dwUbQE5Zgx%2FIWki3S8aakHI%2BXac30JU5eI3FFLeWORwFrsJDniM%2BKCDyhUi5i2zad8aYF%2FNnndhh4yISYk0ASjKa4%2BAnQxR3fZjqK1iw44K3Oe%2FoVc4weHIYCra6ecNnMWkFzBZkLUuJ%2F1gJN0w%2FNdjFs8DERSHLteTbg2OnqjOSEmn62fYXUb%2FW6YRQblJB0J%2BElbJ%2BKIn5v5NRXAerGcIT2O%2F6t08s%3D70ec7ec53459a33b68c9fda357cfbf634fcada85;
expires=Wed, 17-May-2017 04:55:47 GMT; Max-Age=864000; path=/ < Set-Cookie:
lbh_session=DUK2jE22vfFQmL5vZpV8LpqFsFD0%2F1aHV2mpi6MHNOw4oEastxJGbqL70Tlq79lpD%2F41%2Bl9P%2Bz4%2B8aNESLphAr4%2BlwkEn83jPGE2J83JazLGQJC07ndgXRL7Hf%2FsXbMnyaOwpFPGRwQ7AdLvuIfX8j0lQ7gEEoAF4NQmupcPo0PeQ41gTAf3tJbusD4ONNqkLVi3lGH1qhT%2FjXbu1mpPwYdcZyU18OU3qomqbWkx%2B1RsX8vsiHjoCADs%2FIHhZaY4rBH%2BDi6oDS8JR9vgBG5ll6jN3eTlXtvRblDHE1IMHMA%3D78fb89e5fda5a29bb58f6ab3b872d9150e7ecd9b;
expires=Wed, 17-May-2017 04:55:47 GMT; Max-Age=864000; path=/ < Set-Cookie:
AWSELB=E93BBFC71E4DF46DDD850E2C67B1FBE52FEAA0E103B670233CA20FC7694721647519A155E8C10ED0C96618595B97A7D45BA1E9EE061A86361B235D0E008D08712CA9113D57;PATH=/;MAX-
AGE=604800 < X-Powered-By: PHP/5.5.38 < X-UA-Compatible: IE=Edge,chrome=1 <
Connection: keep-alive * no chunk, no close, no size. Assume close to signal
end <

~~~
10165
There is no need for keep-alive for a single HEAD request. Why use HTTP/1.1?

    
    
       cat << eof |nc -vv nautil.us 80
       HEAD / HTTP/1.0^M
       Host: nautil.us^M
       User-Agent: curl/7.47.0^M
       Accept: */*^M
       Connection: close^M
       ^M
       eof
    

Anyway, it looks like they fixed the problem or I was mistaken.

I will need to find another example.

Meanwhile looking on stackexchange one can still see people running websites
asking whether to block or "turn off" HEAD as recently as last year.

If a user expects every website to respond properly to a HEAD request, then
the user might be occasionally "surprised". This is because not every person
running a website understands or agrees how HEAD can be useful. Sadly, GET is
the only method that a user can expect to work on across _all_ websites.

