Hacker News new | past | comments | ask | show | jobs | submit login
Detecting Tor Communication in Network Traffic (netresec.com)
80 points by sbierwagen on Apr 6, 2013 | hide | past | web | favorite | 27 comments

A few years ago, I wrote a client (tortunnel) that connects directly to Tor exit nodes and pretends to be a full circuit, allowing you to use Tor exit nodes as one hop proxies. I'm sure that things have changed since then, but I got to know the protocol while writing that, and my recollection was that Tor traffic would be easy to detect:

1) Most people aren't using bridge nodes, and are instead connecting to Tor nodes publicly listed in the consensus. No DPI necessary.

2) Tor clients/servers used to signal their intent by including "special" cipher suite combinations in the TLS handshake. IIRC, they later switched to doing a "normal" looking TLS handshake with an immediate TLS renegotiation once the outer handshake was complete. That's a very distinctive traffic pattern.

3) All writes were translated into cells that were padded to 512 bytes. So by design, all Tor traffic looks the same.

4) The circuits were much longer lived than a standard TLS connection.

My sense was that Tor was originally designed to use TLS for innocuous egress filtering compatibility, rather than for explicit censorship resistance.

Yes, at the moment points 1 and 3 are enough to distinguish Tor traffic from other HTTPS traffic. They're working on a few solutions though:


Meh, there's a reasonably large amount of research on disguising Tor traffic nowadays. Two recent interesting papers were StegoTorus [1], which conceals Tor traffic via steganography in a way robust to the above statistical analysis, and -- more exotically -- SkypeMorph [2], which obfuscates Tor traffic to mimic Skype traffic.

[1] http://www.owlfolio.org/media/2010/05/stegotorus.pdf

[2] http://www.cypherpunks.ca/~iang/pubs/skypemorph-ccs.pdf

I find it interesting that the article uses the term "TOR" even though the Tor FAQ clearly states the following:

Note: even though it originally came from an acronym, Tor is not spelled "TOR". Only the first letter is capitalized. In fact, we can usually spot people who haven't read any of our website (and have instead learned everything they know about Tor from news articles) by the fact that they spell it wrong.

They're just not doing it right, that's all. Few are. Nothing interesting about it.

Had the author used "Tor" instead of "TOR," I wouldn't be wondering what else they may have done wrong.

Kind of like the story about Van Halen and brown M&M's?

Also Linux should be written GNU/Linux.

you are, of course, right. but it's irrelevant to this point. there is a strong consensus, that it's 'Tor' and not 'TOR'. there is _no consensus at all_ on 'gnu/' or not 'gnu'.

I was trying to be sarcastic, but OK.

tl;dr: TOR traffic looks like https, but can be detected via traffic analysis. There is more info on how traffic analysis can chip away at TOR users' anonymity in this paper: http://www.cl.cam.ac.uk/~sjm217/papers/oakland05torta.pdf

That's a rather old paper though, it's 8 years old and I could have sworn that it was shown to not be effective in detecting current TOR traffic. From what I understand, you have to have massive traffic shaping capabilities to even begin to "chip away", just not a viable nor efficient solution as it takes huge resources. They're not going to devote that amount of time and money just to shut down a few nefarious drug resellers or other places.

Good call, I didn't notice that. Thanks for pointing it out.

Detected but not decrypted, unless you're an exit node. TOR is still fundamentally sound, especially in the past few years.

That depends on why you're interested in Tor. While my sense is that it was originally designed as an anonymity tool, it seems to have really exploded for use as a censorship circumvention tool.

Most of the latter types of users are probably much less interested in whether the destination website can identify them, and potentially much more interested in Tor traffic going undetected by censors.

TL;DR: Tor is a peer-to-peer anonymity system providing TCP-like connections over SSL/TLS on TCP port 443.

Anonymity can be used for good. It can be used for evil. A botnet has been seen utilizing Tor.

There's a program called "Caploader" with a checkbox labeled "Identify protocols". Checking the box can identify traffic speaking the current vanilla Tor protocol.

What specifically about TOR's TLS stream allows it to be identified as TOR traffic? The article simply says to load pcap files into the tool ...

Makes me feel like reading the article was a waste of time. I want technical details.

Dead comment from h72a (brand new account; possibly double-posted and deleted the wrong one?):

h72a 14 minutes ago | link [dead]

Tor's TLS handshake exhibits a number of peculiarities which distinguishes it from HTTPS. The cipher list inside the TLS client hello used to be a (almost?) unique (see http://www.cs.kau.se/philwint/static/gfc/ ) and the SNI contains a random bogus domain.

packet sizes and inter-packet timings. This paper might peak your interests http://cacr.uwaterloo.ca/techreports/2012/cacr2012-08.pdf . It tries to obfuscate the network traffics by morphing them so they statistically look like Skype Traffic.

They even open sourced their code at http://crysp.uwaterloo.ca/software/CodeTalkerTunnel.html

My guess is that the timing, relative sizes, and/or destinations of packets sent distinguish one from the other.

Exactly, there's nothing really useful in this article.

For 900 EUR, however, you can buy yourself a copy of their tool.

100% agree about the uselessness of the article.

If I were to take a stab in the dark about how the tool is doing it, though - based on their "statistical" analysis comment, my guess is they're measuring sustained traffic levels / TCP connection duration. Your average encrypted web session won't look anything similar to a command-and-control bot calling home over Tor to some irc server (which is their example usage for the tool). Possibly including "known" Tor node IP addresses, as well.

In addition, there was that Ethopian DPI filtering project against Tor that happened last summer (https://blog.torproject.org/blog/update-censorship-ethiopia), with the Tor Project thinking they'd somehow fingerprinted some aspect of their TLS handshake. Maybe this knowledge is spreading.

I suppose this points out a social function that malicious software can fill.

The discovery of methods to identify TOR traffic in the pursuit of reigning in malicious software, should encourage the TOR network to become less easily detectable before authoritarian governments manage to shut it down more effectively.

Seems unlikely that the antimalware scene is driving the Tor detection research. China seems to be putting the lots of effort behind it for censorship purposes.

See: How Governments have tried to Block Tor https://www.youtube.com/watch?v=GwMr8Xl7JMQ

I was partly responding to my own irritation with the underlying premise of the article. If Tor is being used maliciously to deliver or receive an encrypted payload on your computer or network, it isn't a problem caused by Tor. Furthermore, Tor has an enormous social benefit.

In other words, my first reaction was that it is harmful to attack the technology, but realized that is a silly argument for obscurity. Publishing a vulnerability, and more people publicly searching for vulnerabilities is a good thing, since authoritarian actors will just exploit what they find without any disclosure.

Yes, the Tor developers are working on it. See e.g. https://www.torproject.org/projects/obfsproxy.html.en

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact