

Flaws in Tor anonymity network spotlighted - kmfrk
http://arstechnica.com/tech-policy/news/2010/12/flaws-in-tor-anonymity-network-spotlighted.ars

======
bengross
I wish the article's author had done a little bit of background work to find
references to the CCC presenter's research.

Here is the paper published last year describing the research on
fingerprinting. The second URL at uni-regensburg.de does not require an ACM
account to download the paper.

Website fingerprinting: attacking popular privacy enhancing technologies with
the multinomial naïve-bayes classifier
<http://portal.acm.org/citation.cfm?doid=1655008.1655013> [http://epub.uni-
regensburg.de/11919/1/authorsversion-ccsw09....](http://epub.uni-
regensburg.de/11919/1/authorsversion-ccsw09.pdf)

Dominik Herrmann, University of Regensburg, Regensburg, Germany Rolf
Wendolsky, JonDos GmbH, Regensburg, Germany Hannes Federrath, University of
Regensburg, Regensburg, Germany

"Privacy enhancing technologies like OpenSSL, OpenVPN or Tor establish an
encrypted tunnel that enables users to hide content and addresses of requested
websites from external observers This protection is endangered by local
traffic analysis attacks that allow an external, passive attacker between the
PET system and the user to uncover the identity of the requested sites.
However, existing proposals for such attacks are not practicable yet.

We present a novel method that applies common text mining techniques to the
normalised frequency distribution of observable IP packet sizes. Our
classifier correctly identifies up to 97% of requests on a sample of 775 sites
and over 300,000 real-world traffic dumps recorded over a two-month period. It
outperforms previously known methods like Jaccard's classifier and Naïve Bayes
that neglect packet frequencies altogether or rely on absolute frequency
values, respectively. Our method is system-agnostic: it can be used against
any PET without alteration. Closed-world results indicate that many popular
single-hop and even multi-hop systems like Tor and JonDonym are vulnerable
against this general fingerprinting attack. Furthermore, we discuss important
real-world issues, namely false alarms and the influence of the browser cache
on accuracy."

Also related (no account required to download the paper):

Compromising Tor Anonymity Exploiting P2P Information Leakage
<http://fr.arxiv.org/abs/1004.1461>

Pere Manils, Chaabane Abdelberri, Stevens Le Blond, Mohamed Ali Kaafar, Claude
Castelluccia, Arnaud Legout, Walid Dabbous (All - INRIA Sophia Antipolis /
INRIA Rhône-Alpes)

"Privacy of users in P2P networks goes far beyond their current usage and is a
fundamental requirement to the adoption of P2P protocols for legal usage. In a
climate of cold war between these users and anti-piracy groups, more and more
users are moving to anonymizing networks in an attempt to hide their identity.
However, when not designed to protect users information, a P2P protocol would
leak information that may compromise the identity of its users. In this paper,
we first present three attacks targeting BitTorrent users on top of Tor that
reveal their real IP addresses. In a second step, we analyze the Tor usage by
BitTorrent users and compare it to its usage outside of Tor. Finally, we
depict the risks induced by this de-anonymization and show that users' privacy
violation goes beyond BitTorrent traffic and contaminates other protocols such
as HTTP."

------
mike-cardwell
This exact same flaw exists for HTTPS. Well, SSL in general. It's not Tor
specific.

~~~
jmillikin
HTTPS/TLS doesn't try to anonymize which site a client is accessing; any
intermediate can read the address of outgoing packets to determine the
server's identity. It's a little trickier when a server supports SNI, but few
do.

TOR is supposed to prevent intermediaries from determining which sites a
client is browsing, which is why this technique is interesting.

~~~
mike-cardwell
You can use this technique with Tor to make a reasonable guess if a user on
your LAN is visiting a certain website.

You can also use this technique with plain https to see if a user that visits
a certain website is downloading certain files from it, or accessing certain
pages inside the website.

It is an interesting attack, but it's not one to get seriously worried about.

~~~
getsat
> You can also use this technique with plain https to see if a user that
> visits a certain website is downloading certain files from it, or accessing
> certain pages inside the website.

How do you do this when all the HTTP headers (the Server: and actual GET/POST)
are part of the encrypted stream of data? You can't even see the specific
domain they're trying to access, only the host/ip of the server.

Am I missing something?

~~~
mike-cardwell
I explained this in a comment further up. I'll repeat here:

It's just simple traffic analysis. A page load generates a certain number of
request/responses. Each request and response is a specific size, and will be
transferred in a specific order. You create a fingerprint of that and it
doesn't matter if the page is opened via a plain http channel, or https, or
over Tor, The fingerprint will be the same (almost).

~~~
getsat
Thanks for the explanation. I misunderstood the context of your comment.

------
jmillikin
I'm doubtful of their 55-60% accuracy claim; how could a statistical analysis
of encrypted traffic differentiate between samizdat and benign text? Or, more
relevantly, whether someone browsing Wikipedia is looking up Tienanmen or just
porn?

~~~
mike-cardwell
It's just simple traffic analysis. A page load generates a certain number of
request/responses. Each request and response is a specific size, and will be
transferred in a specific order. You create a fingerprint of that and it
doesn't matter if the page is opened via a plain http channel, or https, or
over Tor, The fingerprint will be the same (almost).

~~~
justsee
But it's not that simple if the client is running as a relaying node as well,
is it? The mixture of client traffic and relay traffic would make traffic
analysis much more difficult.

Of course if you're that interesting that your ISP is doing traffic analysis
on your connection you quite possibly have more pressing security issues.

~~~
gwern
Merely reduces the statistical power, doesn't make the inferences go away
completely.

And there may be techniques for filtering out relayed material - perhaps relay
traffic emerges from the node quickly enough that an observer can then figure
out what entering traffic was just being relayed and remove it from
consideration (always a concern with a high-performance mix network since you
can't randomize retransmission as strongly as you could with email mix
networks like Mixmaster where you could wait hours, without rendering it
unusable) or relay traffic is constant enough that one can assume any 'spikes'
are the user's traffic.

------
jondos
Note that we at JonDonym will have developed a strong countermeasure within
the next few weeks...

------
aresant
I don't get Tor - doens't the risk that somebody does something illicit that
appers to originate from your IP render its value questionable?

EG the German arrested when a bomb threat was posted via Tor but traced back
to his IP?

<http://news.cnet.com/8301-13739_3-9779225-46.html>

~~~
getsat
A handful of the highest throughput Tor exit nodes (named "blutmagie") are run
by a single German fellow who is technically/legally his own ISP. Since it's
not the same person, I'm assuming the one mentioned in the article did not go
through all the same precautionary steps as the blutmagie admin.

<http://anonymizer2.blutmagie.de/>

