
On the Effectiveness of Traffic Analysis Against Tor Networks Using Flow Records [pdf] - uptown
https://mice.cs.columbia.edu/getTechreport.php?techreportID=1545&format=pdf
======
tedks
This is an easier version of a traffic analysis attack, an attack that Tor
expressly does not attempt to provide a strong defense against.

It relies on a malicious server and entry node. The contribution of this paper
is that if you have the malicious server and entry node, you can use a less
expensive data source (Cisco NetFlow data) rather than raw packets to perform
a correlation attack.

The correlation they achieve in a private Tor network is impressive; however,
if you look at the graphs in the actual paper[0], you can see that the
differences in correlations are actually quite small in the wild.

The title of this post and article is actually incorrect; the technique
demonstrated has an 81.4% accuracy. This means that the base rate fallacy will
make it nearly unusable in practice, and more so as the scale of Tor traffic
grows. For more on the Base Rate Fallacy, see [1].

So in summary:

* This is an incremental improvement of an already existing and known attack pattern on low-latency anonymity systems

* The technique presented in this paper is only a threat if your threat model is an adversary that can control your entry guard and the server you are trying to communicate with, but does not have the budget for packet-level correlation attacks

* This technique does not achieve sufficiently high accuracy and sufficiently low false positives to reliably identify arbitrary Tor users, but might be more successful if used in combination with a prior hypothesis that, say, a specific NSA employee is communicating with GlobalLeaks.

[0]
[https://mice.cs.columbia.edu/getTechreport.php?techreportID=...](https://mice.cs.columbia.edu/getTechreport.php?techreportID=1545&format=pdf)

[1]
[http://archives.seul.org/or/dev/Sep-2008/msg00016.html](http://archives.seul.org/or/dev/Sep-2008/msg00016.html)

~~~
Tepix
Do they really need to control the entry guard? Or is it "good enough" for
this attack if they merely have access to the netflow data?

~~~
tedks
They need to be able to access netflow data at the entry point.

To put it another way, let's say someone is posting bomb threats over Tor on
Google Plus. Google cooperates with the FBI and deterministically perturbs all
traffic going into the Tor network. What router does the FBI get netflow data
from in order to find the bomber?

~~~
tptacek
You should be aware that every tier 1 ISP already collects, aggregates, and
stores NetFlow, which means they have archives of historical data (though
probably not raw flow archives, though it would be possible to do that for
"anomalous" traffic --- meaning, not port 80) and, more importantly, the
capability to very easily enable filtered high-fidelity collection from most
points on their network with very few keystrokes.

You can thank DDOS trolls for that, I guess, because that's why the ISPs paid
to build that out.

~~~
belorn
Tor can today already use port 80 for all it traffic by using the
FascistFirewall setting.

If tier 1 ISP differentiate traffic on port level, then switching that setting
on should bypass the filter.

~~~
tptacek
I hope you understood what I was saying. I wasn't saying ISPs are limited to
monitoring non-port-80 traffic. They can most certainly do sophisticated
monitoring of "HTTP" traffic as well. All I was saying is that it was unlikely
that most ISPs had high-fidelity raw flow archives for all traffic, but
possible they they might already have them for non-port-80 traffic.

Hiding in port 80 will protect you not-at-all from traffic analysis, and all
the major ISPs already have the infrastructure to deploy traffic analysis.

------
hawkice
So, I'm a motivated privacy-seeker with enough technical chops to configure
any not-yet-for-the-masses technology out there. I would very much prefer for
my ideal strategy not to be "poop my pants in terror" \-- any guidance from
the knowledgeable HN-osphere?

~~~
sbierwagen
Freenet.

[http://en.wikipedia.org/wiki/Freenet](http://en.wikipedia.org/wiki/Freenet)

Upsides: Much more secure than Tor, (since it doesn't try to be a low-latency
mixing router) not funded by USG.

Downsides: Real slow. _Real_ slow. SLOW. Think effective throughput in the
tens of kilobytes/s, latency measured in minutes.

~~~
tedks
>Upsides: Much more secure than Tor, (since it doesn't try to be a low-latency
mixing router) not funded by USG.

Most academic computer science is funded by the American government. Does that
mean that all academic computer science is backdoored by the military?

To put it more specifically, most compilers researchers I know have at some
point been on a DARPA grant, because DARPA has money and academics want money.
I'm sure plenty of LLVM contributors have been paid from DARPA grants. Is LLVM
backdoored?

===========================

On an entirely different point, how is Freenet even comparable to Tor? They
address totally different use cases (Tor is an anonymizing TCP overlay;
Freenet is a distributed censorship-resistant store), and have very different
threat models.

Further, it seems very unlikely that Tor (which is a piece of very well-
maintained software with some of the foremost privacy researchers working on
it) would have fewer bugs than Freenet, which is a sprawling Java program
maintained by one man. Further, the security of Freenet in abstract hugely
depends on having a functional small-world network, which requires Freenet to
have been widely adopted to start with.

The comment parent's request was vague, but you simply cannot replace Tor with
Freenet because they do totally different things. It is nonsensical to compare
them because they address different threat models and accomplish different
goals.

~~~
rsync
I'm glad we're discussing this because there is a point I'd like to raise...

I always like to remind people that Tor is funded by the USG (currently,
actively). I think it's very important that people understand that and adjust
their threat models based on that.

However, it's not all bad news ... in fact, I think there's a very significant
upside to the USG funding of Tor:

It provides a very compelling defense in the event that _simply participating
in Tor_ begins to be prosecuted.

There is very often a worry put forth that simply participating in Tor (as a
client or by running a relay, etc.) can _in itself_ be considered an illegal
act. I think that as long as the USG is funding the development, people in the
US can rest easy that they can't be prosecuted in any way for simply
participating on the Tor network.

"If the USG is funding it and the state department is encouraging people to
use it (think arab spring) then how could my use of it be illegal ?"

IANAL.

~~~
simoncion
It's even better than that; tor was built to be used by spooks to cover their
open-source intelligence gathering efforts. [0] Tor continues to be used by
those same parties for those purposes.

Also, the Tor Project periodically sends out folks to remind the FBI and
friends that tor has many legitimate uses, and is routinely used by law
enforcement agencies as part of their day-to-day business. [1]

[0] [https://lists.torproject.org/pipermail/tor-
talk/2011-March/0...](https://lists.torproject.org/pipermail/tor-
talk/2011-March/019913.html)

[1] [https://blog.torproject.org/blog/trip-report-october-fbi-
con...](https://blog.torproject.org/blog/trip-report-october-fbi-conference)

------
Tepix
As the summary mentions: Tor is susceptible to this kind of traffic analysis
because it was designed for low-latency.

You give up a certain amount of security, but gain a lot of comfort.

Perhaps a future version of tor can offer a "comfort level" setting that
introduces a varying amount of delays, bogus traffic and other concealment
methods (ideally at every hop).

~~~
amelius
> You give up a certain amount of security, but gain a lot of comfort.

He who sacrifices security for comfort deserves neither :)

~~~
randomjoe2
Well if that's the case then that should apply to anyone on the internet. It'd
be much more secure to just not go on the web, so anyone on the web is
sacrificing a huge amount of security. These aren't black and white arguments.

~~~
ghodith
I think he is just taking the opportunity to quote Benjamin Franklin. Not so
much offering an argument of any hue.

------
colordrops
Even in incognito mode, on my machine Chrome puts out enough unique info
identify it across 4 million other users. You can test your browser here:

[https://panopticlick.eff.org/](https://panopticlick.eff.org/)

How well does Tor do with panopticlick?

~~~
Forbo
Last I had checked, using the browser bundle in Tails with javascript disabled
puts the fingerprint to about 1 in 22,000, but it's been a couple months since
I last tested. Going to fire up my machine and see what it shows now, will
report back with results shortly.

Update: Tor Browser in Tails 1.2 with javascript disabled returns a
fingerprint that's shared by 1 in 2,615 with 11.35 bits of identifying
information.

~~~
kissickas
How do I have more identifying bits of information, but it's shared by more
browsers?

>Within our dataset of several million visitors, only one in 4,955 browsers
have the same fingerprint as yours.

>Currently, we estimate that your browser has a fingerprint that conveys 12.27
bits of identifying information.

I figured it would be more unique due to my running Tor Browser on a Mac - but
I don't see how the math works out. Unless it actually has a count of machines
with information identical to mine?

~~~
mbrubeck
1/4955 means your fingerprint is shared by _fewer_ browsers than the other
commenter's 1/2915.

If your fingerprint is totally unique (like mine, with 22.16 bits of
information) then it is shared by only 1 in 4,697,672 browsers in their
dataset.

------
readmission
If you have the technical and OPSEC wherewithal, it appears that running you
own "private" (PublishServerDescriptor 0) exit node has become an extremely
attractive anonymity tool.

There are lots of downsides to this (epistemic attacks), but if your anon use
case makes sense for such a setup, it is a valuable tool to have in the
toolbox.

~~~
Torgo
Can you explain this? bridge relays are setup using "PublishServerDescriptor
0" but exit nodes? How does that work?

~~~
readmission
[https://lists.torproject.org/pipermail/tor-
relays/2011-Augus...](https://lists.torproject.org/pipermail/tor-
relays/2011-August/000900.html)

If you just config it with PublishServerDescriptor as 0 and someone else knows
the IP (middle relays) they will be able to use it.

It's essentially a function of not announcing the node to anyone.

Edit: And to be clear, priv exit nodes don't prevent the timing attack in the
article.

------
SoftwareMaven
So encrypting content as it travels through Tor is generally considered best
practice, but how does one get/manage/pay for a VPN such that the VPN itself
doesn't lead directly back to you?

~~~
hawkice
So, to clarify something: you connect to the VPN, and the VPN connects to TOR.
If you do it the other way around (as you point out) it's pretty worthless. If
you connect to TOR from the VPN then the most de-anonymized you are going to
get is "customer of this VPN provider".

As for paying for services such that "customer of this VPN provider" can't be
linked to you, I'd look at places that specifically accept bitcoin (I'm not a
fan in general, but in this case, it signals their willingness to avoid
collecting normal billing data like name, address, cc#, etc.).

[https://proxy.sh/](https://proxy.sh/)

I have no affiliation but they popped up in the googlings. Seems to be simple
enough to get their service without giving away anything personally
identifiable.

EDIT: the reason I tease out those two parts is because I'm concerned people
will think TOR can protect against entities that can do a global customer-of-
VPN dereference. TOR doesn't protect against global super-adversaries, consult
your local security practitioner before feeling safe, etc. etc.

~~~
onewaystreet
VPN -> TOR is wrong. Think about what happens if the VPN gets owned or seized.
The bottom line is if your need for anonymity isn't just fantasy (i.e. you are
actually a target of law enforcement) then no layer of your protection should
be an IP that is connected to your real identity or location.

~~~
escapologybb
I am unsure at this VPN -> TOR notification, would you mind clarifying it for
me please?

At the moment if I start up my VPN client and wait for the connection to my
very trustworthy VPN provider, and only then open to Tor Browser Bundle, is
that the right way round?

Thanks for the clarification.

------
Fogest
More and more just keeps coming out and it honestly seems like Tor is not a
flawless method at all for anonymity and it is not something I'd trust relying
on. I'd put more trust on using Tor in combination with a logless VPN.

~~~
AndrewKemendo
There is no flawless method of anonymity. The best you can do is have
plausible deniability.

~~~
lazaroclapp
Well, you can construct the following protocol for fully untraceable
communication between n nodes, numbered i \in 1...n: Divide time into
intervals of size t, where t is larger than the time it takes to propagate a
message of constant size s from any node to _all_ of the remaining n-1 (e.g.
by flooding). For interval j, node i = j % n _always_ transmits a message m of
size s to all other nodes with the following characteristic: m is either the
output of a PRG or a message encrypted with the key for a host k to which i
wishes to communicate a message, the choice of which is entirely up to i.
Under this scheme - assuming previously set up authentication between every
pair of nodes and an encryption scheme in which encrypted messages are
indistinguishable from random data without the decryption key - any node can
send a message to any other node in such a way that no one else inside or
outside this network can know the contents of the message or even that the
communication took place. For any node not receiving communication, the
protocol would be indistinguishable of one in which all transmissions are
random noise.

Of course, the issue is that latency in this scheme is O(n) and per-node
bandwidth is O(1/n), with large constants. Also, it's a reasonable suspicion
in practice that no one would set up this scheme and then actually have zero
communication going on over it, so it still reveals that "at least one of the
n nodes is talking to at least another of the n nodes".

------
comboy
It's obviously hard to verify it, but I'd love to hear tor devs take on this.

I may be paranoid, but it seems to be that there is some FUD around tor
lately. And if many people will stop using it, it will indeed become less
secure.

~~~
killface
i think it's largely a misinformation campaign to keep people who aren't doing
anything illegal from anonymizing their traffic. I like the idea of using Tor,
but from what I can see, there's nothing of value for me on there. I don't
need to buy drugs or get illegal porn, and I don't really care about
anarchist-cookbook style sites. I'm glad it exists for journalists and
whatnot, but I don't find it useful for an average person.

------
exo762
And as I understand this attack it is possible only if connection between
target server and client is not protected by SSL.

------
devindotcom
One more reason to have open-developed, ground-up-built endpoint hardware. I'm
surprised there hasn't been a credible Kickstarter or something. People
wouldn't mind slightly fewer features or lower performance due to reverse
engineering an older design - if it meant real, publicly auditable security at
that link in the chain. Still lots to do to improve security but that's one
that's always seemed neglected to me.

~~~
SwellJoe
While I agree that open hardware is a good idea, Cisco has no malicious intent
with NetFlow. It is a legitimate tool for maintaining and improving network
quality. This isn't an instance of a vendor providing a back door to the NSA
(or whoever). It's an accident of the protocol that it is easier to derive
this data from than raw packet sniffing.

~~~
devindotcom
No, certainly I don't think them malicious, I'm just surprised the option
doesn't exist.

------
tedks
This link is blogspam and should be redirected to the source:
[https://mice.cs.columbia.edu/getTechreport.php?techreportID=...](https://mice.cs.columbia.edu/getTechreport.php?techreportID=1545&format=pdf)

~~~
dchest
No, it's an article about research paper. Please don't mislead moderators.

~~~
dang
Hmm. This is a borderline case. The article mostly lifts from the paper,
especially the diagrams. On the other hand, HN often prefers the best popular
article on a story, with the specialized paper linked in the comments. The
reason is that specialized papers are less accessible to a general-interest
audience.

That said, papers in computing are an exception because the audience here is
informed in that field. And the thread, at this point, provides a lot of
context.

Given the above I think the paper is probably the better URL, and we've
changed to it from [http://thestack.com/chakravarty-tor-traffic-
analysis-141114](http://thestack.com/chakravarty-tor-traffic-analysis-141114).

Marking this subthread as obviously off-topic.

------
Thaxll
Who had the terrible idea of the top scrolling menu...

------
paulhauggis
I stopped trusting Tor a couple of years ago.

