
Passive Deanonymization of Tor Hidden Services [pdf] - mikemoka
https://davidlazar.org/papers/torhs_fingerprint.pdf
======
elmotri
Here is the TOR Project's response:
[https://blog.torproject.org/blog/technical-summary-usenix-
fi...](https://blog.torproject.org/blog/technical-summary-usenix-
fingerprinting-paper)

~~~
peterwwillis
tl;dr "we don't think they can uniquely identify public websites that well
though they can obviously identify someone is using hidden services, so we
should probably make it more difficult to identify hidden services"

------
rdudek
So, big question. How anonymous is the TOR Project? Can it be trusted?

~~~
peterwwillis
I may trust my mother a lot, and maybe she wouldn't rat me out to the cops,
but that doesn't mean she can stand up to enhanced interrogation techniques.
Tor alone will not protect you completely, but it's better than nothing.

------
belorn
> Then he trains a supervised classifier with many identifying features of a
> network traffic of a website, such as the sequences of packets, size of the
> packets, and inter-packet timings.

I wonder how easy this is when a lot of websites share both infrastructure and
frameworks. How much uniqueness can you extract from a single blog running
wordpress when there hundred similar sites hosted on the same server?

~~~
peterwwillis
In general, it depends. Statistical correlation analysis works pretty well as
long as you have enough data, the data points you're looking at aren't
specifically designed to change constantly (such as with a strong boolean
function in a stream cipher correlation attack), and they don't introduce
false positives. But if the data points you're looking at are strong
indicators of unique data and your training method can correlate strongly with
a specific user's requests, it shouldn't be too difficult.

Section 7 (page 12) goes over website fingerprinting and how they're able to
gain a significant positive match and low false-positive match rate. They can
more easily identify noise from signal than previously thought, we're only
looking at hidden services (which number far fewer than the number of public
websites), hidden service pages change much less frequently than public ones,
and they exclude data points that could raise the false positive rate.

Also note that the classifiers here were even better at identifying specific
clients than what hidden services they were accessing. So, not only is your
hidden service not safe, neither are your users :-)

------
deftnerd
One of the benefits of the Tor Browser Bundle is that they do great job making
all the visitors to websites look identical. Same version of Firefox, same
plugins, etc.

Perhaps something similar needs to be done for hidden services. A pre-
generated VM or Docker container, widely used, that would make it difficult to
do software package analysis on a hidden service.

