De-Anonymizing Web Browsing Data with Social Networks [pdf] (randomwalker.info)
11 points by MaurizioP 26 minutes ago | 3 comments





> Our approach is based on a simple observation: each person has a distinctive social network, and thus the set of links appearing in one’s feed is unique.

Compartmentalization mitigates that threat. I have multiple online personas, and Mirimir is the only one who goes on about privacy, anonymity, etc. The only one who visits HN, Wilders, etc. Mirimir and other online personas also share no contacts. However, Mirimir does use pseudonyms ;)

At the 33c3 I showed how you can often uniquely identify a single anonymized user from a dataset containing three million people by using publicly posted links from his/her Twitter timeline. I also showed that this is possible with other types of public information as well, such as YouTube video ratings or reviews on Google Maps.

The math behind it is quite simple and very reliable for many datasets, which makes it very easy to build robust fingerprints based on browsing / location / behavior data. In my opinion, this is what most big companies rely on today for identifying users, as this is more robust than cookie-based mechanisms, which become more ineffective as the use of multiple devices and blockers increases.

Here's the link to the video (you can choose the language in the menu, by default it's German but the talk is also available in English and French):

https://media.ccc.de/v/33c3-8034-build_your_own_nsa

As I keep reminding here, there's no such thing as "anonymized data" - there's only "anonymized until combined with other data sets". Thanks for a demonstration, and thanks 'MaruizioP for posting this.

