Hacker News new | past | comments | ask | show | jobs | submit login
The Underpants Project (cubiq.org)
215 points by tbassetto on Apr 23, 2012 | hide | past | web | favorite | 81 comments

I'm the (co)author of the project. Please note that it's just a one-day proof of concept, imagine what a well motivated corporation could do.

It's nothing new (someone correctly pointed the EFF project) but I wanted to make a real world demo out of it.

The demo doesn't store each bits of info separately, it simply creates an hash out of them. If I stored the data separately I could for example identify small user agent updates or screen resolution changes or newly installed plugins and so on.

Also, many of the info can be gathered without JS and actually browsing with NoJS puts you in a very restricted niche making you even more trackable ;)

The demo is far from perfect but I believe that even a 90% reliability is alarming. Anyway I'll put all the source code on Github for you to review. I hope to be able to add the NoScript code as well soon.

Surely an IP comparison achieves everything this does, and is probably more than 90% reliable?

Just IP comparison only works if the user's IP is constant, which rules out many phones, tablets, and laptops that frequent coffee shops and restaurant wifi. It also doesn't allow you to tell when IP $a and IP $b are really the same person at home and work. Further, IP comparison doesn't let you distinguish between multiple users coming from the same home or office network, or from a proxy.

Commercial implementations of this sort of tracking include the user's IP in their dataset, but track a number of other datapoints so they can tell if e.g. everything but a user's IP matches an entry in their database, it's probably the same person connecting from a different location.

You can view source to see what they're using to generate the fingerprint: screenSize, devicePixelRatio, timezone, mimeTypes, plugins, httpAcceptHeaders, fonts. It's interesting that these are enough to generate a moderately unique fingerprint. I'm sure my fonts list is unique, so that's probably enough to ID me right there. However, not every computer I use has the same fonts installed, nor the same screen dimensions. This won't track me as I go from my desktop to mobile device.

Professional authentication managers such as RSA Adaptive Authentication can gather 40 or 50 data points from which to tell if a user is somewhat the same or not. They apply a ratio to the value generated by which a user can be redirected to a challenge question. It's not foolproof but it prevents a lot of automated phishing or botnet scams from being able to automatically log in with your credentials.

I think the font question was answered by http://panopticlick.eff.org/

Yes, I tried Panopticlick on a number of different computers and it was always the fonts that gave them the most bits of entropy. I wonder if installing a couple of new fonts and deleting a few others (which I do from time to time) would make them believe I'm a different person...

But really, browsers should stop allowing scripts to access the full list of available fonts. What use does a website have for that data, anyway? Any site that doesn't want to use one of the standard fonts should be using webfonts nowadays.

I would say how unique the fingerprint really is is actually an important issue, I wonder how much traffic / time does it take before collisions start to occur. In the current setting it would probably suffice if the fingerprint generated one of just 100 or 500 values, the traffic is probably rather low and you visit the page for maybe few minutes and you probably won't go back to it in 3 days or even in 1 hour just to check whether some other guy didn't overwrite your secret word.

Regardless, it's a very interesting idea, and also picturing how difficult and counter-intuitive security can be if you do not study such issues, as an API designer I would surely have a hard time foreseeing that exposing the screen size or fonts list can turn out to be a security issues for the users.

> This won't track me as I go from my desktop to mobile device.

May not even track from one browser to the next. Camino and Safari generates different fingerprints on my machine.

Also for me, Firefox and Chrome, different fingerprints.

That said, they can probably eventually start correlating the different fingerprints using other data, like device id, location patterns, etc. It would not be impossible to build a dossier of all your browsers and devices, especially if you ever log in to any online service from multiple machines.

Simply resizing my browser window before pasting the second url seems to thwart this (But I don't have flash installed). Without flash, it falls firmly into the "kinda-works sometimes if everything goes perfect" camp.

So its another demonstration of flash being ridiculously insecure. These guys did it better, even defeating tor to reveal the origin IP. http://dl.packetstormsecurity.net/0610-advisories/Practical_...

Resizing my browser didn't go it but moving it to another screen it changed from "1ccf9e9301db4fb87b1d178d77edad5bfa598057" to "ab0e6beb449408b28473dd66a6f4501528087c0e". I don't think this method is prefect at all or should be used for anything reliable(like logins).

Reliability is not necessary. "Good enough" to sell ads- that's all they need.

Interesting, I had Chrome Flash Block enabled but it did not seem to thwart this.

Enabling click-to-play in Chrome (and not one of those extensions that hide or remove from the DOM the element once it's loaded) just makes the fingerprint to have less bits, since it can't get the list of the installed fonts on your computer.

I have a scriptblocker, an adblocker and ghostery installed. did not thwart this.

If someone's foolish enough to run javascript and flash on Tor, of course it's trivially easy to defeat it!

I expect most users of Tor fall into that category.

Works on the ipad though. No flash, and cant resize window. Must be working off timezone? Surely some more ipads in the UK?

iPads may be rotated. Rotating my smartphone made it unrecognized.

How is this significantly different from http://panopticlick.eff.org/ ?

No difference, except they didn't just warn of a possible attack, they implemented it.

I hit a collision when I moved from one browser to another on my iPad. I'm guessing the number of permutations on mobile devices is quite small.

With NoScript, it does not provide a tracking ID. Which shows yet another reason to browse with NoScript.

   *The following is your unique fingerprint on the web:*

I'll bet my fingerprint isn't unique.


You'd be surprised.

Not really.

Currently, we estimate that your browser has a fingerprint that conveys 11.9 bits of identifying information.

That's a very low value. With JS off or blocked, my browsers convey between 16 and 21 bits (the latter meaning unicity in their dataset).

The amount of information conveyed by the HTTP_ACCEPT headers is especially preoccupying. There is nothing in there, apart from maybe the language, that should leak any info, on a modern browser. And certainly not 10 or 16 bits of info.

Ok that’s a scary page.

"Your browser fingerprint appears to be unique among the 2,155,876 tested so far."

Huh, that's awesome ... in a bad way. According to that page , both my system fonts and browser plugin details are unique among the browsers they've tested thus far.

Not necessarily, they fingerprint you using the info the browser gives to them. This site for example uses JS to do the fingerprinting, but it could be just as easily (perhaps less scalable) to do the fingerprinting serverside.

TL;DR serverside code can fingerprint you

Indeed. Similarly, if you barricade yourself inside your house you'll be well-protected against thieves.

It seems like the biggest information leak is installed fonts. If I install one extra font above the default for my OS, I am leaking a good deal of information: https://panopticlick.eff.org/

Would it not make sense for my OS to sandbox which fonts can be accessed by my browser? If a webpage wants to use a special font-family, I could be prompted to allow/block access to my greater font library.

Does anyone have any idea how well this works with corporate enviroments where the typical workstation is a clone of all the others behind the same (NAT'es) address?

I really question the name of this. Underpants Project? I just don't get it. It doesn't really fit with the point you're trying to make, IMO.

1st step create tracking software.

2nd step

3rd step profit

"The Underpant Gnomes"[1] were honestly the very first thing that came to my mind when I read the name.

[1]: http://en.wikipedia.org/wiki/Gnomes_%28South_Park%29

Underpants could be a synonym for underwear, in which case it could make sense: you'd like to keep you fingerprint private on the web, and your underwear private in real life.

This is exactly how I took it.

"Caught with your pants down" is the best I could come up with

It got us to click to link, didn't it? :)

Yeah, but at what cost? I already have an unfavorable view of it - just based on the name (that's human nature) - going into it. In this case, the project isn't really all that clear on what it does - I need to do a bunch of stuff to see it's benefit. I'm busy, I've lost interest. See the problem?

That's just my point of view, of course, but how many others will share it? I'd wager a good number would.

You could accomplish the same thing using local storage in an iframe with postmessage and it'd be a lot more robust with fairly significant browser support. (IE8+)

I built a demo a year ago that let you store personal data and exposed a postmessage API for storing and sharing permissions and personal data with sites as kind of the beginnings of a poor man's client-side only Oauth.

I think it doesn't work.

After typing "meow" and hitting enter. The copy paster url had this to say about me "It seems you didn't save the word. Go to lab.cubiq.org/underpants first."

The unique fingerprint is also different. "93615388f7f54cd79d2f806ac3795c182217aa9b" somehow became "f37ec3fdd05c27c13cbb7fcdef95cc004297f62d" after copy-pasting.

Other than that technical glitch for me (Linux, Chrome latest unstable version), I still think this is actually a pretty good idea. But will websites use it now that the ones we actually want to worry about are injected into every website via Tweet and Like and + and whatever buttons.

Google in particular is everywhere with their gAnalytics tracking code.

edit: now that I think about it, I may have misunderstood the point. Was it a proof of concept of providing cross-site tracking without tying to a personal identity?

If not, insecurities in cross-site whatever hardly matter when I am logged into every little tidbit that is loaded via iframe and appears on almost every website. Even porn sites have like buttons these days.

I entered meow too. It also forget my word and fingerprint.

You have to remember that it's not just about the physical tracking, it's also about time. If the page takes seconds to run it's not feasible on billions of requests because the tracking software sits in between the page getting paid and the target destination. If your tracking script takes seconds (this technique only works after the page is loaded), or provides a white screen jump period, it isn't something that will be desired. Unless of course all other options are removed.

Getting all pages on the web to remove the old image based cookie tracking in support of JS etc... also will never happen unless extreme circumstances occur. Most of the people running ad sites have no idea what you are talking about anyway in realms outside of Cookie and Tracking.

> This technique can be used to find out some of the softwares installed on your system. For example I can say that you probably don't have Adobe Creative Suite installed.

Sorry, I have CS3 installed and fully licensed. But still a nice demo.

Presumably they're looking at installed fonts to see if you have Adobe CS, so what did you do with the fonts that would fox this?

I'm guessing you're on Mac OSX? Can you have fonts available that aren't apparent to the browser somehow?

If cubiq is right it doesn't check for my old version. It's pretty much a vanilla install on OSX.

the demo checks CS4 and CS5

Does anybody know of a browser plug-in that attempts to defeat fingerprinting?

This fingerprinting can be defeated by NoScript. Or by turning off JavaScript. A sensible idea would be to make an extension that purges the unique elements from the set they're tracking (i.e. fonts, plugins, mimeTypes, screen size and pixel ratio, etc.) and provide a white-list for sites you want to have that information.

torbutton kills some of it -- they lock your window size to a multiple of some fairly large number of pixels. not sure what else they do....

Fingerprinting can be very useful to you as a user, as well. Imagine that you switch between devices and work locations all the time and you use a core suite of web applications. Fingerprinting could be used, with your permission, to uniquely identify you across all your devices, locations, and browsers for the suite of services that you depend upon. Not having to ever enter a password again while remaining secure sounds pretty nice to me.

It is not the technology, but the evil application of said technology that is evil.

I must have a plugin or something that breaks this.

When I went I saved the word "what" against the fingerprint of 0e24f67890fb99dfd6fa147adc5634224e6cf509

Then, I opened a new tab, copied and pasted the url to ghosttouch and was given this fingerprint: 0373164e6053f6d4d1e0cea156be83e5a45e13d4

So, out of curiosity I copied and pasted the url in the same tab where I had "saved" my word:


It generated the same code.

So then I went back and re-saved my word, went to the ghosttouch site again and this time it loaded my word.

Something wonky in there but I'm not sure what.

Today I learned that surfing with different firefox profiles out of privacy concerns is only useful if you disable Flash. Or any other plugin, for that matter.

In 2008 I worked at a major credit card company and they were building the exact same thing, only with more like 75 attributes. Of course it was all through a 3rd party so they wouldn't have any PII, but it was their design. They'd to this to build super-cookies and then track prospects across multiple products. It was awful.

What is the worst thing somebody can do with a fingerprint? Show you ads they think you will like?

It depends on what they are looking for. For example, a company could gather information about your browsing habits using advertising. If a company like Google distributes ads across a wide variety of sites, they can use your fingerprint to gather lots of information about your browsing habits.

This is scary not because it enables them to provide more relevant ads, but because it enables them to sell personal information to organizations like corporations and the government. Imagine your boss being able to buy a package that tells him what type of porn you like, how often you view porn, your most visited subreddits, etc.

And often from this information, these companies (based on aggregate data) can make more sweeping generalizations (that are often incorrect, but also often right on the mark) like income bracket, ethnicity, drug use habits, sexual orientation, etc. These approximations can also be bought.

Imagine that your 'package' has some information in it deducing that you regularly use cocaine. Even if this is not true and the person observing this information knows it might not be true, the fact that it has been stated might be enough to lose you some important opportunity.

I don't know how common it is for someone to buy this information, but I know that the information is already out there and the potential for things like this is very large.

Can anyone explain what use-case this technique enables that is not served by cookie tracking?

Or is the point that disabling cookies is not sufficient to avoid being tracked? Everyone has cookies enabled (or many sites don't work), so if that's all it is, nbd..

Cookies don't cross site boundaries - e.g., cnn.com can't read a cookie set by foxnews.com.

Just to emphasize that this is meant to demonstrate privacy risks, not to be taken as a feature suggestion...

Source site said Fingerprint: cb71811ba44d8d86755b03be8a83938d2946169b And I chose the word: Charismatic

Libellu.la said Fingerprint: e0349008d04cc0c73e5cc9a9ea15a246feebf3b1 Word: Chortle

So... no?

Ipad2, jailbroken with AdBlocket running, no idea if that was a factor or not.

Here you have the same thing, but IT WORKS AMONG BROWSERS: http://fingerprint.pet-portal.eu

Have to enable js/flash (I browse with NoScript), but it's interesting that it works through FF's Private Browsing (FF 11.0).

I can see that even in chrome "incognito mode"

I would like to remind you that when you open incognito, it specifically says:

"Going incognito doesn't affect the behavior of other people, servers, or software. Be wary of: - Websites that collect or share information about you"

Incognito only prevents information from being stored on your computer, not anyone else's.

But more interesting is that the fingerprint in the normal window and the incognito window ends up being the same.

- Enter word in normal browser window - In an incognito window, go to the URL and retrieve your word.

Same here. That's really creepy. :/

I can confirm this as well.

Me 3.. It doesn't work if you copied the link from chrome to firefox or IE.

Should we ban all analytics software (e.g. Google Analytics)?

Or should we regulate how analytics software handles data instead?

Easier: don't use your real identity online if you don't want to be tracked.

Interesting, but is this really more reliable than a simple IP comparison? Probably not.

It failed to track me even though I opened the link in a new tab beside the first one.

Failed spectacularly for me.

Doesnt work on the iPad...

No flash FTW!

It worked for me on the ipad...

sorry, it doesn't work. I saw different finger prints when accessed the link from a different machine and different ip address

"Your word has been saved! Now point your browser to any of the following addresses (copy-n-paste) and watch the magic."

Followed by, "It seems you didn't save the word." on the two connecting websites.

Safari 5.1.5, OS X 10.6.8, no add-ons installed. I do, however, have Flash configured to not allow any websites access to local storage unless I specifically say so.

I know this isn't Slashdot, but I have karma to burn and can't resist:

Step 1: Collect underpants

Step 2: ???

Step 3: Profit!!!

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact