The tool we built to do this research is open-source https://github.com/citp/OpenWPM/ We'd love to work with outside developers to improve it and do new things with it. We've also released the raw data from our study.
This is all assuming people don't run any third party plugins like Flash.
Are browser vendors on track to figure out a solution to this problem that combines user friendliness with privacy? Or will anonymous browsing remain a privilege for those with the right amount of technical know-how?
But there is one powerful step browsers can take: put stronger privacy protections into private browsing mode, even at the expense of some functionality. Firefox has taken steps in this direction https://blog.mozilla.org/blog/2015/11/03/firefox-now-offers-...
Traditionally all browsers viewed private browsing mode as protecting against local adversaries and not trackers / network adversaries, and in my opinion this was a mistake.
Don't you think this sort of thing warrants a separate sort of browsing mode? A lot of people who use the likes of incognito mode just use it for e.g. browsing porn where they don't want the local history to be preserved.
Obviously. Google is in the business of destroying your privacy: Advertising revenue is maximized when the consumer is/remains completely tracked and profiled at all times.
Other browser vendors which are not in the ad business could use this as an opportunity to differentiate themselves from Google:
Introduce a 3rd browsing mode which kills fingerprinting (with the "cost" of reduced user friendliness).
Technically speaking yes.
From a marketing/communication standpoint, I would separate this "feature" clearly from the 2 known browsing experiences. It not only clearly communicates to the user that a different browsing experience is about to start. By selling it as "the third browsing mode", it definitely also adds more perceived value to the product.
In fact it is in googles best interest to remove these security holes so other advertisers lose whatever minor advantage they can get.
Think about it this way, would those using incognito mode for porn be OK with their normal browsing being peppered with ads claiming to "Improve your <fetish> with our range of <sexual implements>"?
Whilst I think incognito mode's warnings about not hiding data from network operators should remain (i.e. your boss can find out what sites you were visiting at work), that doesn't mean efforts to prevent it shouldn't be made.
I disagree. If there were billions to be made from this new tech, secure browsing, then the browser vendors would be moving rapidly toward it. Certainly, more difficult technical challenges are overcome regularly.
Surveillance was implemented without users' knowledge and without public debate, presented as a fait accompli, and now the latest tactic is to say there's nothing that can be done about it. People accept that because they feel helpless, but I don't think we should be perpetuating this rhetoric of inevitiability. There's no technical reason it can't be done.
For WebRTC, browsers could block local addresses. uBlock Origin can do this on Firefox already.
For battery: browsers could treat it like location and ask for permission. Why does the average site need to know my battery status?
For fonts: browsers could standardize a list of system fonts available on each platform. It's 2016 already: web fonts are here, are widely supported, and no legitimate website should be relying on some oddball manually installed font.
This problem is hard to solve, but the Tor browser has it mostly solved. Other browsers could learn from it.
It would probably make sense to completely disable support for local fonts unless permitted by the user (for legacy websites that depend on it). All modern browsers support @font-face, and without @font-face you can always depend on the special keywords serif, sans-serif, and monospace; these will load the system's default font for that category.
It would be a shame to have to keep re-downloading that every time.
The problem is that the behaviour you are describing is also one of the ways a fingerprinter gets its data on your fonts; by specifying an @font-face declaration that first tries for a local font, and only loads a remote font if that is not found. Do this for a short-list of popular but distinct fonts (such as Roboto), and you have a nice amount of bits of identifying data to add to the stack.
Also, tricks like these exist (using rendering metrics to detect fonts):
On some mobile platforms the browser cache can be replaced entirely by some heavy pages!
Plus, even if we assumed "cached forever" actually worked for a significant amount of time, it still doesn't solve the problem that I am hoping to solve. I know many websites use the Roboto font. By installing it I no longer need to ever download that font again. It doesn't matter if it's the first time i'm seeing the site, if they use a CDN, if they link to the bold/light/regular version or their own packed font, etc...
I understand that it's a privacy issue, but I'm hoping there is a way to solve that privacy issue without removing that feature.
Really, in the end, all input accepted from the remote side (including text/html) needs to be vetted and processed by security conscious routines. I don't personally have a reason to assume a font library is more likely to be exploitable than an HTML+CSS parser and layout engine. Based on complexity, I would actually assume the opposite, which is probably right, except we've already found and fixed a lot of the exploits for the HTML parser and layout engine.
A set of standard page templates could do it. Clients could then choose their preferred client-side CSS to apply. Article, index page, image gallery, catalog entry, search result, etc.
Seems a finite set should cover most needs. Which ought be available from a few fairly standard sources (CMS, blogging engines, frameworks).
Not to mention I don't think it's a solution that's viable given economic principle and how much people value expression. We'd just be back to the equivalent of Flash sites again, with whatever takes over for flash (canvas?).
GitHub and gmail are prime examples for sites that broke a lot of things in the process.
Maybe what we need instead are real APIs and custom clients.
If we had locked down CSS five years ago, what CSS would be not be capable of using today? If we lock it down today, what would we be missing out on that would come five years from now?
Design by committee is horribly inefficient, and rarely takes into consideration the full needs of the users. What's more, it can't take into consideration future needs. Design by committee gets us XML. Adoption by iteration and evolution gets us JSON. XML has its place, but JSON is overwhelmingly more popular in certain contexts for a reason, it fits the domain better.
That said, a sane standard for embedded interfaces, where choice is restricted, it needs to live a long time, and needs to have sane accessibility features would do well with better standards. I view that as a separate problem.
Github and Gmail are both tools which now face the dilemma of gratuitous changes -- many of the recent innovations haven't done much for usability, for numerous reasons (familiarity itself is a key factor, GUI offers limited capacity for improved functionality, jwz has commented on this from his Mozilla experiences).
But most changes to default styles are pants.
Hell, much the problem is that default styles are pants. If browsers had a set of presentation styles that did work well (see the "readability" modes offered by Safari, Firefox, Readability, Pocket, Instapaper, etc.), then we'd have slightly less a problem.
Github, Gmail, Google Maps, etc., are largely the exception to long-form informational content pages. I'm OK with an explicit "app mode" for such sites. But 99.999999% of what I read would do vastly better with uniform presentation.
More attention to content and semantic construction. Less to layout frippery.
Something tells me you'll not be convinced.
If your stance is "provide well established default templates, but don't enforce their use", then I have no disagreement. That's not how I interpreted "I've considered what might be necessary to dispose of server-side CSS."
> Github, Gmail, Google Maps, etc., are largely the exception to long-form informational content pages. I'm OK with an explicit "app mode" for such sites. But 99.999999% of what I read would do vastly better with uniform presentation.
I think that depends heavily on what you use the web for. You and I likely read a lot on the web. Some people might stick largely to Facebook and Gmail. There are people that spend a lot of time in Github, and others that spend very little. Some people use a lot of online organizational and collaboration tools, others none.
> More attention to content and semantic construction. Less to layout frippery.
What you call layout frippery, someone else desires. This sounds suspiciously like remaking the web for your use cases, not for general use cases (which are always changing). But I'm not sure there's even a problem to address, you already addressed through referencing "readability modes" as an example of presentation styles that do work well. Why isn't that your solution to this perceived problem?
It feels like you're trying to achieve the equivalent of forcing all the printers to agree to not print magazines that don't conform to someone's opinion of what a good magazine is. I'm just not sure why that's even desirable.
> Something tells me you'll not be convinced.
No, not yet, if I understand your position correctly.
Among the problems of present Web design is that the Web is an error condition (there's a wonderful essay exploring this), and browsers default to allowing broken behavior, even adapting themselves to it, explicitly.
The lack of a publishing gateway, even a minimal one which enforces markup correctness to the Web is a problem.
Layout frippery as pertains textual content has a rather well-supported basis. Complexity is the enemy of reliability, and more complex layouts offer far more ways for sites to break. That's a well-established fact that successive generations eventual learn (or fail to learn) at their peril.
(The phrase "Complexity is the enemy" itself dates to the 1950s. I'd have to check the year, but have remarked on it before. Source is The Economist newspaper.)
I've seen what happens when documents and other media are aimed at very specific readers. Eventually, they rot.
Bog-standard HTML (or some alternative markup -- I'm increasingly partial to LaTeX) tends, strongly, to avoid this.
You're also going back to ignoring points raised earlier in this conversation about security, privacy, and usability.
And yes, if there's a call for an app-based runtime environment, which Google seem quite bent on producing, well, that's a thing. But no need to fuck up the game for the rest of us.
And models which prove useful could and should be incorporated.
I'm pretty gobstopped, for example, that 25 years after its introduction there's no affordance in HTML for notes (e.g., footnotes, endnotes, sidenotes, as presentation is a client issue), or for hierarchical presentation, e.g., of comments threads.
On can create nested hierarchies, but one with an integrated expand/collapse/sort/filter functionality doesn't exist. This was extant in Usenet newsreaders and mail clients 20 years ago. Why not the Web?
> You're also going back to ignoring points raised earlier in this conversation about security, privacy, and usability.
I was just working off your points, which all seemed to be about usability. I've been treating this discussion as somewhat distinct from that one. I can definitely make arguments about conformity having it's own negative aspects with regard to security.
We can begin by actually reviving browser user style sheets and having a well known and respected sets of names will allow for appropriate styling on the client.
GitHub has been, like Twitter, grabbing more key bindings that existed before in a web browser like Ctrl-K and their comment edit box got limited in its resizability, forcing me to edit outside the website and paste into it often enough that it's an annoyance.
I'm not really sure what you are referring to here. Gmail does attempt to give you an editor for emails, but it's extremely simple in my experience to get it to do what you want most the time. If your complaint is that you want it to just send a text email, and not a multi-part with a plain text version and an HTML version, in which case I have to question why, as all it does it add choice and allow people to view it in the format they prefer, and it should look the same either way.
> grabbing more key bindings that existed before in a web browser like Ctrl-K and their comment edit box got limited in its resizability
Re: key binding, yeah, I can see that as somewhat annoying. I suspect they are trying to match some standard usability map and thinking of their site as an application, but it's annoying that it interferes with the browser (but only when within a text input, from what I can tell).
To some degree, I have to agree with what's probably Github's stance, which is that it's their site, and while it may seem annoying in some respects, they may have specific reasons they do things. They obviously aren't going to be able to make every change something everyone likes, but I don't necessarily think they are making change for change's sake. It's likely in response to pressure from gitlab and competitors. Presumably they are audience testing. The best way you can speak to this is to not use them when possible, or urge others to not use them.
The big issue is that they start hijacking keys that were free before. It's hard to impossible to sway developers to use anything but Github. I've tried and been treated as if I'm in the luddite camp.
These all just work for me. I'm not sure what the specific complaints are, maybe it's a Firefox thing, but it's not like there's a lot in chrome that FF doesn't support.
> On top of that, you cannot resize it.
A little convoluted, but there is a way. In the subject of the thread, to the right, along with the collaps all control, there's the option to open the thread in a new window. This window can be resized, and since the input is the size of the window. Although, I suspect Gmail is meant to be viewed as more of an app than a site, so if the window size is not just about composing, but use, it might be worth using it as a freestanding browser window, distinct from and sized differently than other tabs, if you aren't already. I might actually play around with doing that not that I've said it.
> Re Github: The big issue is that they start hijacking keys that were free before. It's hard to impossible to sway developers to use anything but Github. I've tried and been treated as if I'm in the luddite camp.
Yeah, that's unfortunate, and I would have hoped Github would do better. I don't really think it's the norm though.
There's a risk / frequency trade off with all of these. Privacy can be quite possibly costly or fatal, though slightly more rare. Not so rare though that 20% of all Web users in a US Department of Commerce survey (see my recent comments history) report known credit card fraud. That's many tens of millions of affected users.
The security risks are similar but also extend to organisations which might stand to lose control over their own (validly) private information, or control over systems (see for example concerns over SCADA infrastructure, or industrial process control).
Usability and adaptability issues pose lower risks, but have a much larger affected field.
It goes well beyond the visually disabled, illiterate, and cognitively challenged. Anyone who's landed on a desktop site that's unusable on mobile has encountered a usability challenge. Google, Apple, Facebook, and Amazon are all rapidly pushing us, some kicking and screaming (I include myself) to an audible Web -- one in which the primary control and response interfaces are spoken.
What landing on a small set of templates does is provide for clearly parseable and understandable content. In a world where the goal isn't to read a full page but to extract and convey a useful item of information from it, wading verbally through megabytes of unparsable and nonexcludable content isn't particularly useful (and yes, figuring out how much a data reference is worth to the data-rerference intermediary is another question worth considering).
More generally, in my case, with only modest perceptual impairments (reading small, low-contrast type is among the earlier signs of your impending death), I've come to conclude for some years now that Web design isn't the solution, Web design is the problem. There are only so many ways you can present content that doesn't fuck with readability. I try, very hard, to ensure I'm not doing this on my own modest designs (look up "Edward Morbius's Motherfucking Web Page", a riff on a popular refrain, for my own principals in action).
My most common response when landing on a website is to sigh, roll my eyes, and dump it to something more readable. Firefox's Reader Mode. Pocket. Straight ASCII text. w3m.
And no, "novel graphic design" isn't conveying vast new amounts of information. I grew tired of hearing that argument 30 years ago, it's not got fresher since. Bloomberg, The New York Times, and the BBD are all experimenting with high-concept article formatting. In my experienct, without exception, it simply Gets In The Fucking Way.
My half-serious response to this is to create a new web browser embodying these and a few other principles. The working title is "the fuck your web design browser". FYWD for short.
Ninnies may opt to call it the Fine Young Western Dinosaurs browser as an alternative.
> My half-serious response to this is to create a new web browser embodying these and a few other principles.
In all seriousness, I wonder if spoofing a mobile client (easily done through most browser developer console's or an extension) might immediately result in a more useful experience for you on the majority of sites. Given the viewing constraints of most mobile platforms, and the focus on mobile accessibility (it's supposed to account for over 50% of traffic now), I imagine many sites try to but some minimum level of effort in to at least make it usable.
Even sites which are otherwise well-designed (Aeon and Medium come to mind) insist on dark-pattern behavior such as fixed headers/footers. Again: straight to reader-mode for that.
Except for the sites which break that. Violet Blue's Peerlyst comes to mind:
(Screenshots contrasting site and a Reader Mode session included.)
I've written directly with the site designer who seems utterly insensate to why 14pt font isn't in fact a majickal solution to all readability problems.
HN itself is only barely usable.
Really? That seems unlikely. I mostly see people use phones, and small tables, so <= 7".
> Except for the sites which break that. Violet Blue's Peerlyst comes to mind
There will always be someone thwarting best practices, just as there will always be those that skirt or break the rules in systems that are less lenient. There's not a lot of recourse, you want what they've got, so you are at their whim unless you can work around their imposed difficulties or find another source.
> I've written directly with the site designer who seems utterly insensate to why 14pt font isn't in fact a majickal solution to all readability problems.
See above :/
> HN itself is only barely usable.
Yeah, but I think the reasoning behind HN is slightly different. I suspect HN assumes you will takes some appropriate steps to optimize your use of the platform. Instead of "we will tailor the view to our artistic vision and you shall not besmirch it!" it's more of a "we believe in user agency, so get off your ass and make it better for yourself." Depending on your point of view, skill level, and site usage, you might find one more appealing than the other.
Personally, I use one of the browser extensions that allows collapsible comments, inline replying, and user info on hover over username.
> Really? That seems unlikely. I mostly see people use phones, and small tables, so <= 7".
Ignore that, I misread the sentence. I thought you were saying most mobile browsing is with a 10" tablet. I'm not trying to tell you that you're wrong about your own reported habits...
The complexity and size of a modern web browser and the need to better engineering tools to combat this are often touted as some of the reasoning the Rust project started.
I agree that the most complex parts are HTML+CSS+JS+DOM+GFX, but some parts cannot be reasonably disabled without breaking it completely.
I would go further and suggest that really no site needs to know it (I am sure there could be a few reasonable uses, but still). Which makes me wonder if we could strike back by abusing the WebRTC spec and fuzzing values like these, instead of simply blocking them.
That would defeat a huge selling point of WebRTC, the ability to create in-browser p2p connections over the user's local network.
You'd want to use it any time you want a high-speed network connection with another user. For example, a multiplayer game or video teleconference.
Others have pointed out this behavior has changed in Chrome 48. You don't get the local IP unless the page asks for access to the mic/camera which the user has to give permission for.
These two things, of course, go hand-in-hand, but us techies tend to look, I think, for the technical solution because that's the place where it's easiest to see how we could have any sort of impact. The other stuff is a lot of talking to and listening to people, consensus-building, being persuasive, etc.
I have tried to convince the Dutch banks I use (ING and ABN AMRO, i.e., big banks) to stop employing tracking beacons and third party tracking services on their secured internet banking environments, but the responses I get range from 'yeah we need those to improve your customer experience' to 'you are welcome to block these trackers yourself' (I already do, thank you very much).
Other stuff like GPS, camera, and microphone already require permission before being used.
* If I have a fleet of Chromebooks running the same version of Chrome OS, will they all have the same fingerprint?
* Will, say, all iPhones 6 with the same hardware parts, running the same Mobile Safari and iOS version, have the same fingerprint?
The working group recommendation that we linked in the paper (https://datatracker.ietf.org/doc/draft-ietf-rtcweb-ip-handli...) addresses some of the concerns that arise from that (namely the concern that a user behind a VPN or proxy will have their real, public address exposed), but still recommends that a single private IP address be returned by default and without user permission.
However that's still quite identifying for some network configurations, e.g. a network which assigns non-RFC1918 IPs to users behind a NAT. Seems to me that putting access to the local IP address behind a permission would both remove the tracking risk and still allow the performance gains after the user grants permission.
I had to dig around, from the paper is sounds like a stateless form of tracking.
The audio example made sense:
1. the mic comes on, and it identifies a particular background noise.
2. I browse to another site, or a different page without a cookie.
3. The mic comes on again, matches the ambient noise and realizes I am the same person.
Is that what you mean? If this is the case, how can the "canvas fingerprinting" work since I had to browse to a new page and all the old pixels from the previous page are no longer there.
Anyway, if it is what I understand it to be, then it sounds very interesting. I bet some science fiction author wishes they had though to use it.
> "This page tests browser-fingerprinting using the AudioContext and Canvas API.
> Using the AudioContext API to fingerprint does not collect sound played or
> recorded by your machine - an AudioContext fingerprint is a property of your
> machine's audio stack itself. If you choose to see your fingerprint, we will
> collect the fingerprint along with a randomly assigned identifier, your IP
> Address, and your User-Agent and store it in a private database so that we can
> analyze the effectiveness of the technique. We will not release the raw data
> publicly. A cookie will be set in your browser to help in our analysis. We
> also test a form of fingerprinting using Flash if you have Flash enabled."
What is the typical use case for AudioContext?
The capabilities of AudioContext used in audio fingerprinting seem like they're beyond what is really necessary?
Check out https://panopticlick.eff.org - this will attempt to fingerprint your browser and see if it's unique.
As a developer, you can take advantage of the spec only if you're building a native app. There's frameworks that you can use if you do. But within Safari or Chrome you have zero WebRTC support.
It's supported in modern versions of chrome on Android but won't be supported on iOS until apple does something about it.
Chrome is just a skin on Safari for iOS, because Apple doesn't allow third party browsers, right? I would think FF (or any other browser) wouldn't be able to on iOS either, given that constraint.
The linked page answers this: "Differences in font rendering, smoothing, anti-aliasing, as well as other device features cause devices to draw the image differently."
Put differently, the function measureText(canvas full of text with various fonts and bizarre features with varying implementation) is a pretty good hashing function for a population of web users, because each of these web users have a pretty-unique [canvas rendering engine, underlying OS, installed fonts] combination.
Combine several of these techniques (webrtc, audio, list of plugins installed and their version, etc), and you go from a "pretty unique" to a "guaranteed unique" hash, which you can follow across the web.
To do that, you first try to identify API that have different results depending on the browser or the device, and then track their result. For example, the User agent have some identifying information. It's not unique for each person, but you can start having a bit of identifying information. Do that with multiple APIs (available fonts, installed plugins ...), and you start having enough identifying informations to uniquely identify some browsers, without having an actual ID provided by the browser.
To test your browser, you can visit https://panopticlick.eff.org/
"Your browser fingerprint appears to be unique among the 135,054 tested so far."
Shouldn't it tell me that my browser is not unique during my 10th attempt considering it has recorded my previous attempts. This warning actually never changes, regardless of duration between consecutive attempts. That can only mean that the panopticlick is flawed or my browser signature is in constant flux (which would essentially make it useless from tracking perspective.)
Turns out they put a bunch of tracking cookies on your machine without asking you (it is mentioned in the about page though), which seem rather naughty for an organisation promoting online privacy.
When I removed all 4 of them, I get down to being "almost unique". I'm currently down to having the same fingerprint as 1 in 45132.3333333 browsers.
Edit: The fingerprint test at https://panopticlick.eff.org/ shows my System Fonts
src: local("Roboto"), url("https://example.com/user-does-not-have-roboto") format("woff2");
Edit: Disable Flash or make it click-to-activate and https://panopticlick.eff.org/ shouldn't list your fonts anymore.
But now I see that is just seeing which fonts are available.
Thanks for the explanation. Its just hard to believe devices are so different. I would think most versions of iOS would have roughly the same set of fonts etc.
I don't know the research definition, but fingerprinting is a technique to uniquely track a user across multiple sites without a tracking beacon.
The most basic form of fingerprinting is to use the browser-supplied headers (user agent, version, OS). Canvas fingerprinting works because identical browser versions across different machines may render slightly different, but consistent. IIUC, canvas fingerprinting doesn't rely on any pixels shown to the user or anything unique to the site, but if the same canvas is rendered exactly the same on two different sites, that's another indication that both visits were from the same user.
I don't think the AudioContext fingerprinting uses the actual microphone: it uses the browser's (and possibly OS's) audio engine to generate an audio stream, then fingerprints the resulting data stream.
I do want to state for the record that instinctiveads.com was testing augur.io and that's why we're listed there. We don't use them anymore but unfortunate timing, especially considering we're trying to be a better ad network than the rest.
Also I'd like to point out that one of the most pervasive tracking methods is done through form submissions. Anywhere you submit an email (login, purchase, etc) can be used as identification and first-party cookie matching.
A reliable ID allows for storing your ad history and interests to show you better ads and less of the same. This is proven since it's all math and data science and we can see the increase in metrics with better targeting. By the way, clicks are not the most important metric either, there's much more that goes into an ad campaign. Ironically, reliable IDs also allow for storing any opt-out settings since it's just a value attached to that ID.
The email login I mentioned above is the most common way to track online, most of the big sites actually sell login data and fire tracking tags when you're logged in with the email address passed through (usually hashed but not always) so that providers can set their own cookies and recognize you again. Since emails are strongly unique, this is really effective.
This tech is also used to combat ad fraud (which is what we were using it for). Fraud is a massive problem since it's so easy to start up botnets and churn through millions of ad impressions quickly.
Unfortunately a lot of this new age of tracking is the result of politics, bad incentives, and a lack of regulation that's led to a wild west situation where these companies can do anything. Clearly the technical talent is capable (as seen in this research) but it's being put to the wrong use. The DNT (do not track) header was a compromise but lacked any real regulation to make it effective. 3rd party cookies were fine but unfairly demonized and the default blocking of them pushed the industry to these deeper tactics.
Ultimately this is a business process issue: if there was a standardized ID like IDFA but for browsers (or even better at the OS level) and privacy regulation that's actually enforced, that would be a good compromise. Sites and ad networks get a reliable ID and you get control over when and how that ID is refreshed.
EDIT - All this stuff used by independent ad companies is just a tiny fraction of the industry. This barely covers ISPs who have very refined tracking abilities that you really cant avoid since they control the traffic. Comcast/Verizon has the AOL ad network using this. And the 2 biggest ad companies are Google and Facebook, both of which don't need fingerprinting because they already know who you are from just being logged in.
To detect ad fraud, would the ID need to be the same on all sites? Instead of sites dropping cookies on clients, what if browsers generated their own random per-site IDs? Users and browsers would have more control over managing and clearing cookies and user IDs.
If it's unique to every site then it's nothing new, networks can already set IDs today with 1st party cookies. It's being able to have a internet-wide ID that's valuable and is what 3rd party cookies allow(ed).
The ID itself doesn't matter, it's just random characters and mapped in various ways by networks. It's the reliability and consistency on a device level that's needed. Having something like this would make a massive difference - all the cookies/tracking junk would be obsolete, along with the hundreds of pixel sync tags, and would make everything faster, more accurate, more private and more secure.
But the point of fingerprinting is that practically no two "browsers" are the same:
- browser software and exact version
- installed plugins
- size of browser window
- OS software and exact version (think of patches!)
- time zone
- screen resolution
- (and all the stuff mentioned in the submitted article!)
If you'd find some things do change too often to be relied upon you could either take that into account, or simply don't use that specific fingerprinting technique.
> They wouldn't.
The `AudioContext` API exposes several details about the host which may depend on the hardware (sound card, sound chip), software stack (OS, on Linux e.g. PulseAudio vs. ALSA), sound driver and its versions, and connected periphery (speakers? headphones?).
Additionally, the audio API is used to generate a sound (which is muted before being played, but still generated before). Sound is hard, and so the browser vendors don't necessarily generate the "sound bits" themselves but ask the OS to so. Which might in fact ask its sound system to do so. Which might ask its sound driver...
Some of these properties are fairly common or likely to change often. But chances are that combined they give you more bits of information then say the simple user agent string (which is shared by thousands - if not more! - other browsers).
"When using the headless configuration, we are able to run up to 10 stateful browser instances on an Amazon EC2 “c4.2xlarge” virtual machine."
Also it seems like you ran the crawl only in the month of January this year, and crawled about 90 million pages. Were you able to do that on the single AWS instance, using Firefox via Selenium? What do you think the performance would have been just issuing raw requests?
Just interested because I'm currently building a crawler and am trying to decide if Selenium would be worth it performance wise.
Changing common settings might in fact even make you stand out _more_.
Check the EFF's Panopticlick  to see how your specific configurations leaks identifying information.
IOW most fingerprinting fail on mobile devices and that sensors, eg batteries, are one of the few remains for fingerprining on iOS. Do you disagree with Heise? Could you please substantiate your statements regarding iOS fingerprinting?
See here for a paper on the subject: https://crypto.stanford.edu/gyrophone/sensor_id.pdf
Put these in your user prefs.js file on Firefox:
Here's my full firefox config currently:
Privacy on the web keeps getting harder and harder. Of course this should only be used in conjunction with maxed out ad blockers, anti-anti-adblockers, privacy badger and disconnect.
We need browsers to start asking permission. When you install an app on Android or iOS it says "here's what it's going to use, do you want this?". The mere presence of the popup would annoy people and prevent them from using these APIs.
Google only pitches the idea of multiple identities in the context of sharing devices among several people: https://support.google.com/chrome/answer/2364824?hl=en
and even then doesn't do much to surface the idea. https://www.google.com/search?hl=en&as_q=multiple+identities...
Zooming is such a basic thing... i don't understand why they implement it in such a crappy way. Certainly doesn't attract users.
In Safari, this is what it does: https://up1.ca/#Lu0r_cI_v0vXvzpa9nUmEg
So in Safari it lets me zoom all the way in and/or out with 1 smooth movement.
This is what the same movement does in Firefox: https://up1.ca/#SEKWNOm1BSQnkntxj_v53w
In Firefox, if i want to zoom all the way in, i have to pinch in like 10 times (very annoying) and then to zoom out pinch out another 10 times...
It's competitors for data? To see how is Microsoft's "sign in to the web" is playing one might be tempted to Bing with IE, but statistically the odds favor another browser and another search service combination.
Note that you can use your webserver logs for analytics and that doesn't require the cookie banner.
In the EU, tracking user IPs actually requires consent. Even logging them does.
If the cookies are used for tracking, like Google Analytics, then yes, it needs to ask the user for consent.
And that’s not a warning, but actual "yes/no", and in the no case, it may not set a tracking cookie, or have set a tracking cookie already.
Most sites (except for a few dozen German and Dutch ones) just redirect you somewhere else, though, if you refuse to be tracked.
The law requires user consent, in form of a click on a banner or scrolling the page, before setting any cookie.
Complete law: http://eur-lex.europa.eu/legal-content/EN/TXT/PDF/?uri=CELEX...
Paragraph 66 talks about cookies.
A later exception was made by the EU for session cookies.
Guidelines for webmasters:
It has a sample banner which is similar to those which most users display.
Spanish official directives (with further protection because of a local law called LSSI): https://www.agpd.es/portalwebAGPD/canaldocumentacion/publica..., page 17. Also comes with a sample banner
Did you really think that everyone else was wrong or didn't read the law and is programming these banners as some sort of fad?
However, OP is right, governments spy on our webcams and analyze our traffic, and that's ok, but we need a stupid banner that overrides browser preferences to avoid all but session cookies. Duh.
This is the official stance of the ICO, the UK national authority: there was a need to educate users what cookies were when the directive was passed. No such need exists now. ICO itself briefly used consent overlays, but does not anymore (EDIT: Aaaaand they've apparently use them again; I'll try to find the policy release where they say this is not necessary.). Cookies not used for tracking of persons never needed any consent, as they have no privacy implications.
People who make their living creating cargo-cult UI designs, have predictably added cargo-cult law-compliance to their toolset. It is beyond stupid.
Wrong. If I disable cookies in my browser, I can't log in to websites anymore, so they need to be allowed. A whitelist would be very inconvenient. On top of that, it's not explicit allowance, it'd be implicit (i.e. opt-out instead of opt-in).
I don't know if British legislation is different, but this is illegal at least in the Netherlands.
It has never been enforced that way to my knowledge, anywhere in the EU. Which law or court decision says that it is actually illegal?
How does my browser know that one PHPSESSID is used for tracking, and another is a session? You probably mean until I close the browser, which would be never -- at least, I would never want to, but I do every few months for browser updates. (My laptop always goes in suspend/sleep mode.)
> Ditto for third-party cookies
I don't know what third-party cookies are anyway, and I bet my peers could not give me an accurate description either. We're all in the software business, be it game development or general software development or something.
Two gave a rough description but couldn't answer a question about whether embedded Like buttons would work if the user is logged into Facebook. Another just said "I don't know".
I'm not sure "the public is informed about all their options by now". The ones who really care generally use uBlock, ABP, Self-Destructing Cookies, Ghostery, etc., the rest just click "ok" because the sites do not inform them about these aforementioned possibilities: that wouldn't be in their interest.
> Duplicating UI in a website is a solution looking for a problem
Oh I agree it's an issue, I hate this cookie wall as much as anyone. I would love for there to be no need to ever see this wall.
> It has never been enforced that way to my knowledge, anywhere in the EU. Which law or court decision says that it is actually illegal?
I am not sure fines have been dealt, but the Dutch ACM ("authority for consumer and markets", literally translated) did give out warnings to non-compliant sites and they subsequently places cookie walls.
The law simply says no such cookies may be placed, it doesn't say "for a few months while users are unaware, and after that, oh well, have some fun picking your own privacy laws as you wish."
And yes, I know functional cookies and simple tracking is allowed if you don't invade a person's privacy. This means practically every major website knowingly tries to invade your privacy, because they have these walls in place. What do people say? "Fucking government does not understand the internet, look at all these walls." What should we be saying? "Wait why are they trying to create detailed profiles of me in the first place?"
Conversely, going after that small set of APIs and ripping them out or slapping permission prompts in front of them is unlikely to meaningfully improve your privacy when visiting adversarial websites.
Few years back, we put together a less publicized paper that explored the fingerprintable "attack surface" of modern browsers:
Overall, the picture is incredibly nuanced, and purely technical solutions to fingerprinting probably require breaking quite a few core properties of the web.
Is anyone aware of the existence of one?
How do you prevent that, apart from working on 'fixing' browsers to create pixel-perfect renders across different browsers/platforms/configurations. Would that even be possible?
> Tor Browser notifies the user for canvas read attempts and provides the option to return blank image data to prevent fingerprinting.
Huh. I guess that's one attempt, but being able to read pixel data out of a canvas is completely reasonable.
Not for every website. Most websites don't need canvas at all. One option would be to ask users to activate canvas support for a website that does need it, so users can judge for themselves if the request is legitimate. This is how the geo-location API works after all.
I am not convinced that this will work very well though.
> apart from working on 'fixing' browsers to create pixel-
> perfect renders
For Firefox and Chrome there are canvas fingerprint blockers  and . These are heuristic based, so you'll likely see a bit of false positives. uBlock Origin includes an option to prevent the leakage of local IP addresses .
> A solution would probably be a browser where every version, on every platform reports the exact same things, always the same way.
That's exactly what Tor Browser does.
I'm glad I disabled WebRTC when I first discovered it could be used to expose local IP on a VPN.
These "extension" technologies should all be optional plugins. Preferably install on demand, but a simple, obvious way to disable would be acceptable. (ie more obvious than about:config)
Not a great deal can be done about font metrics other than my belief that websites shouldn't be able to ferret around my fonts to see what I have. Not like it's a critical need for any site.
Having these features as optional plugins means they are basically impossible to count on having in the basic web platform, meaning you're going to fight a losing battle to gain adoption for any applications that need them.
And the open web platform is the only platform right now that is enabling developers to create cross-platform applications outside of the restrictions of walled-garden app stores.
> Having these features as optional plugins means they are basically impossible to count on having
Funny. Didn't seem to prevent flash, acrobat or others becoming extensively adopted. If I want browser video chat I can install WebRTC etc.
If the cost of having that universal platform is compromising everyone's privacy, on any site that wants to check, it's not a fair or acceptable trade.
Seems to me we have this ass backwards.
Seems to me you're just being paranoid.
See https://github.com/diafygi/webrtc-ips or https://www.purevpn.com/blog/disable-webrtc-in-chrome-and-fi...
I can also assume that your router lives at .1 or .254 or similar, and use your browser to pivot and brute force the password while you browse cat pictures.
Of course, I keep webrtc disabled in Firefox anyway except when i need it, defense in depth like you said.
But then still whether you installed an extension would contribute a bit of information to your fingerprint.
I mean, using a web app for the first time would be no different then installing a mobile app - I wouldn't be surprised if I had to give it a few permissions.
It would only work if many users have disabled exactly the same APIs as you and all other non-disabled APIs don't provide any information useful for fingerprinting.
If I could just feed it random data instead of fully disabling, that would also be fine.
It would need to remember your choice so you don't get prompted everytime, but that's a small/easy change to make.
It might be a fun project to start though. I've been really enjoying testdouble's API (and have started using that for my unit tests).
Perhaps instead of a site probing for capabilities, they should instead publish a list of what the site/page can leverage and what it absolutely needs to work. Maybe meta tags in the head or something like the robots.txt. Browsers can then pull the list and present it to the end user for white-listing.
You could have a series of tags similar to noscript to decorate broken portions of sites if you wanted to advertise missing features to users and, based on what features they chose to enable/disable for the site, the browser would selectively render them.
I mean, how many people are dealing with the hassle of noscript? That's probably most of the users that are going to do anything other than tell the browser to stop asking questions.
Why is it unrealistic to expect the same for other interfaces like audio, video, WebRTC, and other potentially exploitable functionality?
I'm arguing it won't help a huge number of users because they will default to granting them.
Also a great paper on this topic: http://research.microsoft.com/pubs/209989/tr1.pdf
There is a lot of people trying to earn on clicking ads with bots.
Edit: and by the way disabling JS is an effective method against most of the fingerprinting techniques.
There's still zero (0) use cases to have WebRTC data channels enabled in the background with no indicator.
If all these APIs are added, the web will turn into a bigger mess than it is. They can't prompt for permissions too much. So they'll skip that, like WebRTC does.
To interpret this research as reason for crippling web or browsers would be a giant mistake. Crippling browsers will only work against users, who will be then forced into installing apps by companies.
Two popular shopping companies in India exactly did this, they completely abandoned their websites and went native app only. This combined with large set of permission requested by apps lead to worse experience in terms of privacy for consumers. As the announcement for Instant Apps at Google I/O demonstrate, web as an open platform is in peril and its demise will be only hastened by blindly adopting these types of recommendations.
Essentially web as open platform will be destroyed in the name of perfect privacy. Only to be replaced by inescapable walled gardens. Rather consider that web allows a motivated user to employ evasion tactics, while still offering usability to those who are not interested in privacy. While with native apps where Apple needs a credit card on file to install, offer no such opportunity.
I am happy that Arvind (author of the paper) in another comment recommends a similar approach:
Personally I think there are so many of these APIs that for the browser to try to prevent the ability to fingerprint is putting the genie back in the bottle.
But there is one powerful step browsers can take: put stronger privacy protections into private browsing mode, even at the expense of some functionality. Firefox has taken steps in this direction https://blog.mozilla.org/blog/2015/11/03/firefox-now-offers-....
Traditionally all browsers viewed private browsing mode as protecting against local adversaries and not trackers / network adversaries, and in my opinion this was a mistake.
I'm surprised nobody has commented on your comment yet. I was in a meeting just this morning where my interlocutor assured me that over 70% of advertising in 10 years will be native apps since everything else is getting blocked or abandoned (and presenting it as an opportunity to do all the stuff you "can't do anymore" on browser).
Each font is probably associated with a non-trivial caching scheme and other OS resources, not to mention the use of anti-aliasing in rendering, etc. So a web page, doing something you don’t even want, is able to cause the OS to devote maybe 100x more resources to fonts than it otherwise would?
A simple solution would be to set a hard limit, such as “4 fonts maximum”, for any web site; and, to completely disallow linked domains from using more.
The downside of having no JS compared to accidentally a getting fingerprinted is a no brainer for me. Modern web is pretty useless without JS for me.
E.g. a geocaching app could benefit by signaling the user if battery goes low (Ingress is a battery hog, for example!)
Shops can do the same with baskets, you find that people are either identified by one very rare feature which reoccurs often or their little graph of 4-5 items which correlate 99% to them.
Web apps are definitely getting better, I haven't used an actual email client in 10 years, but they have a long way to go before they can replace dedicated clients entirely.
And yet, just yesterday there was a great discussion on Virtual Desktop Infrastructures, where entire operating systems are accessed and operated virtually through just the browser .
The current top comment indicates that while there are some setup hoops to jump through to use a specific OS, the performance itself "works very well" . Does this not qualify as a web app replacing a client entirely?
If you can pull up a video stream from a surveillance camera in your house then you no longer need a home?
When you watch Daredevil on the Netflix App on your phone do you think that the actors are inside your phone performing live action for you?
What they're discussing is a web app that allows you to interact with a remote client. That client OS still exists and the UI/UX is still being rendered by a nonweb technology, the pixels rendered are just being streamed to your web browser instead of to a monitor and your inputs are being captured and transmitted to that client OS.
Ideally I'd like to have a minimal OS and file set on my local machine (for offline and poor connectivity scenarios), that automatically syncs with my own, encrypted cloud system, such that I can (at my own discretion) update the OS from controlled sources (e.g. git). But I don't think there is enough interest from others for such a system, and I'm occupied with enough other projects that I won't be able to set up such a system.
I do use Github for some projects, but I also maintain local copies and maintain my own backups for all my projects.
If pinboard.in ever disappeared, it'd be like loosing an appendage! It might not be as bad as loosing an entire arm or leg, but its loss would be equivalent to at least a finger or two!
Heck, what would be really interesting would be hardware acceleration for the final version of WebAssembly. That should (?) make it competitive with regular assembly.
Oddly enough, just yesterday I started using an email client once more (emacs+gnus, for Gmail). It just felt so _nice_. And fast too!