Hacker News new | past | comments | ask | show | jobs | submit login
Decentraleyes: A Firefox addon to prevent tracking via free CDN providers (addons.mozilla.org)
281 points by throwaway2048 on Feb 1, 2016 | hide | past | web | favorite | 111 comments

I read over a bit of the source. It uses a hard coded list of CDNs and files. So it does nothing unless the the CDN and file is on this list: https://github.com/Synzvato/decentraleyes/blob/master/lib/ma...

Edit: someone asked how this works..

1) it looks up the resource in the mapping (linked above) (matching the cdn and file path).

2) if found, it replaces it with the copy it includes: https://github.com/Synzvato/decentraleyes/tree/master/data/r...

So for those files, requests are never made to the CDN.

If the website uses a different CDN; a lib not recognized; or a version not recognized.. then the request is still made.

I thought that CDN should send proper caching headers and browser will save cached version after first request and never hit CDN again.

They do but the browser has a limited cache size. And because of the gigantic size of even the smallest web sites these days, the cache is maxed every day, and your file are purged again and again. This is basically just a super cache of files you know you never want to invalidate. Also, it prevents OPTIONS and HEAD requests.

Are browsers really this stupid? Seems like an obvious strategy to have several cache buckets, with one dedicated to smaller assets with long expiry times.

It's not stupidity. There is really no way to know which file you want to keep longer than others without risking breaking things. CDN are actually a good, manually updated, source of listing of files that have this quality. But basing yourself on proprietary CDN are not a move any browser in their right might would do.

That it has been looked at at /any level/ (e.g. this plugin) is great, and it does so without waiting for a decade of W3C and browser standards back-and-forth.

still need to check if file is changed, hence 304 http status code.

If the server sends an "Expires" header in the response, then the client doesn't even need to do that check. With an expires header, the server has effectively told the client that the data wont change until at least a particular date, and so the client honours that information.

Last-Modified/If-Modified-Since is an optimisation trick which exists for the situation where the person running the website hasn't bothered to explicitly define expiry periods for content.

That depends on what type of caching headers the CDN uses. If it uses max-age and no etags/last-modified the browser won't send the if-modified-since request and just use the cached resource without asking the server.

It's hard to tell if it's good or bad. The same way adblock/ublock, https everywhere work. They block what they know about.

Wait, does that mean that I have to blindly trust the versions the author put on github?

Does that also mean I don't get the up to date javascript library when they change?

Yes it does! And, IMO, it's dangerous ! Note that the script are hosted on GH that belongs to....... Google !

This is brilliant, not only for privacy but for speed. Seeing this makes me wonder why I didn't build this yet. I've often thought that Javascript loading tags could include a hash of the desired resource and your browser can fetch them once only for a thousand page loads on a thousand websites. This is not that, but it is extra local caching and on top of that it stops most tracking by CDNs. Guess I always thought of it as something my browser should have instead of an addon.

IPFS does this by design. Everything is content-addressed so you immediately know if you've seen the resource before. This also enables chunk-level deduplication.

If course the P2P nature of the project means other people can find out exactly which of those resources you're looking at...

Only your direct peers can, and they can't tell if you got the content to increase your fitness score or because you wanted the asset for yourself. Peers are incentivized to pull as many assets as they can (which prevents torrent death) in order to build reputation.

I think this is overly optimistic; as soon as you pull down a rare asset you have leaked information, since a peer that's farming would presumably work down the list of assets ranked by some measure of popularity, and would be unlikely to bother collecting obscure content.

This sort of system helps against some sorts of snooping, but certainly not nation-state adversaries.

> This is brilliant, not only for privacy but for speed.

But these resources are probably already cached by the browser anyway (using the appropriate http headers). So how can this solution add any improvements to that, once the resources have been loaded for the first time?

If your browser usage patterns include frequent "private browsing" or frequent cache clearing, this can be a noticeable speed boost.

I often using "private browsing" as a way to get another login session (e.g., login a test user while still having admin user logged in).

I actually worked on a prototype for this 4 years ago.... -> https://github.com/cdnjs/browser-extension (Speed not privacy)

The obvious problem was that storing scripts locally got a bit out of control when considering having to store all versions.

The libraries can't be that big, surely it would work fairly well if you just dedicate a preset portion of hard drive space, and delete the least used when it exceeds that size.

Actually now using this and seeing the results, I think Chrome caches assets much like this extension does. It gives a huge perceptive speed boost to Firefox. Mozilla devs need to look at including this in Firefox by default. It's huge in terms of speed and Firefox's competitiveness with Chrome.

I believe it still sends a network request to check the status of the resource. But an extension can bypass this and assume that the asset has not changed.

Indeed. Or check it every x hours in the background. Or the developer keeps up with jQuery news (and a few other big ones) and pushes updates. Or developers can push updates themselves. Many simple solutions that give a speed boost on many sites already.

Looks good. I've added it to my ever growing list of privacy extensions:

  * Privacy Badger
  * Disconnect
  * CanvasBlocker
Can anyone recommend any more?

Expire history by days - removes your history when it's X days old, prevents storing too much data about your visits habits https://addons.mozilla.org/en-US/firefox/addon/expire-histor...

Clean links - removes redirects from search engines, facebook, twitter etc to hide a fact that you clicked a link, google doesn't know what link you clicked from search results, so if you block GA, it can't track you https://addons.mozilla.org/en-US/firefox/addon/clean-links/

I personally use this triple combo :

HTTPS Everywhere https://www.eff.org/https-everywhere

uBlock Origin : https://github.com/gorhill/uBlock

uMatrix : https://github.com/gorhill/uMatrix

I have used Noscript, AdBlockPlus and Ghostery before but found they where lacking functionality, flexibility and performance. I used Privacy Badger too but if I remember well, it is based on the same engine as ABP and suffers for the same performance problems.

uBlock Origin's advanced mode does pretty much the same thing as uMatrix.

Which engine? This is new to me. I thought ABP is so slow because the code is crappy (thats also what i see when i look at the code)

Ditto except uMatrix

What exactly does it do?

uMatrix provides a tabular view sorted by host featuring toggleable category columns e.g. Dis/Allow iframe, script etc. It's granular client side resource whitelisting.

uBlock Origin does that too, just need to turn on Advanced Mode via options.

uMatrix offers more granularity. You can choose exactly what each third-party site can do in terms of cookies, CSS, images, plugins, javascript, XHR(!), frames and media (audio, video, pdf, etc).

After years of using uMatrix (formerly HTTP Switchboard), many sites "just works" wrt Youtube, Vimeo and similar even without first-party javascript enabled.

I've considered sharing parts of my global ruleset so others can just copy-paste the sections/sites they want to whitelist without having to discover what's required themselves.

The more privacy, the more of a unique fingerprint your browser has, the less privacy.

Check out panoptoclick[1], to see your fingerprint. Someone should make a series of common groups, and those who care should stick to just those.

[1] https://panopticlick.eff.org/

Considering the amount of variables that contribute to the browser fingerprint, you would be forced to conclude that the only way to prevent being so unique is to run a browser in a vanilla VM (although the OS is already a variable in itself).

I think this is a topic that gets discussed by (for example) the Firefox developers, but I get the feeling that this is one of the hardest problems to fix.

I would like to see a browser mode akin to the privacy mode most browsers feature that reduces the number of identifying variables (at the cost of features). So instead of telling the world that my time zone is CET and I prefer English (GB) as language, it would select a random time zone and locale (although this does inconveniently mean that sites might suddenly serve me content in Portuguese).

Come to think of it, TOR Browser probably does a couple of these things. Disabling Javascript is surely the biggest factor, although that does make the modern web pretty much unusable.

> Considering the amount of variables that contribute to the browser fingerprint, you would be forced to conclude that the only way to prevent being so unique is to run a browser in a vanilla VM (although the OS is already a variable in itself).

It'd have to be more like a VM running the OS with the highest market share (Windows), the browser with the highest market share (Internet Explorer), with the most common language used, with the most common time zone of users of the site you're accessing (varies by site and time of day), etc.

Anything else and you could stand out in the crowd. Using Linux or OS X, for example, really make fingerprinting easier for sites, which is quite disturbing.

Randomizing the values of certain attributes, as you've described, may help a lot if more people adopt it and make fingerprinting a futile exercise to those using it. :) If the people doing the fingerprinting see millions being successfully tracked with just a handful they're unable to track, they wouldn't even care. It's kinda like ad blocking. A few do it and it's not seen as a problem. If the majority does it, then the sites take notice. For a larger scale effect, browser makers should get into this. Mozilla, Apple, Microsoft and Google, in that order (with Opera somewhere in the middle), may be interested in thwarting browser fingerprinting.

Mozilla yes, Apple and Microsoft maybe but I'm unsure, Google I don't think so. They're the ones selling the most ads.

Maybe you need to turn it around and change the fingerprint every minute? Would that help?

A lot of factors that make up the total fingerprint have an influence on how sites react to your browser, so I would have it change per-session and per-domain to prevent weirdness.

I like your idea, but wouldn't it be easier to create a constantly changing unique fingerprint?

Your fingerprint might become more unique but at least only 2 instead of 60 sites get to inspect it.

Request filters: These add-ons filter requests to 3rd party hosts, effectively blocking everything (if set to default to deny all). Most sites, other than web applications or ecommerce, only need to connect to at most a single remote host to pull down their CSS files; the next most common requirement is Google Hosted Libraries.

* RequestPolicy: No longer developed, but still works for me


* RequestPolicy Continued


* Policeman: Haven't tried it, but AFAIK it also filters by data type (e.g., allow media requests from x.com to y.com, but not scripts)


uMatrix if you are a control freak. I usually stand it for two weeks before giving up. And yup, I've trained it to know the sites I regularly visit. Thing is, I also surf NEW sites all the time.

Now I use Self Destructing Cookies, uBlock Origin and HTTPS Everywhere. That works just fine without taking the fun out of the web.

I've found NoScript a bigger part of my browsing habits lately. I wait for the moment when a site just goes haywire, maxing CPU cores, at which point I just nix its script privileges. This isn't nearly as disruptive as distrusting all websites.

uBlock Origin and uMatrix allow you to do this. Just kill third-party resources.

There's really no need to use both Privacy Badger and Disconnect as they both do pretty much the same thing. I'd ditch them both and just use uBlock Origin with the "Privacy" filter lists enabled.

I thought Disconnect was based on preexisting lists and Privacy Badger automatically worked out which sites seemed to be setting cookies and using them for tracking across sites. I'll need to look into it more, thanks.

You're correct. I'm not sure where people get this idea that Privacy Badger's supposed "lists" are included in uBlock, but I've seen it around here a lot. Personally I use both.

CsFire: Protects against Cross-Site Request Forgeries (CSRF) by stripping authetication information from cross-domain requests.



CsFire is the result of academic research, available in the following publications: CsFire: Transparent client-side mitigation of malicious cross-domain requests (published at the International Symposium on Engineering Secure Software and Systems 2010) and Automatic and precise client-side protection against CSRF attacks (published at the European Symposium on Research in Computer Security 2011)

There are also about:config settings that improves privacy,

//no 3p cookies


//less referer headers














From what I have read that can break quite a few things in Firefox, becarefule

* Self destructing cookies

* Better privacy

* uBlock with EasyPrivacy list

* Https everywhere

Self destructing cookies

This looks amazing. Any particular setting one should be aware of or must change to keep things less annoying and smooth?

I turned off the notifications. Didn't need to know every time a cookie blows up.

Self destructing cookies

Chrome gives you the option to delete cookies on quit and has exceptions for whitelisting, is "Self destructing cookies" any different?

Firefox lets you do that natively too. Self destructing cookies deletes the cookies after you close the browser tab, not the entire window. If the browser doesn't provide complete isolation between tabs, this'll make it so the cookie isn't there to harvest from. You can set each site to tab/browser/never too.

This is good, one could add Certificate Patrol to combat MITM attacks.

I use the Perspectives extension for this purpose.

There are duckduckGo privacy settings you can set. I'm not a big fan of a cloud store for your settings and thankfully duckduckGo allows for settings parameters in the url[0]. You can do things like require POST instead of GET, redirecting, forcing https etc. Once you get your search and privacy settings how you like them, take the resulting url and make an openSearch plugin out of it, manually or with something like my mycroft project[1]. Now any searches use your settings, no account/cloud settings needed. It's then easy to throw in every browser you use. Technically a plugin.

PS. If you need a favicon from any site for a plugin or otherwise, easiest way I've found is https://www.google.com/s2/favicons?domain=duckduckgo.com. Will grab it from wheverever the sysadmin decided to put it.

[0]https://duckduckgo.com/params [1]http://mycroftproject.com/submitos.html

I wish their was a way to isolate third party cookies/html5 data to SiteVisited/ThirdPartySite instead of the current ThirdPartySite model. The third party site could track you within the site visited but you would appear as a different user when visiting a different site. There would be no way to track you across websites.

There's no need to use Disconnect anymore. Disconnect's list is now included in Firefox's native tracking protection feature (https://support.mozilla.org/en-US/kb/tracking-protection-pbm), and is also available through uBlock Origin subscriptions.

Use Tor (or just TBB) if you want proper privacy. All the other add-ons are just plugging the leaky holes that is modern browser privacy.

It wouldn't hurt to get SQLite Manager and periodically check what's in the browser databases. For example, if you buy anything online you might find your credit card number in there.

I'm using:

HTTPS: Everywhere

uMatrix (used to use NoScript, but I prefer this now)

Self-Destructing Cookies

Better Privacy


Try uBlock Origin instead. It's faster, lighter, and replaced both Adblock Plus and Ghostery for me.

Don't they sell your blocking data?

They claim to only sell your data if you opt-in to something called "GhostRank." [1]. It's proprietary software so there's no way to actually confirm that though.

There's really no reason for privacy conscious individuals to use Ghostery when uBlock Origin can do the exact same thing.

[1] https://www.ghostery.com/intelligence/consumer-blog/consumer...

In addition to asking you a series of invasive questions when you uninstall — which, I suspect means they sell that data once you're no longer using the software.

Could someone please explain what this does in a clear step-by-step way please?

Here, I'll explain what I think it does so you can at least correct what I'm missing:

(1) User visits web site example.com and needs to get file foo.jpg from example.com.

(2) foo.jpg is available at some content delivery network, let's say Akamai.

(3) User's browser gets foo.jpg from Akamai.

(4) Akamai now knows the user's IP address, the Referer (example.com), and the user agent info (browser version, OS version, etc.)

So what does the Decentraleyes add-on do? I think it does the following:

First, this add-on apparently cuts out the Referer when the browser asks for foo.jpg, but Akamai would still get the IP address (and the user agent info unless the user is disguising that). With the IP address you've been tracked, so does this really help?

Second, this add-on apparently gives you a local copy of foo.jpg if it exists (i.e., a copy of foo.jpg already cached on your own computer). Well, the first copy of foo.jpg had to have come from somewhere (either example.com or Akamai), so you've already been tracked.

NOTE: I'm not criticizing the add-on at all! I'm just trying to understand it.

It's simpler than you think.

The extension has common files (jquery etc) requested included with it, and list of CDN curls that return those files. Every time browser makes request to those urls, the extension serves the local file back.

How does that help? It speeds up browsing, since you have local version of the requested file. It also increases privacy. For example many sites are lazy and use for example google cdn for let say jQuery. Now when you visit the site, google still can track you, because you make request to them.

The only weakness with this approach is that it only works for urls known to the author. Request to unknown CDN or even known CAN, but a new file will still be made (AFAIR there is an option to block unknown files on known CDN, but that will often break many websites)

I thought one of the benefits of using something like Google CDN to serve jQuery was meant to be that a person's browser was much more likely to have that in their cache than mylittlewebsite.com?

That's true, but you don't really have to let Google know which sites you visit in order to pull the jQuery library. This extension provides the files from a local cache so that you avoid the requests to a great extent and thus minimize any tracking. If mylittlewebsite.com and yourlittlewebsite.com both use Google CDN for jQuery, Google would know that you visited these two sites. With this extension, there's lesser chances of Google or another CDN getting all instances of the jQuery download requests (unless each site is using a different newer version of jQuery that's not locally cached yet).

Some of the things they mention, like the Google libs, are a CDN for a small, mostly-static set of stuff. So you could, say, hardcode most/all of the JavaScript libraries that Google hosts right into the plugin and never fetch them from Google.

With that said, I'm not 100% sure what it's doing for "normal" CDN files either. What you're saying sounds like a flaw, and I don't know if I don't see the obvious answer, or if you're right and that's a significant problem.

(1) User connects to example.com and the page includes jQuery.

(2) The developer at example.com has included the version of jQuery from Google Hosted Libraries or another CDN so the request goes to Google's CDN.

(3) Google adds this request to what they already are tracking.

The addon includes a bunch of versions of popular libraries: https://github.com/Synzvato/decentraleyes/tree/master/data/r...

Ah! It looks like it also prevents additional "is this up to date?" requests for normal CDN resources. So you'll still load them on the first request, but you won't send a "hey, I'm using this" trackable request to the CDN after the first time.

That would be bad if the content changed, but in some cases you can be sure it won't.

its not focused on unique assets like images, its focused on stuff like Jquery, which uses the same file from the same CDNs across many websites. Many sites hook into a js script CDN and nothing else from those domains, so this saves being tracked by them.

I'd add Google Fonts to the list. It's a massive privacy leak as well.

The method this extension uses would require one to download the ~1GB font archive upon installation for that to work, since it installs all the blocked libraries in the extension itself. Perhaps it could block the 100 or so most commonly used google fonts and serve those...

  # cat /etc/hosts | grep google		fonts.googleapis.com

Problem solved.

The problem this solves is even worse than it sounds - there's no reason why the NSA couldn't force a CDN to silently concatenate their own analytics onto a site's JQuery. Is there a good way of signing your assets?

There is, and it's called Sub-resource integrity: http://www.w3.org/TR/SRI/

MaxCDN's Bootstrap CDN implements it for example: <link href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.6/css/bootstra... rel="stylesheet" integrity="sha256-7s5uDGW3AHqw6xtJmNNtr+OBRJUlgkNJEo78P4b0yRw= sha512-nNo+yCHEyn0smMxSswnf/OnX6/KwJuZTlNZBjauKhTK0c+zT+q5JOCx0UFhXQ6rJR9jg6Es8gPuD2uZcYDLqSw==" crossorigin="anonymous">

Here is a one line Linux/OSX script that builds the sub-resource integrity hash string of js & css files.


Can experienced with the code speak to how this is a 5MB addon? That smells like an order of magnitude more code than would be needed to block a fairly simple behaviour on fewer than 200 predefined URLs.

Cool plugin, but it's quiet ironic that their test utility uses (and needs) jQuery hosted by Google to work.

Checkout the source here: https://decentraleyes.org/test/

It would be much better to serve jQuery from the decentraleyes.org domain and run the test with that.

tl;dr: It needs jQuery from Google to test if jQuery from Google can be loaded via $.ajax.

A version of Firefox with the modifications in the tor firefox bundle minus the tor network would be splendid. They fix privacy and solve fingerprinting as well. On top of that you don't have to deal with a website's decision to use certain fonts anymore. Is this available somewhere as a patchset/branch for mozilla-release.hg?

Does this work on localhost? Apart from the increased privacy, it sounds like this would also improve offline web development. I could be offline, and still have all the CDNJS libraries on the page load correctly.

Does Privacy Badger cover these CDNs as well?

So does this cache files on your local machine that are often served via CDN?

Doesn't the browser do that automatically for you?

headers are still sent with referer etc information every time the resource is loaded to check for freshness.

Is there something similar but for Chrome?

Do you have Chrome and still care about privacy ? Or maybe you are asking that for the speed enhancement.

Isn't it ironic that this page itself has a google.com dependency as shown by uBlock :D

well it's the mozilla addons site - they couldn't really control that

I know, it was much more ironic when a page by mozilla which was talking about online tracking and privacy had a google tracker :D

Why not just strip the referer header when getting static resources from the CDNs?

You'd have to block IP address, and er...

Blank the cookies and mangle the user agent too.

How does it know where the local files would be?

Looks interesting. Will try and post reviews.

Just use Tor. Anything that doesn't function like Tor (or can guarantee such security properties) isn't worth your time.

Tor isn't worth my time for normal usage. (Although it has gotten better, and is certainly an option in many cases)

How does Tor prevent cookie-based tracking?

The Tor Browser bundle clears cookies on exit. There are some plugins you can add to clear them on tab close too. As for something like CDN cookies, I'm actually not sure. But if you have cookies that are set by the CDN for their domain, then it's not trivial to link the loading page (assuming Referrer headers are stripped) to the resource being loaded because TBB uses different Tor circuits for different websites.

I think running any addons or plugins within Tor Browser is a bad idea. Even if it's from a "respected" source, the risk of it somehow becoming compromised is not worth it. IIRC the bundle even advises you that addons may be risky. Considering that the purpose of Tor is to remain anonymous, one should keep in mind that any addon could de-anonymize you.

> IIRC the bundle even advises you that addons may be risky.

The reason for this is that by installing various non-default addons, you're actually making your browser more unique. As a consequence, you're making it easier to link all of your Tor activity back to a single person.

Great. I add it. I wonder whether a fully free service is trustable or not.

Question: where do you get your info from? I'm trying to gather twitter lists into this repo to know the best sources of info. Please collaborate: https://github.com/davidpelayo/twitter-tech-lists

Registration is open for Startup School 2019. Classes start July 22nd.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact