Hacker News new | past | comments | ask | show | jobs | submit login
Tales of Favicons and Caches: Persistent Tracking in Modern Browsers [pdf] (uic.edu)
179 points by amenghra 43 days ago | hide | past | favorite | 53 comments



"More importantly, the caching of favicons in modern browsers exhibits several unique characteristics that render this tracking vector particularly powerful, as it is persistent (not affected by users clearing their browser data), non-destructive (reconstructing the identifier in subsequent visits does not alter the existing combination of cached entries), and even crosses the isolation of the incognito mode."

Why are favicons cached separately? I assume it is just code from pre-commercial www days that no one has since bothered to examine or rewrite?

I feel like privacy within modern browsers is a Sisyphean struggle. Their vast and ever-expanding API surface can never be brought under sensible control without splitting the browser into several unrelated tasks that must cross strictly locked down interprocess communication channels. The existing multi-process architecture must be taken to the next level, but who will do the difficult work involved given that of the major players only Mozilla and Apple have a stated incentive for privacy and even there their stated incentive is on fairly weak grounds since one is a profitable corporation while the other is funded by profitable corporations?


Maybe we should consider legal solutions?

There are real-world threats that you can't 100% defend against and yet we are mostly safe because the law is an effective deterrent.

Why not apply the same on the web? How come we have draconian anti-hacking laws (that are sometimes abused), but none of them are used against this tracking where it's essentially the same result as installing spyware?


Strongly this.

Privacy is not a technology problem. It's a business problem. As long as the adtech industry is allowed to thrive, as long as people build companies with ad-based or data-resale-based business models, this will be an endless game of whac-a-mole, with the browsers only ever growing in complexity, and building anything on the web only becoming more difficult.

We have to address the root cause: advertising as a business model. My suggestion: let's apply regulatory measures to kill this business model entirely.


> We have to address the root cause: advertising as a business model.

Isn't the root cause advertising that depends on data sharing, rather than advertising itself? I think it's fine if a site wants to display advertising that it serves from its own domain, without passing on any data to third parties.


The way I see it: advertising as a business model => want for better ads; targeted ads are more effective than untargeted ads, therefore the business model drives increased privacy violations.

I agree that in terms of privacy alone, an intervention point could be to get rid of third-party ad targeting. But advertising itself - not just targeted one - causes so many pathologies on the web that I'm in favor of focusing on the common cause.


> targeted ads are more effective than untargeted ads, therefore the business model drives increased privacy violations

That's what the current lions keep saying (Google, Facebook), but what few unbiased, unsponsored studies keep showing and what nearly a century of advertising "common sense" knew is that they are wrong on multiple levels. Targeted advertising is "preaching to the choir" at best and calculated harassment of your target (micro-)demographic at worst. Neither of those extremes and largely nothing in the middle in between them is actually good for growing a brand.

I have a slow burning boycott of companies that target me too directly, and if trends continue I wouldn't be surprised if that becomes a more general boycott/movement/backlash among the populace.

Advertising can be reformed if we regulate the business models without killing advertising as a whole. It should be as easy as a reboot to pre-DoubleClick/Google advertising best practices that served reasonably well for a century or more.


It's not just ad-tech. You have full fledged business models based on siphoning off and sell your privacy, like Plaid and Visa. In fact, every CEO asks their company a very important question--how do we weaponize our data? It's a revenue stream for everyone.


One idea I’ve chewed on: Make it a law to have to pay people for their private data. Ie charged by the minute (second would be best) to the tune of minimum wage or an organizations WTP as a salary for 24/7 access. The idea is to price and legislate it at the point where it makes sense for the average citizen/user. Creating the notion of private property and reinforcing it is a fundamental purpose of government.


This would raise many problems: do you pay based on how long you store the data for? Does derived data qualify too (otherwise you can keep the original data for a minute, derive an intermediate data format out of it, then discard the original)? Do you charge for how long is spent processing that data (so they just throw more CPU at it so the data is only processed for milliseconds)?

I don't think this is a solution. Not only is this hard to implement & enforce, but this still ends up legalizing the unwanted processing of consumer's data as long as the processors can pay the fee. Those users should be allowed to decline regardless of how much the processor is willing to pay.


Those questions would be addressed in the actual legislation. Yes, enforcement becomes the tough point. I actually suspect that might be why it makes sense. Companies would be incentivized to not keep data around any longer than they need it. And companies like facebook would have incentives to decrease the size of their networks as a mitigation from class actions. They might even restructure to match a franchise model where you only ever interact with your “local” FB. That local FB would be much easier to police both within FB and externally.


I mean, you fundamentally "pay" with your private data already in the sense that those services use your "payment" in order to offer you the service in the first place.

You could force businesses to put a number on it. But in general websites don't just sit on the money they make from ads - they spend it on hosting costs and whatever other business expenses. Note that I'm not arguing whether some of them are still making a fortune with it - but requiring that users are paid for using a service that incurs costs for the other party is... backwards?


Yes, you’re paying with your data already. But you’re generally not able to ask to be paid for it. We’re treating peoples data like a public good that anyone can profit off of - except the individual who owns and generates it. That’s why there's got to be a way to make it possible for the market to properly manage this market on its own-once the market participants are given a legally reinforced way to interact as a market with supply and demand.


The legal system cannot keep up with the complexity rapid change of software. You will end up with one regulation that says you cannot track users, and another regulation that says you have to prove that you are effectively blocking Iran users from using your product. If you log IPs connecting to your severs you'll be accused of tracking users, and if you don't log IPs you'll be accused of not doing your due diligence in confirming you blocked foreign users.


Is this an actual problem or is this a typical knee-jerk argument people make when someone is talking about regulation? (despite the current situation being so bad that it's hard to imagine regulation making it worse)

Regarding your specific example, the GDPR appears to deal with it easily: any data processing to comply with the law is allowed and does not require explicit consent. This seems to work well (of course, the GDPR is bad because it't not being enforced seriously, but if it was, the scenario you describe wouldn't be a problem)

Also, when I talk about regulation, I'm talking about regulating the intent and/or outcome rather than a particular implementation. If you track someone without their explicit consent for the purposes of targeted advertising or marketing you are in breach of the regulation, regardless of whether you obtained that data online, in the real-world (mobile phone tracking, facial recognition, loyalty cards, etc), by using Tarot cards or even a fortune-telling goldfish.


"Why are favicons cached separately?"

That's a great question. My guess is that it's because they are used for things like bookmarks and the chrome page that shows frequently visited websites. And that something about those uses made a separate cache logical. A bit of googling does show lots of confusion and bugs because of it though.


> Why are favicons cached separately?

Likely because they are used for bookmarks and you don't want clearing the cache to remove all of the icons from your bookmarks.

Of course you could only do this for URLs which are bookmarked however it would be more work (probably why it wasn't done) and would remove icons from your browser history (probably a minor loss).

TL;DR Because they are used outside the context of browsing.


why couldn't we solve that by having a separate cache for bookmarks sandboxed away from web content processes?


Presumptive user feedback: "How come when I click this bookmark which has the icon nice and right there it sometimes takes minutes for the icon to show up on the tab?"

Principle of least surprises for the user is probably at play here. Bookmarks and tab icons seem like reasonably similar "chrome" to the user.

Separating the caches isn't necessarily easy either: it is just as likely to hand the trackers at that point a good signal for people who bookmarked a site based on whatever heuristic ends up being to refresh that cache if it is no longer "recently accessed tabs".


maybe we should consider switching to alternative browsers like kingpin. the chrome is too powerful and google has its own (business) goals.


"Firefox: ... However, it never actually uses the cache to fetch the entries. As a result, Firefox actually issues requests to re-fetch favicons that are already present in the cache. We have reported this bug to the Mozilla team, who verified and acknowledged it. At the time of submission, this remains an open issue. Nonetheless, we believe that once this bug is fixed our attack will work in Firefox..."

Gosh, I hope the favicon cache bug the authors filed isn't fixed until a broader mitigation against this is implemented.


Bugzilla link since I didn't see it in paper: https://bugzilla.mozilla.org/show_bug.cgi?id=1618257

I find it kinda weird that Solomos reported it as normal defect and even prompted for fix update months later without making it clear it would make FF vulnerable to issue...


> " However, it never actually uses the cache to fetch the entries."

I doubt the "never" because it regularly shows me the wrong favicon. This has been true for so many years that I consider it a familiar quirk more than a bug...


Firefox bug is being tracked here: https://bugzilla.mozilla.org/show_bug.cgi?id=1618257


Later in the paper:

> we have disclosed our research to all the browser vendors.

Please consider that the researchers apparently submitted TWO bug reports. One because functionally the cache is broken, one because there's a potential privacy issue.


The account used to file that bug has not filed any other bug reports, so it isn't clear to me if they did report the underlying security issue that they found. (Disclaimer: I work on Firefox, but I'm just speaking for myself.)


It looks like someone else posted their paper to their bug.


Good. This sums it up pretty well:

    I also think that it would have been appropriate to notify about the
    ulterior motive behind this defect report at the latest when the paper got
    published. This underhanded approach of reporting a defect just leaves a bad
    taste, really.

    The behavior may be an actual defect in the classical sense, but I'm just
    wondering what would have happened, had this been addressed "in time" by the
    developers. It would seem that the researchers would then have triumphantly
    proclaimed that all major browsers are prone to their newly found attack.
    Must be somewhat disappointing that it didn't get fixed "in time" to make it
    into the paper that way.


I wonder if that behaviour is misconduct under the rules of the researcher's university. It seems at least highly questionable for a university employed researcher to, in effect, feature request a privacy vulnerability in order to later be able to publish an academic paper on that vulnerability.


Seems like this is a pretty clear ethics violation on the part of the authors.


Straight up Black Hat work. Not cool.


It's unbelievable that any form of unclearable cache is allwoed to exist.

"Clear Browsing Data" must clear ALL browser data, as if I was doing a completely fresh install of my browser but maintaining my settings, extensions, bookmarks, and auto-fill.

That is IT. Yes, Google Chrome, you must also delete Google cookies (which they do not do).


That's why I setup my Linux install to work like a live-CD, with a two layer filesystem: a read-only base, and a read-write overlay that lives in the RAM. The files that I know I want to keep are bound from a read-write partition on the disk to the RAM filesystem, and all the rest gets deleted every time I shutdown my PC.

A lot of pieces of software non-maliciously keep records of everything you do with them through logs or caches that aren't straightforward to delete and it's the only way I found to have control over it.


How do you persist files you care about? Another separate partition?

This is an interesting approach. Do you have any documentation on to how it was setup? Also, how do you change a setting in your browser? Do you have to rebuild your base layer?


Yes, the read-write disk partition also holds my files.

No docs I'm afraid and I set it up too long ago to remember the exact details. I used overlayroot, there are some really good resources on google to set it up like this. If I remember correctly it's just a matter of setting the overlayroot.conf file to:

  overlayroot_cfgdisk="disabled"  
  overlayroot="tmpfs:swap=1,recurse=0"  
And then a grub option to mount the base in read-only:

  linux /boot/vmlinuz-5.3.0-51-generic root=UUID=... ro  $vt_handoff
Then you add your mounts in fstab for persistent stuff.

I think this blog post describes it well: https://spin.atomicobject.com/2015/03/10/protecting-ubuntu-r...

For modifications to the base, installing or modifying software, etc I have a grub option to disable the overlay system and mount the base partition in read-write so it can be used normally. So I reboot into this option, do my changes, then reboot immediately in overlay mode.

  linux /boot/vmlinuz-5.3.0-51-generic root=UUID=... rw overlayroot=disabled  $vt_handoff  

It took me about a month to get used to it, sometimes I'd apt-get something then the next day I'd facepalm after realizing I had done it in overlay-mode and had to do it all over again. I haven't lost any personal files though, it's pretty easy to remember to avoid saving them to your home and instead go to the persistent partition.


Clearing browser history should be interpreted as "nuke my browser container's cache directory please". This also requires that all "cache" gets into the cache directory though, which might not be the case.

Unfortunately nuking the whole of the container while effective, it's probably not desired, as it contains various browser settings and browser extensions.


The open-source BleachBit does an excellent job of clearing out caches and vacuuming out SGLite databases and can also remove icons and thumbnails.

bleachbit.org


What's unbelievable is the audacity of these companies. Programmers want to improve performance for everyone so they come up with caching mechanisms. So what do the companies do? They abuse the feature in order to track users.

The ability to clear browser data is not quite enough. Caching should be disabled by default in all browsers due to the potential for abuse. Oh no, now companies are getting less conversions and sales due to the loss in performance... Sucks to be them. Actually the more their abuse costs them the better.


I wonder if it is possible to implement an "out-of-band" cache clearing command(s). On Linux it would be quite straightforward, but I know next to nothing about Windows or OSX.


Check out BleachBit[0]. It’s available for Windows and Linux.

[0]: https://bleachbit.org


Browser vendors don't take clearing browser data seriously, see how Firefox has implemented the browsingData extension API [1]. These bugs compromise the security and privacy of Firefox users, but fixing them has not been a priority over the years.

Built-in clearing options in Firefox will also leave classes of cached data behind. The only reliable way to wipe everything has been to delete specific files from the Firefox profile folder before the browser launches.

[1] https://armin.dev/blog/2019/03/firefox-extensions-browsing-d...


So two things:

1) this is insane! It even breaks the “sandbox” of incognito mode.

2) Based on how it works I would assume it absolutely decimates the back button functionality, which depending on what you’re trying to accomplish might be a good thing, and 2 seconds isn’t a short period of time. Ppl wouldn’t be that ok with waiting 2 secs even with today’s js heavy loads.


First of all, thanks for sharing because it's such an insightful paper!

Some thoughts/doubts on it:

1. It's unbelievable that in a world where we promote privacy and freedom of individuals such cross-country trackers exist. It seems more an Orwellian story rather than reality.

2. I'm a bit ignorant on this theme on a technical level (I have a business background, even if working at a tech startup focused on security). There is a growing concern globally over an increasing sensitisation over privacy and over the importance of security. Even Google has promised to remove third party cookies within 2 years, and there is going to be a migration from Whatsapp to Signal (even if Whatsapp clarified a bit on that). Do you think that such fresh tools like these "favicons" or simple tracking will remain long term?


Do you speak Spanish by any chance? The way that you are using "doubts" (dudas) sounds slightly off to me. I live in Spain and I hear it a lot :)


I am italian actually, pero hablo un poco español tambien! The idioms are really similar.


> Specifically, websites can create and store a unique browser identifier through a unique combination of entries in the favicon cache. To be more precise, this tracking can be easily performed by any website by redirecting the user accordingly through a series of subdomains. These subdomains serve different favicons and, thus, create their own entries in the Favicon-Cache. Accordingly, a set of N-subdomains can be used to create an N-bit identifier, that is unique for each browser. Since the attacker controls the website, they can force the browser to visit subdomains without any user interaction. In essence, the presence of the favicon for subdomain in the cache corresponds to a value of 1 for the i-th bit of the identifier, while the absence denotes a value of 0.

So the bulk of it is: cashing favicons, timing request, multiple redirects through controlled subdomains.


This paper references https://www.ndss-symposium.org/wp-content/uploads/2019/02/nd...

They in turn reference my 2015 take on this: http://dnscookie.com/

With homage Moxie's Cryptographic Doom Principle, I propose the Cache Doom Principle: If a system's behaviour can be influenced by a cache, eventually someone will figure out a way to use that cache to leak data.


Does this make much of a difference? My impression was that we already lost against fingerprinting and browser vendors keep adding more and more feature crap which makes it only worse.


Could one perform this attack without redirects by changing the page's DOM.head.link(rel=icon).href value with JavaScript?


Well apparently javascript can be used to modify the favicon dynamically: https://stackoverflow.com/questions/260857/changing-website-... - presumably this will then have the same interactions with the cache.

Perhaps you could just rely on the user navigating across a number of pages on your attack site.


Does anyone know if you can disable favicons in firefox?


You can on iOS Safari. I never understood why it was disabled by default.


I block favicon requests with a forward proxy. One could probably block them using DevTools or an ad blocker.


For me favicons are a huge UX requirement, I can barely use my browser tabs without favicons.




Applications are open for YC Summer 2021

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: