Hacker News new | past | comments | ask | show | jobs | submit login
Cookieless cookies (2013) (lucb1e.com)
149 points by dedalus 5 months ago | hide | past | web | favorite | 62 comments



That's a surprise to see pop up again! Author here. Now I wish I had improved the demo in the past years... It still generates the etag from a few static parameters to make it work without JavaScript (as the page notes near the bottom), a real implementation doesn't have that limitation because they don't care to echo data (such as your note and visit count) back to you before the image with etag even loaded. I should have switched it to display the data in the image, so it works more accurately.


I was a little surprised as well to see your domain pop up, considering you're phasing out this username. Anyways, cool demo, this is the first time I came across it :)


Yeah, I had the same reaction


Hey guys, fun to see you around ^^


The reason I submitted this now albeit dated is due to the fact that now GDPR is alive, CCPA is in the works and overall privacy environment has changed to kill the cookie in advertising as we know it. So there will be a need to explore alternatives to do the same stuff without using cookies (e.g analytics systems such as GA)


No that's not a solution. It's the tracking that counts, not the cookies. I commented elsewhere on this thread more details:

https://news.ycombinator.com/item?id=20394661


And as I said six years ago https://news.ycombinator.com/item?id=6233362 :p (not to diminish your comment!)


The GDPR is about privacy overall and nothing to do with cookies in particular. This approach will also be in breach of the regulation.


I was surprised to see that this tracking works across both regular and private browsing in Firefox (67.0.4 on macOS). I can see the number of visits increment and whatever message I've saved on either side is displayed to both.


It is creating a new session it's just that the server code uses a deterministic session key which is created based on your IP address and user agent. So as long as you use the same browser/IP combo you will get assigned the same etag, (at least user-agent is in there otherwise it would be extra trippy to load the page on FF and go to Safari and see the # visits/message displayed there too)

  $etag = substr(sha1($secret . sha1($_SERVER["REMOTE_ADDR"]) . sha1($_SERVER["HTTP_USER_AGENT"])), 0, 18);


You're right! That sort of negates the opening statement though:

> This tracking method works without needing to use: [...] Your IP address or user agent string

The author even notes in the source, "Normally you would derive this from randomness." I wonder what the reasoning was for this strategy?


Why is private browsing mode not emptying the cache then? Seems to be contrary to the very purpose of the mode...


Private browsing mode wasn't (originally) intended to to prevent tracking by websites. It's goal was to keep your browsing history locally private. (i.e. keep your pr0n viewing out of the local browsing history so your friends wouldn't see what kind of porn you like when they borrow your laptop to check their email.

That definition has expanded a bit (very) recently, but preventing website tracking is usually a separate feature (adblockers, noscript, automatic cookie deletion tools, firefox's recent "tracking blocker").


Private browsing was and still is mostly about hiding from other users on the device, not hiding from the people you are connecting to.

---

You’ve gone incognito,

Now you can browse privately, and other people who use this device won’t see your activity.

Your activity might still be visible to:

* Websites you visit

* Your employer or school

* Your internet service provider


I just tested Firefox 67.0.4 and Chromium 75.0.3770.100 on Arch and both have the same behavior. Definitely not something I expected.


It also works even if you use uMatrix and block the loading of images and css, after the image was loaded for the first time.


Does this interfere with lazy loading?


Same for me with Firefox and chromium on Ubuntu. Guess they share the cache with private windows?


Same for Chrome. Wow. Truck-sized hole, this one.


Same for Safari on OSX and iOS.


Should only work for first party though. Safari partitions its cache by the origin in the address bar[1] so your typical advertising tracking should not work.

[1] https://andydavies.me/blog/2018/09/06/safari-caching-and-3rd...


Same, actually. Firefox 67.0.4 (64-bit) Windows.

Edit: Submitted a bug


Can you add a link to the bug you created?


Wikipedia has a list of websites known to use this technique already: https://en.wikipedia.org/wiki/HTTP_ETag#Tracking_using_ETags

Looks like KISSmetrics are getting sued with a class action lawsuit over using this technique.


There are a lot of very cool things that have been in or allowed by the HTTP and HTML specs that have had to go because people can't be trusted to play nice.

ETags are one of the biggest disappointments that way.

I wonder if there's a potential to fix this by suggesting a deterministic version of ETags, that's based off of information the User Agent can see. Like the Vary information, response headers and payload.



In Firefox, setting

    browser.cache.disk.enable
    browser.cache.memory.enable
to false, seems to stop some of this from working. The last visit date still works, but the text storage and number of visits does not.


It's not a perfect solution, though. You're going to stick out as one of those people who disabled their cache entirely. You mitigated this exploit, at the cost of increasing your browser fingerprint entropy. Ideally you want to clear your cache when you clear your cookies.


How can a website detect a disabled cache vs. the cache being full, for example?


I am shocked firefox still has such gaping privacy holes:

It's just one of many: https://samy.pl/evercookie/

In the old discussion (https://news.ycombinator.com/item?id=6231039) it was revealed that some parties were sued and settled for $500,000. Relatedly British Airways was just fined £183m. That's a beginning.


"Even when you disabled cookies entirely, have Javascript turned off and use a VPN service, this technique will still be able to track you."

It doesn't work if you disable image loading.

For 90%+ of the web browsing I do, I don't need to see images at all, and browsing using emacs-w3m which I have set up to show only text and not load any images suffices. Occasionally there might be some image I want to see on a website and then I'll usually load it and view that one (or handful of images) manually. Very very rarely, I'll visit a site with an image gallery, where loading and viewing images one at a time is too painful, and then I'll just open it in Firefox, which I have set up to load images.

I know not loading images by default is not a solution for most people, but it's worked great for me for many years.

Update: A lot of replies are mentioning CSS. Just for the record: emacs-w3m does not process CSS


You're easily trackable too, how many people are there that use a browser that doesn't loads images, or have a browser that sends a emacs-w3m user agent? I'm guessing not many.


I don't know why people are mentioning CSS as if only special types of resources can be cached with ETags. ETags can be used on HTML documents or TXT files or whatever. It doesn't matter what type of file it is.


Can generate etags for css files. Even normal html content can give an etag.


I'm not going to have time to experiment for the next few weeks, but I bet I could set up a version of this that worked via CSS as well.

Wouldn't affect you with since I believe w3m also doesn't ship with CSS. But if anyone is reading this and thinking, "ah, disabling images in UMatrix will do the trick" -- probably not?


emacs-w3m does not process CSS


More to the point, emacs-w3m does not have a cache at all.

That's the real thing to take away -- that images are not the issue, the cache is the issue, and there are lots of things that can go into a cache beyond just images. If you're not caching anything, you could probably load images by default and you'd still be fine.


ETags don't work just on images, you can use any resource, like stylesheets.


emacs-w3m doesn't process CSS either, nor Javascript for that matter.


This is a fundamental property of caching.

To avoid an extra fetch, you have to explicitly tell the data source that you already have this piece of data, and sending it is not required.


Etags make it really easy though, by storing not a hash, but arbitrary text. Of course arbitrary text is much more flexible.

Even without etags you could devise a similar scheme probably, if you can get a browser to cache a request indefinitely. Then no request would need to be made to get the “cookie,” though JS would be needed to make use of it.


Same tracking is possible by embedding data in last-modified/if-modified-since, or location of cacheable redirects.

There are also other variations that just use resource bodies, e.g. cache HTML that opens an iframe with a unique URL, JS that contains constants storing data, CSS that opens user-specific URLs via imports, etc.

For tracking you just need any unique identifier, so (lack of) flexibility doesn't do much. You can even track users with HSTS cache that caches only 1 bit per domain.


Is there a way to design the http cache so that this is not true?

One way is for clients to first ask for the latest checksum/Etag from the server, and compare it to what they have locally before requesting the resource.

But maybe that's an extra round-trip. I wonder if a different design of HTTP could have avoided all this.


Using Subresource Integrity, the server can tell the browser about the latest checksum without having an extra round-trip.


No, there us no way around it.

Either a page requests a linked resource, or not. If it does not, we can assume that it has it cached. We can tell these two outcomes apart.

But we can make tracking much harder by not having the client send an ETag. E.g. the server could send ETags of all page's resources in the initial response, and let the client decide. This assumes that the server knows about all the related resources, about the page structure, etc. This is not unreasonable, but this does special-case HTML, and does complicate the server quite significantly.


Subresource integrity solves this issue.


I tried this on the 0.66.99 version of Brave and it works.


I wasn't familiar with etags, so for anyone else curious I'll save you the google search:

https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/ET...

>If the resource at a given URL changes, a new Etag value must be generated. Etags are therefore similar to fingerprints and might also be used for tracking purposes by some servers. A comparison of them allows the determination of whether two representations of a resource are the same. They might also be set to persist indefinitely by a tracking server.



Interesting. You can store the data in a regular browser window, and it's still there if you open the URL in a private window.

The private mode should have its own cache that is initially empty.

Idea: since private sessions are typically short-lived, they never have to validate ETags. Basically just cache resources indefinitely and never ask "is this item still valid". the cache is thrown away when the private session is closed; that's what invalidates it.


Also read: clear gifs / web beacons

You may see this in your email to track how many times you open newsletters and other items.


Firefox 69.0b3 here, works across refreshes but not when restarting the browser or opening a private window.


It's worth noting that when you do a hard refresh, obviously you won't get a 304 Not Modified, which is what this tracking relies on. So, if you wanted to clear the state, that would be one way.


Interesting technique, I learned something new about ETags.

Probably a dumb question but: any reason this couldn't be reliably used instead of cookies to track sessions, e.g. login status etc.?


The developer ergonomics are certainly less convenient. The ETag is tied to a particular resource, not an origin.


Before anyone thinks this (and similar) approaches are a way around the GDPR's cookie consent tracking crackdown: It's not.

The GDPR talks about online identifiers, of which cookies, IP address and fingerprints are examples. If you read any regulator's guidance carefully, you'll see they talk about "cookies and similar technologies", with just "cookies" being used alone for brevity.

To rephrase tracking of any kind is the issue, not cookies. Don't mistake the implementation for the activity.

Disclosure: Founder of a non-tracking web analytics service because of this exact issue.


All true. Small note: whereas cookies are easy to identify as tracking, etags have a legitimate purpose and you might not know that you're being tracked by this method. It would be illegal not to disclose it, so I doubt any self respecting company would do it, but also hard to detect.


Clever but as you can’t analyze cross domain images I don’t think you can use this to track people across the web.


You can analyze cross-domain images (and other resources) thanks to CORS. There are many other variants of persisting data in cache via JS and CSS, so only cache partitioning by top-level domain can stop it.


On Chrome for Android it does not work with data saver enabled. Disabling it causes it to work for me.


Data Saver makes the images go through Google's servers, so the browser doesn't get the ETag. On the other hand, it's much easier for Google itself to track you.


Doesn’t appear to work with DuckDuckGo browser on iOS.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: