$etag = substr(sha1($secret . sha1($_SERVER["REMOTE_ADDR"]) . sha1($_SERVER["HTTP_USER_AGENT"])), 0, 18);
> This tracking method works without needing to use: [...] Your IP address or user agent string
The author even notes in the source, "Normally you would derive this from randomness." I wonder what the reasoning was for this strategy?
That definition has expanded a bit (very) recently, but preventing website tracking is usually a separate feature (adblockers, noscript, automatic cookie deletion tools, firefox's recent "tracking blocker").
You’ve gone incognito,
Now you can browse privately, and other people who use this device won’t see your activity.
Your activity might still be visible to:
* Websites you visit
* Your employer or school
* Your internet service provider
Edit: Submitted a bug
Looks like KISSmetrics are getting sued with a class action lawsuit over using this technique.
ETags are one of the biggest disappointments that way.
I wonder if there's a potential to fix this by suggesting a deterministic version of ETags, that's based off of information the User Agent can see. Like the Vary information, response headers and payload.
It's just one of many:
In the old discussion (https://news.ycombinator.com/item?id=6231039) it was revealed that some parties were sued and settled for $500,000. Relatedly British Airways was just fined £183m. That's a beginning.
It doesn't work if you disable image loading.
For 90%+ of the web browsing I do, I don't need to see images at all, and browsing using emacs-w3m which I have set up to show only text and not load any images suffices. Occasionally there might be some image I want to see on a website and then I'll usually load it and view that one (or handful of images) manually. Very very rarely, I'll visit a site with an image gallery, where loading and viewing images one at a time is too painful, and then I'll just open it in Firefox, which I have set up to load images.
I know not loading images by default is not a solution for most people, but it's worked great for me for many years.
Update: A lot of replies are mentioning CSS. Just for the record: emacs-w3m does not process CSS
Wouldn't affect you with since I believe w3m also doesn't ship with CSS. But if anyone is reading this and thinking, "ah, disabling images in UMatrix will do the trick" -- probably not?
That's the real thing to take away -- that images are not the issue, the cache is the issue, and there are lots of things that can go into a cache beyond just images. If you're not caching anything, you could probably load images by default and you'd still be fine.
To avoid an extra fetch, you have to explicitly tell the data source that you already have this piece of data, and sending it is not required.
Even without etags you could devise a similar scheme probably, if you can get a browser to cache a request indefinitely. Then no request would need to be made to get the “cookie,” though JS would be needed to make use of it.
There are also other variations that just use resource bodies, e.g. cache HTML that opens an iframe with a unique URL, JS that contains constants storing data, CSS that opens user-specific URLs via imports, etc.
For tracking you just need any unique identifier, so (lack of) flexibility doesn't do much. You can even track users with HSTS cache that caches only 1 bit per domain.
One way is for clients to first ask for the latest checksum/Etag from the server, and compare it to what they have locally before requesting the resource.
But maybe that's an extra round-trip. I wonder if a different design of HTTP could have avoided all this.
Either a page requests a linked resource, or not. If it does not, we can assume that it has it cached. We can tell these two outcomes apart.
But we can make tracking much harder by not having the client send an ETag. E.g. the server could send ETags of all page's resources in the initial response, and let the client decide. This assumes that the server knows about all the related resources, about the page structure, etc. This is not unreasonable, but this does special-case HTML, and does complicate the server quite significantly.
>If the resource at a given URL changes, a new Etag value must be generated. Etags are therefore similar to fingerprints and might also be used for tracking purposes by some servers. A comparison of them allows the determination of whether two representations of a resource are the same. They might also be set to persist indefinitely by a tracking server.
The private mode should have its own cache that is initially empty.
Idea: since private sessions are typically short-lived, they never have to validate ETags. Basically just cache resources indefinitely and never ask "is this item still valid". the cache is thrown away when the private session is closed; that's what invalidates it.
You may see this in your email to track how many times you open newsletters and other items.
Probably a dumb question but: any reason this couldn't be reliably used instead of cookies to track sessions, e.g. login status etc.?
The GDPR talks about online identifiers, of which cookies, IP address and fingerprints are examples. If you read any regulator's guidance carefully, you'll see they talk about "cookies and similar technologies", with just "cookies" being used alone for brevity.
To rephrase tracking of any kind is the issue, not cookies. Don't mistake the implementation for the activity.
Disclosure: Founder of a non-tracking web analytics service because of this exact issue.