
Persistant and unblockable browser cookies using last-modified HTTP header - nikcub
http://nikcub.appspot.com/persistant-and-unblockable-cookies-using-http-headers
======
mike-cardwell
As long as caching exists, there will always be tricks like this. Here's
another example: Encode a unique ID in an image and send that to the browser
with a long cache time. Then on subsequent requests, use JavaScript canvas to
read the image and decode the unique ID from it.

I've already completely disabled disk caching in Firefox by setting
browser.cache.disk.enable to false. I'm seriously considering disabling the in
memory cache too at browser.cache.memory.enable.

EDIT: In fact, I've just gone and disabled in memory caching. Will be
interesting to see if this change is noticable in my normal usage.

~~~
Erwin
Even the Canvas/JS is not necessary. Have your website load up a styles1.css
file which is really generated by a script and has long expiry; that can then
in turn @import a styles2.css embedding some unique ID in the imported URL
(alternatively reference an image with the unique ID). As usual you would try
to identify the user by existing cookies first, so this is a fallback.

The styles2.css has little or no caching enabled, so that it's requested
reasonably often by the browser; it can then set a cookie or you can just
correlate the information on the server side.

That doesn't require Javascript, just CSS (if requiring JS is OK, you could
serve a some innocently named "helper_functions.js" that sets windows.uuid =
"some unique value" and force it to be cached).

The site that started this: <http://samy.pl/evercookie/>

------
ck2
The plugin mentioned at the end won't work - not unless it's going to have a
tedious whitelist.

Since many sites use "3rd party requests" like remotely hosted images, fonts,
and even for example google based jquery, blocking them would most certainly
break the page.

What I would like is a warning when the last-modified appears to be a
malformed date or not a date at all. PHP's strtotime can convert almost
anything to a date/time (if it really is a date). Can it be ported? Optionally
when it doesn't appear to be a date within the past 30 days, never cache it.

Another solution for a pure privacy mode is to never send back last-modified,
ever. It would hurt servers and page load times because things would never be
cached and always served as "200" instead of "304" but for a pure privacy
mode, may be necessary.

Last but not least, we could just dial down our cache time limit to say an
hour max. It would still give some info to the trackers but not between
browser sessions or the computer turned off. Since firefox doesn't have a time
limit on the cache, using a memory-only cache is the only easy workaround for
now.

~~~
justincormack
Http defines the date format precisely (although there are 3 valid formats)
and browsers already can parse it for other purposes, like cache expiry.
<http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html> I imagine this bug
will be fixed pretty quickly as the http replies being sent are out of spec.

------
heelhook
Very interesting find, one of those that when you read the title you
immediately realize its so obvious, just you never thought about it before!

One thing to note, the date could still be used to identify using any date,
say 01/17/2157 identifies you. What could be done is to restrict the date to
(present time - 3.months) <= last-modified <= present time.

That effectively reduces the number of people they can track to 7776000.
Rounding off the seconds would reduce it to 129600.

------
senko
Ensuring that if-modified-since is a proper date might not help much there.
The evil site could just encode the unique id info in the time (and/or date)
portion of the tag. A possible workaround would be for the browser to randomly
send time a few seconds in the past, so the cache still mostly works, but that
the value can't be reliably used for anything else.

~~~
caf
If we force If-Modified-Since to be in the last year and limit it to 1 minute
resolution, that leaves 19 bits worth of uniqueness for the site to play with
- probably not enough for tracking on a large affiliate network?

~~~
socratic
Can n image GETs (or similar) be used to produce 19*n bits of uniqueness,
presuming that the affiliate network is willing to handle a factor of n more
requests?

~~~
sirclueless
Oof, you're right. These are better than cookies because you can set them for
specific URLs. Correlating requests wouldn't be too hard, just look for
consecutive requests from the same IP with the same referrer.

------
Tichy
I am surprised that it works like that. I would have expected the browser to
just remember the last time they accessed a resource and send that time along.

~~~
FaceKicker
That seems like a pretty good idea to me; would make the last-modified
response header unnecessary. I'm guessing it's not done this way because of
synchronization issues? (e.g., if someone has their computer's clock set to
the year 2025, they can never refresh on servers that use the header)

------
aurynn
That is a surprisingly clever and evil hack.

------
copypasteweb
>The privacy plugin that I am working on, Parley, would solve the cross-site
tracking aspect of this bug, since it blocks all third party requests.

[https://addons.mozilla.org/en-
US/firefox/addon/requestpolicy...](https://addons.mozilla.org/en-
US/firefox/addon/requestpolicy/)

~~~
nikcub
should have mentioned that it is a Chrome extension

and the reason why I haven't released it is because I am experimenting with a
number of features such as cookie rewriting, cache invalidation by rewriting
requests, forcing SSL, etc.

------
pilooch
As the maintainer of an open source proxy (Seeks), I'm playing with
randomizing the last-modified header, on demand. The randomizing procedure is
triggered by a regexp over the requested URL.

Does anyone here know if (and where) a useful list of websites that use such
tracking methods has been compiled ?

------
spaghetti
Just remove the Last-Modified header from all in-coming server responses? And
remove the If-Modified-Since header from all browser requests? If these can be
set to arbitrary strings without affecting browser performance, user
experience etc why have these headers at all?

------
prodigal_erik
I'm not a fan of virtualization for apps that are maintained and should be
able to coexist under one kernel, but I may reconsider since browser and
plugin vendors are still not offering a permanent Chinese wall between my
activities on separate sites.

~~~
T-hawk
Because the sites themselves don't work with a Chinese wall between sites.
Yourfavoritesite.com is probably loading images from
images.yourfavoritesite.com, or possibly akamai.net, and scripts from
jquery.com. Making a browser able to distinguish between that and black-hat
cross-site stuff is an extremely difficult task that we still haven't gotten
quite right.

~~~
mike-cardwell
I wish we had sandboxed tabs. Ie you create a sandboxed tab instead of a
normal tab, it has its own cache, it's own cookie store etc. I could open my
bank website in a separate sandboxed tab and not have to worry about sites in
other tabs hitting it with CSRF attacks etc.

------
5h
As a UK based web developer, All these esoteric tracking methods makes me hate
on the ICO's recent idiocy even more.

