
Unsanctioned Web Tracking - cpeterso
http://www.w3.org/2001/tag/doc/unsanctioned-tracking/
======
codezero
I work for an analytics company that doesn't share data with third parties and
doesn't use any questionable persistence techniques (we set a domain cookie,
that's it), and I can say that unsanctioned tracking hurts everyone.

We get lumped in with super questionable networks in tools like Adblock and
Ghostery and I have to explain to people when I tell them that I work in
analytics that we aren't evil because the default assumption is more and more
becoming that analytics and ads are all doing shady things. That sucks. I wish
there was a way to combat unsanctioned tracking without diminishing the value
that legitimate, and user controlled, tracking and attribution tools provide.

As long as people are willing to do anything to squeeze out more money or beat
their competition, questionable techniques will find their way into the
market, and that's bad for us all :(

~~~
dkfsls
It's extremely difficult to determine good vs bad analytics companies. Every
analytics company is tracking data for multiple websites and therefore can
track people across the web. How is the user suppose to know what you are
doing with their data? Even if you aren't doing anything now, how can they be
sure that won't change, especially if the company is sold.

~~~
codezero
This is sort of true but it depends. We set a first party domain specific
cookie. We can't track a user across different domains, or customers.
Technically we could correlate based on IP and activity times, but it's not
the same as setting a super cookie that is shared between sites.

You are still right, how is there user supposed to know if one tool is
reputable and another isn't. Worse than that, one may be fine today but then
it gets acquired and someone starts putting the pieces together and uses your
historical data in good_tool to link you in bad_tool.

I don't have a good answer. As the other commenter mentioned, regulation may
help.

Beyond that, some kind of standardized policy that can be checked and tested
would be nice.

~~~
teej

        We can't track a user across different domains, or customers.
    

To be super clear- yes you can. You don't. That's very different. With full JS
access on a site you have the ability to collect a lot of information. As
another poster mentioned, it only takes 30 bits of entropy to identify all 3
billion internet users.

~~~
codezero
You're right. My bad. We don't. Not we can't.

Technically, though we can't, since we'd have to dedicate engineering time to
making the changes necessary to do that kind of tracking, and we're not going
to :P

------
frik
Browsers should (as default setting) disallow third party JS, problem solved!

Analytics and ad-network company could create on-premise products that can be
installed on the website server, problem solved.

Win-Win.

------
mhkool
Tracking started with single site tracking and worked well with a simple
cookie. There was no overhead and no bloat like heatmaps and 5-level nested
scripts that display an invisible pixel. It was not evil except that users did
not know that they were tracked.

Then the evil started with multi-site tracking. Trickery was required to
implement cross-site tracking and the advertisers became too obsessed with
'knowing the user' that they overdid it. I would like to see proof that
today's excessive tracking really pays off with higher click rates.

A lot of trickery uses javascript and there are only a few users who disable
javascript since almost all sites uses it and sites get (seriously) crippled
if javascript is disabled. I therefore am waiting for a browser vendor who
recognises this problem and comes with a 'Javascript light' where only a small
subset of javascript -- just enough to build great responsive websites -- is
supported. Javascript light will not allow to generate heatmaps, invisible
pixels and upload system information. Surely you will say 'web programmers
need system information' but this can be provided in a different way, i.e.
browsers can send a header with the information that they want to give.

Another evil are the social networks since they have their 'like' button on
almost every website that you visit and hence know _exactly_ what a user does
24 hours per day. Just like the 'do not track me' feature, the web needs a new
'no like buttons for me' feature.

~~~
Jgrubb
What features would this light version of Javascript support and what features
would it not support. Obviously `new Image()` is out, as is any kind of cookie
support right? So that means no ability to log in? I don't think any browser
vendors would bother spending developer time on a project that would severely
break pretty much every site out there.

Not to mention, you must realize that any and all methods that were available
would eventually be exploited for tracking users in the absence of the usual
methods.

This is a situation that is only going to be "fixed" with legislation and
regulation.

~~~
mhkool
I am not against legislation but also not optimistic about politicians being
able to make a proper law.

You are correct, 'new Image' in javascript would be out. And the question is,
if that really hurts. Do you want to dynamically create an image in javascript
and let the decision-making happen inside the browser of an end user? Or can
the 'new Image' functionality of javascript be considered bloat because the
decision to put an image somewhere on the page can be made perfectly well on
the web server?

And we can go on, should it be possible to do a POST from javascript code, or
should a browser only allow/do a POST when a user presses a button?

------
userbinator
_Encourages browser vendors to expose appropriate controls to users who wish
to minimize their fingerprinting surface area._

This is something everyone should be encouraging - unfortunately, browser
vendors seem to be slowly removing and/or making more opaque any
configurability, in the name of "simplicity". I agree it certainly is simpler
to not think about web tracking or privacy at all, but perhaps these are
things worth thinking about.

~~~
meowface
It's really difficult, though. There are dozens of different ways someone can
be fingerprinted [1], and you may have to make serious or even crippling
changes to browser features to mitigate some techniques.

It's not fair to say "privacy is dead", but I think it's safe to say "browser
fingerprint evasion is dead". Solutions to easily and selectively block
unsavory companies and networks (ad or otherwise) while letting users allow
some things, like uBlock and uMatrix, are probably the only feasible
solutions.

[1] [https://www.chromium.org/Home/chromium-security/client-
ident...](https://www.chromium.org/Home/chromium-security/client-
identification-mechanisms) (This still isn't a totally comprehensive list)

------
Animats
Returning bogus information to trackers is one way to fight back. If the
tracking data quality can be destroyed, it will become useless to advertisers
and tracking companies will go broke.

Meanwhile, use Ghostery and block everything. A few sites won't work, but
there are better alternatives for most of them. With all tracking blocked, you
can't watch ABC-TV, but in exchange, commercials are skipped on CBS-TV shows.

------
biturd
Can someone tell me how these super cookies work? Cookies can only be read in
the domain/origin they were set, so how is a cookie passing data off to other
domains?

Or is it just as simple as two companies working together and fingerinting and
matching thereby making that called a super cookie?

~~~
nl
SuperCookies are generally considered to be cookies that recreate themselves
after deletion. A variety of techniques are used for this, ranging from
cooperating domain to flash cookies to local storage.

Information us passed between sites using JavaScript and backend cooperation
on ID matching.

------
Sephr
I agree that supercookies and header enrichment should be prevented whenever
and however possible (e.g. header enrichment will be solved by requiring
encryption à la Let's Encrypt), but fingerprinting is a lost battle that we
should all give up on.

We will never be able to solve fingerprinting without upheaving the entire web
platform as we know it. So many web APIs are simply not possible without
exposing some UA capability and configuration variance. For example, it is
impossible to support WebGL without exposing additional UA variance for
fingerprinting.

Computers will always have varying capabilities and configurations, and
developers will always need to consider some of them. A world without
fingerprinting is a world without the modern web.

It only takes ~30 bits of entropy to uniquely fingerprint _all ~3 billion
internet users_. We already expose this much variance entropy, and it is only
going to increase as the web gets new features.

I implore you all to simply _give up_ on fighting fingerprinting. Try to stop
worrying about it, as there's almost nothing we can do short of the nuclear
option of removing every API that exposes UA variance (which will make the web
less useful).

We have already lost, and every new feature makes the hole a little deeper.
The hole is already too steep to escape, so accept that you will be tracked by
colluding websites whenever you browse the web.

\---

The W3C page comes to roughly the same conclusion, but recommends a very
drastic and dangerous solution: legislation. I fear that legislating this
issue will legitimize only select pre-approved uses of UA variance entropy,
and will hinder developer innovation in the long run. I'd rather be
fingerprinted than be held back by legislation as to what browser data I'm
allowed to read, and in what manner I can act on that data.

Please do not lobby for legislation in order to fix this problem. W3C's
proposed solution will most likely only cause more harm than good. I would be
deeply upset if "intent to fingerprint" became an actual crime.

One example of something useful that such legislation may make illegal is my
navigator.hardwareConcurrency polyfill[1] that runs a timing attack on your
CPU (not unlike "The Spy in the Sandbox" linked to in the W3C page) to figure
out how many cores you have. This information is actually useful for
optimizing heavy multi-threaded webapps, but it is also directly useful for
fingerprinting. Future legislation could make it so that using my polyfill,
even for benign purposes, counts as "intent to fingerprint".

People do not deserve jail time or fines based on if a tech illiterate jury
judges you to harbor "intent to fingerprint". The future will be a very scary
place for developers if you actually have to worry about this.

[1]: [http://wg.oftn.org/projects/core-
estimator/demo/](http://wg.oftn.org/projects/core-estimator/demo/)

~~~
pdkl95
This defeatist attitude is dangerous, especially when the solution is _simple_
: just stop browsers from leaking >= ~30 bits of entropy.

Deprecate HTTP headers that leak entropy (like the user agent). Rewrite fields
like If-Modified-Since so they can only express a value quantized into values
no smaller than days. Remove JS APIs that leak information (like the ability
to read CSS attributes). Impose stricter same-origin policies to eliminate
3rd-party cookies and javascript. Some people will complain that this breaks
some use cases. Just as Dan Geer put it when discussing software liability,
"Yes, please! That was exactly the idea."

Unless a platform puts user safety _first_ \- without exception - then it
inevitably creates moral hazard. If for some reason this does not entirely fix
the problem, then we apply the force of law - just like we do in every other
area of society. If this concerns you, you should encourage self policing and
removal of the business models based on any kind of fingerprinting, so no
legal remedy is necessary.

People may indeed deserve jail time (or other legal remedy) for _stalking_.
Technical literacy does not except you from social responsibility. As for your
concerns about a jury: the problems with our legal system are far broader than
your concerns over "technical literacy. A lot of work is needed in that area,
with great urgency. That aside, a jury is also not expected to be an expert in
advanced kinematics when they hear a case involving cars that crashed into
each other at an intersection. It is the responsibility of the lawyers
involved to explain such technical details to the jury. My grandfather - a
physicist who reconstructed accidents and a frequent expert witness - has
given quite a few remedial lessons in physics from the witness box.

I understand the concern about having to worry about this kind of legal
threat. It _is_ scary, but you will learn to live with it, just like surgeons
learn to live with the possibility of malpractice charges or civil engineers
that could be liable if the building they design falls down. Really, the
concerns of a developer shouldn't be that bad compared to the doctor or civil
engineer who have to worry about people _dying_ if they make some kinds of
mistakes.

What I find a far scarier future would be the future where people are not only
afraid to speak their mind out of fear of being recorded, but where they are
afraid to even seek out knowledge because of the trail it leaves. Our judicial
system certainly has problems, but I'll take it over _de facto_ feudalism,
where the only people that can freely speak their mind are the lords that
control the aggregate databases of everything their peasants do.

By the way - while it certainly isn't perfect, the EFF's Panopticlick tool
reports my browser as only leaking 14.03 bits of entropy. The user agent
accounts for ~9 of those bits, and ~4 more bits from the HTTP accept headers.
Both of those are trivially removable, and the remaining entropy would not be
easily to fingerprint. I'm sure this analysis misses some entropy sources, but
this should be sufficient to show that _it is possible_ to fix this problem.

~~~
Sephr
Panopticlick doesn't use everything available. WebGL alone adds an additional
5.11 bits of entropy[1]. Other things such as your local network address from
WebRTC, core count, etc. all can add a lot more entropy for fingerprinting.

[1]:
[http://arxiv.org/pdf/1503.01408.pdf](http://arxiv.org/pdf/1503.01408.pdf)

------
MichaelCrawford
The easiest way to spot web bugs is to use a very old build of Safari. There
are many other ways but if you have an old Safari select Window -> Activity
then leave it open as you browse a few different domains.

You will see some 43 byte documents with huge long URLs full of query
parameters, also one byte javascript sources. I block the ones I find in my
hosts file:

    
    
       127.0.0.1 www.hosted-pixel.com
    

On some operating systems it may be better to use 0.0.0.0 but I am not
completely clear. Alternatively block them in your firewall.

Among the reasons I dont install any other mobile apps other than those I
absolutely require - not even free ones - is mobile analytics. The developer
SDKs are all free as in beer but I have seen a photo of one of their data
centers. Data centers are expensive; someone must be paying for all that.

You can edit the hosts file on iOS with iFile from the Cydia app store if you
jailbreak, alternatively you can maintain the hosts file on your box then
install it on your device with scp.

Similarly for Android but I dont know what text editors you can use to edit
system files. Some Android devices enable you to install your own firmware
build but I dont have a current list.

I am concerned about the impact analytics may be having on democracy. Its not
really a secret ballot if the candidates all know what pages I read.

~~~
MichaelCrawford
Just last night it occurred to me to write browser add-ons that would scramble
those query parameters, also send bogus user-agent headers but only for the
web bugs.

That is, the add-on would discover one-pixel transparent gifs, take note of
what query parameters they used, then every time it found that same gif in the
future it would issue a GET with randomly selected parameters drawn from the
instances of that same gif it had seen in the past.

For extra crispy electronic warfare, that same add-on could issue GET requests
at randomly selected intervals. If those intervals were reasonably far apart
(say one hour apart for any one gif) then it would not obviously be an attack
on the analytics server.

What it would do is to make it completely useless to correlate your visits to
different domains.

~~~
pdkl95
Why bother randomizing the query parameters, when you could just permanently
cache the image?

~~~
MichaelCrawford
If I permanently cache the image or block the server with my hosts file then
the web analytics services will not track me.

If I randomize the query parameters then they will not track anyone because
they won't provide useful information to those who presently purchase it. Of
course that will require far more people then me to also randomize their query
parameters.

