

Introducing Collusion: Discover who’s tracking you online - tbassetto
http://www.mozilla.org/en-US/collusion/

======
JonoXia
Hello! Collusion main developer here. I am excited to be going live with this
project and thrilled to see it being discussed on Hacker News (where I am a
long time lurker, first time poster). I value the input from the community
here.

To answer an issue which Nostromo and others have brought up, I am well aware
that the addon is incomplete until it also includes data on Flash cookies,
tracking pixels, localstorage, iframes, useragent fingerprinting, etc. I plan
to add all of these things; the bug for adding Flash cookies is at
<https://github.com/toolness/collusion/issues/22> and I would greatly
appreciate help with implementation from anyone who's interested (hint, hint!)

I'm also working on making the graph actionable, i.e. you should be able to
click any node and say "Block" (or "whitelist" for sites you are OK with).
Firefox already has the ability to set site-specific 3rd party cookie
policies, but the interface to it can charitably be described as "for experts
only". Collusion could provide a much more usable way to control your
browser's policies.

The graph, for those who asked, is drawn using d3.js and SVG.

The demo does not require flash; it uses SVG. You just have to click "click
here".

~~~
gghh
Hi JonoXia. I was already using collusion since quite a while, I guess I got
the pointer here at HN some time ago: <http://collusion.toolness.org/> So what
is happening? is the "new" collusion somehow more "official", i.e. released by
Mozilla?

EDIT: found the old HN post, <http://news.ycombinator.com/item?id=2741249>

~~~
JonoXia
Hi gghh, Collusion.toolness.org was started as a personal project/experiment
by my good friend Atul Varma. It's recently gathered a lot of interest within
Mozilla; we're talking about maybe eventually making it into a built-in
Firefox feature (no promises). As described on mozilla.org/collusion, we've
got a grant from the Ford Foundation to support development on it. So yeah,
what's happening now is basically that Mozilla is committing some real
development resources to turn it into a proper product.

------
nostromo
All of these services / plugins / demos are incomplete without analyzing Flash
cookies. I worry the result is people thinking they're more anonymous than
they really are. (You can view your Flash cookies here:
[http://www.macromedia.com/support/documentation/en/flashplay...](http://www.macromedia.com/support/documentation/en/flashplayer/help/settings_manager07.html))

I'd really like to see the browsers take on this practice. Safari, for example
disables 3rd party cookies by default, but leaves open this huge hole via
Flash.

~~~
blasterford
Even stranger to see no mention that cookies aren't needed at all to track
users successfully.

Check your browser at <http://panopticlick.eff.org/>

Basically, using useragent, plugins, time zone, language, screen size, etc etc
etc, you can fingerprint a user pretty reliably without using cookies.

Cookies are just an easy way to track users client side, but if there is an
'assault' on cookies, then people will just start relying more on server side
tracking of users instead.

Disabling 3rd party cookies etc really achieves nothing.

Also, there's numerous methods you can use to store "cookies" in the browser
these days, (localstorage api, http cookies, flash, cache etc)

If you _really_ don't want to be tracked for some reason, disable javascript,
clear out your user-agent, and use TOR.

~~~
pnathan
I would love to see a plugin that mimics the statistically normal setup. (all
3 of the browsers I currently have open read in as unique)

~~~
lwat
Incognito mode does that fairly successfully. Try the Panopticlick link in
incognito, you'll see what I mean.

~~~
toyg
"only one in 1,024,434 browsers have the same fingerprint as yours."

 _fairly successful_? eh.

~~~
Jimmie
I think that you are that one, as well. Do the test in a normal window and
then in an incognito window. Each "Browser Characteristic" had the same result
in both.

------
magicalist
I do find it somewhat ironic that the demo page includes the old

> _If you're not paying for something, you're not the customer; you're the
> product being sold._

<http://www.mozilla.org/en-US/collusion/demo/>

I also find the name "collusion" unfortunate. Part of my paycheck comes from
advertising, and I include google analytics on my site. However, I also work
hard to have a crystal clear privacy policy and I don't opt in to the shared
analytics logging for my site, so the data only goes to me. But I'm lumped in
with the scummiest of ad networks.

Mozilla, of course, offers a free browser. Their funding ultimately comes from
the "collusion" they're talking about here and the search traffic they
generate feeds it. More directly, you can argue that they sell our search data
to the highest(?) bidder. Why isn't DDG the default search provider? Why
aren't third party cookies disabled by default and the Do Not Track header
enabled by default?

These are actually hard questions, and trite soundbites that ignore actual
economics and the tensions inherent in the internet we have today do us no
favors. Transparency _is_ the answer in many of these problems we've created
for ourselves, I believe, but we need to be able to talk about them with equal
intellectual clarity.

edit: as an example, I really liked EFF's Peter Eckersley's quotes in the ars
technica article on DNT today:

[http://arstechnica.com/tech-policy/news/2012/02/can-do-
not-t...](http://arstechnica.com/tech-policy/news/2012/02/can-do-not-track-
tame-the-webs-cookie-monsters.ars)

~~~
gcp
What is your criticism here exactly, given that you find this "ironic"? You
call for transparency - that's exactly what the published tool provides.

I think the understanding is that advertising is required for many free things
on the net (including Firefox), but that unless advertising behaves in a
reasonable manner, and the user has some control and understanding over it,
it'll be self-defeating as everybody will go start running Adblock, Ghostery,
etc.

Do Not Track works due to exactly the same economics. I think there was a
public statement that if Do Not Track were enabled by default - no-one would
respect it.

There is no conflict of interest here. You either self-police or you're shut
down.

~~~
magicalist
My criticism is exactly what I wrote. If that "you're the product" statement
is always true, then we are necessarily Mozilla's product that they sell to
Google. If it's not always true or is in no way an informative reductive
statement (in that all economic gradients could be "products" in a trivial
sense), then it should be dropped from the page because it serves as nothing
more than a (disingenuous) marketing slogan. The purpose of this tool is
ostensibly activism from transparency, not activism from the rabble-roused.

It's a similar argument for the name "collusion". Good marketing, yes, and
maybe that's important to get attention and the name isn't that negative. At
the same time, it paints everyone with the same brush when, again, the
ostensible goal is not to shut down all ad networks on the internet, or the
sites that are funded by them, but to give transparency into the links between
the ad networks and the sites we visit, in order to hopefully force
responsible behavior and give plain choices to end users.

And that's why I pointed to the EFF's statements in that article. They
acknowledge the tensions inherent in the internet we've built and inherited,
and postulate that it's possible to force everyone to be better actors without
having to burn the house down. Moreover that meaning is exactly what they say,
without resorting too much to rhetoric.

------
nobody_nowhere
In case anyone is curious, a couple of the key use cases for the tracking
pixels are:

1\. Tracking ad frequency and performance (e..g, did you buy something after
you saw an ad; don't show you more than X ads for a given product)

2\. ID synchronization between ad exchanges, ad buyers, and data targeters

3\. Retargeting (e.g., showing you an ad after you've been to a site)

4\. 3rd party data: things like guessing whether you're interested in cars or
ceramic figurines and selling the ability to target you with ads

5\. Site performance data (omniture, google analytics etc)

------
redstripe
I find that unlike disabling cookies altogether, the web is a very usable with
cookies set to expire when you close your browser. So all cookie based
tracking is reset whenever I restart. In firefox you can alter rules on a per
site basis. I only enable permanent cookies on my banking site which asks
annoying "secret questions".

This demo only showed me 3 cookies from IMDB (which I browsed in this session
I guess).

Note: the "restore my last session" link in firefox will work against this
setting and reload your cookies from the last session.

~~~
rabidsnail
Anyone who cares will be using flash cookies, local storage (or the user
storage behavior thingy in IE < 8), etags, etc. Web browsers are full of tiny
crevices for tracking tokens to hide.

~~~
stingraycharles
Premium advertisers generally avoid these kind of practices; basically they
don't do business with agencies that use flash cookies and/or browser hashing.

Typically, an ad network wants to conform to the guidelines set by the IAB
[1], which explicitly recommends against flash cookies, calling them illegal
[2].

So, all in all, if something can hurt a brand's reputation among consumers,
advertisers generally don't spend their money there. Shady practices like
these are among them.

[1] <http://www.iab.net/>

[2] [http://www.iabeurope.eu/news/iab-europe-
condemns-%E2%80%98re...](http://www.iabeurope.eu/news/iab-europe-
condemns-%E2%80%98re-spawning%E2%80%99-as-an-illegal-marketing-practice.aspx)

~~~
rabidsnail
That only applies to organizations that have a presence in the EU (which
probably includes most of the big ad networks, since they have international
sales teams). Smaller players domestic to the US don't have the same
constraints. Also, it's not just ad networks that do browser tracking.

------
nthitz
Introducing? Here's a post about it from >6 months ago
[http://threatpost.com/en_us/blogs/collusion-firefox-add-
pain...](http://threatpost.com/en_us/blogs/collusion-firefox-add-paints-
picture-web-tracking-070811)

~~~
gkoberger
Mozilla works in the open, so our "launches" are rarely new or secret
projects. Everything is on GitHub the day it's written. Most things we launch
have been around for a while; a launch just signifies it's ready for public
consumption.

(Note: I work at Mozilla, however have no clue specifically about Collusion.)

------
jakubw
I couldn't find a link to the source code but it seems to be this project on
GitHub: <https://github.com/toolness/collusion>

~~~
Wilya
It is indeed this one. It is linked to at the end of the demo.

------
jeremyarussell
This is great, I'm a big fan of transparency, I wonder how long it will take
for the site's in question to either a) throw a fit about Collusion, or b)
start making some workarounds to prevent it from working right. Which of
course will be followed by the Collusion people figuring out how to keep up
the transparency. It's times like this I wish I had tons of money, I'd donate
a grip to Collusion and anyone else helping prevent sneaky corporations from
being, well sneaky.

~~~
draggnar
The one I consider the worst is IMR Worldwide. The best I can make of it is
they are affiliated with Nielsen. They don't even have a website. How do I opt
out of their tracking? How do I tell this private corporation to stop
monitoring me? DoubleClick is very clear. They have a website. I can opt-out.
The issue here is transparency.

------
instakill
I just, erm, browsed around in private mode on FF for a bit and when I went
back to normal mode, some of those sites are on the graph.

~~~
JonoXia
Hi, Collusion dev here. I don't think that's supposed to happen, as cookies
should not be written when in private browsing mode. I'm guessing it's a bug
in Collusion's visualization.

If you can reproduce the bug using 'generic' websites (I don't want to know
about your private browsing habits), would you mind filing a bug report in
<https://github.com/toolness/collusion/issues> ? It would be a big help.
Thanks a lot.

~~~
mbrubeck
Issue filed and pull request sent. Thanks for the bug report, instakill! :)

<https://github.com/toolness/collusion/pull/67>

------
glinkov
Chrome is probably not going to match this service, as google would prefer to
track us. Will this drive people back to Mozilla?

~~~
brown9-2
1\. There are plenty of Chrome extensions which block trackers like this, such
as Ghostery

2\. You can disable third-party cookies in Chrome's preferences if you like

------
rohit89
Creepy. I wonder if a possible solution (for the tracking across sites part)
is to make sure that each website can access only the cookies it creates. For
example, chrome creates a seperate process for each tab and the process would
have a lock on its cookies such that other processes can't access it. A VM for
each process in a way.

~~~
gbhn
The cookie protocol already works this way, and has for as long as cookies
have existed. The issue is that a website can load content from a third-party
domain, and the historical cookie protocol implementation has historically
treated those requests as cookie-able. (Hence "tracking pixels," third-party
ad services and stats collection, etc.)

The demo mentions the privacy concerns of third-party cookies, but does not
mention that there are significant positive uses for such cookies. In the
demo, reference.com sets cookies for thesaurus.com and dictionary.com, which
are different domains run by the same operator. Such cookies allow the
operator to provide customized services across the domains. Third-party
cookies allow for richer embedded-content experiences. Video is a great
example of that.

The revenue sites can get from third-party cookied ad networks is
significantly higher than from unpersonalized third-party ads. Large sites
like HuffPo and NYT are able to sell a lot of their premium inventory
directly, so a big reduction there is likely to disproportionately affect
smaller sites.

~~~
Sander_Marechal
I believe rohit89 means something else. Imagine you visit IMDB, which contains
resources from doubleclick.net that set a doubleclick.net cookie. Next, surf
to the WSJ which also has doubelclick.net content. Your browser could simply
not send the doubleclick cookie because it knows that it came _through_ IMDB
and not through the WSJ.

In order to do this, a browser (or extension) would need to track through
which sites a 3rd party cookie was set, and only send the 3rd party if you're
on that same site again. Else it should just pretend there is no 3rd party
cookie.

I'd definitely install an extension that would implement this process.

~~~
rohit89
Yup, that's what I meant. Sandboxing on a per tab basis. To put it another
way, each new tab you open is like a new incognito session.

------
dubcanada
I actually don't mind being tracked... It doesn't really bother me at all. I'd
much rather have ads and stuff geared towards what I may be interested in,
then some ad for girls pantyhose sale.

I know it sounds strange, but it doesn't bother me one bit.

------
newman314
I wonder if RequestPolicy helps block this.

My config is as follows:

1\. AdBlock (+privacylist)

2\. Ghostery

3\. RequestPolicy

4\. HTTPS Everywhere

5\. /etc/hosts with common tracking hosts pointing to 127.0.0.1

6\. Disconnect.me

7\. Disable 3rd party cookies

8\. Uninstall Flash (when I need Flash, I use Chrome)

9\. Configure Chrome Flash to not allow any local storage

10\. about:config set dom.storage.enabled to false.

This is just a start and it would be nice to have some consistent way to
disable localstorage.

~~~
DellOrange12
My OS is Ubuntu and my Firefox config is:

* AdBlock Plus (with 16 filter subscriptions)

* BetterPrivacy

* User Agent Switcher

* RequestPolicy

* NoScript

* PrivacyChoice TrackerBlock

* Ghostery

* QuickJava

After installing Collusion and going to both the BBC webpage and a random
selection of Gawker Media's webpages, the Collusion graph is empty. I can now
confirm my selection of security plug-ins prevents tracking. _Sweet!_ :-D

------
jonpaul
Tip of the hat to Mozilla for innovating and trying to make Firefox the best
browser possible. It seems like Chrome usage keeps going up; I'm really happy
to see Mozilla not sitting around.

------
Sthorpe
Here is my version of a tracker I wrote. <https://github.com/sthorpe/tracked>

It sniffs your local packets and tells you all the sites that connect to your
computer while you browse.

Then I organize them by number of times connected to your computer. It reveals
some really weird sites. Like somehow pandora knows my age and sex....

~~~
computerbob
That is because in the settings of your pandora account it asks for your age
and your sex as a preference. They tell you as a side note that it is for ad
targeting. Pretty honest (IMHO)

~~~
Sthorpe
Ahh, I signed up so long ago I had forgotten.

------
brown9-2
Very interesting idea and well executed, but the layout of the graph nodes
makes it hard to tell who the worst sites are:

[http://f.cl.ly/items/3A2C0x2a1f0F370Y0d3E/Screen%20shot%2020...](http://f.cl.ly/items/3A2C0x2a1f0F370Y0d3E/Screen%20shot%202012-02-28%20at%201.55.05%20PM.png)

It also seems to want to push some graph nodes off the screen.

~~~
JonoXia
In a previous version of the visualization, the nodes would increase in size
proportionally to the number of incoming links. It was dropped during a UI
overhaul as it didn't fit the new visual style. But I'm looking for another
way to make the "worst" sites stand out in a crowded graph. I would welcome
your suggestions.

------
zerop
Great. How does it work...

~~~
xpose2000
My guess is that it looks at HTTP requests on the site and looks to see if
cookies are created. Then matches them to a database of sites/services that
are known trackers.

Essentially, just about any banner advertisement will show up on Collusion.
Also, any type of javascript visitor tracking like GetClicky or Google
Analytics.

~~~
Karunamon
As if there wasn't a compelling case for Adblock and the like already..

~~~
gcp
Adblock doesn't protect you against this. But something like Ghostery sure
does.

------
jdangu
Publishers are lacking control over who tracks what on their site. My full
time project <http://www.clarityad.com> monitors third-party ad tags and
reports pixels dropped.

------
donohoe
If anyone plans on porting this to a Chrome extension please let me know!

------
akashshah
Does anyone know the library that is used to plot the graphs dynamically?

~~~
JonoXia
It uses the Force-Directed Graph library from d3.js.

<http://mbostock.github.com/d3/ex/force.html>

------
newman314
Actually, my bigger fear is for mobile browsers which are much further behind
in terms of such features as compared to desktop browsers.

~~~
Teapot
And it's a growing problem. For now i guess the only protection is clear Cache
and Cookies. And then load it with opt-out cookies.

------
figure8
Does my accepting Google's new privacy policy allow them to link information
possibly gained via Google Analytics (using their ubiquitous cookies like
_utma, _utmb, _utmz) to my other Google product activities?

~~~
tonfa
I don't think so, as far as I understand analytics, there is nothing to link
with other Google products since it only uses first party cookies.

[http://code.google.com/apis/analytics/docs/concepts/gaConcep...](http://code.google.com/apis/analytics/docs/concepts/gaConceptsOverview.html)

On a side note, the doubleclick cookie is explicitly excluded from linking
without explicit consent:

> We will not combine DoubleClick cookie information with personally
> identifiable information unless we have your opt-in consent.

~~~
figure8
Thanks. This was helpful.

In theory, I see your point about Google not having direct access to a site's
first party cookies, but Google is clever. If you monitor the requested
resources when you visit any site with those cookies, you'll notice a request
to <http://www.google-analytics.com/__utm.gif>? followed by parameters sending
the values of those cookies to Google. So, they are tracking said cookies.

~~~
figure8
Furthermore, based on a quick estimate based on my current Firefox cookies,
Google is (through Google Analytics) aware of 80%-90% of my browsing activity.

~~~
tonfa
But the cookie is per analytics domain, right? And not directly linked to
other google cookies.

So I think Analytics "sees" 80% of your browsing but it's multiple fragments,
each going to different analytics domains. It's never seen as a single user.

~~~
figure8
Hmm. Thanks for your help; I'm sorry if I'm not understanding.

Can't Google make certain (admittedly imperfect) inferences based on seeing
the same IP address visit gmail.com, then AllThingsD.com, then CA.gov, etc?
(All of which use Google Analytics) It doesn't take rocket science to place a
high probability that the the IP address that visited gmail is also me across
all those domains.

~~~
tonfa
It's not because something is technically possible that it should (or can) be
done.

Why go through all the pain of setting analytics such that: it uses per
analytics domain first party cookie, serves the javascript and the tracking
gif on google-analytics.com (no google.com cookies are transmitted), but they
would then try to reconstruct user behaviour based on IPs?

That would be quite deceptive (and might not allowed by the FTC and the
privacy policy).

And they can already do global tracking with the doubleclick cookie. But it is
explicitly not allowed to link it with other data without consent.

Edit: by the way, thanks for your questions, I never really digged into this
and also at first assumed they would have access to a lot more information.
But analytics is actually pretty well designed.

------
joejohnson
Does the demo require flash? I can't see anything.

~~~
brown9-2
You have to click "click here"

------
twapi
Use Adblock Plus. Stay Happy. :)

------
Craiggybear
Very interesting, however it makes Firefox 10.0.2 crash if you refresh the
page or click off the collusion tab then back on to it. At least it does in
Linux.

Will try it out on OS X later ...

~~~
gcp
Please file a bug if this is reproducible.

