Interesting that they use the list of trackers from TrackerBlock. The license provided is:
We reserve our copyright as to commercial applications but please contact us if you are interested in licensing for non-profit or educational uses.
Our source code is available to review for your assurance.
In their extension, the "trackers.json" file is dated as 8/Feb/2012, so almost 2 years old now.
Looks interesting, just to take it for a quick spin I tested it with a small set of bookmarks.  Then I deactivated noScript and Disconnect and reactivated them individually. (Screenshots at
So the result is, that there are three sites which do not incorporate third party connections whatsoever (DDG, HN, fefe). Without the addons, the other sites form a connected graph. With disconnect, the graph is less strongly connected. With only noScript, it starts to fall apart. With both activated, the primary sites are disconnected. ( But the combination apparently breaks something, since a second Guardian primary node appears.)
A few caveats, first of all this is of course not reproducible, since it depends on my whitelists for noScript and Disconnect. And the test set is of course not representative for anything except itself. And absence of a edge in the graph does not mean absence of a connection. But with this in mind, I found it quite interesting how connected even a small test set is.
This looks like a cool upgrade to Mozilla's Collusion add-on, which is no longer available. https://www.mozilla.org/en-US/collusion/ Edit: It even gave me a pop-up warning me that it's overwriting my Collusion data.
> Lightbeam began in July 2011 as Collusion, a personal project by Mozilla software developer Atul Varma. Inspired by the book The Filter Bubble, Atul created an experimental add-on to visualize browsing behavior and data collection on the Web.
> In September 2012, Mozilla joined forces with students at Emily Carr University of Art + Design to develop and implement visualizations for the add-on. With the support of the Ford Foundation and the Natural Sciences and Engineering Research Council (NSERC), Collusion has been re-imagined as Lightbeam and was launched in the fall of 2013.
FWIW, the RefControl add-on gives you more fine-grained control.
I use it to spoof the referrer as the root of the site when I link in and then the correct referrer when navigating within the site. In some rare cases I force the referrer to be google, that lets you past some paywalls.
Your enemy is 3rd-party cookies, not referer headers. This just makes it harder for sites to do mostly harmless analytics and in some cases it actually prevents security features from working correctly.
Indeed, there are a couple of site that don't work at all or require an extra step; if I remember correctly, most of them mumble something about CSFR and are based on Django. For example I can't log in to Launchpad  or Coursera or I'm required to push "I'm a human" button on the Fedora Accounts System website.
Yeah you do still get tracked but you really screw up any sort of referrer tracking - you'll appear to be coming directly to every page you visit. Also in some cases you may be blocked because you may look like a bad crawler...
As a way to demonstrate to a lay user the insidious relationships on the web, it is pretty cool.
However, this doesn't seem like a good way to collect good quality crowd-sourced data. It can be easily poisoned, and there are simpler alternatives, such as crawling and analyzing the links by themselves. (I am assuming that an entity like Mozilla would have sufficient resources for that).
Crawling is certainly a complementary data collection strategy, but it's harder to avoid IP-based "filter bubble" effects w/out also deploying something akin to a bot. The hope is that by using real people using real browsers we'll collect data that reflects actual-behavior-in-the-wild.
You're right that poisoning is a potential problem if/when the data ends up useful enough to warrant poisoning.