So the result is, that there are three sites which do not incorporate third party connections whatsoever (DDG, HN, fefe). Without the addons, the other sites form a connected graph. With disconnect, the graph is less strongly connected. With only noScript, it starts to fall apart. With both activated, the primary sites are disconnected. ( But the combination apparently breaks something, since a second Guardian primary node appears.)
A few caveats, first of all this is of course not reproducible, since it depends on my whitelists for noScript and Disconnect. And the test set is of course not representative for anything except itself. And absence of a edge in the graph does not mean absence of a connection. But with this in mind, I found it quite interesting how connected even a small test set is.
http://natmonitor.com/2013/10/24/ghostly-shape-of-coldest-pl... ( from reddit)
http://linuxreviews.org/kde/screenshot_in_kde/ (from DDG search)
> Lightbeam began in July 2011 as Collusion, a personal project by Mozilla software developer Atul Varma. Inspired by the book The Filter Bubble, Atul created an experimental add-on to visualize browsing behavior and data collection on the Web.
> In September 2012, Mozilla joined forces with students at Emily Carr University of Art + Design to develop and implement visualizations for the add-on. With the support of the Ford Foundation and the Natural Sciences and Engineering Research Council (NSERC), Collusion has been re-imagined as Lightbeam and was launched in the fall of 2013.
We reserve our copyright as to commercial applications but please contact us if you are interested in licensing for non-profit or educational uses.
Our source code is available to review for your assurance.
I use it to spoof the referrer as the root of the site when I link in and then the correct referrer when navigating within the site. In some rare cases I force the referrer to be google, that lets you past some paywalls.
However, this doesn't seem like a good way to collect good quality crowd-sourced data. It can be easily poisoned, and there are simpler alternatives, such as crawling and analyzing the links by themselves. (I am assuming that an entity like Mozilla would have sufficient resources for that).
You're right that poisoning is a potential problem if/when the data ends up useful enough to warrant poisoning.