
The Guardian Is Being Swamped with 'Dark Traffic' - xvirk
http://uk.businessinsider.com/the-guardians-dark-social-traffic-problem-2014-10
======
randunel
I built my personal extension which blocks all the traffic from Dan Pollock's
list [1], then blocks all the traffic from major service providers (google
analytics, etc) and social networks (fb, tw, google, etc) when not on their
website. Referral and user agent headers removed, I haven't found the need to
remove other headers. Currently working on preventing (and manually allowing)
all xhr/script/image requests 2 seconds after the main frame has loaded.

The internet is a lot faster for me, and the battery appears to last longer.
So far, I've only had problems logging into instagram, but once the cookie is
set, I can re-enable blocking ads and tracking.

I guess I am one of those 'dark traffickers' :P

[1] [http://someonewhocares.org/hosts/](http://someonewhocares.org/hosts/)

~~~
cantlin
Looks like that list hasn't been updated since our domain switch. If you
really want to block our internal analytics (which are in practice fairly
harmless) replace "hits.guardian.co.uk" with "hits.theguardian.com".

~~~
jacquesm
Nominated for the most classy comment of the month. That the product manager
of The Guardian would take the time out from his (no doubt) busy day to help a
user to block tracking is an amazing display of trust in that the user knows
best what is good for them. Thank you.

~~~
pmoriarty
It's more like he knows that helping one single user like this isn't going to
make a dent in the tracking they do on virtually all other users.

Now, if they voluntarily stopped tracking all or a significant portion of
their users, I would be shocked.

Of course, that isn't going to happen.

~~~
jacquesm
Of course there is always a way to put a negative slant on just about
anything.

------
anewhnaccount
"""The Atlantic first identified "dark social" traffic back in 2012 to
describe traffic coming messaging apps that had been stripped of referrer data
because messaging and email use the secure "HTTPS" system rather than the open
"HTTP" system used by web pages.

Excuse me?! How does this trash get published?

~~~
aikah
How can one trust anything that website says after this kind of statement?
because clearly,these guys dont know what they are talking about.

pretty sure they dont know what HTTP is and what a header is.

How many factual errors in other articles that dont deal with the web?

------
Cthulhu_
This is another reason why a lot of links to certain pages have the referral
or campaign information hardcoded into their URLs, which removes the need for
a referral header. I guess if the Guardian really wants that information they
can make a deal with app publishers or Reddit or whoever to add a header like
that.

But on the other hand, given how it was the Guardian that published all of the
original Snowden articles and information, they of all newspapers should
applaud this rapid increase in increased privacy from their readers.

I can't actually read the dates on the charts, but I assume the increase
started shortly after the Snowden revelations, when more sites were enabling
https by default and people became more privacy-aware.

So the Guardian's a bit inconsistent here, On the one side they go "Big
Brother is watching you!", on the other (in this article) they go "We're Big
Brother and we can't see you anymore!"

~~~
danmaz74
The funny thing is that there are scores of people here who are actively
trying to get the Guardian out of business, because they regard anything
related to advertising as evil. Will we be more or less free without the
Guardian and similar independent sources of information?

~~~
jacquesm
You could of course simply pay for your online news(paper), the same way you
used to pay for your paper based one.

Online newspaper is a bit strange, something like a 'plastic glass'.

I'm all for it, but most business models revolve around advertising somehow.

~~~
danmaz74
Yes, but advertising plays a big role on paid newspapers too. They would be
just too expensive without ads. Moreover, before the free newspapers on the
internet, the only mainstream free way to get news was from TV, which is much
more superficial and easier to control by governments. I think we are much
more free and more informed thanks to ad-supported online news sources. That's
why I find all this hate for that business model wrong.

Final remark: A website like Hacker News would not be possible if all the
content was behind a paywall. Would that really be a better internet?

~~~
jacquesm
That's the beauty of hackernews, it doesn't need a paywall or advertising. The
presence of the users _is_ the payment.

~~~
danmaz74
But hackernews links to free information sources. No free information sources
=> no hackernews...

~~~
jacquesm
Plenty of the information sources linked to are free of advertising and not
behind paywalls. In fact, those are probably the better information sources.

That's how the web started, remember: no ads. Just free sources of
information.

~~~
danmaz74
Not behind paywalls, I agree. Free of advertising... it would be interesting
to have stats at hand, but I'm not so sure.

Anyway, thank you for reminding me I'm old enough to remember how the web
started :D

------
seren
Interesting, I had never made the relation between referer and advertising,
which seems pretty obvious in retrospect.

This parts makes me a bit uneasy.

> The frustration here is that search, apps and HTTPS traffic all represent
> different types of readers arriving at The Guardian for different reasons —
> and not knowing that data hurts the Guardian's ability to serve those
> readers relevant content.

I am not sure I would be interested in a "tailored" news experienced. (Or
"relevant content" is a weasel word for "relevant ads")

~~~
ninjaplease
You get it anyway. Are you kidding me? Were you born yesterday?

------
spindritf
It's pretty crazy that browsers send referral in the first place. Getting rid
of it, accidentally or not, is not a bug.

I use refcontrol[1] to spoof the referral. I'm always visiting from the front
page of the website even though I'm almost never visiting from the front page.

[1] [https://addons.mozilla.org/en-
US/firefox/addon/refcontrol/](https://addons.mozilla.org/en-
US/firefox/addon/refcontrol/)

~~~
mahouse
Exactly. On Chrome, I use Referer Control [1], which I guess does the same.

[1] [https://chrome.google.com/webstore/detail/referer-
control/hn...](https://chrome.google.com/webstore/detail/referer-
control/hnkcfpcejkafcihlgbojoidoihckciin)

It's also very good to avoid those annoying image hostings that serve you a
.html page with ads when they detect you click a .png file from another web
page or those that just show you a image asking you not to hotlink content.

------
stuartmemo
"executives at the company cannot figure out where it is coming from"

Maybe get the engineers to have a look instead.

------
zecg
That's what you get from forcing your app on facebook.

P.S. remember this: [http://rational.pdimension.net/2011/10/11/do-not-use-the-
gua...](http://rational.pdimension.net/2011/10/11/do-not-use-the-guardian-
facebook-app/)

This is backlash, enjoy it.

~~~
Ntrails
I will never forget it, and I still don't click on links to the guardian.

------
kbart
_" <...>not knowing that data hurts the Guardian's ability to serve those
readers relevant content."_

Actually one of the main reasons why I use various anonymizers is that _I don
't want relevant content_ for the same reason I don't want to see Facebooks's
"top stories" \-- most often it turns out to be totally irrelevant, clickbait
or complete bullshit. Leave me the choice to what's interesting for me and
what I want to see.

~~~
rasz_pl
But how are they (FB, Google, guardian) going to make money if you wont let
them tell you what to think????

------
aw3c2
I saw this illustration yesterday and it perfectly illustrates why I won't
ever be a "normal" visitor to spying websites. Dear journalists, please
consider your integrity to a better society. If you can't publish your
thoughts without selling your readers to tracking and other evil, then I won't
be crying after you. :\

[http://i.imgur.com/AqL7C28.jpg](http://i.imgur.com/AqL7C28.jpg)

------
troels
Funny how the tone suggests that this is something malicious being done to the
guardian.

~~~
throwaway900
It kind of is. The Guardian is funded by advertising and this limits their ad
sales story. As the article points out, the main beneficiary of this is Google
- who are essentially competing with the Guardian for ad sales. I don't always
agree with the Graun, but I believe in plurality rather than a Google-
dominated world.

Throwaway because of inevitable downvotes from the privacy crowd.

~~~
SideburnsOfDoom
> The Guardian is funded by advertising and this limits their ad sales story
> ... inevitable downvotes from the privacy crowd.

The privacy crowd are right; and so are you. This is a huge internal conflict
the web today - how do you make it pay, keep it free and not have it track
users?

~~~
crdoconnor
This presumes that 'keep it free' is actually desirable. Free journalism means
the person reading it is the product. Perhaps the less of that, the better.

~~~
thisGuysAccount
the existing model has been to subsidize newspaper sales with advertising for
a very long time. This same model carried forward through radio and
television.

The alternative is a pay-per-view service, or subscribing to wires. I haven't
done any research into the viability of that type of service, but it is a
paradigm shift.

------
underlines
For me as a Web Analytic Consultant this sounds so wrong:

We track campaigns through campaign-parameters but apps are a blind spot of
course, that's nothing new. Relying on the referrer is stupid, most apps don't
provide a referrer because they are Apps and not websites! Browsers provide a
referrer if you are coming from another site. An App isn't a website, so
there's no referrer.

Of course there's no referrer, it's a new browser window! Campaign-Parameters
here would also not be very helpful. If the Visitor copies the link not from a
guardian.com visit, but after coming to the story through a campaign-URL, he
would copy the URL with the campaign parameters and paste it into the app.
This would be even more wrong, but happens daily!

We Web Analysts should get used to it: People are becoming aware of privacy
more than in the past and we can't always measure everything and everyone. Get
over it!

~~~
mtbcoder
Bot traffic is more likely a bigger culprit for "dark traffic" than people
becoming aware of privacy tools.

[http://www.bbc.com/news/technology-25346235](http://www.bbc.com/news/technology-25346235)

------
ChuckMcM
_" The frustration here is that search, apps and HTTPS traffic all represent
different types of readers arriving at The Guardian for different reasons —
and not knowing that data hurts the Guardian's ability to serve those readers
relevant content."_

I think they meant to say 'relevant advertising' there not 'relevant content'
as the content should, in theory, be the same regardless of how you got there.
The interesting bit is that I've seen advertising contracts where you can't
advertise with unapproved networks on a referred link from a Google SERP. Only
on the second click can you do that pop-under or egregious flying frisbee ad.
So if you are trying to be 'safe' you don't do any of that nonsense if you
can't tell the difference, and I'm guessing that cuts into revenue.

------
gojomo
Much could also be fake traffic, intended to defraud advertisers or others
measuring audiences.

Even assuming The Guardian itself is not a knowing participant in such
schemes, its sites could receive such traffic when fraudsters try to make the
full behavior of their sources look more legitimate.

------
ishener
I think it's clear that almost all of the dark traffic is simply https sites.
It's no necessarily apps, it can just be gmail...

Someone really should make it a default to pass the referrer even for https.

~~~
toothbrush
Haha, wat?

edit (less obtuse): there should be _less_ passing of referral headers, not
more. Browsing is already such a leaky experience privacy-wise, we shouldn't
be clamouring for it to become worse...

------
zecg
Also relevant:
[http://www.geekculture.com/joyoftech/joyimages/2066.jpg](http://www.geekculture.com/joyoftech/joyimages/2066.jpg)

------
ig1
They could just randomly survey a sample of the dark traffic user to find out
where they're coming from.

