

Google Analytics, Casualty of Spam - amitmittal1993
http://www.ditherandbicker.com/posts/2015-01-10-rip-google-analytics.html

======
notlisted
I think there's more to it than referrer spam: unscrupulous SEO/SEM people
artificially pumping up their performance to justify their rates.

A friend's analytics showed an amazing number of visits for a tiny site. While
traffic was up, it did not lead to new clients. She fired the company she'd
engaged for SEO/SEM because they kept raising their rates as traffic
milestones were reached (hundreds of dollars a month).

Immediately after terminating that relationship, she noticed a 95% drop in
traffic and panicked (see
[http://i.imgur.com/WwJ0vYo.png](http://i.imgur.com/WwJ0vYo.png) ). I was
asked to fix it for her. One look in the referrers showed that this 95% all
originated from China (ads.acesse.com) and was useless/fake (very few page
views, very short durations). While we have no proof to support a lawsuit, the
timing was too much of a coincidence to ignore.

~~~
eterm
The moral of the story is that all targets will be gamed, so only ever target
against revenue or something incredibly close to it.

~~~
Scoundreller
Are the relative rankings in search results (e.g. moving from 25th to 15th or
2nd to 1st) for a set of pre-defined searches considered poor metrics?

Sure it will be hard to determine if the improvements are due to the cause of
SEO or some other update, but using revenue as a metric has the same issue.

(Ideally one would use a "difference in differences"[1] approach, but it could
be difficult without good comparators).

[1]
[https://en.wikipedia.org/wiki/Difference_in_differences](https://en.wikipedia.org/wiki/Difference_in_differences)

~~~
hluska
Any metric that you choose will have the same problem. The metric may improve
over a period because of actions that you take, or they may improve because of
something entirely unrelated. This is an important thing to be aware of.

However, there is another problem with metrics. Sometimes, it is too easy to
pay too much attention to vanity metrics that won't add anything to your
bottom line. Will moving from a 25th place to 1st place for a search help your
business? That depends - does that search generate traffic, and does that
traffic generate revenue? On the other hand, increasing revenue is never a bad
thing.

I'd argue that unless decision makers have too much time on their hands,
they're better off focusing on metrics like revenue than metrics like search
results. Validity will be a problem across all metrics, but at least there
isn't a tendency to optimize useless metrics.

------
charlieirish
Google have responded to this (in the past) by implementing an automated
spam/bot/spider filtering service:

[https://plus.google.com/+GoogleAnalytics/posts/2tJ79CkfnZk](https://plus.google.com/+GoogleAnalytics/posts/2tJ79CkfnZk)

If you're seeing nefarious traffic/referrers you may want to tick this box
which I believe is unticked by default.

~~~
angry-hacker
This doesn't work in this case.

I have 100 of clients whose analytics are useless, I can't block them on
server side since they never visit the sites.

It's impossible to keep up creating filters to filter them out also on Google
Analytics.

This technique is a new, has been going on for a month or so.

~~~
sounds
Google is definitely aware of this kind of abuse, and it's not too small to
notice, too far into the long tail, etc.

The real problem here is that Analytics has that real-time view. If a spammer
creates a test account and then tries to spam it, they can get real-time
feedback on what works / what doesn't.

The solution is straightforward but not "easy": put the analytics frontend[1]
on the same host that serves the rest of the application; use the same session
auth and spam filtering that is already there.

[1] By "frontend" I mean the first step of the analytics data-gathering
pipeline. And this part could be as simple as a new logging module that
consumes log data in realtime, anonymizing it and aggregating it, then
uploading it to one or more analytics services of your choice.

This solution sacrifices the ease-of-use that analytics currently enjoys. No
more "just drop in this <script> tag."

~~~
simbolo
This system already exists and is Measurement Protcol
([https://developers.google.com/analytics/devguides/collection...](https://developers.google.com/analytics/devguides/collection/protocol/v1/))
allowing you to send activity data via your webserver, so you won't need the
javascript tool at all.

The problem is still there however as this is how the spammers send fake data.
Essentially, even the server side API is still unauthenticated.

~~~
ploxiln
But could you get a new tracking ID and keep it a secret? Since it's now only
server-side, and not client-side?

------
gk1
Oh boy... Declaring a world-class analytics tool dead because you haven't
figured out how to prevent script hijacking.

Just create a view filter that ignores traffic on any hostname other than
yours. That's it.

~~~
pmelendez
You still would like to know who is referring you. You can still filter those
out one by one but it becomes a tedious war against spammers (very similar to
the one on emails before the spam filters era)

~~~
corin_
If those sites hijacking your code aren't actually linking to you, then the
visits that show as referred by them are presumably visits staying on those
spam sites. In which case by filtering out those visits, you'll also filter
out the referral sources for those visits, no?

(It's been a while since Analytics was anywhere near my personal work, so
could be wrong here.)

~~~
ascorbic
They're not real visits. They're directly sending requests via the analytics
api. The spammers can very easily spoof the domain so it looks like it was a
visit to your site, not their domain.

------
gingerlime
I'm not sure what level of sophistication goes into GA's anomaly detection,
but if those spammy domains show up, then I'm guessing it's not that difficult
to cause much more damage using similar techniques.

Scenario:

I want to annoy / confuse / distract my competitor by making their analytics
data less-effective (potentially totally unusable). I grab their tracking ID
and send tons of fake events / requests / page views. Now my competitor can't
really figure out what actual traffic they're getting and what's real and
what's fake... Plus they spend time trying to figure out what's going on,
clean up their data etc.

It can go way beyond referring domains - think custom events, ecommerce
tracking, site speed... anything that analytics tracks can be faked.

------
aselzer
It seems like they are targeting smaller sites desperate for traffic. They are
trying to make the owners monitoring (Google) analytics look at their own site
offering "SEO", "marketing", "social optimizations" and similar services that
are probably as shady as their way of "contacting" the owners of low-traffic
sites.

I have a site with less than 200 "visits" per month. 20% of traffic apparently
comes from the site semalt.semalt.com. 10% comes from this site: buttons-for-
website.com. Another 6% is from this one: make-money-
online.7makemoneyonline.com

~~~
Menge
I don't think they are targeting anyone, I have always seen a few hundred a
month on company websites. Some small sites simply aren't found yet by these
spiders.

If you are seriously using analytics then you aren't really working in
absolute numbers and the only problem is sporadic noise.

~~~
aselzer
The target audience for these sites are obviously people who want more traffic
and want to make money with their sites.

Bigger sites couldn't be targeted cost-effectively because they would have to
make a lot more noise for themselves to even show up in the analytics reports.
Also, the people reading the reports are more likely professional and aware of
those techniques.

Their algorithm is probably not so advanced, so they just shoot lots of
requests to any site they can find. Luckily for them, most sites are small and
unsuccessful.

------
jarcane
I've been noticing this for several years. A significant enough bulk of my
logged traffic to my publishing label is this kind of spam that Ukraine shows
up as my second largest source of traffic. My 9th most frequent referrer is,
indeed, semalt.com, as mentioned in the article.

Generally, unless there's a major traffic spike from one source or another, I
largely consider my traffic reports complete fiction because of this level of
spam referrers.

------
andrewstuart2
For the record, blurring (even when applied appropriately) is still a pretty
bad idea for hiding information [1]. I know this has been on HN a few times.

I'm not sure what you can do with the GA key or if it's even private, but just
adjusting levels in gimp shows the numbers.

[1] [http://dheera.net/projects/blur](http://dheera.net/projects/blur)

~~~
athenot
I was going to make the same comment and then I realized blurring it is pretty
useless: that info is in the JS snippet for GA on that site (and yes the key
is the same).

~~~
andrewstuart2
That's what I figured, but since the author still blurred it I figured it was
still worth mentioning.

~~~
kevin_thibedeau
The technique from the dheera.com article won't work well in this case because
the filter is applied non-uniformly as a targeted spot. It becomes much more
difficult to generate pixelated patterns to compare against. The Gaussian blur
does however enable the simpler use of deconvolution to reveal the obscured
digits.

------
gii
You can create global filters in Google Analytics by going to Administration
-> Global Filters

create a new custom filter for field Referrer and exclude the spammy site from
there (do not forget to escape the dot \\.)

~~~
Aardwolf
Wouldn't the spammers constantly have random different referrers?

~~~
jarcane
Yes. The Ukrainian spam I get is almost entirely unique referrers each time,
so individual results rarely show up enough times to even rank. Semalt.com and
something called speedfox are the only ones that really show up consistently
enough from the same source for host-by-host blocking to do any good. The
others just rotate through different hosts on a routine enough basis that it'd
be more work than it was worth blocking them one by one.

------
zer0defex
Data cleanliness is never a solved problem, it's just a fact. Depending on how
severe the problem is, a simple way to combat this is by adding a custom
key/value pair to all client-side GA requests (custom dimensions are great for
this) and then adding a filter to your profile within the Google Analytics
admin to exclude all requests without the appropriate key value. Change the
value on a recurring basis, how often is your preference. Though always be
sure you have at least 2 profiles for any GA property, one filtered
(Production) and one unfiltered as the C.Y.A. profile so that should anything
go wrong, you can still get to all data.

------
dohertyjf
"A person who went through the trouble of setting up analytics tracking is
probably a person with just enough vanity to immediately check up who's
referring to their site."

Wait, what? I'm pretty sure if you go to the majority of sites on the Internet
you will find some sort of analytics tracking code, whether it's Omniture, GA,
or another. They don't do this out of vanity - they do it because they want to
know where traffic is coming from so that they can monetize it.

BTW, you have GA implemented on your site as well. Does that make you vain, or
simply smarter than the average bear?

------
artursapek
Ah, so it's not just me. I guess I'll be putting more efforts into my server-
side tracking/logging...

------
countryqt30
So 300 "spammers" are visiting your site regularly. Why are they doing that?
Only that you visit _their_ website which is usually offline or doesn't offer
anything?

I don't get the point of this spam.

~~~
blfr
I don't get the point of this spam either but they don't actually visit your
site. They never connect to your server. They only send an event to Google
Analytics using your (and probably any other) ID.

------
zuck9
This is not just with Google Analytics. I saw these same referrers in my
WordPress.com Stats too.

Looks like spammers are deploying spiders browsing the internet with fake spam
referrers.

------
dazc
I have compiled a list of persistent offenders over the past 12 months and
block them using SetEnvIfNoCase. It's a surprisingly short list with semalt
being the winner by a long way.

------
PinguTS
It is not only Google Analytics. I discovered the same with my Piwik
installation.

~~~
taf2
Any client side tracking solution is vulnerable or exposed is a better word to
this. It's difficult for me to say it's a security issue because really by
design everything about a javascript based tracker is public and really even
server side trackers are not immune if someone decided to inflate your numbers
or mess with referring traffic information it's all based on what the client
sends you. I think in this case google maybe able to add some sort of machine
learning to indicate in the result sets that certain links/visitors appear to
be either bots or explicit spammers. Perhaps someone could even create a third
party tool to do the analysis against a GA account using
[https://github.com/twitter/AnomalyDetection](https://github.com/twitter/AnomalyDetection)

------
steventhedev
For the server-side analytics, it's simple. Just use the GA Measurement
Protocol, or a wrapper like staccato[1]. You can push the cid through ajax and
javascript so you can even make proper reports, and just send everything from
the JS to a dummy property.

[1] -
[https://github.com/tpitale/staccato](https://github.com/tpitale/staccato)

------
josteink
I don't think the spammers are targeting Google Analytics specificaly as much
as they are trying to get links for their domains on to the internet.

Lots of websites posts their visitor logs or stats on a special status-page
(or at least used to do). If those links aren't rel=nofollow, then
congratulations, your referer-spamming just gained yourself some SEO-bonus.

------
kmfrk
Reminds me of this video
[[https://www.youtube.com/watch?v=oVfHeWTKjag](https://www.youtube.com/watch?v=oVfHeWTKjag)]
on the bogus Facebook engagement you pay for:
[https://www.youtube.com/watch?v=oVfHeWTKjag](https://www.youtube.com/watch?v=oVfHeWTKjag).

------
simbolo
Spammers likely use the public API to send the fake traffic
([https://developers.google.com/analytics/devguides/collection...](https://developers.google.com/analytics/devguides/collection/protocol/v1/)).
The issue Google has is they need to provide a way to authentiate the data
rather than rely on the public tracker id, then at least the data could be
relayed server side after the server has already filtererd out spam; it would
also be less trivial to generate fake traffic reports too.

One tip is to set the main view to filter only to include your actual domain
name. I notice a lot of the fake traffic is for traffic on other domains. I
don't think these spammers are crafting fake data specific for your website.
Much like comment spam, the same HTTP GET is executed millions of times
against a list of defined tracking ids they have obtained.

------
grigio
I would just add that it is done from Russia and often that links redirect to
Amazon referral ids,..

------
binarymax
Its not clear to me why spammers would do this, can someone please explain how
a 3rd party benefits from incrementing hits on an unrelated site?

~~~
taf2
they get their links in your google analytics reports... you'll likely check
to see what the site is - causing you to visit the spammers site...

~~~
chii
wow, that's quite a long winded way to get a site admin to visit a spam site -
and plus it's very likely to be ad-blocked (savvy admins).

~~~
wlkr
Based on article dates it would seem this spam hasn't been around all that
long, certainly I hadn't seen it before December. I imagine people would click
the link because it's the source of an unexpected spike in perceived traffic.
Certainly spammers are testing the waters with this, I suppose we'll soon know
how successful it is from the number of copycats and Googles eventual
response.

~~~
SyneRyder
The SEMalt spam has been around since at least June 2014, I got incensed
enough to block them from my server and write up how to block them on Apache
servers:

[http://kohanikin.com/2014/filtering-semalt-referrer-
spam.htm...](http://kohanikin.com/2014/filtering-semalt-referrer-spam.html)

This post seems to describe a new technique where spammers never even visit
your site in the first place, spamming the Google Analytics servers directly.

------
joshschreuder
I had the Darodar variety of this barely 2 days after setting up my new blog.
Turns out when you visit the link in Analytics it redirects to Amazon and sets
the affiliate cookie meaning they get money when you buy something.

A similar money making venture was done on Pinterest a couple of years ago
with affiliate cookies.

------
Magicstatic
This website actually sums it up pretty well as to WHY these websites are
doing this: [http://www.wiyre.com/google-analytics-darodar-forum-spam-
wha...](http://www.wiyre.com/google-analytics-darodar-forum-spam-what-is-it/)

------
jerrac
Heh... I have an old Analytics site that hasn't had live code available on the
web for years, and it got 4 hits last month. Are they randomly generating the
ids?

~~~
rikkipitt
Yes, it seems like they are to me. I created a test analytics account that has
a fake URL and is not on the web but it has now accrued over 35 "hits" from
forum.topic5768xxxx.darodar.com where 5768xxxx = the GA tracking code ID that
is private and not exposed on the internet. Very annoying.

------
bhouston
Just this week we started to get stupid priceg.com and blackhatworth.com hits
from nowhere. Good to know I'm not the only one with this issue.

------
Kiro
Can't Google just check on their end that the key was called from the website
of the GA account?

------
BMorearty
I've been wondering since like forever why hackers hadn't figured this out
yet.

------
elberto34
maybe this can be fixed by requiring some mouse over action before the hit is
registered

~~~
gingerlime
those spammers don't need to visit your site to send bogus analytics data. All
they need is your unique GA tracking id, and they can fire data straight into
google.

------
arb99
referrer spam is something that has been happening to years, and GA is
actually quite good at filtering it comapred to a lot of other stats programs
out there. This is really not an issue.

------
cornewut
Adding to this - most privacy/adblock plugins also block analytics. So I
really have to doubt that Google Analytics is of much value.

~~~
13
Almost no mobile users do.

