Hacker News new | past | comments | ask | show | jobs | submit login
Google Analytics, Casualty of Spam (ditherandbicker.com)
207 points by amitmittal1993 on Jan 11, 2015 | hide | past | favorite | 83 comments



I think there's more to it than referrer spam: unscrupulous SEO/SEM people artificially pumping up their performance to justify their rates.

A friend's analytics showed an amazing number of visits for a tiny site. While traffic was up, it did not lead to new clients. She fired the company she'd engaged for SEO/SEM because they kept raising their rates as traffic milestones were reached (hundreds of dollars a month).

Immediately after terminating that relationship, she noticed a 95% drop in traffic and panicked (see http://i.imgur.com/WwJ0vYo.png ). I was asked to fix it for her. One look in the referrers showed that this 95% all originated from China (ads.acesse.com) and was useless/fake (very few page views, very short durations). While we have no proof to support a lawsuit, the timing was too much of a coincidence to ignore.


The moral of the story is that all targets will be gamed, so only ever target against revenue or something incredibly close to it.


Are the relative rankings in search results (e.g. moving from 25th to 15th or 2nd to 1st) for a set of pre-defined searches considered poor metrics?

Sure it will be hard to determine if the improvements are due to the cause of SEO or some other update, but using revenue as a metric has the same issue.

(Ideally one would use a "difference in differences"[1] approach, but it could be difficult without good comparators).

[1] https://en.wikipedia.org/wiki/Difference_in_differences


Any metric that you choose will have the same problem. The metric may improve over a period because of actions that you take, or they may improve because of something entirely unrelated. This is an important thing to be aware of.

However, there is another problem with metrics. Sometimes, it is too easy to pay too much attention to vanity metrics that won't add anything to your bottom line. Will moving from a 25th place to 1st place for a search help your business? That depends - does that search generate traffic, and does that traffic generate revenue? On the other hand, increasing revenue is never a bad thing.

I'd argue that unless decision makers have too much time on their hands, they're better off focusing on metrics like revenue than metrics like search results. Validity will be a problem across all metrics, but at least there isn't a tendency to optimize useless metrics.


Revenue is the most famous of the easily gamed. Target free cash flow.


Errr - can you explain. I would like to increase my revenue by 20% if you have a simple solution I'm all ears!


Drop your price by 50%. Or start selling dollar bills for 95 cents - bet your revenue numbers will go off the charts.

I think that's what the previous poster is getting at - getting a lot of revenue isn't hard if you're willing to throw core parts of your business under the bus (say, profitability).

More generally though optimizing for any kind of financial target (whether revenue, profit, or otherwise) can still be gamed like any other metric.

Your salesperson promises the moon to a client, closes a bunch of deals, looks great, and your company takes on a boatload of liability in the process - for example.

Or, to take a real life example - drop the price of your software to 99 cents and ride a huge sales wave but find your clientele unwilling to pay anything more than 99 cents from then on (hellllo mobile apps).

Lots of ways raw revenue/profit isn't by itself a great metric.


Ah - I may have been looking at the wrong end of the conversion funnel - I was thinking more on the lines of how do I find new leads at all?

Oddly I was reading how Buffets first insurance company was focused on writing profitable premiums only - to the extent of seeing shrinking business. It was unusual then and now, so I do take your point.


I increase revenue $1 and increase expense $1.50. If you target profit I could increase profit, but you take 18 months to collect. Free cash flow targets revenue, leading to profit that is collected ahead of expenses. Their may be a more technical definition, but thats the gist.


Except my job is to get you traffic. I don't control how good or crappy your product is. So the biz can think of it in terms of rev but not fair to pay the SEO person on it


I have been doing Internet marketing a long time. Optimizing for rankings/position is a waste of everyone's resources for a number of reasons.

1. The keywords you want may be worthless 2. Google/Bing/Yahoo stopped sending the keyword data about a year ago (so you have almost no insight into what keywords did what

What is important is driving meaningful traffic. This usually means increasing visits that increase revenue.

Here is the hard part, besides ecommerce sites, most businesses do NOT track at the level needed to know.

Even if you educate people, there is a tiny set that actually care to track and an even smaller set that actually put in the effort to do so.

A lot of people still care about the vanity side. "I am #1 for <keyword>"

Pay for Perforamce marketing only works if you close the loop. And even then it's a hard to do and may not be best for everyone

To me, it does sound like your friend was a victim of fraud.


Missed your response, yes she was. She was told that she would get better positioning (which she didn't) and receive loads of extra traffic (which she did), but none of that traffic was qualified traffic (that converts).

By the way, she does not sell a product, she sells a service (herself) and that's a little more difficult to track.


So, what levels do you need to track at?

Let's say I am selling a saas service, Facebook for dogs. I advertise on keywords like "share my puppy pics". Each keyword hits a different landing page. I track users with a cookie

Is there any more or less to do?


Do you also track churn and lifetime user value based on the keyword or source that brought the user to your site? Not all page views or users are created equal.


Hence why business owners need to be educated that "all the traffic" is worthless if it's not making you money. Vanity metrics are called thus for a reason.

Disclosure: professional marketer and former FT SEO who is tired of people getting fed snake oil.


Google have responded to this (in the past) by implementing an automated spam/bot/spider filtering service:

https://plus.google.com/+GoogleAnalytics/posts/2tJ79CkfnZk

If you're seeing nefarious traffic/referrers you may want to tick this box which I believe is unticked by default.


I have that setting turned ever since I started tracking my website's traffic. But I am sorry to say, it doesn't work reliably. I still see traffic from sites such as semalt.com and a few others as you can see here.

http://i.imgur.com/XGRvLKa.png


This doesn't work in this case.

I have 100 of clients whose analytics are useless, I can't block them on server side since they never visit the sites.

It's impossible to keep up creating filters to filter them out also on Google Analytics.

This technique is a new, has been going on for a month or so.


Google is definitely aware of this kind of abuse, and it's not too small to notice, too far into the long tail, etc.

The real problem here is that Analytics has that real-time view. If a spammer creates a test account and then tries to spam it, they can get real-time feedback on what works / what doesn't.

The solution is straightforward but not "easy": put the analytics frontend[1] on the same host that serves the rest of the application; use the same session auth and spam filtering that is already there.

[1] By "frontend" I mean the first step of the analytics data-gathering pipeline. And this part could be as simple as a new logging module that consumes log data in realtime, anonymizing it and aggregating it, then uploading it to one or more analytics services of your choice.

This solution sacrifices the ease-of-use that analytics currently enjoys. No more "just drop in this <script> tag."


This system already exists and is Measurement Protcol (https://developers.google.com/analytics/devguides/collection...) allowing you to send activity data via your webserver, so you won't need the javascript tool at all.

The problem is still there however as this is how the spammers send fake data. Essentially, even the server side API is still unauthenticated.


But could you get a new tracking ID and keep it a secret? Since it's now only server-side, and not client-side?


That's interesting, but judging by the description, it's based on User-Agent header matching. Sounds trivial to circumvent...

> The backend will exclude hits matching the User Agents named in the list as though they were subject to a profile filter


thanks for the hint - no more blackhats and vitaly shown in the sources - seems to work


Oh boy... Declaring a world-class analytics tool dead because you haven't figured out how to prevent script hijacking.

Just create a view filter that ignores traffic on any hostname other than yours. That's it.


I'm not entirely sure how someone can prevent the publicly available Google Analytics code from getting hijacked. The author is claiming the sites responsible aren't even going to his site, just using, again, publicly available information (GA JS code and his very public API key).

Really, it sounds like the author is claiming that JavaScript only analytics solutions are the problem, not that GA is inherently bad (clickbait title aside).

Beyond that, as a few people have stated ITT, most of your GA reports are pure fiction already and it's worse the larger your site is. If a significant fraction of your total data is garbage, you aren't going to get much out of it, even if you can clean it up.


You still would like to know who is referring you. You can still filter those out one by one but it becomes a tedious war against spammers (very similar to the one on emails before the spam filters era)


If those sites hijacking your code aren't actually linking to you, then the visits that show as referred by them are presumably visits staying on those spam sites. In which case by filtering out those visits, you'll also filter out the referral sources for those visits, no?

(It's been a while since Analytics was anywhere near my personal work, so could be wrong here.)


They're not real visits. They're directly sending requests via the analytics api. The spammers can very easily spoof the domain so it looks like it was a visit to your site, not their domain.


You know that by forging their own HTTP requests, they can send whatever hostname they would like.


I'm not sure what level of sophistication goes into GA's anomaly detection, but if those spammy domains show up, then I'm guessing it's not that difficult to cause much more damage using similar techniques.

Scenario:

I want to annoy / confuse / distract my competitor by making their analytics data less-effective (potentially totally unusable). I grab their tracking ID and send tons of fake events / requests / page views. Now my competitor can't really figure out what actual traffic they're getting and what's real and what's fake... Plus they spend time trying to figure out what's going on, clean up their data etc.

It can go way beyond referring domains - think custom events, ecommerce tracking, site speed... anything that analytics tracks can be faked.


It seems like they are targeting smaller sites desperate for traffic. They are trying to make the owners monitoring (Google) analytics look at their own site offering "SEO", "marketing", "social optimizations" and similar services that are probably as shady as their way of "contacting" the owners of low-traffic sites.

I have a site with less than 200 "visits" per month. 20% of traffic apparently comes from the site semalt.semalt.com. 10% comes from this site: buttons-for-website.com. Another 6% is from this one: make-money-online.7makemoneyonline.com


I've had the same bogus traffic, and referrers, exactly the same. I wrote about this here:

http://blog.steve.org.uk/paying_attention_to_webserver_logs....

Then put together a simple tool to filter out bogus IPs based on their requests, so that I can firewall them:

https://github.com/skx/webserver-attacks


I don't think they are targeting anyone, I have always seen a few hundred a month on company websites. Some small sites simply aren't found yet by these spiders.

If you are seriously using analytics then you aren't really working in absolute numbers and the only problem is sporadic noise.


The target audience for these sites are obviously people who want more traffic and want to make money with their sites.

Bigger sites couldn't be targeted cost-effectively because they would have to make a lot more noise for themselves to even show up in the analytics reports. Also, the people reading the reports are more likely professional and aware of those techniques.

Their algorithm is probably not so advanced, so they just shoot lots of requests to any site they can find. Luckily for them, most sites are small and unsuccessful.


I can confirm this. Got a site with very low traffic and one with 138K sessions a month. These russian spam referral sites are all over the low traffic site but they barely appear on the high traffic one. Not sure why Google hasn't blocked these yet. The same domains (ilovevitaly.com, darodar.com, pricer.com etc.) have been referral spamming since at least mid December. In fact the range of domains doing this was so limited it was relatively easy to set up exclusion rules in GA. The only annoying thing is that the exclusions aren't retroactive.


Actually, was referring to small, desperate sites rather than the SEO motivation. Being spam, makes sense to target less popular sites. I'm not going to investigate every referrer on a busy site, but if I've only got a dozen referrers then I might check out one site sending me 50 visits.


I have some low activity Google code sites with analytics enabled. 50% of the referrals now come from a Russian SEO site starting a few months ago. They're generating unique bogus subdomains such as forum.topic52148208.darodar DOT com for each tracker ID they target.

Can't Google just blacklist these sites and make them disappear?


I've been noticing this for several years. A significant enough bulk of my logged traffic to my publishing label is this kind of spam that Ukraine shows up as my second largest source of traffic. My 9th most frequent referrer is, indeed, semalt.com, as mentioned in the article.

Generally, unless there's a major traffic spike from one source or another, I largely consider my traffic reports complete fiction because of this level of spam referrers.


For the record, blurring (even when applied appropriately) is still a pretty bad idea for hiding information [1]. I know this has been on HN a few times.

I'm not sure what you can do with the GA key or if it's even private, but just adjusting levels in gimp shows the numbers.

[1] http://dheera.net/projects/blur


I was going to make the same comment and then I realized blurring it is pretty useless: that info is in the JS snippet for GA on that site (and yes the key is the same).


That's what I figured, but since the author still blurred it I figured it was still worth mentioning.


The technique from the dheera.com article won't work well in this case because the filter is applied non-uniformly as a targeted spot. It becomes much more difficult to generate pixelated patterns to compare against. The Gaussian blur does however enable the simpler use of deconvolution to reveal the obscured digits.


You can create global filters in Google Analytics by going to Administration -> Global Filters

create a new custom filter for field Referrer and exclude the spammy site from there (do not forget to escape the dot \.)


Wouldn't the spammers constantly have random different referrers?


Yes. The Ukrainian spam I get is almost entirely unique referrers each time, so individual results rarely show up enough times to even rank. Semalt.com and something called speedfox are the only ones that really show up consistently enough from the same source for host-by-host blocking to do any good. The others just rotate through different hosts on a routine enough basis that it'd be more work than it was worth blocking them one by one.


Data cleanliness is never a solved problem, it's just a fact. Depending on how severe the problem is, a simple way to combat this is by adding a custom key/value pair to all client-side GA requests (custom dimensions are great for this) and then adding a filter to your profile within the Google Analytics admin to exclude all requests without the appropriate key value. Change the value on a recurring basis, how often is your preference. Though always be sure you have at least 2 profiles for any GA property, one filtered (Production) and one unfiltered as the C.Y.A. profile so that should anything go wrong, you can still get to all data.


"A person who went through the trouble of setting up analytics tracking is probably a person with just enough vanity to immediately check up who's referring to their site."

Wait, what? I'm pretty sure if you go to the majority of sites on the Internet you will find some sort of analytics tracking code, whether it's Omniture, GA, or another. They don't do this out of vanity - they do it because they want to know where traffic is coming from so that they can monetize it.

BTW, you have GA implemented on your site as well. Does that make you vain, or simply smarter than the average bear?


Ah, so it's not just me. I guess I'll be putting more efforts into my server-side tracking/logging...


So 300 "spammers" are visiting your site regularly. Why are they doing that? Only that you visit _their_ website which is usually offline or doesn't offer anything?

I don't get the point of this spam.


I don't get the point of this spam either but they don't actually visit your site. They never connect to your server. They only send an event to Google Analytics using your (and probably any other) ID.


I have compiled a list of persistent offenders over the past 12 months and block them using SetEnvIfNoCase. It's a surprisingly short list with semalt being the winner by a long way.


This is not just with Google Analytics. I saw these same referrers in my WordPress.com Stats too.

Looks like spammers are deploying spiders browsing the internet with fake spam referrers.


It is not only Google Analytics. I discovered the same with my Piwik installation.


Any client side tracking solution is vulnerable or exposed is a better word to this. It's difficult for me to say it's a security issue because really by design everything about a javascript based tracker is public and really even server side trackers are not immune if someone decided to inflate your numbers or mess with referring traffic information it's all based on what the client sends you. I think in this case google maybe able to add some sort of machine learning to indicate in the result sets that certain links/visitors appear to be either bots or explicit spammers. Perhaps someone could even create a third party tool to do the analysis against a GA account using https://github.com/twitter/AnomalyDetection


For the server-side analytics, it's simple. Just use the GA Measurement Protocol, or a wrapper like staccato[1]. You can push the cid through ajax and javascript so you can even make proper reports, and just send everything from the JS to a dummy property.

[1] - https://github.com/tpitale/staccato


I don't think the spammers are targeting Google Analytics specificaly as much as they are trying to get links for their domains on to the internet.

Lots of websites posts their visitor logs or stats on a special status-page (or at least used to do). If those links aren't rel=nofollow, then congratulations, your referer-spamming just gained yourself some SEO-bonus.


Reminds me of this video [https://www.youtube.com/watch?v=oVfHeWTKjag] on the bogus Facebook engagement you pay for: https://www.youtube.com/watch?v=oVfHeWTKjag.


Spammers likely use the public API to send the fake traffic (https://developers.google.com/analytics/devguides/collection...). The issue Google has is they need to provide a way to authentiate the data rather than rely on the public tracker id, then at least the data could be relayed server side after the server has already filtererd out spam; it would also be less trivial to generate fake traffic reports too.

One tip is to set the main view to filter only to include your actual domain name. I notice a lot of the fake traffic is for traffic on other domains. I don't think these spammers are crafting fake data specific for your website. Much like comment spam, the same HTTP GET is executed millions of times against a list of defined tracking ids they have obtained.


I would just add that it is done from Russia and often that links redirect to Amazon referral ids,..


Its not clear to me why spammers would do this, can someone please explain how a 3rd party benefits from incrementing hits on an unrelated site?


According to this article when the site owner visits the spam link affiliate cookies are set.

http://www.wiyre.com/google-analytics-darodar-forum-spam-wha...


I think it's aimed at CMS platforms like Wordpress as some users have a list top referrers in their sidebar.


Lots of websites posts their visitor logs or stats on a special status-page (or at least used to do). If those links aren't rel=nofollow, then congratulations, your referer-spamming just gained yourself some SEO-bonus.


they get their links in your google analytics reports... you'll likely check to see what the site is - causing you to visit the spammers site...


The number of people in posession of a Google Analytics dashboard is way too low for spammers to generate significant click counts.


wow, that's quite a long winded way to get a site admin to visit a spam site - and plus it's very likely to be ad-blocked (savvy admins).


Based on article dates it would seem this spam hasn't been around all that long, certainly I hadn't seen it before December. I imagine people would click the link because it's the source of an unexpected spike in perceived traffic. Certainly spammers are testing the waters with this, I suppose we'll soon know how successful it is from the number of copycats and Googles eventual response.


The SEMalt spam has been around since at least June 2014, I got incensed enough to block them from my server and write up how to block them on Apache servers:

http://kohanikin.com/2014/filtering-semalt-referrer-spam.htm...

This post seems to describe a new technique where spammers never even visit your site in the first place, spamming the Google Analytics servers directly.


Referer spam has been around almost as long as web analytics. What's new is that they found a way to infiltrate Google Analytics, which previously avoided them somehow. I'm surprised Google hasn't come up with a countermeasure yet.


I had the Darodar variety of this barely 2 days after setting up my new blog. Turns out when you visit the link in Analytics it redirects to Amazon and sets the affiliate cookie meaning they get money when you buy something.

A similar money making venture was done on Pinterest a couple of years ago with affiliate cookies.


This website actually sums it up pretty well as to WHY these websites are doing this: http://www.wiyre.com/google-analytics-darodar-forum-spam-wha...


Heh... I have an old Analytics site that hasn't had live code available on the web for years, and it got 4 hits last month. Are they randomly generating the ids?


Yes, it seems like they are to me. I created a test analytics account that has a fake URL and is not on the web but it has now accrued over 35 "hits" from forum.topic5768xxxx.darodar.com where 5768xxxx = the GA tracking code ID that is private and not exposed on the internet. Very annoying.


Just this week we started to get stupid priceg.com and blackhatworth.com hits from nowhere. Good to know I'm not the only one with this issue.


Can't Google just check on their end that the key was called from the website of the GA account?


I've been wondering since like forever why hackers hadn't figured this out yet.


maybe this can be fixed by requiring some mouse over action before the hit is registered


those spammers don't need to visit your site to send bogus analytics data. All they need is your unique GA tracking id, and they can fire data straight into google.


this is something google should take care of, not the site owners that use GA.


that might be a problem for mobile devices.


referrer spam is something that has been happening to years, and GA is actually quite good at filtering it comapred to a lot of other stats programs out there. This is really not an issue.


Adding to this - most privacy/adblock plugins also block analytics. So I really have to doubt that Google Analytics is of much value.


Almost no mobile users do.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: