
The Google Analytics Setup I Use on Every Site I Build - uptown
https://philipwalton.com/articles/the-google-analytics-setup-i-use-on-every-site-i-build/
======
jameslk
It's pretty annoying that I have to create spam filters for Google Analytics
to be useful. Every site I've installed GA on has required me to filter out
spam. I don't understand why something isn't done about it at an engineering
level. If site owners can set up filters against spammers, is it really that
hard for Google to do it? Especially since they can see it across their
accounts. Seems like it's the same type of issue that plagues email, yet
Google seems to have that under control.

~~~
throwawaydbfif
You can get around this with some fairly simple hacks. Write some JavaScript
that evals a part of your page or something crazy like loading part of itself
from Rot13 text file. Have this js generate an ID you can identify as 'real'
or 'fake'. Filter your analytics by this ID. If you want to be extra funny
make real and fake IDs look indistinguishable to human eyes.

99.9% of spammers are too lazy to spend any time figuring this out for a
single site, and their tools won't even tell them spam isn't working. I've
gotten away with adding a simple static ID to everything and except for really
large juicy targets spammers don't even watse time on this.

All of my sites get zero spam with this filter

~~~
joshdance
Can you elaborate on that? How do you make the real and fake ids?

~~~
throwawaydbfif
The spammer bots are so dumb that using anything besides default "pageview"
events seems to work

------
kristianc
Echoing what others are saying, I much prefer Google Tag Manager. Many clients
use a CMS which make injecting dynamic variables into a page a bit of a pain
if it's not done via rules at runtime.

The Next Web has open-sourced its Google Tag Manager setup
([https://github.com/thenextweb/gtm](https://github.com/thenextweb/gtm)),
which has things like Scroll Tracking, Engagement Tracking (riveted.js),
Outbound Link Tracking and lots of other things that are not in the default GA
setup. They have recently added support for AMP.

In my experience it allows clients to get up and running with a useful GA
setup in a couple of hours and means that you as a developer don't get
bothered to make trivial changes.

~~~
aorth

      Scroll Tracking, Engagement Tracking (riveted.js), Outbound Link Tracking and lots of other things that are not in the default GA setup.
    

I understand why a site owner would want those things, but as a user it is
terrifying! This is why I run an ad blocker.

~~~
opdev8
Tracking & analytics are great, but respect for user privacy should be
greater. The current state of analytics is too intrusive to user privacy.
Hence we decided to get rid of all analytics from our website
([https://ipapi.co](https://ipapi.co)). </Shameless plug>

------
caleblloyd
With the surge in Ad Blocking recently, part of me wonders how accurate the
Google Analytics JavaScript tracker is today, and how accurate it will be in 5
years. I wonder if we'll see a trend back to server-side analytics soon.

~~~
coderdude
The percentage of visitors with an ad blocker depends on your site's audience.
Outside of computer geeks and gamers, almost no one uses ad blockers. I
wouldn't buy into the hype that the whole world is installing ad blockers.

~~~
f4rker
This is correct. The vast vast majority of people are not using adblock.

~~~
achairapart
Sometime ago I read an interesting interview[1] with the Economist deputy
editor, Tom Standange, saying things like:

> The other thing about ads is that 41 percent of millennials are using ad
> block. My daughter has ad block and she goes around infecting every machine
> she gets to. She puts it on everything.

> But the other thing is that she lives in incognito mode. She’s a total
> nightmare for advertisers, because she’s not leaving any cookies and she’s
> not seeing any ads.

Digital privacy is an undeniable rising trend. Just stating _the vast majority
of people are not using adblock_ is, at minimum, shortsighted.

[1]: [http://www.niemanlab.org/2015/04/the-economists-tom-
standage...](http://www.niemanlab.org/2015/04/the-economists-tom-standage-on-
digital-strategy-and-the-limits-of-a-model-based-on-advertising/)

~~~
ionised
> But the other thing is that she lives in incognito mode. She’s a total
> nightmare for advertisers, because she’s not leaving any cookies and she’s
> not seeing any ads.

Seems like she has the right idea to be honest.

------
Sir_Cmpwn
Please don't contribute to Google's tracking dominance over the web. How
insane is it that one company runs their javascript on 90% of the web?

~~~
chishaku
What are the best alternatives?

~~~
Sir_Cmpwn
piwik or just grep your nginx logs.

~~~
rhizome
Grep? Is that something I need a sysadmin for? /s

------
tombrossman
Remember that it's mandatory to disclose to visitors that your site uses
Google Analytics in their T&C's
[https://www.google.com/analytics/terms/us.html](https://www.google.com/analytics/terms/us.html)
(section 7, 'Privacy'). I don't see a privacy policy on this Google employee's
page but perhaps they have a special exemption?

Anyhow, for many websites you'll get more accurate traffic data with GoAccess
parsing your logs and showing you page views and basic demographic data. Use
it alongside Google Analytics if you must, to see the exact difference between
what Google tells you your page views were versus what your server tells you.

~~~
peterhartree
> for many websites you'll get more accurate traffic data with GoAccess
> parsing your logs and showing you page views and basic demographic data

Yes but remember that bot traffic may be more of an issue when analysing
server side logs (a lot of bots still don't execute JavaScript).

It's hard to know how effective the bot filtering features in GoAccess are
compared with those of Google Analytics.

~~~
pierrefar
> a lot of bots still don't execute JavaScript

I operate a service that measures this (see another comment on this
discussion), and all I'll say is you'll be very surprised how many bots
actually execute JS, especially stealth bots. You have to be careful either
way.

~~~
peterhartree
> all I'll say is you'll be very surprised how many bots actually execute JS

Interesting. Do you have any numbers you can share?

~~~
pierrefar
I don't have access to the raw log files from the customers, so can't give you
a percentage. All I'll say confidently is that my service processes a lot of
bot traffic that needs to be filtered out before reporting.

BTW, are you the same Peter Hartree on this Segment thread?
[https://community.segment.com/t/1889n1/how-common-is-
client-...](https://community.segment.com/t/1889n1/how-common-is-client-side-
tracker-blocking-how-can-i-find-the-percentage-of-visitors-that-block-google-
analytics-on-my-website) It would appear we've crossed paths before on this
topic. Please do email me if you want to talk properly. That Segment thread
has my email.

------
largehotcoffee
Not many people know about this feature of GA, but add the following to
anonymize your users IP addresses before sending the information to Google.

> ga('set', 'anonymizeIp', true);

~~~
pdkl95
> anonymize your users IP addresses before sending the information to Google

That's a nice placebo that does almost nothing. Even if the packet body
doesn't contain the IP address, it's still available in the IP header's Source
Address field.

However, even if we assume Google - in a reversal of their general focus on
gathering as much data as possible - _doesn 't_ recover the address from the
IP header, their own documentation[1] for analytics collection URLs with the
&aip=1 parameter (which should be present when 'anonymizeIp' is true) says:

    
    
        "... the last octet of the user IP address
         is set to zero ..."
    

Zeroing the least interesting 8 bits of the address doesn't make it anonymous.
They still get to record the ASN, and they are recording at _least_ 8 bit of
fingerprintable data from other sources. I should be _trivial_ to recover
mostly-unique users, and calling this "anonymization" is at best naive and for
Google, an obvious lie.

Their documentation even betrays their intentions:

    
    
        "This feature is designed to help site owners comply
         with their own privacy policies or, in some countries,
         recommendations from local data protection authorities,
         which may prevent the storage of full
         IP address information."
    

Actually making the data anonymous isn't the goal. They just want a rubber-
stamp feature that lets them comply with the letter of the law.

[1]
[https://support.google.com/analytics/answer/2763052?hl=en](https://support.google.com/analytics/answer/2763052?hl=en)

------
sync
It looks like navigator.sendBeacon is not very well supported across browsers.
[1]

Is this really a good idea?

1: [https://developer.mozilla.org/en-
US/docs/Web/API/Navigator/s...](https://developer.mozilla.org/en-
US/docs/Web/API/Navigator/sendBeacon)

~~~
philipwalton
analytics.js falls back to the older methods in unsupporting browsers.

------
cyborgx7
Alternative title: The Spyware I Use on Every Site I Build

------
thomasthomas
Tag Manager is definitely preferable in my experience if you want to empower
non technical people such as marketing to make their own changes on the fly
without having to bother developers.

~~~
sjeanpierre
Yup, GTM is great until the folks in marketing add a script to the site
without first testing in preprod that causes the UI for the app to not render.

------
niutech
Don't feed Google with your visitors' data, respect their privacy, use open
source Piwik instead.

------
ns8sl
What's the deal with stats delayed over 24 hours? Man, I hate that.

------
shostack
Beyond this info, I'd add my own suggestions from having spent a good portion
of my career digging around in GA...

\- If you have multiple domains, sub domains, etc. make sure to spend plenty
of time reviewing the cross-domain setup documentation and test it thoroughly.

\- If you have high volume, frequently do deep segmentations, use lots of
custom dimensions, etc., make sure you have a clear understanding of how
sampling in GA works, how to tell if you are being sampled, and find ways to
avoid it by pulling reports in different ways. Otherwise you can end up in a
situation where you are making decisions off of .3% of your traffic and while
Google's sampling algorithm thinks it is fine, comparison against other data
sources often shows it is not.

\- Make sure any reporting you do across things like GA vs. AdWords is done
with a clear understanding of how they each report on paid search. GA reports
on it by default on a last non-direct click basis. AdWords just counts
everything AdWords touches. This means that AdWords can give you a good sense
of where you are gaining traction, whereas GA can help you understand how it
works in conjunction with other touch points, and perhaps how you might change
the way you weight things and measure success.

\- GTM is powerful and free, but with great power comes great responsibility.
Also, it can be a real PITA sometimes.

\- Annotations are a highly underutilized tool in GA and can save you a lot of
headaches. I just wish there was a way to bulk import/export them via
spreadsheet or API.

\- You can't currently create goal funnels from event-based conversions
(please Google add this!), but the workaround for the time being is to push
virtual page views at the same as the event fires, and then create funnels off
of those.

\- User stitching sounds awesome, but is actually much more limited than you'd
think from reading overview. You need a separate view (which means your main
GA view you use can't segment for the stitched sessions for comparison--just
the new view which only contains the stitched users). And there's a 90 day
rolling data retention window, so you need some sort of export process if you
care about that data. Unfortunately, this is pretty important data if you have
lots of cross-device tracking issues.

\- Depending on your volume, you can reach the hit limits of the free tier
pretty quickly if you start tracking a ton of events (since they all count as
hits). Here's a good overview [1] of what these limits are, how they work, and
what they mean for you. When I got the scary notification, Google was
exceptionally unhelpful in working with me to resolve the problem, despite
considerable ad spend. After reducing them to what we thought would be fine,
they were unable to assure me that our data would not be nuked, and basically
couldn't give me any real info beyond "this is the policy." Super frustrating.

\- If you have good logging of events that tracks both server and client-side,
it is healthy to compare for variances monthly or quarterly. You'd expect
client-side tracking to break more often than server-side, but it is important
to see how much that can alter your numbers.

[1] [https://www.e-nor.com/blog/general/hit-count-in-google-
analy...](https://www.e-nor.com/blog/general/hit-count-in-google-analytics)

------
Roger_Jones
Filtering out GA sessions with the language of "C" (versus actual languages
like en-us, fr, etc.) goes a long way in filtering out GA spam.

This language code is 99% of the time associated with bots. I had one site
where 20% of all the sessions in a given month was such fake traffic!

~~~
shostack
Isn't that a relatively easy thing for a spammer to change? Also, I'm seeing
some valid traffic coming in with that language (conversions and everything).

~~~
snowwrestler
Most won't bother

~~~
shostack
Sure, but even still, if I'm seeing valid traffic with this, then on its own
it isn't sufficient to use as a filter.

