
The Dangers of the “Google Analytics-Powered Startup” - redredredred
http://simontorring.com/google-analytics/
======
stdbrouw
The criticism of sampling in this blogpost is way overblown. (1) there's
virtually no difference between the accuracy of a sample size of 10k and one
of 100k, and those are the sample sizes you're usually working with (2) in
particular, when working with custom reports you get from the API, you can
actually specify the accuracy you want (high, normal, low) as part of your API
request.

To boot, 9 times out of 10 what you're interested in as an analyst is not the
absolute numbers anyway, but ratios and trends. So whether or not e.g. every
precious little pageview gets counted is irrelevant, as long as the way it is
counted is stable over time.

This is endemic to so many discussions about big data: "if we don't have every
individual data point ever, all hell will break loose and we lose the ability
to make any sense of the data." Take an intro to basic probability and
statistics, will ya.

There's legitimate problems with using Google Analytics for a startup, but
they're mostly related to the fact that it doesn't provide good tooling around
A/B, customer lifecycle management and custom metrics – they're possible but
you're not making it easy on yourself. These things are the bread and butter
of SaaS and app analytics (as opposed to ecommerce or media) so it makes sense
to invest in something like Mixpanel/Heap/Keen/KISSMetrics. But those are
issues with Google Analytics as a product, not with the quality of its data.

~~~
crdb
Yes and no.

Exporting visit trends over 5 landing pages and a month? Sure.

Exporting page views for 100,000 products each of whom got 5-100 views? Then
that 5% sample is going to exclude most products. The latter approach is
however necessary if you're trying to determine how each product category is
really performing.

Two alternatives I prefer to Google Analytics Premium (once you get to that
size): Webtrekk, a small but competent German company whose product costs
around 1/10th as much per year, has a fraction of the bugs, and does reliable
unsampled daily dumps (moving to hourly, I believe), although the UI is a
little less intuitive; and a self-hosted Piwik instance, so you don't need to
worry about data exports. The truth is modern relational databases are
incredibly powerful and will easily scale even with information like
impressions in onsite search. There are multi-TB instances of Postgres out
there. I really suggest installing either in parallel to GA or on their own
when you set up tracking.

I do agree with you that anybody involved in any kind of job that includes
"analytics" in the title, or indeed most people in management, should take an
intro stats course. I particularly like Introduction to Statistical Learning
because of its brevity, relatively high abstraction level, and lack of maths.

------
jordanthoms
Google Analytics is really designed for, and works well for, _websites_. If
your startup is a website, as opposed to an app or service which happens to be
on the web, then it's a good option.

For our web app, we use Mixpanel, along with tracking events into our own
database. This allows you to track custom events for the things that matter in
your app - think 'someone added something to the cart' or 'someone clicked the
reply button', not 'someone visited this page'.

Yes, google analytics does let you track custom events, but it's extremely
limited compared to mixpanel which lets you attach a properties object with as
many custom properties as you like to each event, and then do retroactive
analysis on them instantly.

------
mbesto
> _It’s a bit like how Gallup can summarise Indonesians’ smartphone habits by
> calling 1,500 of them; it works fine if you’re looking for a general
> pattern, but it might skew the data if you’re looking for data about a tiny
> niche of smartphone users or if Gallup happened to call up relatively too
> many Nokia users that day._

This is an incorrect statement and interpretation of how statistics work. Is
there a chance that the 1,500 Indonesians that they call that day not
representative of the over population? Of course, but the probability of that
is very low. This concept specifically is known as statistical
significance[0]. The conclusion is: sampling error can lead to incorrect
conclusions, but if you can eliminate any biases to your sampling, then it can
indeed be representative. Personally, the more important take away is this:
before you start deriving conclusions from your metrics, it's necessary to
fully grok the concept of statistics.

[0] -
[http://en.wikipedia.org/wiki/Statistical_significance](http://en.wikipedia.org/wiki/Statistical_significance)

------
exelius
Completely agree. Sampling is fine; but you need to understand that the free
version of Google Analytics isn't a replacement for a proper BI tool. GA also
doesn't even do sampling until your traffic is over a certain threshold, so at
low traffic levels it's actually accurate (though those levels are also low
enough that you probably shouldn't try to draw too many insights).

This is not to say that GA is bad; in fact I would call you stupid if your web
startup went out and bought a tool before you launched. GA is free, it works
well, it's super-easy to implement, and people should use it; but once you've
reached the point where you can afford something better (usually in addition
to GA, not in replacement of) you should look into doing so.

~~~
redredredred
I think Google Analytics is much better than nothing, and until you reach a
certain scale (I used the estimate of ~1 million monthly sessions) I think
it's sufficient.

What I am mainly arguing is that relying completely on GA for reports and
performance measurements is dangerous and frustrating.

\- Simon

------
fiatjaf
I have various low traffic websites, and just because I hated having to go
through the burden of creating accounts for all of them in Google Analytics, I
wrote a very simple web analytics engine called Microanalytics[1].

It is a Couchapp (which means it only takes a CouchDB database to work, no
other server or backend) and it allows for emitting of custom events with a
simple `ma(event, [optional_value])`.

Every event is tied to a session, so later you can analyse and filter events
based on session, see exactly who did what on your site, see if the same user
came back at some other day, things like this.

So, in my small websites I can clearly see when a person enters the site and
all that. Also, 1 visitor makes a difference, and when I tested having Google
Analytics alongside Microanalytics what I saw was that Google Analytics
statistics showed a lot more visitors than Microanalytics. I know
Microanalytics can't be wrong, because it literally counts me in real time
when I enter the site, so I don't know what to think. The only thing it
doesn't count is visitors without Javascript, but are there so much of them?
Does Google Analytics count them? I think not.

\---

Also good to say: the way Microanalytics does data visualization is through a
command line tool that prints to STDIN, so you can do all sorts of things with
Unix pipes. For example, for doing an A/B Test experiment once, I just called
`ma('version', versionName)` in each tested page, `ma('conversion',
'converted')` when appropriate, and later ran the following:

    
    
        for name in versionA versionB
            echo $name
            set v (microanalytics identifier inspect sessions --limit 300 | grep $name | wc -l)
            set c (microanalytics identifier inspect sessions --limit 300 | grep $name | grep converted | wc -l)
            echo $v $c (echo "$c / $v" | bc -l)
            echo
        end
    

(This example is in the fish shell, but you can do the same in bash,
obviously.)

[1]:
[https://github.com/fiatjaf/microanalytics](https://github.com/fiatjaf/microanalytics)

~~~
j_s
Awesome work!

You'd have to dig into the details (referer, user agent, etc.) on the Google
Analytics side to see the differences... GA probably tracks every random web
scraping (search engine) hit by monitoring the loading of the JavaScript file.

~~~
fiatjaf
Oh, I forgot they could monitor the loading of the tracking file!

So that's it, I can't do that in CouchDB.

------
harryf
What drives me nuts about Google Analytics is their foot-dragging on mobile.

Despite the mobile version of Google Analytics, trying to do anything
meaningful, like analyse app retention, it's a huge pain.

All the newer players to mobile analytics, like Flury or Localytics, have this
stuff nailed but Google Analytics leaves you with out-dated, web-oriented
reports like the "New vs. Returning" and "Loyalty" and generally prefers to
push you towards looking at sessions instead of actually understanding what
your users are doing.

OK time to stop here before I start to rant...

~~~
redredredred
I haven't explored the current functionality for mobile apps, but we're
currently working multi-funnel analysis and it's a complete nightmare given
that most purchases journeys happen across more than one device. And that's
with GA Premium..

\- Simon

------
raverbashing
So, what happened to collecting traffic information yourself?

You know, from Apache logs and similar tools.

Yes, I know google analytics gives a lot of bells and whistles and tools and
whatnot.

But it's still YOUR website, YOUR data, YOUR traffic. YOU can get (most of)
the info that GA is sampling down.

~~~
taf2
Collecting data from YOUR website - costs time/money. Google Analytics is FREE
and low cost time/money to install.

It's true the data is there and we can get to it... but this takes some
foresight that having installed analytics does not require. e.g. a few months
after the original apache logs have rotated - you realize it would be nice to
know how many mobile safari users are coming to your site vs chrome android
users... because you're trying to determine the impact of releasing that next
feature that requires webrtc. You can add the analysis to future traffic and
give it a few extra weeks but wouldn't it be nice to know right now?

GA is nice because it lets you avoid the investment in time/money and you can
still go back and look at historical data points you might not have imagined
being useful before...

I think in this way sampling is really not a big deal... if it is you're right
- absolute right. collect the data points that require this level of accuracy
yourself... but I don't think if you need that level of accuracy early on
you're focusing on the right things...

~~~
jlarocco
Not sure I agree with that.

There are, or used to be, plenty of tools for parsing the web logs, crunching
the numbers, and spitting out pretty graphs.

Again, maybe not as pretty as GA, and not as many bells and whistles, but good
enough.

~~~
taf2
but that can't be as easy as copy pasting a snippet of javascript code into
your website?

~~~
raverbashing
Oh wow, yeah, reading files in /var/log/httpd/ is soooo hard.

I agree that not all people have the capability to work with this (read,
front-end developers, Wordpress customizers, etc) still, I mean, even CPanel
lets you download it from past months.

~~~
richardbrevig
cPanel has multiple log analyzers built in. So the user doesn't have to go
through that work. If your site is on a shared host and it uses cPanel, just
log into cPanel and you'll find them.

------
nlh
A quick spin through Segment.com's list of integrations show a TON of GA-like
services. Anyone have any suggestions for an alternative to GA that provides
the same type of data without the sampling / bloat / etc of GA?

~~~
gk1
Hey Noah, an unfortunate side effect of GA's power is its steeper learning
curve. A new driver sitting in a Ferrari might consider all the paddle
shifters and gauges to be "bloat," but in the right hands the car is a beast
...

Don't get me wrong, I do wish Google spent more time improving GA, but I think
many of the alternatives sacrifice too much just to be beginner-friendly. What
we need is a Tesla for web analytics.

~~~
nlh
Agreed - I've used GA for many years and it's indeed extremely powerful
(especially for a free tool). And the tradeoff is fair, I think -- Google
gives us a ton of power to analyze site data in exchange for, well, Google
being able to analyze site data :) But it does take a bit of a kitchen-sink
approach to things.

I guess I should rephrase my question to be less broad -- of the companies
that purport to be competitive, do any of them do a particularly good job
(even if just at subsets of GA's features)?

------
kumarm
tl;dr: Google Analytics is a great tool and informs you exactly what it does
very well. You need to read what it does though.

I would say its a click bait.

~~~
grey-area
Having dealt with lots of customers who take the stats generated by GA as
gospel despite multiple problems, I disagree.

Some example problems: referrals unreliable, countries unreliable, sampling
distorting figures, no warnings when data displayed is based on very little
data, sessions often misinterpreted as clicks by users, inexplicable
disparities with other methods of tracking because their methods are pretty
opaque and because of sampling, in-page analytics looking deceptively like
click-tracking when in most cases it uses page load data. Some of that you can
attribute to user error, but it is not good for the market that google
dominates tracking like this, and GA is sometimes misleading.

Just as an example of a problem I ran into recently - the free GA doesn't
offer referral stats for https websites, but this isn't made clear to end
users. As a result they simply trust that referrals have collapsed if a
referring site switches to https.

~~~
gk1
The https referral issue isn't unique to GA, nor is it their fault. The
browser doesn't pass a referrer value for HTTPS->HTTP traffic. Best way around
this is to use HTTPS yourself, or use custom UTM tags on your link (if you
have any say).

~~~
grey-area
GA gives prominence to referrals in a world where more and more sites use
https. People are making real business decisions based on this flawed stat,
and most people have no idea that this is in fact a useless stat without https
if any of your referrers are secure pages. Google says website x was your top
referrer, and they just take that as truth.

I think they should just take it out and recommend using campaign links or
landing pages or at the very least make it very clear that this is only a
partial and distorted view. Same goes for in-page analytics etc. the
presentation of overlays on page items (implying clicks) is misleading.

------
h1fra
The real issue if find about that, is StartUp with this kind of thinking tend
to put number before people or actual results. And most of the time, you
cannot reduce problem or solution with crunching number.

Once the breach is opened, it became a nightmare explaining your coworker that
a small drop in Session does not probably need a complete restructuration of
the website.

Currently working in this kind of startup, doing a good job is actually
impossible as we changed everything every 3 month or so, because number
changed in analytics.

~~~
redredredred
Yeah that's exactly the kind of frustrations that I've experienced countless
times and hope to get better at combatting by exploring how GA really works
and how/where it's limited. Slowly getting there..

\- Simon

------
fndrplayer13
I loved the article. I think it tracks very much in line with the kind of
experiences that I have personally had in working with Google Analytics data
-- both via their reporting API and in the dashboard itself. I think GA is a
great service though, when the caveats are considered and understood. It's
great that you took the opportunity to state and consolidate this information
in one place.

I also just want to shout out that I work for a startup called Narrative
Science that offers a free product called Quill Engage
([https://quillengage.narrativescience.com/](https://quillengage.narrativescience.com/))
that can help identify some of the key insights from your GA data in a free
weekly and monthly automated report.

------
BradRuderman
I agree with most of the points. We use GA for general traffic patterns,
definitely not conversion tracking. It is horrible at true conversion
tracking.

~~~
gk1
What kind of conversions are you trying to track? Although it takes some
configuration, I've had no problems tracking all kinds of conversions for
different sites--ecommerce, SaaS, leadgen, etc.

~~~
jordanthoms
The worst thing about GA for conversion tracking is it's not retroactive, so
you have to go though a slow configure/wait a few days/see if the data looks
right/reconfigure cycle.

------
totoroisalive
Not free, you're giving your users to google massive tracking machine.

