

What to do when your site is "too large" for Google Analytics? - mwsherman
http://webmasters.stackexchange.com/questions/2141/site-too-large-to-officially-use-google-analytics

======
lmkg
Professional Google-certified web analyst here.

According to the terms of use I'm familiar with, linking an AdWords account to
your GA account, and spending money on it (even nominal sums like a buck
fifty) has been sufficient to be exempt from any hit limits on GA. Your
AdWords account does need to be explicitly linked to the GA account, and you
do need to be spending some non-zero amount of money, but they're easy to
connect and that's honestly still quite cheap for an analytics implementation.
Omniture SiteCatalyst will run you in the six figures per year, minimum. It's
possible that Google has changed its policy, as Google has a reputation for
being somewhat fickle as an analytics provider, but I haven't heard anything
via official channels.

That aside, only having your data update once a day should not be a problem.
There is surprisingly little value in data updating multiple times per day.
There's no reason to have data update more frequently than you can take action
upon it, and most sites of most companies move at the speed of a glacier.
There are a few business models where it is beneficial, for example news sites
can benefit from knowing what stories are really popular right this minute.
Besides, if you need data more often than once a day, you shouldn't be on
Google Analytics anyways. Average data delay is a few hours, but it's not
unheard of for data to be stale by a day or two.

------
jacquesm
I got the exact same email today, and had to read their terms of service to
know for sure I wasn't being had.

Apparently you need an adwords account 'in good standing', I'm not sure what
they mean by that because we do have an adwords account, and as far as I know
and can see it is in 'good standing', only we're not advertising any more
using google because we found that it isn't making enough money in return.

See:

<http://news.ycombinator.com/item?id=1593898>

I'd pay for analytics, it's a very useful service and I can't see the adwords
angle, after all with something as vague as 'good standing' as the deciding
factor how about simply charging for analytics?

I always figured that analytics was free because it provides google with
useful data as well.

Maybe it's time to get my old stats suite dusted off and re-installed, I don't
like operating against the TOS even if it is apparently something they'll let
you get away with.

~~~
tibbon
Good standing doesn't seem to be vague to me really. Your account isn't
violating some ToS, delinquent, etc... Most accounts are probably in good
standing.

I wouldn't worry about this too much. It doesn't cut you off, just limits it
to updating once per day (basically just a day behind). Its possible that your
business/company is dependent on information from hourly reporting in
realtime, and if that's the case you should probably augment GA with some
other paid service or roll your own BI internally.

~~~
jacquesm
I think it is a bit weird (and points to a fairly serious lack of 'Chinese
walls') that google would change the behavior of their analytics package based
on your account for a completely unrelated offering.

I see adwords, adsense and analytics (as well as search) as unrelated
offerings by google.

~~~
lmkg
Analytics has always been tied to AdWords. There used to be a page-view-per-
month hard cap that could be removed by linking to an active (spending $)
AdWords account. AdWords integration is also pushed pretty heavily in the GA
training material, and even in the certification exam. In the real world,
there are quite a lot of people whose only experience with GA is optimizing
CPC campaigns. Personally, I've never used it for that purpose, but Google
pushes them as a bundle.

~~~
liedra
That's curious, because I was just at a talk by Alma Whitten who said that
Google didn't use any of the GA data for profiling or ad-related stuff, and
simply provided it as a service. Did I misunderstand her, or, possibly, you?

~~~
124816
Alma Whitten might be uninformed. (Was this talk recorded? Do you have a link
to it?) There are some options in the account UI that, if enabled, allow
Google to do some analysis of your data.

Link directly to their help center:
[http://www.google.com/support/analytics/bin/answer.py?hl=en&...](http://www.google.com/support/analytics/bin/answer.py?hl=en&answer=87515)

~~~
liedra
Ooh, okay, I think I see where the misunderstanding is: I was asking Alma (in
questioning time at a talk at the IFIP/PrimeLife summer school (
<http://www.cs.kau.se/IFIP-summerschool/program.html> ), which was
unfortunately not recorded) about website users' information, not the data of
the website itself. Though it's a pretty fine line there, isn't it!

I told her that I blocked google analytics because I didn't want it to find
out things about me for advertising etc. She responded saying that Google
doesn't do that, but now that I think about it she was probably saying that
Google doesn't profile _me specifically_ , nothing about anonymised data...
yeah, now I'm not so sure. :)

------
drosenthal
I designed and built high-end web analytics software for years.

Sampling your data by visitor/cookie (not by page view) is the number one
thing to do that that no one does. Just sample your data down by a factor of
10 or 100. The costs drop by a factor of 10 or 100, query speeds improve, and
the business value of the data drops by very, very little.

The only unfortunate prerequisite for this approach is that you need a
company/boss that understands that sampled numbers are OK even if they are not
exact. (So, explain, if you must, that your industry insider source tells you
that the notion of 'exact' in web analytics world is very loose indeed.)

~~~
pierrefar
Yep. For queries with large volumes of data, Analytics does do sampling and
gives you error margins (+/- x%) next to each number. I found that having
these margins helps convince higher ups that it's OK, and also tells me when
it's not OK and have to dig a bit more.

~~~
124816
I think drosenthal is talking about sampling the number of hits you send to
analytics. GA's tracking api supports this:

[http://code.google.com/apis/analytics/docs/gaJS/gaJSApiBasic...](http://code.google.com/apis/analytics/docs/gaJS/gaJSApiBasicConfiguration.html#_gat.GA_Tracker_._setSampleRate)

------
papa
I've run into problems with hitting a number of limits on Analytics as well
and still have no good solution.

A couple of solutions I'm aware of:

1) Omniture Site Catalyst: A lot of big sites use this but it will cost you
dearly. My site runs about 85mm pvs/month and they quoted me close to a
million dollars a year at those traffic levels! I didn't even bother
negotiating it down, even the negotiated price would still have been way more
than I'm willing to pay.

2) Urchin: I don't know how similar/dissimilar Urchin is from Analytics, but
the Urchin 6 software runs about $3k. This option could prove labor intensive
since you would effectively be running and logging your own metrics on your
own server. So the actual costs could go up (wrt to time, equipment, etc.)

There are plenty of smaller players in this space, but the problem that always
comes up is, "Will these guys be around in a year?". When looking for an
business solution, it's nice to know that your data won't up and disappear
should the company go bankrupt.

I do wish Google would provide a paid service for those of us that could use
it and would pay for it. I'm already using Google Doubleclick for Publishers
for ad serving (which costs money), you'd think they'd also want to expand
their other complementary offerings (like metrics).

~~~
landyman
Other services I've seen that seem to not fall into that "smaller players"
category: Coremetrics, Webtrends, Unica, OneStat, and Clicky. I've worked with
all of these (and some smaller players too), and I must say that for the
value, you really can't beat Google Analytics. Even Omniture will have some
times when the data is not available right away; especially as you start doing
some more complex things with it.

------
tcarnell
Show off on Hacker News?

On that theme does anybody know what the maximum limit of money is before your
bank account crashes? (or you have to start buying hotels instead of houses)?
:-)

~~~
bl4k
I once tried to deposit enough money in my bank account to cause a buffer
overflow. I could never get the shell code to exec though.

------
nostromo
Buy Urchin and run it yourself -- it's virtually identical to Google Analytics
(or at least used to be) because it's the company that Google acquired to make
GA.

Avoid Omniture SC. It has a terrible interface, is too expensive, harder than
GA to implement, and Omniture doesn't seem to be improving it much, if at all
anymore.

------
loup-vaillant
Do not use it?

More seriously, what Google Analytics does that you can't do yourself when you
have complete access to the logs of your web site?

~~~
jasonkester
The reason that Google is choking on this 30M pageview/month site is because
analytics for a 30M pageview/month site is _hard_.

Things like unique visitor, unique referer, unique whatever tracking just
explode when you have to compare each IP address you get with 20M-entry list.

If you try to do it on your webserver, you'll have a 3 hour window each day
where a significant portion of your CPU and Memory is not available for
serving web traffic. If you try to do it elsewhere, you get to ship 20gb daily
logfiles around.

It's just not any fun. Which is why even _Google_ has difficulty with it.

Running your own analytics is great for relatively small sites. For big ones
it starts to bog down once you hit a certain size.

~~~
124816
Most stats providers compute unique visitors using client-side cookies; not
entirely accurate, but quite cheap. You're right though, it's a lot easier to
use an existing (and free or nearly free) tool than to build your own.

~~~
loup-vaillant
I agree it's a lot easier. But Google Analytics has a drawback I can't accept:
several sites I visited make me process scripts from Analitycs, and that takes
a noticeable amount of time and bandwidth. When someone visits my site, she's
giving me her time. I'd rather not waste it.

That, and the fact that I prefer to process my logs on my computers. As a
matter of principle (decentralization, privacy, freedom, blah blah).

Now, I barely know Analytics. But it looks cool, so I'll check, and see if I
can reap most of their benefits myself. (If I can't, I'll have a tough choice
indeed.)

~~~
124816
Yep, GA used to have the occasional problem with that, though I've been told
some of it was due to a bug in the browser (Firefox's) code that decided what
to put in the "Waiting for xyz..." spot -- if that's what you're referring to.

(Something about that bar not being updated until the first byte of the stream
was read -- so if a static.foo.com was hanging connections, it would get stuck
blaming whatever loaded right before it.)

GA now has asynchronous tracking though, which solves this pretty nicely.
Though I'm not a JS guy, my understanding of it is that instead of sourcing
the script and then issuing the tracking calls, you put some instructions into
a global variable, and then source the script. Nothing blocks on the script's
loading, so even if GA is down the rest of the page is supposed to load
correctly. Once the GA script loads, it looks at the global and runs the
tracking commands.

~~~
loup-vaillant
I do use Firefox, so I probably hit that bug. Also, with NoScript, forbidding
GA causes no problem.

But still, I would have liked the analysis process to be completely invisible
from the client.

------
naturalized
We do not want Google to know the true size of our service, so we have
analytics code in every 5th pageview (when templates are constructed by
scripts, it inserts the analytics code only when the UNIX time is divisible by
5.

~~~
rodh257
though that would skew the referrals and other metrics wouldn't it?

~~~
oasisbob
If your numbers are large enough, no. As long as the selection isn't biased,
it doesn't matter.

Another commenter on this topic has a good post about convincing management of
the validity of sampled metrics, I'd read it then tip your chair back and
stare at the ceiling for a while.

