
We ditched Google Analytics - felipebueno
https://spideroak.com/articles/yeah-we-ditched-google
======
eponeponepon

        It took us only a few weeks to write our home-brew 
        analytics package. Nothing super fancy yet now we have 
        an internal dashboard that shows the entire company much
        of what we used analytics for anyway - and with some 
        nice integration with some of our other systems too.
    

I never quite grasp how the above isn't just a matter of intuition to anyone
working in the tech sector. Google Analytics thrives on developers' laziness
in my opinion.

And to echo other posters: SpiderOak deserve thanks. If I find myself with any
need for a service like theirs, I know I'll be looking at them.

~~~
rplnt
Aren't there self-hosted analytics anyway? Piwik[1] comes to mind first, but
I'm sure there are many.

1\. [https://piwik.org/](https://piwik.org/)

~~~
Albright
Unsurprisingly, Wikipedia has a list:
[https://en.wikipedia.org/wiki/List_of_web_analytics_software](https://en.wikipedia.org/wiki/List_of_web_analytics_software)

~~~
nateguchi
Who makes these lists?!

~~~
blowski
Well you can see the list of users here:

[https://en.wikipedia.org/w/index.php?title=List_of_web_analy...](https://en.wikipedia.org/w/index.php?title=List_of_web_analytics_software&action=history)

------
Veratyr
Not strictly on topic so I apologise if this is unwanted but I thought I'd
share my experience with SpiderOak in case anyone here was thinking of
purchasing one of their plans.

In February SpiderOak dropped its pricing to $12/month for 1TB of data. Having
several hundred gigabytes of photos to backup I took advantage and bought a
year long subscription ($129). I had access to a symmetric gigabit fibre
connection so I connected, set up the SpiderOak client and started uploading.

However I noticed something odd. According to my Mac's activity monitor,
SpiderOak was only uploading in short bursts [0] of ~2MB/s. I did some test
uploads to other services (Google Drive, Amazon) to verify that things were
fine with my connection (they were) and then contacted support (Feb 10).

What followed was nearly __6 months__ of "support", first claiming that it
might be a server side issue and moving me "to a new host" (Feb 17) then when
that didn't resolve my issue, they ignored me for a couple of months then
handed me over to an engineer (Apr 28) who told me:

"we may have your uploads running at the maximum speed we can offer you at the
moment. Additional changes to storage network configuration will not improve
the situation much. There is an overhead limitation when the client encrypts,
deduplicates, and compresses the files you are uploading"

At this point I ran a basic test (cat /dev/urandom | gzip -c | openssl enc
-aes-256-cbc -pass pass:spideroak | pv | shasum -a 256 > /dev/zero) that
showed my laptop was easily capable of hashing and encrypting the data much
faster than SpiderOak was handling it (Apr 30) after which I was simply
ignored for a full month until I opened another ticket asking for a refund
(Jul 9).

I really love the idea of secure, private storage but SpiderOak's client is
barely functional and their customer support is rather bad.

[0]: [http://i.imgur.com/XEvhIop.png](http://i.imgur.com/XEvhIop.png)

~~~
Someone1234
Many of these types of services seem to intentionally cap upload speeds to
reduce their potential storage liability (since they're likely over-selling
storage to be able to offer 1 TB for $12 with the level of redundancy,
staffing costs, etc, needed).

I wonder if that is happening in this specific case? Although if it were the
case the vendor should still be honest about it. Just saying they limit
uploads to 2 Mbps is better than giving the run-around.

~~~
toomuchtodo
> reduce their potential storage liability

Its to reduce their maximum bandwidth capacity required. I don't see it as a
problem, considering their price points. They're selling you storage, not
"slam 1TB of your data into our storage system in a day". If you're looking
for that, ship a hard drive to Iron Mountain.

EDIT: Even AWS limits how fast you can upload to S3, _and built an appliance
for you to rent and ship back and forth if you need to move data faster_. That
station wagon full of tape is still alive and well.

~~~
X-Istence
The appliance is so that you don't need to send terabytes of data over a 10
Gbit/sec connection for example to their datacenter.

The limitation is actually the pipe that connects you to Amazon, not an
inherent limitation within S3 or other services within Amazon on connection
speed. If you have a good enough connection, or peering with Amazon things go
amazingly fast.

When I worked at an ISP, we slammed about 20 Gbit/sec into S3 without issues,
but even then data we were backing up -- about 300 TB of data a day -- at that
rate took 1.4 days to upload to the cloud, so we ended up backing it up in-
house instead. (we needed to store the data for 7 days, after that it went bye
bye).

~~~
toomuchtodo
> When I worked at an ISP, we slammed about 20 Gbit/sec into S3 without
> issues, but even then data we were backing up -- about 300 TB of data a day
> -- at that rate took 1.4 days to upload to the cloud, so we ended up backing
> it up in-house instead. (we needed to store the data for 7 days, after that
> it went bye bye).

Seems like the perfect usecase for S3; inbound transfer is free, and you're
only paying for a rolling 7 day window of storage with lifecycle rules :/

------
buro9
Why not move to push GA data server-side?

Trivial to set-up, immune to adblockers affecting the completeness of data,
prevents the write of tracking cookies, leaves data and utility of the GA
dashboard mostly complete (loss of user client capabilities and some session-
based metrics).

This is the route I'm preferring to take (being applied this Christmas via
[https://pypi.python.org/pypi/pyga](https://pypi.python.org/pypi/pyga) ).

One may argue that Google will still be aware of page views, but the argument
presented in the article is constructed around the use of the tracking cookie
and that would no longer apply.

I'm shifting to server-push to restore completeness, I'm presently estimating
that client-side GA represents barely 25% of my page views (according to a
quick analysis of server logs for a 24hr period). I'm looking to get the
insight of how my site is used rather than capabilities of the client, so this
works for what I want.

~~~
JupiterMoon
People don't care about the cookie or any of the details of the
implementation. They care about being tracked across the whole internet. If
you are still contributing to that then you are disrespecting your customers.
I hope that I am not one of them.

~~~
buro9
Did you read my comment?

You will have no GA cookie from any of my sites, I am not recording client
identifying things or capabilities. It is a server-side push of GA and avoids
all client-side interactions.

It is merely, "A page has been viewed, this one: /foo/bar?bash".

There's nothing in there that is tracking you. I'm not even embracing the
session management aspect.

I get to use the tool that is best-in-class, in a way that lacks capability to
track you.

~~~
huhtenberg
Without any "client identifying things" how would GA be able to chain several
page hits into a session then? That is, do basic visits vs. hits split.

If you are in fact anonymizing everything about a client as you claim you do,
then it won't be able to. Unless, of course, you are feeding GA some opaque
client ID that you then internally map to and from actual clients that hit
your server. However something tells me that you aren't doing that, or you
would've mentioned it already.

(edit) I re-read your comment. You aren't apparently interested in session
counts. But what's good the GA summary then if you can't tell 10 bounced
visitors from one visitor with 10 hits? This makes no sense. If you want to
look at just page hit numbers, there are dramatically simpler ways to do that.

~~~
buro9
In the test I've done, sending no session/user data over, I lose all sense of
a "session".

But I do retain insight into what content has been viewed, how much, what is
rising and falling, etc.

The question really is what info are you really reporting on? AdBlockers make
us blind and tracking is horrible, but I get to have a far more complete view
over the simple stuff Urchin used to be great at.

~~~
huhtenberg
Ah, so you _are_ passing some client IDs over the GA after all. An IP address
perhaps? You know that's a leading question, right?

Incidentally, I ran similar experiment with gaug.es few years ago - pulled on
their tracking API from our server side. While it worked as expected, these
sort of shenanigans are good for only one thing - hiding the fact that you are
using 3rd party analytics from your visitors.

On a more general note - the thing is that you either care about other
people's privacy or you don't. It's not a grayscale, it's binary. And if you
do, there's no place for GA in the picture.

~~~
buro9
No.

I am not passing IP. I am not passing a client-id. I am not passing any kind
of correlation identifier from which a session can be inferred or created. I
am not passing user-agent information. I am not passing a cookie ID.

I am only passing a page view event. "Page /foo/bar?bash has been viewed".

Take a look here:
[https://code.google.com/p/serversidegoogleanalytics/](https://code.google.com/p/serversidegoogleanalytics/)

Tell me where in that example (mine is similar) you see any client identifying
information.

There is none. If GA deduces anything, it will be a property of my origin
server and not a client.

I do not agree that using GA in the way I have described allows Google to
invade privacy at all. Please explain clearly how it does in your opinion.

~~~
ccozan
But isn't the same kind of data you could extract from Apache logs? Since from
what you describe is basically a log of all your requests.

GA has many utilities, mainly is to follow the user and see the funnel they go
and second to monitor the marketing campaigns. If you don't need this, then
Apache log + webalyzer is perfect for everyone.

~~~
buro9
I persist with GA, because every now and then I work with partners who would
like to verify the activity on my websites (and yes my user agreements and
privacy policy allow this) and have a means to compare this with historical
data or data from other sites.

Those partners frustrate me, in that they won't trust me to provide stats
generated from server logs, but they all default trust GA.

This technique allows me to use GA, produce the view of the content they need,
export the PDF, and share that... and they trust it.

GA is the de facto store of trusted data when it comes to web site activity.
For my sites that is tracking content page views.

------
oneJob
How about open-sourcing your product before worrying about improving other
products? SpiderOak has been "investigating a number of licensing options, and
do expect to make the SpiderOak client code open source in the not-distant
future" for a very, very long time now. It's no trivial thing to have a closed
source client for a "zero knowledge" service.

[https://spideroak.com/faq/why-isnt-spideroak-open-source-
yet...](https://spideroak.com/faq/why-isnt-spideroak-open-source-yet-when-
will-it-be)

EDIT: I'd welcome discussion, in addition to your up/down votes

~~~
prajjwal
I came here for this exact thing. They said they were going to go open source
in 2014 IIRC, and failed to deliver. I have stopped using SpiderOak - how am I
supposed to trust them with my most private files when I can't verify that
they're not doing anything shady on my machine?

The opening line of this post is amusing. They ought to give thought to fixing
their core product first.

------
cm2187
The other thing is that google analytics is on many adblockers lists,
precisely for that reason. As adblockers are getting widespread, the analytics
is going blind.

~~~
Albright
I've been running a blocker to block GA and other junk on my PC, but I imagine
I'm in a statistically insignificant minority. And I still can't block them on
my iPhone unless I disable JavaScript entirely (though I'm running iOS 9, I'm
not able to install a blocker for some reason; I guess Apple arbitrarily
doesn't support them on my older iPhone model).

~~~
djrogers
>I guess Apple arbitrarily doesn't support them on my older iPhone

It's not arbitrary - it requires a 64 bit CPU (of which Apple has now shipped
3 generations of).

~~~
Albright
Ah, is that the differentiator? I see. Still strikes me as somewhat arbitrary,
though - is content blocking such a strenuous task that it requires a 64-bit
CPU? Wouldn't using a blocker cause the CPU to do _less_ work in most cases
since it doesn't have to download so many ad media files or execute as much
JavaScript?

Yeah, I guess it's just time to get a friggin' new phone already, but this one
ain't broke yet, ya know?

------
nateberkopec
An open-source, self-hostable solution providing 80% of common Google
Analytics functionality seems doable to me.

Is there anything out there in this realm? If not, why not?

~~~
dataewan
[http://snowplowanalytics.com/](http://snowplowanalytics.com/) is worth
considering if you have larger volumes of traffic

~~~
nkuttler
Why not for low volume sites?

~~~
pg1
Well it takes lots of time to setup + needs a few extra server like event
collector, log cleanup and enrich, data loading and database server.

------
trebor
To any of the SpiderOak team: thank you.

It's more than just the tracking cookie, though. It's also about Google
aggregating all its website data into a unified profile. The data they have on
everyone is frightening—all because of free services like GA.

~~~
tajen
Yes, thank you SpiderOak, even though I don't use you: High profile companies
quitting GA means we get aware of alternative solution. Today, I've learnt
about [http://piwik.org](http://piwik.org) .

------
c0achmcguirk
Spideroak user here. I stopped using Dropbox and started using Spideroak about
a 18 months ago. I really like the product. It's not as good as Dropbox in
some ways (like automatically syncing photos from my phone) but it really is
easy to use. I still have a mobile client on Android and I can keep my files
in sync across multiple computers. I pay for the larger storage size and I'm
not even close to using it all.

It syncs fast too. Just thought I'd share my experience with people.

------
eljimmy
Is it just me or is this a click-bait title with hollow content?

~~~
sparkzilla
It is. It's no big deal to stop using Google Analytics. It is, however, a big
deal not to use Google Search, something I am considering for my company.

------
lukeqsee
> Like lots of other companies with high traffic websites, we are a technology
> company; one with a deep team of software developer expertise. It took us
> only a few weeks to write our home-brew analytics package.

I'm a little curious why they decided to go this route instead of using one of
the open-source solutions. Aren't there good solutions to this problem
already?

~~~
BinaryIdiot
I was curious as well and just assumed the usual NIH (not invented here)
syndrome. Web analytics was so mature before Google bought Urchin and turned
it into Google analytics. Since that time countless open source projects have
sprung up (pwik was the first that came to mind). Google for open source
alternatives brings up thousands of pages of projects.

Writing your own is easy for the basic stuff. When you want to move beyond the
basics, as Spider Oak will find, it becomes much more difficult.

------
rogeryu
I'm doing my part. I'm moving to DuckDuckGo for searching more and more. It's
a process. Google does have better results. For work I still rely on Google,
for private stuff I use [https://duckduckgo.com/](https://duckduckgo.com/)

And for the sake of ducks, I'm eating less meat as well. No more chicken - too
much antibiotics, and as little meat as possible, only when it's worth it, so
great taste and good quality.

~~~
Albright
I'm a big DDG fan too. I don't really notice their results being "worse" than
Google's (but maybe that's just because I haven't used Google for so long).
The Bang feature is also very handy once you get in the habit of remembering
to use it. [https://duckduckgo.com/bang](https://duckduckgo.com/bang)

------
kordless
> Sadly, we didn’t like the answer to that question. “Yes, by using Google
> Analytics, we are furthering the erosion of privacy on the web.”

The only thing "wrong" with using an analytics service to better understand
your customers is that it places all knowledge of visits, including ones that
wished to be private, in a centralized location. This can be useful in
providing correlation data across all visitors in aggregate, such as which
browser you should make sure your site supports most of the time.

In other words, there exists some data in aggregate that is valuable to all of
us, but the cost is a loss of privacy for smaller sets of personal data.

If individuals don't want certain behaviors analyzed by others, then they
shouldn't use centralized services which exist outside their realm of control.
These individuals would be better off using a "website" that is hosted by
themselves, inside their own four walls, running on their own equipment. A
simple way for SpiderOak to address this is to put their website on IPFS or
something similar.

I appreciate the fact that SpiderOak is thinking about these things. It's
important!

------
cpncrunch
>why does Google and their advertisers need to know about it I would ask

Google is pretty clear about this. The only reason they track you is for
advertising, and there isn't any evidence of them using the info for anything
else. In fact there is a lot of evidence pointing the other way, such as their
insistence on encryption data flowing between their datacenters.

This is Google we are talking about, not Kazakhstan, China or Russia.

~~~
Paul-ish
Google could eventually use this information to determine your eligibility for
a home loan. They have already dipped their toes in this area [1]. With all
this data, we have to ensure that it is used fairly (or not at all). There is
enough concern about digital redlining that a 2014 report to the white house
reports on this [2]. As we know machine learning is quite capable inferring
sensitive attributes [3].

This inference doesn't even need to be intentional, machine learning is
capable of accidentally picking up on latent variables. Even if your
neighborhood (the original redlining) isn't a feature in the original
variable, it could be inferred from the other variables.

TL;DR: Your surfing behavior could be used to deny you a home loan one day.

[1] [http://techcrunch.com/2015/11/23/google-launches-mortgage-
sh...](http://techcrunch.com/2015/11/23/google-launches-mortgage-shopping-
tool-in-california-more-states-coming-soon/)

[2]
[https://www.whitehouse.gov/sites/default/files/docs/big_data...](https://www.whitehouse.gov/sites/default/files/docs/big_data_privacy_report_may_1_2014.pdf)

[3]
[http://www.pnas.org/content/110/15/5802.abstract](http://www.pnas.org/content/110/15/5802.abstract)

~~~
cpncrunch
Lots of speculation about what they might do. You could also say that the US
government could use all of your data to spy on people who criticise the
government, so they shouldn't have any of that data either.

------
eridal
Kudos for this!!

it's interesting that still there's meta, probably leftover

    
    
        <meta name="google-site-verification" content="pPH9-SNGQ9Ne6q-h4StA3twBSknzvtP9kfEB88Qwl0w">
    

EDIT: wow, thanks for your answers guys!! so nice to see Cunningham's law in
action ;)

~~~
artursapek
That's for Google's webmaster tools, which is another service Google offers
(although in this case, you get to peer into _their_ data)

------
rbinv
> It took us only a few weeks to write our home-brew analytics package.

Unfortunately, there's no way to replicate what Google Analytics currently
offers (for free!) within a couple of weeks (or even months). Not with big
data sets. Yes, GA does enforce sampling if you don't pay for GA Premium, but
the free edition is still one of hell of a deal (if you don't care about
privacy).

If you only use Google Analytics as a hit counter, sure, you can do that
yourself within a couple of minutes. The advanced features are way more
complicated, though (think segmentation and custom reports).

This also begs the question: why not use Piwik?

~~~
ssharp
I suspect most of the people saying "you don't need Google Analytics! Do it
yourself!" have never used GA for anything that meaningful. As you begin to
really familiarize yourself with your website traffic and understand how to
look at your clickstream data in a more investigative an analytical way,
you'll start to see how nice GA is and how easy it is to answer your
questions.

You also underestimate how ubiquitous GA is because it's free and extremely
popular. I'd consider myself an intermediate to advanced user of GA, but for
people less experienced, I can easily share stuff with them for complicated
tasks or they know how to do a lot of the basics themselves.

In hiring digital marketing people, GA is pretty much on par with Word in
terms of familiarity. It's something a lot of people have a basic competence
with.

~~~
rbinv
I agree completely.

GA has become very, very capable in the last five years or so. Combined with
their current APIs, you can do pretty much anything you want.

------
ksec
To me, it is the cost that matters. Most other Analytics cost $30 - $50 / 1
Million Pageview / Datapoint. To me this expensive. Even when you scale to
100M it will still cost ~$20/Million.

Piwik doesn't scale. At least it doesn't scale unless you spend lots of
resources to tinker with it. Its Cloud Edition is even more expensive then
GoSquared which i consider to be a much better product.

What we basically need is a simple, effective, and cheap enough alternative to
GA. And so far there are simply none.

------
api
Instead of rolling your own look at Piwik. It works very well and is basically
a GA clone. I actually like it _better_ than GA in some ways. It's easy to set
up and you can run it on your own site so you're not contributing to a global
tracking fabric.

------
sghiassy
I don't get it. SpiderOak states that they dropped GA because it furthers "the
erosion of privacy on the web.”, but then they just started tracking in house.

How is tracking in house more private than GA? The user is still being
tracked.

~~~
softyeti
I believe their point was that they want to track their traffic, but when they
use a third party like Google, Google provides tracking services for
SpiderOak, Google also tracks you as well, which SpiderOak has no control
over.

------
kevin_thibedeau
I haven't checked my GA in months since it became clear that Google won't
bother doing anything to fix the referer spam problem that makes the stats
useless if you don't have a high-volume site. It's not like these abusers are
hard to track down but I'll be damned if I'm going to manually add filters to
get rid of them every time they come in from a new domain.

~~~
emilburzo
Admin -> select account -> select property -> View Settings (on the view you
want) there's a checkbox:

Exclude all hits from known bots and spiders

------
patrickfl
For anyone here looking for a really good, free,self-hosted, hackable, open
source alternative to Google Analytics that's been around for a long time,
please consider Piwik.org.

I've been using it for prob 8-10 years and it has never missed a beat. I use
it on all my personal / business sites as well as some client websites that
are super high traffic.

------
binaryapparatus
Analytics, fonts, css. We include it everywhere by default. Then I realized
hey we are all giving away too much. My sites now happily run self-hosted
piwik, for the last six or so months.

I won't be surprised if in the coming years we hear much more about using
google fonts being base to count site access, if there is no analytics in
place.

------
justinkramp
It should also be noted that SpiderOak has opensourced many components of
their product stack, including Crypton, which is the encryption framework
underpinning many of their clients.

The source is at [https://github.com/SpiderOak](https://github.com/SpiderOak)

------
TheAndruu
Make your custom analytics library into a separate product in its own right
and sell it or open-source it!

------
jliptzin
Usually I start with Google Analytics but continue to add to our own in-house
analytics solution targeting the specific metrics we're interested in
tracking. GA often doesn't provide us with the real insights we're looking
for, but it's good for the vanity stats.

------
Something1234
What about awstats or goaccess? Both are great log analyzers, although I like
goacess better.

[http://goaccess.io/](http://goaccess.io/)

[http://www.awstats.org/](http://www.awstats.org/)

~~~
X-Istence
The look and feel of awstats hasn't changed since I last used it back in
2004...

~~~
mattab
Piwik is the new AWStats, check out [http://piwik.org/log-
analytics/](http://piwik.org/log-analytics/)

------
iamleppert
I'd hate to be the intern or the guy in charge of keeping that thing running.

------
qihqi
Random fact: GA cookie is distinct from adwords(google.com) cookie, and it is
illegal for Google to join those (not sure if it is even technically
feasible).

------
rwbt
I've had good experience with Statcounter
([http://www.statcounter.com](http://www.statcounter.com))

------
gramakri
At Cloudron, our vision is to allow companies to host their own apps easily.
We dogfood and don't use Google. We don't use analytics on our website (a
conscious decision). Our emails are based on IMAP servers and we use
thunderbird. We selfhost everything other than email (which is on gandi).

We just entered private beta yesterday -
[https://cloudron.io/blog/2015-12-07-private-
beta.html](https://cloudron.io/blog/2015-12-07-private-beta.html)

~~~
dschep
How's this compare to Sandstorm[0]? It seems like a Closed/SaaS equivalent.

[0] [https://sandstorm.io/](https://sandstorm.io/)

~~~
gramakri
Cloudron and sandstorm are similar projects. I think the main difference is
the user experience (also how we handle domains, how apps are packaged etc).
You can see a demo of the cloudron here - [https://my-
demo.cloudron.me/](https://my-demo.cloudron.me/) (username: cloudron password:
cloudron). All apps use same credentials (because of single sign on)

------
zachh
Is there any analytics solution that allows for on-prem data storage? Only one
I know of is Adobe Analytics (formerly Omniture).

~~~
mattab
Yes, check out Piwik.org / Piwik.pro

------
pmoriarty
Now if they could also just ditch Facebook, Microsoft, and Apple, they'd be
getting somewhere.

------
electriclove
Why not just use Piwik?

------
sparkzilla
Yet they are still listed on Google search
[[http://bit.ly/1ICwud0](http://bit.ly/1ICwud0)]. I guess it's easy to have
principles when it doesn't cost you money.

~~~
AdmiralAsshat
How exactly would they stop themselves from being listed on Google? A right to
be forgotten request? Their decision was to ditch Google Analytics, not to
disappear from the web.

~~~
sparkzilla
Robots.txt. I think it's interesting that you say that removing yourself from
Google is the same as "disappearing from the web" implying that the web is
Google. It's not. Perhaps they should use alternative forms of outreach,
something I am considering with my company.

~~~
AdmiralAsshat
Google is not the web, you are correct; but for all intents and purposes, it's
the web's phonebook. You remove yourself from the phonebook, you make it very
difficult for people to find you, or your business.

~~~
sparkzilla
There are other ways to promote a business outside of search engines. The
point is you can't say you have "ditched Google" while still being part of
their systems that collect user data.

~~~
macintux
Should they block Google's public DNS servers from resolving their domain,
too? Good grief.

~~~
macintux
For a less obnoxious answer: if users choose to use Google, it's not
Spideroak's problem. By dropping GA, they're no longer forcing users to be
subject to Google's tracking.

------
gvb
The irony for me is that I am mostly invisible to Google Analytics, and thus
the companies that rely on GA, because I mostly browse with JavaScript
disabled. (When I need JavaScript, I usually crank up an incognito instance of
Chrome and close it immediately when done, so I'm mostly anonymous to GA even
then.)

When they go "old fashioned" and datamine their web server logs, they uncloak
me. :-/

