
Migrating from Google Analytics - gorkemcetin
https://thomashunter.name/posts/2018-12-28-migrating-from-google-analytics
======
georgewfraser
If you are thinking of migrating away from GA, I highly recommend you move to
a data warehouse based solution, where you store each _event_ permanently in a
data warehouse like Snowflake or BigQuery. There are two client side pixels
that you can self-host: snowplow.js and segment. It’s hard to find
instructions for self hosting segment but I’ve made an example at
[https://github.com/fivetran/self-hosted-analytics-
js](https://github.com/fivetran/self-hosted-analytics-js)

The advantage of doing it this way is you preserve the event level data
forever and you can write arbitrary SQL against it. You will need a BI tool to
visualize; there are several excellent ones with free or cheap tiers for small
companies.

~~~
buremba
That's the path many data-driven companies take lately. Most companies start
with plug-and-play solutions such as GA or Mixpanel but as they start to dig
into their customer data, they either hire data engineers or use ETL solutions
to collect their raw event data into a data warehouse and BI tools in order to
track their own metrics.

That way, you will have more control and be able to ask any question you want.
We have been working on a tool that basically connects to your Segment or
Snowplow data and run ad-hoc analysis similar to Mixpanel and GA so that you
don't need to adopt generic BI tools and write SQL every time you create a new
report. I was going to create a Show HN post but since your comment is quite
relevant to the topic, I wanted to share the product analytics tools that we
have been working on, [https://rakam.io](https://rakam.io). The announcement
blog post is also here: [https://blog.rakam.io/update-we-built-rakam-ui-from-
scratch-...](https://blog.rakam.io/update-we-built-rakam-ui-from-scratch-and-
made-it-even-better/)

P.S: I also genuinely appreciate the work the people have done at Countly,
it's often not easy to use ETL tools to set up your own data pipeline and
create your own metrics yourself so they're a great alternative if you don't
want to get stuck with GA or third-party SaaS alternatives.

~~~
karmelapple
We’re thinking of storing data into Elastic - any thoughts on that approach,
or recommendations on how / what to do?

~~~
sologoub
This also depends on why you are choosing this technology - is it part of your
existing stack or do you have in-house experts in this tech?

While Lucene synax is very powerful, it is not SQL as pointed out by the OP.
If you have a lot of spare developer time and people skilled in this, it will
likely work well for a while (potentially a really long while).

Going with something like BigQuery or Redshift enables you to utilize other
tech to supplement or accelerate skill sets, such as paying a SaaS for
visualization/analysis tools (Looker, Mode, etc).

In particular, separation of storage and processing helps avoid costs when you
are not using the data. With elastic, I believe you need the cluster whether
you are using it or not. BigQuery will only charge you for compute you use and
pennies for storage. Same thing for Athena/Redshift spectrum. Snowflake is
similar, but I believe for enterprise contracts it a bit more complex with
minimums and such.

Structuring data is another really important consideration - will you need to
normalize and will you need to update labels/dimensions? That’s just the
start.

~~~
karmelapple
> is it part of your existing stack or do you have in-house experts in this
> tech?

Nope and nope. That's definitely the scary / unknown part!

> If you have a lot of spare developer time and people skilled in this, it
> will likely work well for a while (potentially a really long while).

We don't have a lot of spare developer time and people. However, the Elastic
docs make our fairly straightforward use case - dump lots of events in and
filter them by date and about 4 different properties - seem not too daunting.

The way it's sold on the Elastic site these days seems like a very "batteries
included" kind of approach - am I interpreting their marketing a little too
positively? Are we kidding ourselves? Would love to hear more about your
thoughts!

> Structuring data is another really important consideration - will you need
> to normalize and will you need to update labels/dimensions? That’s just the
> start.

We don't expect much.

Thanks for mentioning the other technologies, too! The way we're structuring
it allows us to either use Elastic, SQL, or something else entirely, so
thankfully we're fairly agnostic on the data store. If Elastic turns out to be
a time sink, we'll have no hesitation trying something else out.

------
goodroot
Nice post! I have been thinking much about this for my own written works...

My strange conclusion was to not capture analytics at all. My loose metric is
now: which posts inspire people to email me directly?

I have decided I do not want to know what you read, how long you read it,
where you came from... Only whether or not you felt something. And data will
not tell me this.

~~~
tlavoie
Interesting! I was reading all the comments, wondering if anyone would express
the concept that maybe, just maybe, all of this might not be that useful after
all.

More specifically, how often does all of this really need to be down to the
individual person (or IP address) at all? Even if you know that piece of
information at an ephemeral level, my own suspicion is that aggregate data
should be sufficient for any non-creepy use case.

Perhaps one way to phrase the question, how would having personal-level
details in the analytics change the actions you might perform based on
available data?

------
harianus
I think it's great we are moving away from big data collectors and running our
own servers. What I usually see is people storing their users' data on servers
that are situated in locations where governments can get access to the data on
the servers without the owner knowing about it. It's maybe going far in
protecting the privacy of your users, but it's something a lot of solutions on
the internet don't think about.

That's why I moved Simple Analytics'
([https://simpleanalytics.io](https://simpleanalytics.io)) servers to Iceland
where the law will forbid peeking in data before actually informing the owner
of the server. I encrypted my server so if anything happens I can just turn it
off and it will be nearly impossible to get any user data.

~~~
akudha
_I encrypted my server_

Could you please share which Iceland host you are using? Also, what is the
process you're following to encrypt your server?

~~~
harianus
I'm using 1984 [1]. I'm writing a blog post on this soon [2], but in short I
use Ubuntu Server LVM [3], unlock my system via Dropbear SSH (a very small SSH
client)[4] and did move the entry point of incoming data to another server
which will store the incoming data only when the main server is down. This is
because I want to keep recording incoming data and my encrypted server can't
boot without me entering a password. So in case of a power failure the data of
my customers will still be recorded.

[1] [https://1984.is](https://1984.is)

[2] [https://blog.simpleanalytics.io](https://blog.simpleanalytics.io)

[3]
[https://en.wikipedia.org/wiki/Logical_Volume_Manager_%28Linu...](https://en.wikipedia.org/wiki/Logical_Volume_Manager_%28Linux%29)

[4]
[https://matt.ucc.asn.au/dropbear/dropbear.html](https://matt.ucc.asn.au/dropbear/dropbear.html)

------
leowoo91
Good to see new tooling on analytics. Another alternative is Matomo (Piwik,
formerly) [https://github.com/matomo-org/matomo](https://github.com/matomo-
org/matomo)

~~~
gorkemcetin
Maybe not very new, as Countly is around for 7 years :-)

~~~
ksec
But no one has ever heard of it Recommend it around popular Internet forum. (
Although there are various submission but never gained any traction )

One reason could be Countly doesn't offer transparent pricing. People tends to
switch off when they see call for quote.

~~~
gorkemcetin
More than 15000 mobile apps are already using it, Gartner recommends it - but
you are right - more traction is always better. Re transparent pricing, it is
a complicated process when you sell an enterprise product with many addons,
support options, configurations etc, that is why on-prem customizable
solutions don't usually have price tags.

------
teej
> Which pages were featured and on which social media platforms?

This was the “killer feature” of the Google Analytics replacement I built for
myself a few years back. I would grab referrers for social and scrape them to
give a more useful report of the source of traffic.

Unfortunately this is challenging or impossible for many social platforms due
to HTTPS everywhere, the prevalence of outbound link scrubbing, and app-driven
embedded web views.

I still think it would be a killer feature to know who tweeted you out or
which subreddit you are trending on but it’s just not feasible to do based on
a website pixel alone.

~~~
ndnxhs
How does https stop this? I understand many websites will redirect before
loading a link to clear the referrer data which is a good thing since the
referrer header has been abused far too much.

I imagine originally its intent was for website owners to see what other
websites linked to them but now it gets used to track the users

------
dceddia
I always like to hear about new (to me) analytics options. Currently I use GA
+ Clicky, and I've considered getting rid of GA for a while. One option that
looks attractive is Fathom, which can be used either as a SaaS or self-hosted,
and is more focused on privacy than most.

Clicky: [https://clicky.com](https://clicky.com)

Fathom: [https://usefathom.com/](https://usefathom.com/)

~~~
ksec
I have always wondered, it is either something is doggy about Clicky, or
others are really expensive. Clickly could do a million page view for
$20/month, while others, even the simple ones that aren't as feature rich such
as Fathom, offer anywhere from $29 / $50, to Matomo which is over $250 per
month.

What's the Catch?

~~~
gorkemcetin
Check if your analytics vendor can do drilling and segmentation, and be able
to dig into raw data. If that is possible (+which is a hard thing to
implement), then costs are way higher.

------
pablo-massa
I have the same concerns with Google Analytics and tried to install Matomo [1]
for my personal website [2] few months ago, seems a more robust tool than
Countly [3] to me, maybe I'm wrong.

But I had an error installing Matomo and got not help in the official forum
[4], if someone here can help me I will appreciate it, I'm not a developer
(I'm also considering get rid of analytics tools for good, anyone?). Thanks.

[1] [https://matomo.org/](https://matomo.org/)

[2] [https://count.ly/](https://count.ly/)

[3] [https://pablomassa.com/](https://pablomassa.com/)

[4] [https://forum.matomo.org/t/fatal-error-on-
installation/29949](https://forum.matomo.org/t/fatal-error-on-
installation/29949)

~~~
chinathrow
From [4] I learn:

"I have a Hostgator shared hosting with PHP 5.5."

PHP 5.5 is EOL since mid of 2017. Can't you switch to newer version such as
7.2?

Matomo itself does not even display the required version according to [1].
Their FAQ talks about 5.3 - which is horribly ancient too.

[1]
[https://matomo.org/docs/installation/](https://matomo.org/docs/installation/)

~~~
Findus23
Hi,

I have to agree that one should be using PHP 7.2. It also gives a nice
performance boost to Matomo. The required PHP version for Matomo is shown in
[1] (5.5.9 or greater) Can you please send me a link to the FAQ page
mentioning 5.3 (e.g. to lukas@matomo.org) so they can be updated?

[1]
[https://matomo.org/docs/requirements/](https://matomo.org/docs/requirements/)

~~~
chinathrow
My mistake, I was skimming the docs only but overlooked the specially linked
requirements. I would go further though and remove outdated PHP versions from
that page and only recommend maintained versions.

Additionally, the error described by OP looks like autoloading is broken.

------
blakesterz
Anyone else miss Urchin? Google bought them and used it to build Analytics.
For me at least, Urchin was vastly superior. I feel like Analytics is built
for sites that run Google Ads to sell more ads. Urchin was for people who ran
websites who wanted to learn more about their site and readers.

~~~
toomuchtodo
I loved the hosted version of Urchin that processed web server logs locally
(or would grab them remotely with ssh/sftp/ftp). A shame that product was
deprecated post-acquisition.

------
darekkay
> What posts are popular this week?

> How is the site doing compared to last week?

If this is really all the information you need, try out a server-side solution
like GoAccess [0]. It is a little bit harder to set up than copy&pasting an
HTML snippet, but it is well worth it (privacy, performance, etc).

[0] [https://goaccess.io/](https://goaccess.io/)

~~~
danyork
The challenge with any server-side solution is the huge amount of _caching_
that takes place within the web infrastructure. Many sites sit behind CDNs,
but even without CDNs, many ISPs and enterprises run their own caching proxy
servers on the edge of their networks. Browsers also have their own caching of
recent pages. The result of all the caching is that it is possible that very
few of the visitors actually connect to the origin server.

For that reason, most analytics systems use come kind of client-side tracking
code that runs within the client web browser. That way it works regardless of
whatever caching happens.

~~~
darekkay
Server-side analytics is not a silver bullet, but it's great for the quoted
use case. While it has the mentioned disadvantages, so do client-side
solutions. Many people run adblockers and it's easy - sometimes even a default
- to simply block Google Analytics & co. So yes, if you really need exact
statistics, you should probably run both client-side and server-side
analytics. But for a rough "what's popular?" estimate, there's no need to
impose (one more) JS file on your users. I've switched from GA to GoAccess a
few years ago (running both for a few months), and while the absolute numbers
were off by ~10%, the _ratio_ was almost the same. But then again, I'm just
running a blog and not a business.

------
firatdemirel
That's a great migration for the open-source community. As Thomas mentioned on
his post, Google is a cornerstone of "centralized internet" and we should make
this kind of migrations to stop them. #2pac

------
radium3d
I'm curious how these alternative options (countly, matomo, snowplow, etc)
handle fake referral traffic compared to how Google analytics handles it.

~~~
georgewfraser
Not well! Your best bet is to

    
    
      * Sync your (snowplow, segment, ...) data to a data warehouse
      * Try to clean up your data with SQL
      * Keep running GA so you can compare your numbers to GA

~~~
coderintherye
That's exactly what we do. Snowplow does a good job, but having the comparison
points in GA is very helpful.

------
pawal
Is there a privacy friendly analytics tool that does not set cookies and store
data Forever? I don't really care about perfect user analytics, just good
enough. Maybe by analysing logs. In the 90's there was s lot of good tools
like this, but now everybody has gone cloud. I can't imagine the tools offered
today are compatible with GDPR.

~~~
Findus23
Hi,

You can configure Matomo [1] to both not use any cookies [2] and to
automatically delete just the raw data or all data that is older than x
months. Log Analytics is also possible.

If you want something that is far more minimalistic, but also Open Source and
self-hostable, you can take a look at [3]. (Not sure about how they use
cookies)

(Disclaimer: I am part of the Matomo team)

[1] [https://matomo.org/](https://matomo.org/) [2]
[https://matomo.org/faq/general/faq_157/](https://matomo.org/faq/general/faq_157/)
[3] [https://usefathom.com/](https://usefathom.com/)

------
RA_Fisher
Lately I've been pondering whether client-side analytic scripts generate
truncated data by design? That is, a payload goes to the client but before a
response can be sent to some backend server the client terminates (one example
being a tab close) before the analytic code finishes. If this is the case my
understanding is that it'd result in an unknowable underestimation of http
responses. I'm still exploring the space and would be interested to know if
others had thought about this?

~~~
lmkg
There is a browser API called Beacon whose purpose is to address this specific
edge-case. Google Analytics supports it as an option.

[https://developer.mozilla.org/en-
US/docs/Web/API/Beacon_API](https://developer.mozilla.org/en-
US/docs/Web/API/Beacon_API)

------
jagthebeetle
A small point (although I don't claim it militates against the central impulse
of the article), is that serving analytics.js through gtag.js is obviously
going to inflate the tag size unnecessarily. The latter is a unified interface
for all of Google's various page snippets. Also, though not recommended,
there's no reason you can't self-host GA, although practically speaking
Google-hosted GA is likelier cached more often and on pretty snappy geolocated
servers.

------
_up
Is anyone using Yandex.metrica? It seems to offer a lot of features and is
even completely free. But nobody seems to be talking about it. Where is the
catch?

------
Tsubasachan
Considering GA is blocked by every adblocker I wouldn't rely too much on it
anyway.

------
johnchristopher
Hmm. Most viewed pages, time spent... okay.

But GA provide segments that marketers need. How do you get that kind of data
without google of the fa pixel ?

------
evantahler
Any interest in helping me build a countly alternative?

~~~
chvid
I think there is but what would you make different?

~~~
justinclift
Not have a 1 liner installation script that's a security nightmare? :)

eg, this is the Countly one (specifically run as root):

    
    
      wget -qO- http://c.ly/install | bash
    

If someone manages to break into the c.ly redirection service, or the
website/CDN/etc serving that, new users would likely be in for a bad time. And
the problem could be very subtle, if it's done by someone clueful.

Just for added er.. goodness (/s), the Countly instructions also need people
to:

    
    
      Disable SELinux on Red Hat or CentOS if it's enabled.
      Countly may not work on a server where SELinux is enabled. 
      In order to disable SELinux, run "setenforce 0".
    

_sigh_

------
Envision_Envi
Waiting For new tools..will see how accurate they work compared to google
analytics.

------
envisionintel
A good one..!! Everything has some advantages and some disadvantages.

------
Ylodi
Is there any cookieless analytics solution?

~~~
pkz
Matomo can be set up for cookie less tracking. It will of course impact some
metrics. More here:
[https://matomo.org/faq/general/faq_157/](https://matomo.org/faq/general/faq_157/)

------
Brosper
Only advantage is that it weight less?

I don't feel convinced to switch :(

~~~
jstanley
The advantage is that you're not selling out your user's browsing habits to
the biggest surveillance company in the world.

That your pages are now lighter too is just a bonus.

------
homero
Why countly and not piwik?

------
alexnewman
Yea, it's crippleware. Anything good is payfor

~~~
the_common_man
What is crippleware? GA?

~~~
alexnewman
countly

