
Analytics Without Google - luxurytent
https://www.justbartek.ca/analytics-without-google/
======
epoch_100
Props to the author for ditching Google. I just want to point out that there
are a LOT of non-Google analytics options out there; leaving Google Analytics
really isn’t too hard if you’re willing to take the leap.

Plug: those who want a modern client-side analytics tool that’s free, self
hosted, and open source might consider Shynet [0]. (Disclosure: I maintain
it.) It’s a bit simpler/cleaner than Matomo, but exists in the same category.

[0] [https://github.com/milesmcc/shynet](https://github.com/milesmcc/shynet)

~~~
tjbiddle
Google Ads still dominates though, and if you're doing paid advertising -
you're not just shooting yourself in the foot, but you're lopping off a limb -
or two. I wish it wasn't the case.

Any non-Google analytics options integrate well with Google Ads, especially
for retargeting?

~~~
sdoering
For retargeting you would probably have to have the DoubleClick code on your
site nonetheless. But at least serving European users you would only be able
to activate it after a clear opt in via a consent manager.

Using marketing parameters (like utm-... in GA or Matomo) or the likes in any
other tool still gives you clear marketing performance metrics in your
analytics solution of choice, though.

And a lot of "pure" tracking tools would - at least currently - not fall under
the opt in rules of thew GDPR. But the moment you link them with your
advertising profiles (like in GA with the DoubleClick integration) your whole
tracking setup also becomes opt in. So you would probably loose at least some
share of your traffic in analytics and not be able to clearly do marketing
analysis any more. At least not as exact as before.

I had clients loosing 80% of their traffic stats in analytics after shifting
to opt in.

------
slykar
The thing I'm interested in when using Google Analytics is tracking user path
to see the bounce rate in a checkout process for example. You can calculate
conversion rates for different user segments. People who just want to see "how
many visits I got" don't benefit from GA. Developers often miss the point of
GA, because they do not work in sales.

~~~
marcus_holmes
OK, so let's pull this apart:

> The thing I'm interested in when using Google Analytics is tracking user
> path to see the bounce rate in a checkout process for example

There are lots of ways to get this data direct from the servers, and much,
much more reliably and effectively.

> You can calculate conversion rates for different user segments.

Maybe. You only have data on those people who don't have adblockers that block
GA. Which is a sizable segment of the internet. Also that they're not using a
VPN, etc. Stats collected via GA are inherently less reliable than stats
collected from your own site because GA is easily blocked.

Plus, there have been reports since forever that GA's stats are just not that
reliable in the first place.

I've experienced this myself - I used to admin a WP blog, and the numbers from
the site log and GA were around 20% different.

If you're relying on GA stats to calculate the results of any a/b testing,
then you need to put in at least a +/\- 25% error factor (i.e. if the A test
converts 10% better than the B test, you have no idea whether that's a real
thing, or a product of GA giving you inaccurate data - you'd need at least a
25% swing to begin to think it might be a real customer preference).

> Developers often miss the point of GA, because they do not work in sales.

Yes. But have you listened to their objections, rather than dismissing them as
"not working in sales"? Developers do have some knowledge of this subject. And
there are lots of ways of getting this data that doesn't increase page load
and compromise security like GA does.

~~~
cameronbrown
> Maybe. You only have data on those people who don't have adblockers that
> block GA. Which is a sizable segment of the internet. Also that they're not
> using a VPN, etc. Stats collected via GA are inherently less reliable than
> stats collected from your own site because GA is easily blocked.

You'd be surprised how many marketers don't understand this key point. Those
that do just don't care. I'd argue it's part of why the internet is being
dumbed down (because people without adblockers are the only ones with a
voice).

~~~
marcus_holmes
Interesting point. "our audience is all older people with little education, we
should cater to them" -> no, actually that's just the only bit of your
audience who let you track them, and you're annoying the younger, tech-savvy
people who'd like your product if you stopped being technically ignorant ;)

------
sputr
Annoyed with cookie consent on your page?

Easy: use a 1st party self-hosted tracker (matomo/piwik), anomnize last digits
of IP, respect DNT and provide an optout (matomo provides an embeddable
widget) on your privacy policy page.

And bam, no legal need for cookie consent OR notification! No popup at all!
And you still get perfectly usable statistics for most applications.

~~~
kevingrahl
I’m no expert here but wouldn’t that go against the E-privacy Directive
2009/136/EC, according to which you must:

Receive users’ consent before you use any cookies except strictly necessary
cookies?

~~~
JeanMarcS
It seems that there is currently no clear answer to this [0].

Couldn’t find the forum post it’s referring to.

[0]: [https://github.com/matomo-
org/matomo/issues/15425#issuecomme...](https://github.com/matomo-
org/matomo/issues/15425#issuecomment-617790122)

------
esperent
One possible benefit that I don't see discussed much in this context is
bypassing ad blockers. I run a tech focused website and Google analytics
registers about 10k visits/month. I figure that a good chunk of my visitors
have ublock and so they don't show up. Presumably, alternative analytics or
self hosted analytics are not blocked, so I'd get more accurate stats. Is this
a correct assumption?

~~~
dmkii
It’s true, but GA also does a pretty good job at filtering out bot traffic for
example, so it depends on what your definition of more accurate is. There are
also ways to send GA hits through a sub domain to make it look self hosted and
bypass ad blockers.

~~~
markdown
> It’s true, but GA also does a pretty good job at filtering out bot traffic
> for example

Surely you jest. I had to quit using GA because the tech behemoth with all
it's tens of thousands of software and algo experts couldn't figure out how to
filter out referrer spam from their analytics.

Such a simple problem you'd think they'd solve it in a day. I gave them 2yrs
and they still couldn't solve that problem so I left.

~~~
kevin_thibedeau
Parent speaks the truth. I quit because of the garbage referers coming out of
Russia that Google would do nothing about.

~~~
esperent
How do you know they were garbage?

~~~
markdown
[https://www.google.com/search?q=%22buttons-for-
websites%22+r...](https://www.google.com/search?q=%22buttons-for-
websites%22+referrer+spam)

------
AdriaanvRossum
Thanks for the mention of Simple Analytics [1] Bartek. We do love to be a paid
service while we know it doesn’t suit everybody’s need. This way we don’t need
to find any other way of making money (with the data of our customers).

[1] [https://simpleanalytics.com](https://simpleanalytics.com)

------
gpanders
I noticed this post mentions GoatCounter but says that it doesn't meet his
needs because it isn't self-hosted.

It is not by default, but you certainly can self host it, and the author has
been quite open about that being a viable path for people interested in doing
so. I self host GoatCounter myself and it works very well.

~~~
galfarragem
+1 to Goatcounter. Not affiliated with it, just an happy (free) user on a low
traffic/hobby project. Simple and no BS. More that enough if you just want to
know the big picture and care about privacy. The data I get is this (my data
is public but it’s up to you):
[https://slowernews.goatcounter.com/](https://slowernews.goatcounter.com/)

------
johnchristopher
Nice. Can it be used to compare this year's first quarter mobile usage for
dutch speaking visitors who came from Facebook to last year's same profiles
but who came from the newsletter ?

~~~
luxurytent
GoatCounter and others use non-identifiable hashes to track a unique visit,
but they only retain that hash for some time[1]. I think in your example,
you'd have to use a solution that uniquely identifies a session and indeed,
keeps track of it.

[1]
[https://github.com/zgoat/goatcounter/blob/master/docs/sessio...](https://github.com/zgoat/goatcounter/blob/master/docs/sessions.markdown#goatcounters-
solution)

~~~
Carpetsmoker
I'm not sure that's even necessary for the parent's requirement? You should be
able to get the parent's data out of it since the location and campaign are
stored; there just isn't a UI for it.

~~~
johnchristopher
> there just isn't a UI for it

Exactly, the marketing/communication/sales people need those bits in a UI,
they need tools to analyze those bits easily, mark them for specific analyzes
and to share reports.

~~~
Carpetsmoker
Sure, and there's definitely value in that. I can't speak for other products,
but GoatCounter intentionally _doesn 't_ try to solve every possible analytics
use case.

Adding advanced features frequently comes with the trade-off that it makes
things harder for people who _don 't_ use those features, so by limiting the
scope it gets easier/better for some, and worse/harder for others. I found
this is the case for software in general.

There certainly _should_ be an UI to quickly compare how the current quarter
is doing compared to the last quarter (I wrote some code for that last week,
but not done yet), but doing stuff like adding extra search parameters like
"from mobile" and "from Netherlands" seem awfully specific to me. While
certainly useful for some, most people are not hard-core marketers and adding
stuff like that is just "noise" for a lot of people which makes stuff harder
to use for them.

Matomo can do all of that, and I also found it hard to use.

Or, tl;dr: Matomo is already good at being Matomo, let's do something
different :-)

Related comment from a while ago:
[https://lobste.rs/s/e8zgt0/github_is_now_free_for_teams#c_r4...](https://lobste.rs/s/e8zgt0/github_is_now_free_for_teams#c_r4vyje)

------
wldlyinaccurate
I'm not associated with them at all, but Fathom are worth looking into if you
want an analytics platform that respects user privacy:
[https://usefathom.com/](https://usefathom.com/)

~~~
JackWritesCode
Hey, thanks for the shoutout. We're a great option for so many reasons but
here are a few recent things"

* We allow our users to bypass ad-blockers and exclude themselves from tracking their own page views ([https://usefathom.com/blog/custom-domains-embed-code](https://usefathom.com/blog/custom-domains-embed-code))

* We are both full time on Fathom, bootstrapped (we've turned down millions in VC) and profitable ([https://usefathom.com/blog/quit](https://usefathom.com/blog/quit))

* We are adding in unlimited uptime monitoring on Friday (SMS, Telegram, Slack and email notifications)

* We run on auto-scaling infrastructure and are used by individuals, small businesses, governments and some of the biggest companies in the world

~~~
saagarjha
Without launching into moral judgement, I’d just like to mention that people
who block your analytics code from loading normally are often sending you a
pretty strong signal that they would not like to be tracked. Do you have
something that might take this into account?

~~~
JackWritesCode
We don’t share that stance. Most people install ad-blockers to block dodgy
advertisers, Facebook and other privacy-invasive websites.

~~~
gxnxcxcx
Does Fathom honor DNT requests from visitors?

~~~
JackWritesCode
DNT has been abandoned, so we don't support it by default, but you can honor
it by adding honor-dnt="true" to your script.

------
joe5150
GoAccess is really cool. Matomo is free and self-hosted if you want more of
the features of Google Analytics like tracking time on page.

~~~
XCSme
I am also working on a similar product to Matomo, it's not free but it also
provides some of the Matomo's premium features for a much cheaper price:
[https://usertrack.net](https://usertrack.net)

------
AndrewStephens
I like this blog post and I support anyone who removes a third party tracker
from their site.

There are companies that live and die by analytics and demographics but your
personal blog doesn't need the information that GAnalytics sucks up.

GoAccess (suggested in a article) is a fine choice, although I found it did
not do a good job of filtering out bot hits. For most people this might not be
a big deal but it annoyed me.

In the end I just wrote[0] a simple hit counter that triggers off a js beacon.

[0]
[https://github.com/andrewstephens75/visitlog](https://github.com/andrewstephens75/visitlog)

------
celsoazevedo
Something like Matomo running on a cheap VPS (less than $5/month) is a good
Google Analytics replacement for small projects.

It's cheaper than Simple Analytics (mentioned in the article) and no data is
sent to a service operated by a 3rd party because it's self-hosted.

~~~
XCSme
I am also working on a similar product to Matomo, it also provides some of the
Matomo's premium features for a much cheaper price:
[https://usertrack.net](https://usertrack.net)

------
chatmasta
GoAccess is cool but it won't help you much if you have a static website or if
you serve the majority of requests from a CDN. In those cases, Matomo
(previously piwik) is a good solution for client-side JS-based analytics
similar to Google Analytics.

~~~
PenguinCoder
How does Matomo work better with a static site, than GoAccess? GoAccess reads
the server logs and creates metrics. Matomo requires javascript; that isn't
'static website'.

~~~
doh
GoAccess reads metrics of a server you have access to. So if you are
distributing from a CDN or for instance GitHub or some other hosted solution,
then you have no logs to read from.

Also usage of javascript doesn't make one page non-static. Static is usually
referred to a page that is rendered in full before a request comes in so the
content does really change based on the request. More simply, content doesn't
change dynamically based on the request. Having javascript to generate metrics
doesn't make the page dynamic.

------
zubspace
I like GoAccess for following reasons:

* No javascript.

* Does not need a database.

* Realtime stats in html format.

I use it in my blog, but also believe, that the numbers are completely
inflated. I don't trust them. This has been discussed on Github a few times
[1], so don't expect accurate numbers (yet).

It's also hard to see what's going on recently on you server, because you only
get totals. I'd love if I could change the time interval of the shown html
stats.

I like the way GoAccess is going though and I hope it will improve.

[1]
[https://github.com/allinurl/goaccess/issues/964#issuecomment...](https://github.com/allinurl/goaccess/issues/964#issuecomment-350607068)

~~~
darekkay
Interesting, I've switched from Google Analytics to GoAccess ~5 years ago. I
let both analytics run for a month and compared the results. The _relative_
numbers were very similar (so I get the same information about what blog posts
are most popular), but the _absolute_ numbers were in fact _lower_ for
GoAccess. It might be because tech blog visitors are using AdBlockers more
often (and hence block GA).

> _I 'd love if I could change the time interval of the shown html stats._

GoAccess displays the data that you pass it. So while it doesn't have any date
filter option (at least the last time I've checked), you can just filter your
logs beforehand. There's even a more simple solution that I'm using: Set the
logrotate to a specific time frame (e.g. weekly), so you can pass "access.log"
to GoAccess to only get the latest stats. You can still pass "access.log*" to
get ALL stats at the same time.

~~~
zubspace
About the numbers: thanks for your input. I guess, you get more accurate
results the more visitors you have. On my small sized blog I doubt the numbers
and think, that GoAccess does not filter out some bots. You can try to
determine them and filter them out, but well, that takes some time.

However, even if the numbers may not be accurate, you still see overall
trends, which is valuable.

And thanks for the log-rotation trick. I will definitely make use of it.

~~~
sdoering
Most bot filtering from analytics tools (GA or Adobe or so) are quite
efficient. So you would expect lower traffic in these tools as from a tool
using your server's log files.

On the other hand a lot of browser plugins or privacy/incognito mode kill
analytics and do not have any effect on your log files. This would lead to
higher numbers in your log files as well.

So I would expect somewhere between 10% - 25% increased numbers from your log
files depending on the audience you serve and the overall traffic volume your
site has.

At least this were the numbers some years back, when we did some additional
backend tracking for some clients, were we linked the front end tracking tool
ID (from the cookie) to the tracking hit being sent from the backend with
additional information. Back in 2015 it was between 7% and 19%, in 2018
(before GDPR kicked in) it was 'tween 15% and 27% of backend tracking hits
that did not have a frontend ID associated with them. So we knew the amount of
tracking calls that had FE tracking blocked.

~~~
zubspace
Very interesting. I just checked my logfiles and as expected most traffic
seems to originate from search crawlers, feeds and bots running through all
kinds of exploit urls. Hard to tell, but a wild guess is that more than 75% of
hits and visitors are bot-related.

I learned that it is possible to exclude bots through browsers.list [1] and in
goaccess.conf you can exlude ip ranges. Unfortunately updating those entries
is very time consuming and probably not worth it.

[1]
[https://github.com/allinurl/goaccess/issues/560#issuecomment...](https://github.com/allinurl/goaccess/issues/560#issuecomment-441166690)

------
darekkay
Similar HN Posts from the past:

* [https://news.ycombinator.com/item?id=22813168](https://news.ycombinator.com/item?id=22813168)

* [https://news.ycombinator.com/item?id=21890027](https://news.ycombinator.com/item?id=21890027)

* [https://news.ycombinator.com/item?id=19883876](https://news.ycombinator.com/item?id=19883876)

* [https://news.ycombinator.com/item?id=18810035](https://news.ycombinator.com/item?id=18810035)

------
girzel
I feel like all I need is a good script to filter out bots and spiders. I
could take the rest from there.

~~~
luxurytent
This is an interesting thought. I wonder if I could pre process my log files
to get rid of this noise ahead of piping to a log analyzer.

~~~
Carpetsmoker
I worked on a library to do this for my own analytics thing[1]; adding a CLI
so you can filter bots from logfiles isn't too hard: just need to parse the
log lines to a http.Request{} with the correct fields filled in (mostly User-
Agent, but also looks at IP address).

One of the signals is from various JS properties though (navigator.webdriver,
window._phantom), so you'll miss out on that. I have some other ideas as well.

Either way, it's better than nothing?

[https://github.com/zgoat/isbot](https://github.com/zgoat/isbot)

~~~
luxurytent
This is cool! I've bookmarked it for future review. Thanks for sharing

------
monus
I’m not associated with Countly ( [https://count.ly/](https://count.ly/) ),
but I heard good things from my friends using it. It’s open source and makes
money with enterprise edition.

------
tobilg
If you run on AWS and want cheap and privacy-focussed website analytics, you
might also look at [https://ownstats.cloud/](https://ownstats.cloud/)

~~~
wharfjumper
Thanks. That fits our use-case. Now just need to figure out how to do the
visualizations...

------
traceroute66
I was going to say "what about Matomo". But then Matomo are still lurking
around in the dark ages of Python 2 and are not worthy of recommendation until
they pull their finger out and sort out their analytics scripts that rely on
that obsolete thing ([https://github.com/matomo-org/matomo-log-
analytics/pull/242](https://github.com/matomo-org/matomo-log-
analytics/pull/242), [https://github.com/matomo-org/matomo-log-
analytics/issues/3](https://github.com/matomo-org/matomo-log-
analytics/issues/3) etc. etc.)

~~~
sdoering
I am totally not a friend of anyone still using py2.x (esp. as it has EOLed
[1].

Non the less one doesn't need to use log analysis and py2.X if you want to use
Matomo as a GA replacement in the frontend.

So yeah - they should fix the log analysis part with py2 -> py3. But that
doesn't invalidate the whole tool imho.

[1]:
[https://www.python.org/dev/peps/pep-0373/](https://www.python.org/dev/peps/pep-0373/)

------
fareesh
Has anyone had the experience of seeing over-reporting of some metrics in
Google Analytics as compared to an internal tracking system? Is this data
generally seen to be 100% reliable?

------
markosaric
Nice work! Every time I see the "de-Google-ing" posts, there are people in
comments saying but can it do this or can it integrate with that Google
product.

I'm working on a Google Analytics alternative myself [1] and we make it clear
that it is not meant as a clone or a full blown replacement of Google
Analytics.

Some people are fine running GA and are happy to integrate with Google Ads and
the rest of the Google ecosystem.

On the other hand, some would prefer to focus more on privacy of their
visitors or on not having to get cookie / GDPR consent or on having a faster
loading website or support a more independent web etc. And alternative
solutions to Google products such as these are more meant for those use cases.

[1]: [https://plausible.io/](https://plausible.io/)

------
87zuhjkas
There's also Cloudflare Analytics, no js tracking or serverside logging
required in case you are behind CF.

------
spossy
[https://statcounter.com/](https://statcounter.com/) have been around forever
and offer a good alternative as well.

------
vladoh
Very cool project! It is good that more people are trying to go Google-free.
Your project does server side analytics, though, while Google Analytics is
client side. Both different advantages and disadvantages.

Server side analytics advantages: \- No overhead on your website, because no
external resources are loaded \- Not affected by content blockers \- Can track
resources other than web pages, like images or redirects \- Show resources
that could not be found

Client side analytics advantages: \- Can capture more data about the user,
like device size, browser and OS \- Better track user behavior, like time
spent on each page \- Provide real-time feedback about visitors \- Usually
easier to setup (3/8)

I've been playing around with different analytics options on my blog recently
- Netlify on the server and Plausible Analytics on the client side. The
results from 7 days comparison turned to be quite different [0].

The server side tool shows much more views across all pages in my blog. I
think this is because the pages are accessed many times from automated systems
that don't run JavaScript and are therefore not counted on the client side.
This happens when you post a link on Twitter or WhatsApp for example, because
their websites or clients then make a request to fetch the metadata of the
page and display a card preview. Therefore, it counts on the server, but not
on the client side. A perfect example is the draft article, which I only sent
per WhatsApp to my wife for proof reading. The difference there is just 2
views (I guess one preview each on my and on her phone). I sent all other
links to several people and posted them on Twitter and Reddit.

You can also somewhat see this effect in the referrers [1]. The server side
tool counted many hits from a RSS reader and Google Translate (my parents in
law reading my blog in Bulgarian :D). The preview requests from Twitter and
WhatsApp are probably counted as direct traffic.

I think client side analytics more realistically count the number of people
reading my blog (even if not many). However, there are cases that can be
captured only on the server side (for example Google Translate). So, the best
way would be to use both or maybe even make a service that combines client and
server side - I think there is a lot of potential there!

[0] [https://pbs.twimg.com/media/EY-
IqtdXsAAJvu9?format=jpg&name=...](https://pbs.twimg.com/media/EY-
IqtdXsAAJvu9?format=jpg&name=medium)

[1]
[https://pbs.twimg.com/media/EY-J0c8WoAEwn2J?format=jpg&name=...](https://pbs.twimg.com/media/EY-J0c8WoAEwn2J?format=jpg&name=medium)

~~~
akdas
There's one more advantage to client side analytics, though maybe you meant to
cover that under "Usually easier to setup": it's the only option if you host
your site on a platform like Github pages. The problem is, you don't have
access to server-side logs on these platforms.

------
johnghanks
So... more work, slower, and fewer features than GA.

~~~
b20000
google is only interested in selling you ads. They don't care whether you
actually get anything out of their analytics product. If it's free, you ARE
the product.

~~~
kinkrtyavimoodh
Can we not have this mindkill of a line posted in every frigging thread on HN
about Google or FB?

Google IS interested in making Google Analytics better, because that ensures
that more people use it on their websites, because then, with your cynical
framing, Google can use it to sell you better ads.

It's bizarre how people here pretend like Google is simultaneously zealous
about selling you ads but also stupid enough to not realize what parts of the
Google eco-system synergize with the whole machinery.

~~~
b20000
I submitted a feature suggestion years ago to add an API for annotations which
by now has been voted up a million times and they still ignore it. Really,
they don't care about making products better.

