
Using GoAccess for self-hosted web analytics - bjoko
https://b4d.sablun.org/blog/2019-12-23-own-your-website-stats/
======
pilif
While I do like GoAccess and I like owning my own data, there's still a huge
difference between what tools like GoAccess offer and what the elephant in the
room, Google Analytics, offers.

It's not just the data available by default, it's also the very advanced user
interface allowing even somewhat non-technical people to produce their own
reports and dashboards.

Yes, you only co-own the data together with Google, but at least you _have_
the data compared to using a home-grown solution which will either not provide
your product managers with what they need or which will consume all of your
life as you slowly re-implement your own Google Analytics which will never be
as good and/or featureful as the original.

Of course, if it's just about your personal blog and if you're willing to
spend some time on the tooling itself, then yes, tools like these don't have
the privacy issues otherwise accompanying third-party analytics.

~~~
JeanMarcS
For me, the problem with GA (or Matomo or any JS analytics) is that it’s not
accurate with the people visiting your website.

You lose all the ones with a blocker, all the ones with JS turned of.

Of course it’s not the biggest percentage, but still.

So you’re right, it depends of what the website is. It’s easier to follow
individuals with a JS solution.

~~~
tyingq
GA does support using Google Tag Manager and a NOSCRIPT iframe for recording
visits from people with JavaScript disabled.

You can also either proxy GA through your own domain, or use server side APIs
to track those with blockers. Both support passing the browser IP as a
parameter.

Also, fwiw, just noting both are technically possible. I'm not speaking to
whether either is okay ethically.

~~~
joshyi
I think a big problem with Google Analytics is accuracy, especially with the
now so popular adblockers. Log analysis such as GoAccess should be able to
track these down fine since it works at the server level.

I believe tracking visitors at the client level deflates the actual number of
visitors. On the other hand, server-side tracking gives you a more accurate
number at the cost of not knowing for sure if the client is a human behind a
browser.

~~~
tyingq
You can programmatically send hits to GA from your server, and get the benefit
of the nice UI.

------
djhworld
If you're hosting a static site on shared infrastructure (e.g. github pages,
S3) where access logs are hard to come by, you can achieve some level of
statistics gathering using a 1x1 pixel GIF and putting a CDN in front of it,
then putting the CDN logs through GoAccess.

EDIT: this was a blog I loosely followed
[https://benhoyt.com/writings/replacing-google-
analytics/](https://benhoyt.com/writings/replacing-google-analytics/) \- one
detail I forgot to mention is you have to do some post processing on the log
files to get them into a format amenable for use by GoAccess etc, but it's not
too difficult

~~~
awinter-py
yeah, sharing logs for shared hosting is a small but interesting & unsolved
problem

it's still weird to me that we rely on the frontend to manage access logging

------
hardwaresofton
I personally run Fathom[0] for my very simple needs and it's pretty good as
well. Not quite as simple to deploy as GoAccess, but it's got a nice UI as
well.

I used to use Matomo[1] (which is a lot closer to a full analytics suite) but
stopped using it since it felt heavier than what I needed.

[0]:
[https://github.com/usefathom/fathom](https://github.com/usefathom/fathom)

[1]: [https://matomo.org/](https://matomo.org/)

------
mikro2nd
Like the other solutions I've seen, this assumes you have access to your web-
server's logfiles. For many (most?) low-end web-hosting this is likely not
true afaik. I suspect that this is precisely the window-of-opportunity that
enabled the dominance of Google Analytics and similar solutions.

------
UglycupRawky
GoAccess is great and all, but static log analyzation =/= Google Analytics.
You can only get so much data via log analysis, and if you have a SPA, or even
just alot of client side stuff, static log analyzation just cannot provide you
with the same type of data you get from client side.

If you really want to replace Google Analytics and have the same level of
features and tracking you need a client side system - Matomo is the closest
I've seen to that.

All that being said, I've used GoAccess, I like it, but I haven't quite
mastered the log format to enable my more robust AWStats logs with GoAccess. I
have a bunch of subdomains all as VHosts, and I loved the feature in GoAccess
that rolls it all up into a VHost table/chart. However I haven't figured out
the log parser settings to get both AWstats and GoAccess to like it with the
%v_host field in the logs. Any thoughts/help there?

~~~
joshyi
Google Analytics keeps track of visitors using cookies, so if a browser has
cookies or JavaScript disabled, then it won't keep track of it. This includes
the now so popular adblockers and bots as well. Log analysis such as GoAccess
should be able to track these down fine since it works at the server level.

I believe tracking visitors at the client level deflates the actual number of
visitors (due to reason listed on #2). On the other hand, server-side tracking
gives you a more accurate number at the cost of not knowing for sure if the
client is a human behind a browser.

------
alicorn
Does anyone else work on a relatively large internal web project in a global
corp where GA for obvious reasons is forbidden to use internally? How do you
do statistics? Which statistics do you capture and why?

------
JackPoach
Yeah, good luck integrating this with Adwords, YouTube, Facebook, etc. I
understand why this is good for personal projects, but if you are spending
lots of money on advertising (which probably goes mostly toward Google
anyway), Google Analytics is the way to go.

~~~
cuu508
Sadly.

It would be nice if there was a way to do conversion attribution without the
tracking cookies.

One idea I've been thinking about lately:

* in server-side access logs, include visitor's fingerprint as hash(client_ip + user_agent + maybe_salt)

* Look for access log entries with ad campaign specific Referrer values. Call these "Set A".

* Look for access log entries with the conversion event (user signs up, user upgrades to paid plan, user makes purchase, ...). Call these "Set B"

If any given visitor's fingerprint appears in both the Set A and the Set B, we
can assume they came from an ad campaign and converted.

Of course, user's IP can change, and many users can have the same IP. So this
would be imprecise. But could be better than nothing. And, IMO at least,
better than having the tracking cookies, cookie warnings, consent screens,
extra sections in privacy policies etc.

------
reidjs
This looks awesome for a personal/portfolio site. I use matomo for mine and
it’s sort of overkill. You have to set up your own lamp/lemp stack just to get
analytics and I use maybe 1% of the features.

~~~
rambojazz
Matomo unfortunately is unnecessarily difficult to setup for reading web-
server access-log files.

~~~
cpach
Isn’t the primary goal of Matomo to be a replacement for Google Analytics?

~~~
rambojazz
Yes I think so, which means comparing it with GoAccess is a bit apple&oranges.

