Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Show HN: Self-hosted, privacy-friendly and open-source analytics for WordPress (kokoanalytics.com)
187 points by dvko on Dec 6, 2019 | hide | past | favorite | 62 comments


I have zero interest in running Wordpress but I just wanted to say congrats for releasing something that ticks a lot of “objectively good” boxes otherwise: open source, self hosted by design, and privacy friendly.

Well done, and good luck!


Hah, I somewhat understand the former and fully agree with the latter. For what it’s worth, I hoped building it for WordPress would allow me to reach the most people who would otherwise be left-out by other self-hosted tools (by a lack of server skills or interest). Anyway, thanks for the kind words!


we've been considering building a wordpress plugin, how was the developer experience?


I’ve been doing it for the last 10 years and have built other analytics tools before (Ana in 2016, then Fathom in 2018) as well, so it was super easy to build for me.

But if you’re new to WordPress, I think the process won’t be that enjoyable. Users are usually on EOL PHP versions and WordPress itself is very different from modern day PHP projects.


Reminds me of the programs that would make summary pages from your web logs.

(Webalizer)

http://www.webalizer.org/

But how does it work if you are behind a cdn like cloudflare is the usual question, but since Wordpress creates each page dynamically it might work.


GoAccess is a modern log based analyser. I’ve just deployed it analysing logs coming out of varnish cache (on two servers, using syslog to keep log data the same across machines) for a client. It works well, no additional client side code or cookies or changes of any kind required on the (web) application layer.


AWStats still gets frequent updates. I use it for all of my sites' nginx logs since I prefer that over anything that requires JavaScript.

https://www.awstats.org/


You can make it work with cloudflare, just a bit configuration needed so CF is sending you the real visitor IPs, otherwise every hit looks like it's done by distributed CF machines.


The plugin doesn't use IP addresses at all, so this isn't needed. There is an option to determine returning visitors using a cookie (easiest, realiable) but it can be turned off.

When not using a cookie, the plugin will look at the `document.referrer` value to determine whether a visitor is returning and whether a pageview is unique (less reliable, but should be good enough for some).

IP addresses are not looked at at all (let alone stored).


Hey, this uses a tracking request that is initiated client-side. So even if the page is not generated dynamically, Koko Analytics will work just fine.


Webalizer was clumsy and cluttered. I used it like 20 years ago. Koko is way cleaner.


Speaking of self hosted analytics, I used to use Piwik and it was great. It's called Matomo now https://matomo.org/


This project seems to use Matomo's referer blacklist, so there is hopefully some cross-pollination between the similar projects: https://github.com/ibericode/koko-analytics/commit/d53077392...

(Both are GPLv3, so there's no licensing issue or anything; Just caught my eye.)


And speaking of WordPress analytics: There is now a new official WordPress plugin that contains a whole Matomo instance, so you don't need to set it up seperatly.

https://matomo.org/blog/2019/10/matomo-analytics-for-wordpre...


> ... I used to use...

I recently actually started using matomo (away from google analytics)...So I'm curious why you seemed to no longer use it?? Is there something that i should be wary about now using matomo?


I am a great fan (as much as one can be a fan of an analytics tool) and advocate for matomo. They even have server log analytics with no client-side tracking whatsoever.

We're using it for a big variety of clients small to big and sure, it's less capable than Google Analytics, but then again – most of our clients are just interested in simple visitor statistics to validate certain assumptions, not whether or not James Doe's engagement with our site has been 4.32% better after his second Flat White at Starbucks.

When ever there's a viable alternative to a Google product – go for it. Matomo is 100% viable and I feel much better tracking (anonymous!) user-behavior this way.


Oh, no, I just have no use for it now :)

I expect Matomo to be even better, after all this time.


Ah ok, understood; whew. Thanks for the clarification!


I have not found any issue with the software since the name change.


Looks lovely - I'll be testing this out on a few clients websites for simpler stats experience viewing for them.

I do have Google Analytics running on them but GA can be complicated for some users (especially starting out a basic blog), and they love being able to see everything in one place. I've tried a few "display GA stats in WP admin" plugins but they always feel a bit half-baked.


This is a great alternative to Google Analytics. It's free and it works as soon as you activate the plugin.

Most bloggers set up Google Analytics without thinking too much but never use much of the data it collects and tracks.

This plugin has all the basics that the average bloggers cares about and it does it without adding extra load and without collecting intrusive and behavioral data.


> Option to not use any cookies while still being able to determine returning visitors and unique pageviews.

This strikes me as a potentially non privacy-friendly feature. How is this implemented? I assume some form of fingerprinting must be used?


Hey, glad you ask because I feel strongly about this!

We're absolutely against fingerprinting as I believe it's better (for the visitor) to just use a cookie in those cases, since they retain control that way. (Delete cookie = be forgotten.)

That's why we opt for a cookie by default, as I believe it is more in "the spirit of the law" than fingerprinting the visitor and storing information about them server-side.

So, when you disable the cookie the plugin falls back on checking the `document.referrer` value for determining returning visitors and unique pageviews. It's a lot less reliable obviously, but some price has to be paid if you do not want to rely on storing something client-side.


I don't think it's at all clear from the term that returning visitors wouldn't include someone who visited your page from a search page, went back, made a new search, and clicked on a different page on the same site. That seems contrary to the industry standard definitions. I'd encourage you to call it something else


I agree. A returning visitor is someone who visited your site before and now they're back again. The "return" in returning visitor is the fact that they left and came back.

What they're talking about is usually described as tracking unique sessions. So, I would see the use of the referrer as a low-complexity, less reliable, but more privacy-friendly way of tracking unique sessions.


Sorry, this is indeed what I meant to say. Will update the text!


Do you store it as the same record type as the cookie-based metric? If I decided to stop using cookies that would make aggregates of the data very misleading.


We do - I’ll see if we can add in a warning message if changing that setting while a lot of data has already been tracked. Or perhaps mark it in the graph, so the difference comes with an explanation. What would you do?


I'm not sure. It seems like they are totally different metrics and visitors who came from another page on your site might be an interesting metric to cookie users.

What if you completely separated them, and tracked and displayed both for cookie users? Users who disable cookies would then have historical data in the same unit.

Or a warning would probably work fine.


You probably can also use IP addresses to match at least some of the returning users.


Not an option as the plugin does not store anything visitor specific, not even for a second. Would make it considerably harder to scale as well.


A cookie is the only reliable way to determine returning visitors, which is fine if you are setting cookies anyway as part of your page but I think it is a bad choice to introduce a cookie to an otherwise non-persistent page just for tracking.

Fingerprinting is the next most reliable way but raises all sorts of data-retention issues. Maybe it is acceptable if the fingerprints are only stored for a very short period of time (say 1 hour).

For my home-brewed analytics system I eventually decided that the trade off wasn't worth it. I just count page views rather than visitors. I know it isn't industry standard, but I don't care and nobody in "the industry" is going to trust self-hosted analytics for anything anyway.


The only downside I see is for people who plan on selling their website at a later time. The industry standard is GA and people want access to this before buying a website, many don't trust other stats.


> many don't trust other stats May I ask what's exactly wrong with other stats?


Maybe trust wasn't the best word. Most marketplaces where you can sell a website has an option to link your GA account directly to the sales page, showing the site stats directly on the page. No other stat program is offered this option.

Since people are not familiar with other programs and they can't be confirmed, it may make them skeptical


For the unfamiliar, what marketplaces exist? I don't think I've come across these.


Think he's referring to flippa.com?


Interesting point to consider. Could they maybe be run side by side?


But then you lose the main point of the increased privacy. If you are simply using this plugin for simple metrics and don't care about privacy, this would work.


This looks really great. Thank you for making it. Will be trying it out on a WordPress site very soon.


Looks really nice. I wish there was kind of the same thing for static website. A super simple, minimalist, privacy-friendly statistics tool.


A necessary plugin for the moment we live in, thanks. I just got the translation for Brazilian Portuguese done.


That’s how I feel too. Thank you so much for your contribution, I’ll make sure to get it reviewed soon so the language pack can be shipped.


When a wordpress site is required, I use the REST API to build the front end so that its not bloated to hell the WP's own theming system, is this plugin available to those using this methodology? IE can i put a tracking code for it into my pages?


With the plugin active, you can load the tracking script manually and it will work. That said, it could probably be made easier. How would you like to see this?


A JS or PHP include depending on how it works would be grand.


Do you have any public examples of how you have done this?


I tried Koko on two of my blogs. I really like it. You just need a current PHP version.


Thanks! Current though? The plugin supports PHP versions all the way back to 5.3 (which has been EOL for nearly 5 years).

Although I certainly recommend running a current PHP version of course.


Yeah, I know. I'm a bit late on updating. The service I host the blog on is also not the best one obviously or they would take care of PHP.


This is really great. I'm trying it now on my local dev environment.

I'm looking for a GDPR-friendly analytics solution for a media property I am building so I am very excited in this space.

For something like this to be viable for us (am not saying we are your target user type) there are some notable things missing which I'm guessing you may add over time.

  * Browser type / UA
  * Viewport/Screen sizes
  * Time on site/page
  * Bounce rate
  * Pathing
The other thing that you should look into is custom dimensions. So I can label certain templates as a page-type (say 'posts' versus 'pages' versus 'homepage' vs 'tags' etc).

There is more beyond that (event based metric like subscribing to a newsletter) but that would within reach without too much extra effort.

Given the code is available, I'm going to dig into the plugin to see if I can help on that.


Nice! I use WPStatistics currently. You should do a side-by-side comparison explaining why I should make the switch.


Looks really nice and neat, excited to try it!


Hey, developer here. Thank you, would love to hear your thoughts and suggestions for it.

PS. Since this is brought up all the time: in case anyone wonders why this feels similar to Fathom Analytics, it is because I built the initial open-source version of Fathom too.

Koko Analytics is my attempt at a truly open-source and self-hosted tool that is easy to install (ie not just for those who know their way around a server).


Well since I saw in hn all the fuck-fest with fathom v2 and the founders talking down to legitimate users asking about the open source version, I'm glad you released it, will try in my new WP site, will work with woocommerce too?


Sure, without issues. All (standard and custom) post types are supported!


It still uses JavaScript. I'd rather see something completely server-side that sets Etag evercookies.


Installed this on a few sites I run, its really seems like exactly what I want for simple "is anyone viewing this page" type of info. I had previously left GA and analytics all together around the time GDPR passed.

One question, does this plugin have the ability to filter out traffic from web crawlers? The sites I run are low volume, so a non-trivial amount of traffic comes from crawlers and other bots.


The plugin issues a tracking request using a tiny client-side JS script, so most of those bots will already be omitted (as lots will most likely not be executing JavaScript or have `navigator.doNotTrack` enabled).

That said, I believe some of the smarter ones may still be tracked right now so I will add some user agent filtering to the script to make sure they aren't!


Ever thought of using ClickHouse for data storage?


Not really. What problem would it solve?


It's a database precisely designed for OLAP and web analytics in particular. It's what empowers Yandex.Metrika.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: