Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: PiratePx – Open Source Privacy focused analytics (piratepx.com)
30 points by wilsonjohn 10 days ago | hide | past | favorite | 10 comments

Very few people will object if a _site operator_ would know what they did on their site exactly, if they were a returning visitor, how much time they spent per page, etc.

The principal privacy-related concern with any form of analytics is that of tracking by _a 3rd party_ across _unrelated sites_. And this concern is fully addressed by simply not using external analytics services and relying on a self-hosted one instead.

So I really don't understand the whole exercise with not using random techniques that may be abused by 3rd party analytics services and then somehow claiming pro-privacy focus, whereby the solution is to NOT make yet another analytics service.

The only right way to do "privacy-focused" analytics is to offer a self-hosted option. Whoever makes a proper clone of the the original Urchin, before it mutated into GA, will strike pure gold.

There are already several good self-hosted analytics tools depending on your needs (Matomo, userTrack, Plausible, etc.)

What features are missing from those platforms that the original Urchin had?

I don't understand your point. PiratePx seems to be MIT licensed and open-source so that it maybe self-hosted. Are you taking issue with the technique of tracking used here- using pixel-sized GIF?

The point is that if it can be self-hosted, then there's no reason to not use cookies and JS, because that's what allows for collecting actually actionable metrics.

And if it's a 3rd party service, then it's not option for anyone who genuinely cares about visitors' privacy.

My first experience with 'analytics' was awstats. I felt like I discovered god mode!

Always wanted to experiment with the empty_gif[2] module from nginx, and process the logs offline. A quick search shows a bunch of guides offering exactly that.

1. https://www.awstats.org/ 2. https://nginx.org/en/docs/http/ngx_http_empty_gif_module.htm...

I've often wondered why there are so few server-side log parsing libraries (for the likes of Nginx and Apache). Putting that 1x1 pixel image on your front end and serving it from a host that only collects and aggregates anonymous logs about pageviews of each URL would be the best simple analytics most Web site owners would need. Basically the Netlify analytics without the $9 /month/site price tag.

(Yes, it would obviously need a bit more than just a log parser but that would be the easy part, IMO. Separate the backend from the front and let JS folks write their own UI for This Weeks New JS Framework. Maybe you could even use PiratePx as the frontend.)

There are indeed not a lot, but with Matomo (Piwik in the old days) it is possible. You can create analytics based on logs. When you host it yourself and use the correct privacy settings it's a great solution.

Like mentioned by other you of course lose a little bit of accuracy and you don't have front-end tag manager related features (like listen to scroll positions and element events). Although you gain a bit of accuracy too, because some privacy tools block analytics calls which isn't possible with the access logs approach :)

The reason is because of your health checks, bots, retries, and spam that are hard to distinguish without bespoke whitelisting.

Cloudflare analytics suffer heavily from the problems I mentioned.

So false-positves. Is there any reasonable measure on the amount of false-negatives one would get with traditional 3rd party tracking due to e.g. blocked JavaScript, adblockers etc?

Nice :)

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact