Hacker News new | past | comments | ask | show | jobs | submit login
Using analytics on my website (azan-n.com)
41 points by azan-n 12 months ago | hide | past | favorite | 55 comments



Hi HN, PostHog employee here. I'm working on our Web Analytics product, which is currently in beta. It's fun to see us mentioned here :)

I should mention that we have a ton of SDKs (see https://posthog.com/docs/libraries) for back end frameworks and languages, so if you wanted to use PostHog without any client-side JS you could send pageviews and other events manually, but for the vast majority of people it makes more sense to use our JS snippet.

Hijacking this comment to share the roadmap for web analytics https://github.com/PostHog/posthog/issues/18547. It's very much in the launch-early-and-be-embarassed phase, but I would love to hear any feedback or suggestions that people have, particularly if you're already a PostHog user.


Hello there. Thanks for taking the time to read this.

I figured I could use PostHog on a server as well but if I was doing that anyway, I'd probably use something better suited for the use-case instead of PostHog. I did notice the Web Analytics feature and turned the toggle on for my workplace but I haven't gotten around to using it and haven't heard about it from my team either. Will get back to you if I have anything to share. Cheers.


If you already use Posthog, Web Analytics has been in Public Beta for quite some time.[1]

If I remember correctly, CloudFlare Analytics does not need you to register your domain with them. I personally feel keeping domain registration coupled with your DNS provider is not a good idea.

Plausible[2] has an Open Source self-hostable version but is not so updated in sync with their SaaS version.

Umami[3] is another simple, clean one. And, of course, as many have suggested, Matomo is the other well-established one. If you want to avoid maintaining a hosting routine, a lot do the hosting out of the box these days. PikaPods[4] was good when I tried and played around for a while.

1. https://posthog.com/docs/web-analytics

2. https://github.com/plausible/analytics

3. https://umami.is

4. https://www.pikapods.com


I feel like the blog post confused domain registration and DNS on the root domain. CloudFlare absolutely demands the latter by asking you to change your name server on your root domain. The only way out is on the enterprise plan.


Correct. Thanks for pointing that out, I might not have mentioned it correctly in my post.


I suggest using analytics that you can self-host, like https://www.goatcounter.com/ and renting a cheap vm to run it on along with your blog. It is way better, you have more control and you can be sure that javascript tracking is working for 100% of people using the site since you have full control over it not getting blocked by adblockers.


I use Google Analytics (I know, I know, it's bad, sorry for that, but I tried Matomo and it didn't work for me). I noticed in recent years the amount of visitors got down by a lot. Would it be possible to know that the reason is because the users are blocking Analytics JS?

To compare, my website gets around 600 visitors per day according to Analytics, while Cloudflare says something around 4k. Who should I trust?

Also, if Cloudflare would add visitors per page, I would completely remove Analytics. I only want to know visitors per day, country, and the most visited webpages. That's more than enough.


Could you share why you didn't like Matomo? I am building UXWizz, a Matomo alternative, that is focused only on self-hosting. What parts of Matomo made you stop using it?


Yeah I suspect a lot of that has to do with ad blocking and the problem with using log based analytics is filtering out bot traffic


I agree. With PostHog, I know that my ad-blocker will blocks the URL rendering the service useless, but I set up a reverse proxy using Cloudflare Workers to prevent ad-blockers from blocking it.


JS analytics are increasingly susceptible to inaccuracies as data manipulation methods multiply, leading to the production of unreliable events from JavaScript endpoints. Staying abreast of industry developments highlights a rising trend of blockers leveraging AI-driven detection, reducing the significance of JavaScript analytics in such situations.


I can understand the need for analytics on an advertising-supported site with >1M DAU, on a complex web application.

But anything smaller than that and analytics are mostly vanity metrics. A lot of the traffic is going to be bots anyway.

Write for pleasure. If you must, look for qualitative signals like discussions of the article on HN or Twitter.


> But anything smaller than that and analytics are mostly vanity metrics

I removed any analytics from my blog because I'm already obsessed enough with silly star counts. It manages to completely suck away all the joy; ignorance is bliss here. I found that the rare email or comment from some random person saying that they found something useful is much more fulfilling and at the same time less invasive of my brain activity.


I don't use analytics on my personal sites for vanity metrics. I use it to see where people are coming to my site from and the stuff they seem to resonate with.


Upon a bit of introspection I found that for me, such things were just ways of stroking my ego. I just find it liberating to not really care or give much thought about what my audience cares about or who they are. I write first and foremost for myself, second to help others and third to have something to show to prospective employers.

What those metrics mean to oneself is different for everybody. Also, being a vanity metric and being useful are not mutually exclusive either.


I like to know where my users are coming from and discussing what I wrote so I can engage them and have discussions with them too.

Not saying ignoring all that is bad, just my reasons for wanting analytics and not just ego stroking and vanity metrics.


Vanity or not, knowing that a few hundred people checked out a post I made when I linked it from an HN comment is valuable to me. It helps motivate me to write the next one, because I'm writing for others rather than myself.

Maybe others create in a void, but I'll play my piano poorly rather than write about software documentation if I'm doing something for pleasure.


The most important metric I want to track at the moment is the number of people who come to my website using the link on my resume. It helps me understand how many of my applications might have been met with interest (which so far has been 1, unfortunately).

With everything else, I agree. I write as a means of documenting my findings and sharing it with others and surely the discussion section here on HN is where it becomes useful to me where I can engage in constructive conversation and learn new things.

Thanks for reading the article, cheers.


I’ve been looking to add some analytics to the endoflife.date API (JS obviously doesn’t work) and my no-JS options are:

1. Netlify analytics, which I’ve tried in the past and found under-powered

2. CloudFlare analytics (yes, there’s a No-JS variant in the Pro plan)

3. get your raw access logs from your host, and throw them at GoAccess/…. BunnyCDN seemed like the simplest provider for this with access to raw logs in the standard tier. Both CloudFlare and Netlify restrict this to the Enterprise tier.


Maybe Pirsch is an option for your use case?

https://docs.pirsch.io/get-started/backend-integration


Static-site hosted on netlify, so no. We could trigger a netlify function on every request, but that wouldn't scale.


For a personal website with few readers (say one view per day), it's difficult to know if the analytics report actual viewers or bots. Therefore it's useless.

For a personal website with many readers (say more than 100 views per day), it's vanity in the best case, and bad influence in the worst case (one should not produce content just to make more views, that's the lesson we get from social media). Therefore it's useless, at best.

For commercial websites I don't know. If I had to guess, I would say it's not making the world a better place, but maybe it helps someone justifying their job (which I see as a reason, but not as an excuse).


If you have the know-how and time/resources to maintain a self-hosted deployment, and care for security and privacy requests, that sounds like a sound plan. PostHog/Matomo are good choice.

If you want it done for you, then:

Matomo Cloud

Piwik Pro

Wide Angle Analytics

...

plenty of choices


The reason why I am still working on uxwizz.com and maintaining the current pricing model (source-available, paid license, yearly updates), is that it allows me to focus on making self-hosting as easy as possible and work on creating tools to reduce the time it takes to set up and maintain the platform. Most other tools, which provide a cloud offering, have no incentive in making self-hosting easier, as that would lead to their business/cloud offering losing money.


Of all the things that I have, time and resources surely are scarce. Thanks for taking the time to read, cheers.


I was also looking for server-side analytics, created my own, and now it's a product! The idea is that tracking can be done from both, a JS snippet (for easy integration) and an API. Both rely on fingerprinting and almost provide the same set of features. The API just lacks screen resolution. The method is GDPR (and CCPA and whatnot) compliant [0].

Original article: https://marvinblum.de/blog/server-side-tracking-without-cook...

Product: https://pirsch.io

[0] Before this comes up again: Yes, we checked professionally with an external DPO and it was checked by some companies you've probably heard of externally.


Pirsch was also recommended by others and looks like a product I'll be considering for projects with the resources to use a non-free tool. For my website, however, I think it might not be worth the spend for me.

Thanks for reading the post, cheers.


I went the other way entirely: no analytics, 100% static site, no logging.

It's super fast and it takes away one more thing to monitor and keep up, it's not as if I'm running an e-commerce site.


This is the way. However, if you do want to view some stats about your site, you can get them by checking the domain on Google Search console [0].

[0]: https://search.google.com/search-console/about


I've done exactly the same. I must admit, there's a sense of FOMO — if I weren't hosting on GitHub Pages, I would 100% do some analysis of server logs to learn something about site traffic. However, on balance, this is still the best option!


I went this way even running a design service website, works great, lets me focus on what matters, serving my actual customers and that's the best metric of all, actual paid orders, that's all I need to know :)


I do this as well for most of my sites, and I stopped using analytics. I just want to write what I want to write...to hell with maximizing views.


Good idea - one "analytic" is to put your email on the site, and see how many people say hello!


I have visitor statistics enabled on my site and I like to see how many visits I get on articles. The point is to not see more in it than it can tell you, so I don't really understand the 'remove all statistics because it doesn't matter' argument.

However, I must say your idea is a very good one!


I see it like this:

- If the analytics don't change the way you write, then they are useless overhead. - If they change the way you write (because you try to maximize the number of views), then they may be toxic.

IMO you should write what you have to say, not what you believe people will want to read (with the help of tracking).

When I realized that, I removed GoatCounter from my website, which is now a minimal, static, no-javascript, no-tracking personal blog. And it feels good.


That's exactly my take.


I think the argument is "remove 3rd-party analytics because it is not worth the downsides".


Completely agree. No 3rd party code on my sites.


[flagged]


> Just use GoAcces for fuck's sake. You don't need more than access logs. What are you going to do with that analytics data anyway? How much can you even trust that it's real and not bots that are getting better every day?

I can provide you a very simple example. You have a blog, with articles. On a multi-thousand word article, you know that traffic that spent <30 seconds on it are either bots or people that didn't read through. If that's most of your traffic, your articles aren't very good or your introductions suck. If almost all mobile traffic is like that, your site probably looks bad on small screens.

You get the point, I hope. Access logs are useless for this kind of basic information.

Another very good example - there was a blog from someone involved in the UK GDS initiative, that described the case of a woman sitting with a handheld console in the waiting room of a government office. Initially you think she's just playing, but peeking over her shoulder, you see she's filling an application for unemployment. The browser on the handheld is horribly outdated, but she probably doesn't have another choice. Therefore knowing your audience, and adapting to it is crucial. From them:

> people now access GOV.UK in many different ways - 16,500 visits came from games consoles in the last month (Xboxes/PlayStations/Nintendos) - including 65 sessions from a handheld Nintendo 3DS (this is in 2015 https://gds.blog.gov.uk/2015/11/20/2-billion-and-counting/)

It entirely depends on what your website is doing if you need analytics or not. But a lot of orgs and people actually do, and can and should make use of them.


While JS analytics may suit the needs of marketing professionals concerned with product sales and article visibility, those focused on precise traffic patterns, like daily devops, find GoAccess to be a more effective tool.

JS analytics face an escalating risk of inaccuracies due to the myriad ways data can be manipulated, resulting in the generation of unreliable events from JavaScript endpoints. Keeping up with the industry reveals an emerging trend of blockers using AI-driven detection, diminishing the relevance of JavaScript analytics in such scenarios.


You don't need analytics to know that you should build a responsive website and use best practices to keep it resource usage small enough for old devices.


> You don't need analytics to know that you should build a responsive website

You do need to know if you have users using their fridge's custom browser developed by some intern long since retired in 2012 that doesn't support HTML5 properly and needs weird hacks to actually be responsive.

> resource usage small enough for old devices

Which can prevent you from adding new features, or even just video in good quality which can choke an old device's browser. Graceful degradation is nice, and should be preferred whenever possible, but sometimes not possible. So knowing how many, and if any, of your users are using such things you have to go out of your way to support, is important.


I don't think the Nintendo DS / handheld gaming device comments were about keeping resource usage small. Outdated browsers don't support all the same APIs that the ones today do. Typically, businesses will choose a cutoff that supports most traffic they receive from their users. Without analytics, you can't determine this cutoff.

It varies wildly from product to product.


GoAccess is nice but when I tried it a few years ago I found that the results were not accurate. Maybe on a large site the signal rises way above the noise but I found for my blog/project site (with maybe 20 hits a day) the counts consisted of obvious bots that GoAccess didn't filter correctly.

Google Search Console is also widely inaccurate in the other direction.

I ended up just implementing a simple hit counter[0] which was fine for what I wanted.

[0] https://sheep.horse/visitor_statistics.html


If accuracy refers solely to human-vs-bot detection, there might be a point, but for comprehensive traffic analysis, access logs are unparalleled in accuracy.

JavaScript analytics are increasingly prone to inaccuracies due to numerous methods available for manipulating data, leading to the generation of inaccurate events from JavaScript endpoints. Staying current in the field reveals a growing trend of blockers employing AI-driven detection, rendering JavaScript analytics less relevant in these cases.


The benefit with using JS instead of access logs are that scraping bots, security scanners etc usually don't call your JS analytics so for a small site the number % of bots can be 90% of the logs.

And you can use your analytics data to better understand your users because again most bots won't trigger your analytics script (because why would they execute it and waste their CPU cycles on that?). And yes, you can't trust the exact numbers (because adblockers, etc) but you can see the trends:

Page A has 1000 visitors per day. Page B has 10 visitors per day. We can conclude that A is more popular than B (why that is another question).

Or Page A had 100 visitors on average last month and now it has 1000 visitors on average. We can conclude it got more popular. Etc.

Do you need that for your personal blog? Probably not. Do you need that on your e-commerce shop to verify wether you 1 million dollar ad spend makes a difference or not? Probably.


I think many developers overlook this aspect. If you're not in marketing, it's not just about tallying human visitors on our sites. It's essential to be vigilant against possible daily attacks and avoid excessive reliance on basic JS analytics.

Even for personal websites, relying solely on JS can compromise security. We require precise data, and logs prove more dependable than JS, especially with the increasing number of tools tampering with JS data.

We should steer clear of the trap of exclusively focusing on the count of human visitors.


> because why would they execute it and waste their CPU cycles on that?

The analytics script is not run using your CPU cycles; those belong to your victim/target.


> It's bad enought that marketing departments has convinced 99% of companies (or has overruled product / engineering) that it's good to track and make websites slower in the process.

Because there are no product tools and analytics to track and make websites slower in the process?


I love the concept of goaccess but last time I checked it still didn't have the ability in the web ui to present statistics from a subset of dates relative to the entire captured set.


> Just use GoAcces for fuck's sake.

GoAccess seems pretty cool and is probably a good task for the job, when you need something simple, thanks for recommending it: https://goaccess.io/

Even if you have analytics of some sort already in place, I think it'd probably still be a nice idea to run GoAccess on your server, behind some additional auth, so you can check up on how the web servers are performing (in lieu of other options that aggregate your web server logs).

That said, I'd still say that the analytics solutions out there, especially self-hostable ones like Matomo, are quite nice and can have both UIs that are very easy to interact with for the average person (e.g. filtering data by date range, or by page/view that was interacted with), as well as have a plethora of different datasets: https://matomo.org/features/

I think it can be useful to have a look at what sorts of devices are mostly being used to interact with your site, what operating systems and browsers are in use, how people navigate through the site, where do they enter the site from and how they find it, what the front end performance is like, or even how your e-commerce site is doing, at a glance, in addition to seeing how this changes over time.

As for performance, I guess it depends on whether you care about any of the above, whether they actually help you make your site better. If performance was the top goal, we probably wouldn't be using Angular, React or anything like that in the first place, but only very lightweight options. Or, you know, not putting ads or auto playing videos on the sites either.

People have also said good things about Plausible Analytics as well: https://plausible.io/


I believe JS analytics tools serve marketing well, providing estimates like who's purchasing my product or how many are reading my article.

However, for devops, relying too much on these tools can be a trap, potentially causing security and server issues without immediate detection. Our team opts for goaccess for this reason. We prioritize accurate data from logs and scrutinize traffic patterns, focusing beyond just distinguishing between human and bot counts, a task we leave to the marketing/product department.


Actually you don't even need Goaccess running all the time. They parse nginx logs directly so you can only run it when you want to do it (although in practice it takes sometime to parse and create graphs).


I would contend the contrary. Our company consistently runs GoAccess because we are primarily concerned with unusual traffic patterns. This involves monitoring for potential attacks or abnormal bot traffic that might impact our servers, etc.

While JS analytics tools might be of more interest to the marketing team, they are not as crucial for the devops team.


If you use access logs for user analytics, you need all the same consents and adhere to the same ePrivacy/GDPR rules as JavaScript/Cookie-based tools. Once you build all the necessary tooling to use logs lawfully for user analytics, it won't be simpler than the commercial tools.

> It's bad enought that marketing departments has convinced 99% of companies (or has overruled product / engineering) that it's good to track and make websites slower in the process.

Marketing departments need the tracking to use the commercially available advertising, primarily from Google, Meta, and Bytedance. The tracking increases the return on advertising spend by 10-100x and for many businesses is required to be profitable.

This is the typical HN comment from an overconfident developer who believes everything they don't personally like must be the result of idiocy and incompetence.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: