Hacker News new | past | comments | ask | show | jobs | submit login
Counting website visitors is hard (bobbiechen.com)
32 points by bobbiechen on Dec 24, 2021 | hide | past | favorite | 19 comments



I work in website analytics. Cloudflare's numbers are laughably bad. Really, they are not even tracking visits, they are tracking times the link is requested. So everytime the HN page gets updated, that link is getting clicked by the dozen or so bots that trawl that page.

It's a remarkably bad way to do tracking. At the very least, a page view shouldn't be tracked until the page is finished loading for the user. Otherwise you are collecting garbage data.

But fwiw, the specific number of page views is never that important in the grand scheme. You should really be looking at trends, and so long as you pick a number that is consistently measured as your baseline, you can leave well enough alone.


>Really, they are not even tracking visits, they are tracking times the link is requested. So everytime the HN page gets updated, that link is getting clicked by the dozen or so bots that trawl that page.

Very true. I do think metrics are always a balance between precision and ease of collection, and counting IP addresses is definitely towards the ease of collection side. Regarding the bots crawling HN submissions, that would be backed up by an even higher rate of Cloudflare:Squarespace visitors for this submission (about 10:1 vs about 8:1).

>You should really be looking at trends, and so long as you pick a number that is consistently measured as your baseline, you can leave well enough alone.

Thanks for the perspective here - a kind reader sent me a similar message via email as well! The nice thing is that I don't have particular goals for my personal blog, so these analytics numbers feel more like interesting trivia than some OKR metric that I need to increase somehow.


> a page view shouldn't be tracked until the page is finished loading for the user

Isn't this a little more client-driven than Cloudflare would have visibility to, unless we narrow "page load" down to something like "probably read all of the input stream"?


I think that's the thing - Cloudflares metrics are flawed on premise.

If you owned a store at a mall, there are many decent thresholds for constituting foot traffic. But you should probably at least count people who step into the store and not just people who stop and look at it.


The advice I give more and more to my clients is to stop focusing on absolute numbers. Instead, pick a decent analytics package and use it to monitor changes over time. Focus on the relative, not the absolute. (You still need to be careful to make sure your changes over time aren’t because of an increase/decrease in bots.)

Similarly, focus on a few core metrics and don’t go down the rabbit hole that something like GoogleAnalytics tends to compel. Usually those metrics are a few basics (uniques) plus a small number of domain/operation specific conversion metrics.


Google Analytics was a pain to use.

At Appsmith, we use Plausible, which is laughably simple (and open source too).


Ask HN: How do you analyze your web server logs?

ELK is too heavy and GoAccess is too simple and I haven't found anything useful in between.


For a very long time, I've been using shynet [0] on my personal blog. I've not played around the alternatives but there are quite a few:

- Plausible: https://plausible.io/

- umami: https://umami.is/

- Fathom: https://usefathom.com/

[0]: https://github.com/milesmcc/shynet


Early Plausible customer here — Happy so far; most importantly is the necessary feature to proxy myhost.com/randomstats.js to their server to circumvent blocking.

Eventually these new-gen solutions all look and work the same now so it's mostly about the UI you prefer. Can't do magic without cookies and stuff.


These tools can't be used to analyze "web server" (e.g. nginx) logs.


Author of Shynet here! Hope you like it :)


Lovely! Thanks for buliding it.


What's the question you want to answer? And do you want to monitor specific metrics; or run adhoc queries?

FWIW I'm trialing out using Grafana's Loki for this.


4xx/5xx status codes, response times percentiles, referrers, etc.


I've been experimenting with Grafana + Loki and it has been a pleasant experience.


I'd love to see this in its own thread!


> It's likely lower on mobile, where Firefox is the only major browser that has any sort of adblocker integration (that I know of).

Note to author, iOS Safari has adblockers available via the App Store. I believe the same is true for whatever browser(s) Android uses.


Oh wow, thank you for pointing this out! Added a correction here: https://bobbiechen.com/blog/2021/12/21/counting-website-visi...


I use goatcounter, self hosting the script and proxying the metrics endpoint through my website's server. This way there are no 3rd party assets to block. Also, there's a noscript image tag in case js is disabled that just lets me know how many times the page was accessed. The script prevents most bots, but the noscript probably causes a bunch of bots to count.

https://kar.goatcounter.com




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: