Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: I've built my own simple website analytics (jmmv.dev)
63 points by jmmv on Feb 3, 2022 | hide | past | favorite | 24 comments



FYI: if you start on an endeavour like this in order to get around uBlock Origin blocking your existing analytics platform, you will probably end up disappointed; if you do it just for your one site, you may well get away with it, but if you share it with others or if your site is popular, it will be added to popular blocking lists before long. In fact, if writing such a thing myself I would be strongly inclined to deliberately style my URLs so that they matched an existing rule in EasyList or similar. I have nothing but scorn for the likes of https://github.com/plausible/analytics/discussions/387 where site owners are saying “oh no! our tracking script has been blocked! how can we work around the visibly-expressed intent of the end users and give them something that does not serve them and which they have demonstrated they don’t want?” Same deal with tricks like CNAME cloaking. (Related: https://plausible.io/docs/proxy/introduction.)

Yes, your tracker may behave better than Google Analytics, but something that’s still better behaved is “nothing”. :-)

(That you may assess where I sit on this spectrum, and how seriously or otherwise to take me: I have more than the default set of lists enabled in uBlock Origin, and I block JavaScript by default with uMatrix in normal windows, mostly briefly opening Private Browsing windows, where I have uMatrix disabled, if I want to use something that depends on JavaScript. But I do all this far more out of concern for performance than privacy.)


Analytics helps authors understand their visitor demographics and if what they are creating is right or wrong (not in the moral sense). I don't publish free content for myself. If you want to anonymously consume others' work for free use Tor.

Site visitors are saying "how can we work around the visibly-expressed intent of the site authors and avoid giving them the data which they have visibly demonstrated they need".

I totally understand blocking trackers and ads, but the "holier than though" attitude you demonstrate above only helps to solidify the hold the incumbents already have. By all means block all the trackers from the demonstrably evil corps with motives that serve to exploit you... but asking small blog authors to act guilty so you can block their analytics while consuming their content for free is entitlist and exploitive on your part.

That said, I'm aware this is a slippery slope and I am glad people like you exist and push for privacy, even if I think it's a bit extreme sometimes.


> oh no! our tracking script has been blocked! how can we work around the visibly-expressed intent of the end users and give them something that does not serve them and which they have demonstrated they don’t want?

This idea that website owners are not allowed to see what people are doing on their website, as if it's some egregious invasion of privacy, is so tiresome. It's akin to having security cameras, or matching inventory and transaction details to credit card/member information, which happens every time you purchase anything at a store in real life.

Yet this piece of javascript that turns your web session into an anonymized set of numbers is some kind of breaking point? When will the internet nerd community grow out of this phase?


Cameras don't have to be hooked up to my person in order to work. Transaction and card details are only exchanged when an actual transaction is processed, which I must do knowingly and thus consent to (pick up item, bring to counter, pay)

In contrast, the "virtual camera" equivalent involves using my computer's resources, and every move I make on the website is considered a transaction even if no purchase is made. All this data is collected and used most often without the user's knowledge!

The real life equivalent is more like a strip search before entering the store, and a pat down every time you enter a new store department along with a paragraph from you about your intents in this area of the store, and your signature on the bottom.

I'd say that's pretty tiresome for the user.


> The real life equivalent is more like a strip search before entering the store, and a pat down every time you enter a new store department along with a paragraph from you about your intents in this area of the store, and your signature on the bottom.

how is it like this at all? You claim the data is collected without user knowledge, so how could it be compared to such invasive procedures? The camera metaphor is far more apt, other than the resource usage (which is negligible in most cases, let's be real)


Author here. Yeah, I'm aware that this can/will happen, but as I hinted in the opening of the post: "wanted the warning from uBlock to go away if at all possible" (note the "if possible" part). In other words, if it does happen in the end, well, it's understandable, but at least I know what I'm doing in detail.


if you share it with others or if your site is popular, it will be added to popular blocking lists before long

Are the block lists sophisticated enough to block it if the Javacript part of the analytics is - say - shared as ‘host-it-on-your-server’ with an arbitrary file name?

I suppose the block lists would just be amended to block the outgoing connections that that JavaScript makes?

But if the JavaScript were obfuscated and routed the collected data through a first-party server that simply forwarded it on to ‘the analytics server’, is there anything uBlock etc. could do to counter that?


Most lists that include analytics services will block it in a way that will capture the most possible instances.

My privately hosted Plausible instance is blocked because the filename is still "plausible.js" even if it's a on a first party domain.


A visibly expressed intent means nothing at all. I have an intent to get paid without doing work, but it's not working so far.

The intent is a signal, and unless it's one where a site owner needs to legally comply with it, it's nothing more but a wish. And if we were to grant the wishes of the typical entitled internet user, nothing would be sustainable.

Analytics are a primary tool for site owners, not some far-fetched optional thing. They are needed to run any non-trivial website. Analytics don't even require any consent at all in the strict GDPR legislation, so even they acknowledge it is essential.


> As annoying as writing these might be, there is no way I would go without writing tests even for a side toy project. For one, the tests have already saved me from obvious bugs in the SQL queries. For another, extensive test coverage is critical for quick iteration. I’ve been able to prototype and refactor this service in place over and over again to add new features because I have enough test coverage.

Is it critical for a toy project (or any project for that matter)?

> The best thing about this setup is that it is simple. There is no Docker involved, no complex local database configuration, nothing. cargo test or git push to a temporary branch and I am good.

I guess it depends on viewpoint. Maybe I'm an outdated dinosaur here, but we wrote code for decades without tests in production, let alone for side projects. Not having tests could sometimes cause failure - true. But it also forced you to examine what you pushed to production.

I do think the project is very cool though.


Tests can form a normal part of the development process in some languages. You don’t need to go full TDD, but when you language’s tooling and IDE are designed for testing it can speed up your feedback loop.

For example when writing in go in vscode, tests often sit in the same category of tooling as storybook and js hot reloadeder do.

Being able to update a function and then run it with a set of inputs already defined automatically is really great.

It also lets you workshop the API design (of a library) while your developing it.

In some other languages I’ve found the tooling so painful to setup, that it’s rarely worth the effort.


I run a (pretty successful) self-hosted analytics platform[0] and I am no longer writing tests for it. I did have tests at some point, but they were slowing down development too much, whenever something was changed the tests had to be updated, whenever a new feature was added, new tests had to be written.

I do plan to create some automated tests, but mostly for the set-up part, so the installation always goes as expected on different environments.

[0]: https://www.uxwizz.com/


> As mentioned earlier, my needs for analytics are very limited. I think I’d be fine generating statistics based on raw web server logs (like GoAccess does), but I do not have access to those and I don’t fancy running my own web server. (Spoiler alert: it’d have been much easier to do that.)

This is actually what I did here: https://github.com/fitzn/sieve

But your project is much more legit and prettier looking :)


Nice writeup. I also wrote my own stats module for my website package, and I must say: it seems pretty accurate (never tested it next to GA by the way). I basically track the same things as you, but I don't fingerprint. I just explain to my users why and what, and they understand completely. They don't even care about unique visitors that much, just the broad scope and where the users come from and when.

The only cookies my system leaves behind are when a visitor fills in a contactform, so it is easier to block bad actors (there are emails going to be send, and I care about the email server's reputation). It's flabbergasting to see how much form spam takes place.

P.S. I 'follow' your blog now, by RSS.


I did the same. It's dead simple thing in fact - https://tildegit.org/severak/millions/

It helps my to answer questions like "did anyone clicked on my pet project link I posted on HN yesterday?" (yes, 15 ppl :-D). For this purpose it does well.

But I want to rewrite it as in current incarnation there is slight possibility to tracks users by manually quering visits records from database (if I suddenly turn evil). I want to split these numbers to separate data structures, but this project has low priority.


We built a project and we decided to go with anything but google analytics. For the average indie hacker we focused on -

- Size; needs to be unusually small - No prompt for analytics authorization for users - Some API support - Tracking click counts

We went with goatcounter. Even though we don't have any traffic but it is very nice to see how much good work is being done when it comes web analytics. It is really surprising tbh because I wouldn't imagine something like Google Analytics would face competition as it is more or less seems like the standard of the internet.


Nice write-up. I am a bit concerned with the part about fingerprinting. It sounds so easy to make that it made me realize that everyone must be doing it.

I am not sure if the use-case here is to track people that delete their cookies or if the author found it easier than using simple UUID stored in cookies.


The fingerprinting part is to avoid double counting and using cookies - most analytics and feature flag services have something equivalent to:

hash(salt + ip_address + user_agent + some_other_unique_characteristics)

for analytics you could rotate the salt daily if you're only concerned about daily unique visitors


Precisely. To be honest, the fingerprinting part was a very late addition and... I don't like it. The original version used a uuid plus a cookie, but "aaah, cookies!" so I looked for an alternative. Which sounds worse than the simple cookie approach.

One thing I'd like to do is try something like what you said server-side and compare how closely a hash of those properties matches the "accuracy" of fingerprinting. I'm only looking for something of reasonable precision during short periods of time (to avoid obvious double-counting) so if that tracked reasonably enough, it'd be a very interesting alternative.


What about a time-limited fingerprint, built with a seed that rotates daily?

That way, you can only track any given visitor for max 24 hours, which often would mean tracking them across a single "visit". I'm pretty privacy conscious, but personally would be absolutely OK with this.


One part he mentions is that avoiding tracking cookies means avoiding the "I consent to tracking cookies" popup.

> However, GA has grown exceedingly complicated, installs cookies (thus requiring the utterly annoying cookie warning in the EU)


Fingerprinting for this purpose requires user consent in the same way a tracking cookie would. Admittedly, fingerprinting is harder to prove from the user side.


AT least in this case, there’s a blog post pointing out how it violates GDPR ;)


Nice project! For the counting of views being limited to a few days you could run a daily job to compute the running total of previous days and compute the total today as that historical sum plus todays sum. Just a thought.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: