Hacker News new | past | comments | ask | show | jobs | submit login
Umami: Self-hosted open-source alternative to Google Analytics (umami.is)
820 points by bananaoomarang on Aug 18, 2020 | hide | past | favorite | 227 comments



Hi everyone!

Author of Umami here. I totally did not expect this response so it looks like you all hugged my little server to death. The demo should be back up now.

A little background. This is a side project I started 30 days ago because I was tired of how slow and complicated Google Analytics was. I just wanted something really simple and fast that I could browse quickly without diving through layers of menus. So I created Umami to track my own websites and then open sourced it. The stack is React, Redux, and Next.js with a Postgresql backend.

Would be happy to answer any questions you have.


This is a really cool project. I’m happy to see that you are using Prisma for data access. If you are interested we can set up a shared slack channel so you can provide feedback and we can make sure we support everything you need for this project :-)


This is my first time ever using Prisma and I'm a huge fan of it already. It did run into a few gotchas and would love to discuss.


Contact info is in my profile. Please send an email so we can set something up :-)


Since it's self-hosted, is there a reason you went postres rather than something simpler like sqlite or even flat files?


It uses prisma.io for the database connections and SQLite is supported. I just haven't had the time to implement it yet and make sure all the custom queries are working. I would welcome a PR.


Are there any reason why you dont use currently available Open Sources solutions and decided to create your own? ( Other than it is fun to do it yourself :D )

I am wondering why in the past 2 years we went form having little to zero GA alternative to all of a sudden having dozens of them.

I am genuinely curious.


This may just be me, but I'm very particular about my software. I want it to look and flow a certain way. So I wrote Umami mainly for my needs first. Plus it was just a fun project.

I always start side projects so I can learn something new. In this case it was Prisma.io, Chart.js, Next.js authentication, JWT and Postgresql. All of which I didn't know about until this project.


Pretty impressive you learned all this and implemented in just 30 days.


Looks really neat! It might be really interesting if the live demo is the actual live stats of your umami.is :)


I will switch it over at some point. I've been running it on my own sites for a month so I just wanted to provide an example with more data to play with.


Just a little remark: A "Page not found" label flashes by when loading eg. https://app.umami.is/share/ISgW2qz8/flightphp.com


How does a post on HN that has 591 points and on the front page only have 1184 views and 567 visitors in the last 24 hours according to the live demo? Something is not right. Should be seeing lots more page views and users right?

EDIT: just noticed the demo is for another site flightphp.com not the landing page umami.is which is sort of weird. That explains it. The demo should really be demoing the metrics for umami.is. Which is a shame, because that would prove how scalable umami.is is. Unfortunately umami.is is not eating its own dog food.


I actually am using it to record metrics for umami.is:

https://app.umami.is/share/8rmHaheU/umami.is

I'm using it for all my websites. The reason I went with another site for the demo is because I wanted something with at least 30 days of data so users can play around with the different settings. Once I get enough data, I'll switch it over.


Just FYI, some mobile optimization is needed.

https://imgur.com/a/j9MYG9z


Recommend to create an issue on the Github https://github.com/mikecao/umami


Nice! Question for you, how did you make that nice 3D image on the front page with various screenshots overlaid over each other? :)


It's all done in Photoshop. Just take screenshots, then transform, rotate, distort them to look flat. Then add some drop shadows for a 3d look.


This would definitely be really interesting if it had the ability to create analytics in response to UI interactions. There are a lot of SPAs out there that use analytics for feature tracking, and the self-hosted aspect would fit well with those users.


Event tracking is already supported in the build. I just haven't completed the UI components yet. You simply add a custom CSS class on an element and it will automatically be tracked.


Oh OK, I didn't see that when I scanned the documentation. CSS is an interesting solution for that


I have a feeling this requires the database to be available to collect data. That's a bad pattern. No database can be up all the time and an application like this should not lose data.

Have a look at patterns that resolve this like Snowplow Analytics.


Matomo (former Piwik) is an alternative imho.


Unsure about Umami's performance, but Piwik was a non-starter for several higher traffic sites I worked with due to performance issues (even after throwing big hardware at it).


Out of the similar alternative, I think Plausible is currently leading in scaling and performance.

I think this is also the reason why they are the most affordable in pricing.

Note: Not affiliate with them, just observation from researching GA alternatives.


One big difference is that Matomo doesn’t support PostgreSQL and has said they don’t plan to.


Great project. I'm going to follow its development and consider using it in the near future.


Oh, thank you for flightphp, I recognized you from the url used in the demo :D


what are your thoughts on using ui-frameworks like material/ant etc. I checked the github and it looks like you have written all components including css by yourself.


For personal projects I tend to write all the CSS and components myself. I just like being able to control everything down to the pixel without reading some documentation. But that's just my workflow. I say just use whatever gets the job done. The only thing I used was Bootstrap grid for responsive layouts. Tailwind CSS is pretty popular.


Can confirm, Tailwind is legit.


That's very impressive, Mike. Very fast, looks beautiful


This is an awesome project. Is Postgres good enough for high traffic or would it be better to switch to Redis?


Tho very rare, why would you use a database that is at risk to be erased completely, and with a limited set of queries as the main DB? Honest question.


My concern was the performance of Postgres when it's receiving thousands of writes per second. I assumed that such a task would be more suited for Redis, then the data could be filtered and sent to Postgres for longer storage (or some storage solution such as S3).


I think bigger issues would be time related queries when dataset gets bigger as opposed to write speeds.

I think a time series database is best suited for this kind of project (timescale build on top of postgres, influxdb, ...)


One of the claims of Umami is that it's GDPR compliant:

> Umami does not collect any personally identifiable information so it is GDPR and CCPA compliant. No cookie notices are needed because Umami does not use cookies.

From auditing the source code, this doesn't seem to be the case. First, it claims it doesn't use cookies, but it clearly uses localStorage to store a "sessionKey"[0].

The other claim, that Umami is GDPR and CCPA compliant because it does not collect any personally identifiable information is only half true. While the data collected isn't PII (because you can't use it on it's own to identify a user), it's still "personal data". This is because the "sessionKey" stored alongside all events is actually a pseudonymous user identifier. It's really just a hash of the user's IP along with a few other properties[1]. Because the data Umami collects, when combined with some other data, can be attributed back to the user, the data is still considered "personal data". That means you're still subject to most of GDPR such as GDPR deletion requests[2].

[0] https://github.com/mikecao/umami/blob/f4ca353b5c68750bf391e5...

[1] https://github.com/mikecao/umami/blob/master/lib/session.js#...

[2] https://gdpr-info.eu/art-17-gdpr/


I am not a lawyer so I cannot say for sure what constitutes PII and what breaches GDPR. I am using the same techniques as Fathom Analytics, Plausible.io and other products. Everything is hashed into a unique session id and none of the actual data like user agent or IP address is actually stored. It is the same data that is found in server log files. In the strictest interpretation of GDPR, I don't think any analytics product can exist.

As for the localStorage, it's just for performance so I don't have to recompute the session hash. The product will work the same without it. But seeing as it is a cause contention I am probably going to remove it.


Both Fathom and plausible generate a unique salt every day. By getting rid of the old salts, they've anonymized any data older than a day. From [0]:

> We do not attempt to generate a device-persistent identifier because they are considered personal data under GDPR.

> Instead, we generate a daily changing identifier using the visitor’s IP address and User Agent. To anonymize these datapoints, we run them through a hash function with a rotating salt.

[0] https://plausible.io/data-policy


I will probably implement the daily salt and remove the localStorage code as well just to be safe.

But again, I'm not a lawyer here, where do you draw the line? Why not hourly salts? 5 minute salts? What is considered a reasonable effort? At some point you're storing data that can identify a user for the purpose of analytics. Still, I'm going try to lean to the safer side as best I can.


There are two paths to compliance with GDPR.

Option 1: Accept that you're collecting Personal Data, and satisfy the obligations GDPR places on that. This means disclosing the use of analytics in your privacy policy (what data's being collected & why), listing retention periods, and figuring out how to satisfy requests like Access or Deletion (which may include "we can't identify you in the data we previously collected).

Option 2 is to "comply" with GDPR by finding a loophole that it technically doesn't count.

The Option 2 approach is more common when dealing with American data privacy laws. It doesn't work out so well with GDPR. It's very difficult to not be processing personal data at some point. Even if you fully anonymize your data before doing any non-trivial processing, the anonymization itself is still covered by GDPR. Which means you need to include it your privacy policy and provide opt-out.

It's also high-risk. If a court decides that you didn't quite thread the needle through the loophole in their country and GDPR therefore applies in full, then you haven't done any of the compliance groundwork.

For GDPR compliance, I would be much more inclined to trust a tool that describes how to opt users out of tracking than one that claims they're immune from obligations to opt-out.

As another commenter mentions, the ePrivacy Directive is a whole different kettle of fish. Strong consent needed to read or write any data not strictly necessary to provide the services requested by the user. That law should get updated with more sanity soon... it's been that way for a few years now.


GDPR gives you 30 days to comply with deletion requests; that’s a good starting point to ensure you don’t keep PII past the regulated cutoff.


Doesn’t using the website id in the hash mean the key is no longer PII since it can’t follow you between websites? Or is being identifiable within a single site enough the threshold?


> I am not a lawyer so I cannot say for sure what constitutes PII and what breaches GDPR

If you don't feel fit to judge whether something breaches GDPR, then maybe you shouldn't say "so it is GDPR and CCPA compliant".


Fair point. I was simply following the "common practice" from other products making these claims, which is to not store personal user data and only generate an anonymous ids.

Maybe that's not fully compliant, I don't know, so I went ahead and removed any mention of GDPR from the website. It's not really my goal anyways. I'm just trying to release free software while they are charging money and making these claims.


Thank you for removing GDPR mentions, but mostly for building this in the first place!

It looks really nice.


The IDs that you generate aren't anonymous like Plausible.io. You simply need to address that issue and you should be mostly there for GDPR compliance.


Fair point also! Great job on the product, and congrats on shipping. I immediately spotted Inter :)


An IP address is considered personally identifiable information in at least Germany. If you're storing that you'll already have to think about the GDPR.

This is just another misguided attempt to adhere to the letter of the law while going against its spirit. Is is misguided because it's based on a wrong understand of what the letter of the law actually is. You see this a lot with adtech and analytics companies who try to skirt regulations through elaborate mechanisms but ultimately in vain.


>This is just another misguided attempt to adhere to the letter of the law while going against its spirit.

It's easy to say this and hard to draw a line between PII and what I can store without consent. "yesterday I sold 5 products on my website" is not PII (I hope). If I store the timestamps for each purchase I'm already in the grey area. One could combine the timestamps with other data to identify my customers.


So, effectively, you're saying you aren't allowed to have a server that logs requests?


It's considered PII in the United States as well. PII is a very easy standard to meet.


I've listened to a podcast interview with a lawyer specializing in EU privacy laws and he said that it does not matter if the personal data is hashed or encrypted. It's still personal data. This was about data stored in a database tough, but browser local storage is a database.

This was mentioned when the guest spoke about right to be forgotten. The law is really weird, because you need to delete user's data from your database, but it's OK to keep backups.

> It is the same data that is found in server log files. In the strictest interpretation of GDPR, I don't think any analytics product can exist.

It can exists as long as user agrees to be tracked. There is a category of "metrics" "cookies" user needs to agree on before you can track him for metrics. That's the whole point of the law. You need user's permission.


> it does not matter if the personal data is hashed or encrypted

That sounds odd. If there is no way to go back from the hash to the data it is no different from a random string of letters and numbers.


It’s different because it allows reidentification. It prevents you from coming up with an IP or what have you out of thin air, but you or another party you give it to can effectively use it as a perfect proxy of whatever you hashed.


Let’s take a hashed IP address.There are 4.3B ipv4 addresses. So a few minutes on an old laptop to generate a rainbow table. With decent hardware it would be seconds. The rainbow table could then be used to identify all the IPs you store. If they are salted, then each IP would need to be brute forced, but still only seconds on good hardware


That would still take collaborative data from another dataset outside this product. Compliance would be up to whoever hosted this, and the collaborative data set to comply with the request anyway.


Did you remember when an old data set from AOL was released where the user id had been pseudonymised by some hashing?

The users could be re-identified just by their behavior.


Without correlating data it really isn't "personal" though. You could delete the User account and related without touching this product and you've complied because this data could then never be correlated. Also, if nothing in the activities leaks the user's own identity, then again wouldn't really be personal.

IANAL


If you don't want to get dragged into a lawsuit when a user gets sued on a GDPR claim, you probably shouldn't make any statements about your product's GDPR compliance. Stick to the facts about how your product works, and leave the legal speculation to the lawyers.


"In the strictest interpretation of GDPR, I don't think any analytics product can exist." That's the point. Unless you aggregate the data.

Besides, it's not only GDPR you should consider, but also the latest cookie verdict by the CJEU. You need a consent if you drop cookies, session storage or any other tracking technology, no matter if you process personal data or not.


Maybe this might help you, it is roughly 2 hours long but it is as far I am concerned the best explanation of GDPR I have ever seen, done in mostly non legal speech. Actually it is fun to watch (part about borrowing a car is hillarious):

https://www.youtube.com/watch?v=-stjktAu-7k


It doesn't matter if the UA or IP is stored, even using them to fingerprint a user requires GDPR consent.


Consent is only one potential basis for processing under GDPR. There are others such as "legitimate interest" which the controller and/or processor may rely on.


Since this is about cookies and IP addresses, GDPR is not the most relevant EU law. Instead, we have to look at the old ePrivacy Directive.

For cookies or any other access to information stored on the user's device, that access must either be strictly necessary for performing the service explicitly requested by the user, or consent is required (ePD Art 5.3). This is where those annoying cookie banners come from. LocalStorage isn't any different and would require the same consent as cookies.

For traffic data such as IP addresses, processing is allowed if it's technically necessary for the “transmission”, if the data has been anonymized, if it's required for billing purposes, or if the user has consented (ePD Art 6). There is an argument that security logs might be necessary, other uses like analytics are more dubious. The good news is that Umami seems to properly anonymize the IP address, so this part seems fine.

In cases where ePD mandates using consent, we cannot fall back to another GDPR legal basis such as legitimate interest. Of course this discrepancy between ePD and GDPR is a huge problem, and the promised ePD update has yet to materialize.


That's true but not relevant for a random user visiting a website.


Users have the right to object to Legitimate Interest too. A vendor just declaring LI as a Legal Basis for processing isn't enough (legally).


Is there any legal precedent for whether analytics constitute a legitimate interest?


Would randomly generating the session key instead of hashing client IP and other properties satisfy GDPR’s requirement of no PII?

The definition in GDPR Art. 4 reads: [1]

> ‘personal data’ means any information relating to an identified or identifiable natural person (‘data subject’); an identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person;

[1]: https://gdpr-info.eu/art-4-gdpr/

My intuition is that a randomly generated session key could not be tied back to the identity of a natural person, as long as client IP, user agent, etc., are also excluded from the analytics data.


My understanding is that it counts as an “online identifier”. It’s not all that different from a user ID, except the user didn’t ask you to create it (which certainly doesn’t help under GDPR).


As long as you can connect the id to one single client / user, it is PII. It does not matter, where this id comes from, a random hash, an encrypted IP adress. If it's unique, it's PII.

If you only save it on the server, not on the client side, it's not PII. But then it's almost useless for analytics. Because next time the user comes around, you create another hash and therefore another user.


> If you only save it on the server, not on the client side, it's not PII. But then it's almost useless for analytics.

You can get a lot of useful information by only saving stuff on the server. e.g.: number of visits per day, user behavior on certain pages, etc etc...


> If it's unique, it's PII.

Ok, Ill just reuse each generated ID twice and I'm safe. The data gets only a little blurry.


yep. And then you just run this tool twice, with different initial nonces, and then combine the data...


What would session key mean? How does the session end? Where do you store the session key?


I think this problem is still unresolved.

If you do something like Plausible.io with daily changing salts, you know only about daily visitors. This might be GDPR compliant.

If you do something like Fathom with chaining requests, you can see daily uniques, bounce rates and click speed. Not sure this is GDPR compliant though. Would feel better if they run this through an European GDPR watchdog which AFAIK they haven't.

If you do something like SimpleAnalytics with using the referrer to find uniques, you can see daily unique visits but with some statistical errors. Should be GDPR an ePrivacy compliant without your customers needing to declare your usage or have a data processing agreement with you. But gets you the least analytical data (We use SimpleAnalytics).

None of these can do cohorts, the holy grail of VC analytics.

For cohorts I would think you could make something GDPR compliant with Bloom (Cuckoo) filters.


Lots of home-grown analytics are very privacy focussed these days and do not use cookies. That's a good thing.

For simple sites like blogs, simple low volume ecommerce, etc.

But for more "serious" eCommerce, SAAS based applications and sites that are concerned with marketing on email, social and web then then optimizing what you show then and finally generating leads for salespeople to call or actual sales...

Cookies or local storage, or some way of tracking the customer across all the channels and their actions are essential.

If one can avoid using Google Analytics, then that's a good thing also.

But let's get real -- the idea of a cookie-less future is not gonna happen because people actually do business in the web.


Exactly, other than very minimal metrics you can't do much of anything without cookies. It's great that there are now many alternative analytics services available, but I feel like they all just do the exact same thing – stick a two-line script on your website, then get some very minimal data about your website. This is probably good enough for most people, but it becomes very hard to actually do anything with this data if you're running a more "serious" project.

But I'm always amazed at how much popularity these projects seem to gather. I myself made a very simple landing page [1] for a similar service (but one that caters more to the saas based applications), and it's managed to gather some interest even though I've barely done any promotion to it.

[1]: https://tinylens.io


"But let's get real -- the idea of a green future is not gonna happen because people need fossil energy & to pollute to do business"

Sure business-wise & cost-wise it might be better, but should we accept it ?

Also none of this is "essential" at all. It is only needed in a world where the competition does it too because they think it will give them a competitive advantage.

If we could decide that those kind of tracking becomes illegal, then all those big companies will be totally fine. We'll still be buying them the products we need


I have been using goatcounter [0] and love the simplicity. I used to use Matomo, but they want a lot of money to see the referrals from google search/etc. And it's a heavier dependency. Goatcounter is a drop-in golang binary.

[0]: https://github.com/zgoat/goatcounter


I've seen a bunch of these simple self-hosted log dashboards here on HN, but I don't think they directly compare with google analytics, which is just a much more powerful and much much more complicated product. Not to say this isn't a great product, but it really isn't an alternative to GA.


I wonder how many users actually use those advance features. As someone who has only ever used GA to help provide insight into developmental priorities (i.e. not for marketing), this doesn't help too much. For example, this tells you the browser but it doesn't tell you the browser version. It tells you the device being used, but it doesn't tell you the resolution of that device. It tells you the country of your visitors, but it doesn't tell you the user's language. It tells you pages users visit, but it doesn't tell you the order in which they visit them.

This isn't a criticism of Umami. It looks like a nice clean app that accomplishes what it is trying to do. But if this is all you needed from Google Analytics than that tool was overkill in the first place.


Agreed, saying it's a one to one alternative to Google Analytics is probably a misnomer. I think a lot of people, myself included, used GA because there were no simpler alternatives and better overkill than nothing.


Alternative doesn't have to mean it offers exactly the same - for example a bike is an alternative travel option compared to a bus.


Do you know any good resources to learn the intricacies of Google Analytics and its related marketing concepts?


GA's paradigm is based on the Acquisition/Behavior/Conversion model championed by Avinash Kashik. His blog and Google's own courses are great starting points.

https://www.kaushik.net/avinash/

https://www.kaushik.net/avinash/digital-marketing-and-measur...

https://analytics.google.com/analytics/academy/


This looks really nice! If... you’re only looking for high level numbers for something like a personal blog or a simple landing page for a mobile app.

I wouldn’t call this a replacement to Google Analytics.

The reason to have something like Google Analytics is to track traffic at a more granular level, and with very specific intent.

Some of the things I _rely_ on include:

- custom parameters - segments - goals - A/B testing - specific views

And that’s just the short list.

Now, I use Analytics heavily because we spend a lot of effort on growth, both organic (content, seo) and paid (ads), so knowing what’s going on at that level is essential.

If you don’t, there’s not much reason to use something like GA.




So many choices... Anyone have any favorites?


Thanks for mentioning https://www.userTrack.net, I'm the author and still working full-time on improving it. Let me know if you have any questions/remarks about userTrack.


Hey XCSme. Your product is one of the best I have seen. Very deep insights, with a good interface.

Ps: You probably need a better name. Since your website says it is privacy respecting, 'UserTrack' doesn't exactly convey that. Just something with 'Track' not in the name.

And adblockers like uBlock Origin tend to block everything like track.domain.com.

And just a bug/overlap I noticed when hovering over the delete button - https://i.imgur.com/aBA7cpr.png


Thank you for the feedback!

I did consider changing the name, but that's a lot harder than it seems (have to rebrand, change domain, probably lose all SEO, etc.). So far I didn't encounter any issues with ad-blockers (for users userTrack is self-hosted, so you can host it on any domain, so name doesn't matter there). I also rank highly for terms like "user tracking" which I think is good, as people would stumble upon a self-hosted alternative instead of some 3rd party platform like Google Analytics. In the end, it does track stats and users on your website, but if I were to start again I would indeed choose a friendlier name.

I am aware of that visual bug, I do have a better solution in mind for it, unforunately I have to write hacky code to make it work (due to the limitations of the material-ui library used). I think that's a very minor issue though, and there are more important issues I want to fix before it, especially that it's not an easy fix.


Wow, that’s actually the first one (besides matomo which is rather enormous) that looks like a decent alternative to me with more than just bare-bones features. I’ll keep it in mind. And I really like the clear and to-the-point website.

No, XCSme did not pay me for this comment ;)


Thanks for the kind words!

To be honest, I did work a lot on it, 6-7 years as a side-project and one year full-time. I think feature-wise userTrack is pretty comparable to Matomo (including some of their premium features that cost 400eur+/year). I also recently recreated the entire front-end from spaghetti jQuery to TypeScript+React+MaterialUI and implemented an auto-updater system. This means that I can now implement new features, fix bugs and distribute the updates to users very fast.

I am really glad that you like the landing page! I probably changed it like 200 times in the last 2 months (last change was 2 minutes ago). I still want to improve it (eg. some hero video actually showcasing the product, so you don't have to spend time understanding the demo).

PS: I hope that the BTC transfer was successful and thanks again for the comment! (jk)


> am really glad that you like the landing page! I probably changed it like 200 times in the last 2 months (last change was 2 minutes ago).

Hilarious, I have changed our new home page literally hundreds of times over the last few days and having looked at yours, I see inspiration for yet another change.


https://github.com/PostHog/posthog looks great for product analytics. If you've used it, can you please share your experience?


I currently use the self-hosted version on Heroku and impressed with its functionality. It's quite similar to Heap Analytics. My favorite feature is auto-tracking. That said, there are some scaling limitations currently if you have a high traffic site. We have a couple hundred thousand users monthly so we are likely on the larger side of PostHog deployments. The team is cranking out features and improvements incredibly fast and I'd expect these to be resolved soon. Feel free to DM - happy to answer any more questions.


Thanks!


The problem with matomo (not their fault) is that Microsoft flags your site as distributing malware and you disappear from search engines. You have to fill out a bunch of forms to fix it. It’s listed in the matomo faq and is basically either from a bot falsely reporting you, or some other glitch. It’s why my blog is still invisible to bing users: if you visit in edge, you get huge menacing red warnings.


This is unfortunate. I've been using them for years and wasn't aware of this. I wonder how much it's affecting the sites I use it on...


There are a bunch of Github "awesome software" lists.

One thing I haven't seen is someone categorize open source web traffic analytics into Client Side Analytics (via javascript) and Web Server Log analytics.

Since each approach drastically changes the data collected and reported.


Are there any all-in-one-ish solutions that attempt to do both?

i.e. collects both (or one of either) server logs and client side analytics, normalize them, etc.


Matomo does provide an alternative to leverage web server log files (beyond the usual client side javascript)...using a python script: https://matomo.org/faq/log-analytics-tool/

When i first migrated (my personal sites) away from GA, i was concerned about performance, so was considering using server logs, and stumbled upon this feature of matomo. The javascript approach ended up not being the performance issue that i thought it would be...so i never ended up using the python script...So your mileage may vary, but to your question, this does exist.


Fathom started as open source, but the founders stopped supporting the open source project. It's basically abandoned at this point, with no new releases in almost two years and only updates to the README.


I’ve been using GoatCounter for a couple of months now, it’s great! Super simple interface with all the data you need.


Aye, same here. It's really good!


+1 for Plausible.

I've been using it for our name generator product Mashword (https://mashword.com) and it was really straightforward to implement. It's reasonably priced, has a clean interface and graphs, is privacy protecting and supports using your own domain for pulling in the js include.


Here is another to add to that list. They just hit 2.0 this week.

https://github.com/electerious/Ackee


> https://github.com/usefathom/fathom

Be careful of this one. It started out as OSS, but switched to proprietary once they'd achieved traction.


Did they have a reason for changing from OSS to free/pro?


money?



It’s not open source.


True, but it at least respects the privacy of your visitors by doing very minimal tracking. Basically, all you get is country, some device stats, and a time stamp. Last time I used it, it didn’t even track return visits and used no cookies iirc. It felt nice


Here's one we are working on: https://volument.com — it focuses on conversion optimization.


+1 for matomo. Very easy to set up a self-hosted instance, good documentation, and works well. NB: My site is pretty low-traffic.


Are we only looking at little ones that have their own user interface? If not, Snowplow is the prime mover of web and event tracking.

https://github.com/snowplow/snowplow


Of these, do any have a funnel tracking feature that shows what visitors went through a specific series of pages/events? Seeing how users moved about the site and seeing how many converted is a deal breaker for me.


https://volument.com might be a good pick since it focuses strictly on conversion optimization. It attempts to measure the more general conversion flow, known as the AIDA funnel (awareness, interest, desire, and action).


Pretty sure posthog does.


(PostHog founder) Yes we do! PostHog gives you full funnel capabilities + ability to see exactly what users dropped out where


Confirmed! (I'm one of the founders)


Matomo has this feature I believe (if I understand correctly from your description).


Not out of the box but snowplow does if you model the data on your own.


Any privacy oriented analytics tools not purely focused on website analytics?


https://getinsights.io has event tracking for webapps but is not open source


Thereˆs also https://ackee.electerious.com/, quite simple but a good option.


posthog is really unfortunately named


PostHog founder here. Only if you split the name ;-)


I don't get it either. Not a native speaker, though. So could someone explain it?


I believe they are referring to posting a picture of one's hog


How so?


A comparison of Umami and Matomo (formerly Piwik) would be helpful since they seem very similar. I looked at both websites and didn't see any mention of the other project.


Is there a similar product that does this server side (without injected javascript telemetry) with http logs?


Keep in mind if you're using a CDN (ie CloudFlare) your absolute numbers will be way off.


This. Just got bitten by this by using Vercel’s serverless functions edge caching. More details: https://twitter.com/rayshan/status/1295521974479798274?s=21



For Rails, there's Ahoy: https://github.com/ankane/ahoy


Saw this link in another comment, might be what you’re looking for https://goaccess.io/


You end up with a lot of noise from bots and crawlers (using bogus user agents) if you're just looking at server logs.


Any reasonable server side processing thing will exclude the obvious bots, which almost always have some kind of "Bot" wording in their user agent header.


There are a LOT of bots and crawlers with bogus browser user agents.

Some of the bad ones you can select see indirectly in logs because they pick UAs that almost no one uses any more. Go search your logs for IE8 or Firefox <= 70.0. Most just pick a random modern User Agent though and that's awfully hard to see in server logs.


Yeah, but so what? There are plenty of blacklists maintained for all sorts of things, and no metric is perfect. Some really rudimentary filtering or AI methods could get you pretty good data.


to be honest, if you are using nginx, just use / run https://goaccess.io/ It collects the same information as umami and is even more lightweight, since it just runs whenever you tell it to.

just add the command as a cron job, and you get an auto generated static dashboard. very neat.


Apache too (and first).


I'm very excited to see this space heating up. It seems for years we defaulted to using Google Analytics and no one wanted in the market. Now there are plenty alternatives, with many of them open source.


It needs more granularity of OS versions and browser versions. Knowing which iOS version your users have is important to decide on what base level version you need for an iOS app, for example.


When I've seen GA used or recommended to people, it's because their use case is tracking the marketing performance of their website.

Tackling the privacy focus for GA is great, but they're a good deal of products out there that already fill that niche, not to mention the requirements of the privacy crowd usually being a venture into itself.

If you wanted to make it relatively competitive for marketing, the simplest addition would be adding labelling via regex for referrers.

i.e. - Some users want to be able to group Baidu, Google, DuckDuckGo, into a single bucket for comparison. Some users want to break them down into common market segments by country. "https://www.baidu.com/link?url=FyYbCZqj65Vc7A4XeSNrOcQCS2qFX...

is from your live demo referrers, and makes it difficult to actually assess the amount of traffic from Baidu. Using a regex label means that users can break down traffic from Paid/Organic marketing fairly quickly, and start to build up dashboards they can use.

If you ever extended it to allow multiple labels for each hit, could re-run the regex over past data, and could build reports off it, you'd easily have a benefit over GA that would start to wean the marketing crowd off it.


For something this simple, I was hoping to see an option for SQlite, not just MySQL and PostgresSQL.


The app is using prisma.io which does support SQLite. I just haven't had the time to implement it yet.


Congrats on launching -- really impressive. One important issue that these self hosted analytics solve is ad blocking. Ad blocking by users really undermines the ability of a site or app to figure out what is working and not working. When you host your own analytics, you can get usability information for all of your users, not just those that don't block. That allows you to make a better product.

I have been working on something similar at https://argyle.cc -- we combine cloud analytics with a self-hosted analytics collector js. That gives you the best of both worlds: privacy focused, user respecting analytics, but full featured reporting in the cloud and ad-blocker resistance. It also allows event tracking to be done over js/web or in-line/server side.


I'd love to use this. But 34 dependencies?

I know ~10 of them are React, and there's some in there that make sense. But I haven't got the time to audit them all, and re-audit it every time any of those dependencies update .

And escape-string-regexp? Really? it's literally 2 lines of code [0]. Why have I got to give the maintainer of that project commit access to this program that will be seeing potentially sensitive data?

Why, if the developer couldn't come up with those 2 lines themselves, isn't this a Stack Overflow copy/paste?

[0]https://github.com/sindresorhus/escape-string-regexp/blob/ma...


Would you also criticize someone for using Apache Commons StringUtils? The fetishization of critiquing npm package choices is hilarious.


yes. And no, it's a major security problem that we're only just beginning to realise is a major security problem.


Is there a way for me as a user to opt out of this tool other than relying on third party tools like uBlock? I'm starting to get annoyed by so many "privacy focused" tools with literally no consent options at all.


I haven't implemented it yet, but I plan to make it read the user's do not track setting and automatically opt out.


While this is much appreciated, keep in mind this does not work for Safari users.


Wow, Piwik is now Matomo, how fast time flies!


If you want to respect user privacy while collecting analytics data, I recommend using Local Differential Privacy (via Randomized Responses) when collecting information from browsers.

https://en.wikipedia.org/wiki/Local_differential_privacy and https://en.wikipedia.org/wiki/Randomized_response


Am I wrong for thinking that Google Analytics has bad UI?

As a noob at UI it was bizarre and unintuitive for me.

Just finding the region locations of the traffic was odd and didn't make immediate sense.


It's pleasure to work with compared to one or two years ago


Ok have to check it out. Last I looked was years ago. And I was like...there's no excuse for this bad UI from such a great company.


The whole script is 502... https://stats.umami.is/umami.js


I'll throw a shoutout for a top tier project - https://github.com/zgoat/goatcounter

Using it for some personal stuff, and does absolutely everything I need it to, and then some.

I love the ethos of the project, and whilst it's open source, there's a hosted option that looks super reasonable too.


This looks great! For what it’s worth, I also maintain an open source (and self hosted) website analytics tool called Shynet [0] (someone else mentioned it in this thread, but thought I’d share here as well). Really great to see more options in this area!

[0] https://github.com/milesmcc/shynet


How useful are the metrics from non-GA sources trusted when validated traffic to various groups like investors, advertisers, etc?


Quite intriguing! I have no experience pitching to investors or advertisers, (but i do have web analytics exp.) and never would have thought that this would even be a question! Curious, is this something that you encountered, or is this hypothetical?


I had heard in the past that if your numbers were not GA, then they did not put much weight into them. Since you can grant access to other people directly into GA, they can validate the data. Using awstats or other metrics were deemed less trustworthy since they required someone gathering the data (which allows for potential manipulation). Before the days of 3rd party advertising, people tried to sell local ads just like a news paper. The website with more visitors could charge more for the ad banner space. Some "little" blog would have to prove they received the amount of traffic.


I didn't know that. Unfortunately I can see the rationale in that...which saddens me. <sigh>

I appreciate your teaching me something I didn't know. But now I feel worse for us "little blogs". (Not your fault of course.)


This would be amazing if, out of the box, it sent data to BiqQuery and/or Redshift. Postgres is fine, but for most companies this data is most useful in the warehouse. If this was a simple, drop-in solution to get well formatted data into BQ plus a bit of easy vis, that would be cool and VERY useful.


I've used https://count.ly/ instead of Google Analytics to gather exception data and business analytics from mobile and web apps. Relatively cheap for decent scale and they're very nice and helpful.


Slightly off-topic: Does anyone have recommendations for self-hosted open source analytics that can handle a large volume site (think 500.000.000 impressions per month)? I can't imagine systems with MySQL/PostgreSQL as database can handle this.


Countly can do that, and much, much more.


> I can't imagine systems with MySQL/PostgreSQL as database can handle this.

out of curiosity, why so?


500 000 000 requests per month is just about 200 request per second. Why there should be any problem for any DB?

As for question - I saw a lot of great reviews on ClickHouse DB


> 500 000 000 requests per month is just about 200 request per second

Not if you assume that some hours will have more web traffic than others.


Yes, there are some peaks with 10-20000 per sec.

Clickhouse seems very suitable as database. Does anybody know open source analytics tools that use it? Two parts would be needed: the client javascript tracker which injects into the database, and a GUI for reports.


Snowplow. 500 million events a month is nothing.


I wish to see a line about backend platform on installation documentation. Yes, it's simple, but IMO no one will find "Umami requires bla bla platform on bla bla operating system." sentence useless.


From a quickscan of the GitHub repo [1], this is a JavaScript client, like Google Analytics, that sends data to a self-hosted Node.js backend that stores the data in MySQL or PostgreSQL using the Prisma database toolkit:

[1] https://github.com/mikecao/umami


Ironically how big is web on advertisements, but when the time actually comes to tell what the product is and does, they forget words.


I always like seeing competitors to GA but the website could really use some more information on why you should use it and the features it gives you. It's hard to beat top competitors in a saturated space.


Are there any "Google Analytics" alternatives that aren't based on Python, Node, Go, etc but something with a PHP back-end that can be deployed to any commodity LAMP hosting provider?


Would awstats meet this criteria? Not PHP but even simpler. I have data from it going back to 2006 (maybe 2002 if I dig up backups), which is a lot of fun.

https://www.awstats.org/


I remember once at a previous job one of the devs forgot to setup google analytics which was the go to tool at the time. Client calls in wanting to get some stats for their site after 3 months, and we had nothing... thankfully awstats comes with cpanel without any additional setup, and we had something to show. Not great, but better than nothing.


You could check the ReverseEagle list: https://developers.reverseeagle.org/replace/google-analytics...

Also, anyone with a Tedomum account? It'd be nice if you could open an issue about adding umami.is. https://forge.tedomum.net/ReverseEagle/developers/-/issues


Thanks! PS. there is a typo here (should be 'statisfy'): https://developers.reverseeagle.org/replace/google-analytics...


Oh thanks, we'll edit that in


The advantage of Go is that you can compile it to a single (static) binary, and then it doesn't really matter what the rest of the backend is running. Unlike Python, Node, PHP, etc. you don't need to set up an environment.


Matomo uses php


And it's "Wordpress easy" to setup and doesn't require special access or server config?


Yes. On the first run it will take you to a wizard to set it up.


There's even a plugin for wordpress that embeds matomo itself if memory serves right.


Totally! Just set an instance up the other day. Done in five minutes, you'll only need database credentials and a somewhat "normal" PHP setup.


It's as easy as setting up GA in WordPress. Just inject the JavaScript snippet built by your Matomo instance when creating a new tracker.


I think the question was if the software itself is actually simple to set up with a commodity web host. As easy as upload via ftp and configure in webbrowser easy.


Oops, my bad. You are totally right.


I built one: https://usertrack.net

It has a plain PHP + MySQL backend, so it's really easy to install (on a LAMP server, as a WordPress plugin or one-click install on a DigitalOcean droplet).

When I started building it 8 years ago, the idea was exactly this, it should run on any basic shared hosting that can run PHP, so any site can just have its own analytics dashboard, without relying on 3rd parties.


For userTrack, the LAMP installation process has 3 main steps:

1. Upload the script files.

2. Create a MySQL database for the script to use.

3. Run the auto-installer (to set up DB connection and create the tables in DB).

https://docs.usertrack.net/installation/uploading-the-script


Awesome project! Google is definitely fading out for sure. I know many businesses and developers are tired of it. Looking forward to seeing what other inventions will give Google some competition.


I've been happily paying for GoatCounter for several months. I don't imagine I'll ever need to self host but it's nice knowing I can if necessary.


If you have an AWS account, you can use https://ownstats.cloud to self-host website analytics


noticed you didn't write any tests: https://github.com/mikecao/umami

What was your reasoning? Personally, I write tests for all my projects, it forces me to really think hard about how to break down the different components and functionalities and it helps others feel more confident to contribute.


Can anyone tell how this holds up against Matomo?


I liked the image on the home page, how can I create such an image to show case my product? only way is PS?


From the screenshots, the design looks very slick and I can’t wait to give it a try!


+1 to https://goatcounter.com/ o use it for my personal blog https://dannysalzman.com . This is a good reminder to donate.


This is super cool!

FlightPHP looks nice, too, what didn't you use that for the backend?


Any suggestions for collecting server side logs via nginx pods in k8s?


Is there any way to create and track custom events like clicks?


Yes, event tracking is already in the current build. I just haven't finished the UI components or documentation yet. But basically all you have to do is add a CSS class to an element and it will automatically start tracking. Like this:

  <div class="button umami--onclick--signup-button">Signup</div>


Would be great to see official custom event tracking support in the future.


Demo is throwing back "502 Bad Gateway"

Hacker News hug of death?


Good job. Getting 502 error!


Seemingly no api? :(


Demo gives a 502


Should be back up now.


Does this use cookies and similar to warrant a GDPR prompt?


No, it does not use cookies so no cookie prompt is needed.


I just checked your tracking code. It looks like you're using the locale storage to set a session id to track uniqueness. According to this [0] Stackexchange answer you will still have to display a cookie banner.

[0] https://softwareengineering.stackexchange.com/questions/2905...


The local storage is mainly for performance. It's to prevent a round-trip to the database to figure out the session again. The session id will be the same regardless and it can function without local storage. But I do see your point. I may consider removing it just to be safe.


Clicked on love demo and get 505 bad gateway. I hope they analyzed it


Demo website 502's for me :/


Same, maybe we already hugged it to death?


Should be back up now, give it another try.


Same here. :-/




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: