Author of Umami here. I totally did not expect this response so it looks like you all hugged my little server to death. The demo should be back up now.
A little background. This is a side project I started 30 days ago because I was tired of how slow and complicated Google Analytics was. I just wanted something really simple and fast that I could browse quickly without diving through layers of menus. So I created Umami to track my own websites and then open sourced it. The stack is React, Redux, and Next.js with a Postgresql backend.
This is a really cool project. I’m happy to see that you are using Prisma for data access. If you are interested we can set up a shared slack channel so you can provide feedback and we can make sure we support everything you need for this project :-)
It uses prisma.io for the database connections and SQLite is supported. I just haven't had the time to implement it yet and make sure all the custom queries are working. I would welcome a PR.
Are there any reason why you dont use currently available Open Sources solutions and decided to create your own? ( Other than it is fun to do it yourself :D )
I am wondering why in the past 2 years we went form having little to zero GA alternative to all of a sudden having dozens of them.
This may just be me, but I'm very particular about my software. I want it to look and flow a certain way. So I wrote Umami mainly for my needs first. Plus it was just a fun project.
I always start side projects so I can learn something new. In this case it was Prisma.io, Chart.js, Next.js authentication, JWT and Postgresql. All of which I didn't know about until this project.
I will switch it over at some point. I've been running it on my own sites for a month so I just wanted to provide an example with more data to play with.
How does a post on HN that has 591 points and on the front page only have 1184 views and 567 visitors in the last 24 hours according to the live demo? Something is not right. Should be seeing lots more page views and users right?
EDIT: just noticed the demo is for another site flightphp.com not the landing page umami.is which is sort of weird. That explains it. The demo should really be demoing the metrics for umami.is. Which is a shame, because that would prove how scalable umami.is is. Unfortunately umami.is is not eating its own dog food.
I'm using it for all my websites. The reason I went with another site for the demo is because I wanted something with at least 30 days of data so users can play around with the different settings. Once I get enough data, I'll switch it over.
This would definitely be really interesting if it had the ability to create analytics in response to UI interactions. There are a lot of SPAs out there that use analytics for feature tracking, and the self-hosted aspect would fit well with those users.
Event tracking is already supported in the build. I just haven't completed the UI components yet. You simply add a custom CSS class on an element and it will automatically be tracked.
I have a feeling this requires the database to be available to collect data. That's a bad pattern. No database can be up all the time and an application like this should not lose data.
Have a look at patterns that resolve this like Snowplow Analytics.
Unsure about Umami's performance, but Piwik was a non-starter for several higher traffic sites I worked with due to performance issues (even after throwing big hardware at it).
what are your thoughts on using ui-frameworks like material/ant etc. I checked the github and it looks like you have written all components including css by yourself.
For personal projects I tend to write all the CSS and components myself. I just like being able to control everything down to the pixel without reading some documentation. But that's just my workflow. I say just use whatever gets the job done. The only thing I used was Bootstrap grid for responsive layouts. Tailwind CSS is pretty popular.
Tho very rare, why would you use a database that is at risk to be erased completely, and with a limited set of queries as the main DB? Honest question.
My concern was the performance of Postgres when it's receiving thousands of writes per second. I assumed that such a task would be more suited for Redis, then the data could be filtered and sent to Postgres for longer storage (or some storage solution such as S3).
One of the claims of Umami is that it's GDPR compliant:
> Umami does not collect any personally identifiable information so it is GDPR and CCPA compliant. No cookie notices are needed because Umami does not use cookies.
From auditing the source code, this doesn't seem to be the case. First, it claims it doesn't use cookies, but it clearly uses localStorage to store a "sessionKey"[0].
The other claim, that Umami is GDPR and CCPA compliant because it does not collect any personally identifiable information is only half true. While the data collected isn't PII (because you can't use it on it's own to identify a user), it's still "personal data". This is because the "sessionKey" stored alongside all events is actually a pseudonymous user identifier. It's really just a hash of the user's IP along with a few other properties[1]. Because the data Umami collects, when combined with some other data, can be attributed back to the user, the data is still considered "personal data". That means you're still subject to most of GDPR such as GDPR deletion requests[2].
I am not a lawyer so I cannot say for sure what constitutes PII and what breaches GDPR. I am using the same techniques as Fathom Analytics, Plausible.io and other products. Everything is hashed into a unique session id and none of the actual data like user agent or IP address is actually stored. It is the same data that is found in server log files. In the strictest interpretation of GDPR, I don't think any analytics product can exist.
As for the localStorage, it's just for performance so I don't have to recompute the session hash. The product will work the same without it. But seeing as it is a cause contention I am probably going to remove it.
Both Fathom and plausible generate a unique salt every day. By getting rid of the old salts, they've anonymized any data older than a day. From [0]:
> We do not attempt to generate a device-persistent identifier because they are considered personal data under GDPR.
> Instead, we generate a daily changing identifier using the visitor’s IP address and User Agent. To anonymize these datapoints, we run them through a hash function with a rotating salt.
I will probably implement the daily salt and remove the localStorage code as well just to be safe.
But again, I'm not a lawyer here, where do you draw the line? Why not hourly salts? 5 minute salts? What is considered a reasonable effort? At some point you're storing data that can identify a user for the purpose of analytics. Still, I'm going try to lean to the safer side as best I can.
Option 1: Accept that you're collecting Personal Data, and satisfy the obligations GDPR places on that. This means disclosing the use of analytics in your privacy policy (what data's being collected & why), listing retention periods, and figuring out how to satisfy requests like Access or Deletion (which may include "we can't identify you in the data we previously collected).
Option 2 is to "comply" with GDPR by finding a loophole that it technically doesn't count.
The Option 2 approach is more common when dealing with American data privacy laws. It doesn't work out so well with GDPR. It's very difficult to not be processing personal data at some point. Even if you fully anonymize your data before doing any non-trivial processing, the anonymization itself is still covered by GDPR. Which means you need to include it your privacy policy and provide opt-out.
It's also high-risk. If a court decides that you didn't quite thread the needle through the loophole in their country and GDPR therefore applies in full, then you haven't done any of the compliance groundwork.
For GDPR compliance, I would be much more inclined to trust a tool that describes how to opt users out of tracking than one that claims they're immune from obligations to opt-out.
As another commenter mentions, the ePrivacy Directive is a whole different kettle of fish. Strong consent needed to read or write any data not strictly necessary to provide the services requested by the user. That law should get updated with more sanity soon... it's been that way for a few years now.
Doesn’t using the website id in the hash mean the key is no longer PII since it can’t follow you between websites? Or is being identifiable within a single site enough the threshold?
Fair point. I was simply following the "common practice" from other products making these claims, which is to not store personal user data and only generate an anonymous ids.
Maybe that's not fully compliant, I don't know, so I went ahead and removed any mention of GDPR from the website. It's not really my goal anyways. I'm just trying to release free software while they are charging money and making these claims.
The IDs that you generate aren't anonymous like Plausible.io. You simply need to address that issue and you should be mostly there for GDPR compliance.
An IP address is considered personally identifiable information in at least Germany. If you're storing that you'll already have to think about the GDPR.
This is just another misguided attempt to adhere to the letter of the law while going against its spirit. Is is misguided because it's based on a wrong understand of what the letter of the law actually is. You see this a lot with adtech and analytics companies who try to skirt regulations through elaborate mechanisms but ultimately in vain.
>This is just another misguided attempt to adhere to the letter of the law while going against its spirit.
It's easy to say this and hard to draw a line between PII and what I can store without consent. "yesterday I sold 5 products on my website" is not PII (I hope). If I store the timestamps for each purchase I'm already in the grey area. One could combine the timestamps with other data to identify my customers.
I've listened to a podcast interview with a lawyer specializing in EU privacy laws and he said that it does not matter if the personal data is hashed or encrypted. It's still personal data. This was about data stored in a database tough, but browser local storage is a database.
This was mentioned when the guest spoke about right to be forgotten. The law is really weird, because you need to delete user's data from your database, but it's OK to keep backups.
> It is the same data that is found in server log files. In the strictest interpretation of GDPR, I don't think any analytics product can exist.
It can exists as long as user agrees to be tracked. There is a category of "metrics" "cookies" user needs to agree on before you can track him for metrics. That's the whole point of the law. You need user's permission.
It’s different because it allows reidentification. It prevents you from coming up with an IP or what have you out of thin air, but you or another party you give it to can effectively use it as a perfect proxy of whatever you hashed.
Let’s take a hashed IP address.There are 4.3B ipv4 addresses. So a few minutes on an old laptop to generate a rainbow table. With decent hardware it would be seconds. The rainbow table could then be used to identify all the IPs you store. If they are salted, then each IP would need to be brute forced, but still only seconds on good hardware
That would still take collaborative data from another dataset outside this product. Compliance would be up to whoever hosted this, and the collaborative data set to comply with the request anyway.
Without correlating data it really isn't "personal" though. You could delete the User account and related without touching this product and you've complied because this data could then never be correlated. Also, if nothing in the activities leaks the user's own identity, then again wouldn't really be personal.
If you don't want to get dragged into a lawsuit when a user gets sued on a GDPR claim, you probably shouldn't make any statements about your product's GDPR compliance. Stick to the facts about how your product works, and leave the legal speculation to the lawyers.
"In the strictest interpretation of GDPR, I don't think any analytics product can exist."
That's the point. Unless you aggregate the data.
Besides, it's not only GDPR you should consider, but also the latest cookie verdict by the CJEU. You need a consent if you drop cookies, session storage or any other tracking technology, no matter if you process personal data or not.
Maybe this might help you, it is roughly 2 hours long but it is as far I am concerned the best explanation of GDPR I have ever seen, done in mostly non legal speech. Actually it is fun to watch (part about borrowing a car is hillarious):
Consent is only one potential basis for processing under GDPR. There are others such as "legitimate interest" which the controller and/or processor may rely on.
Since this is about cookies and IP addresses, GDPR is not the most relevant EU law. Instead, we have to look at the old ePrivacy Directive.
For cookies or any other access to information stored on the user's device, that access must either be strictly necessary for performing the service explicitly requested by the user, or consent is required (ePD Art 5.3). This is where those annoying cookie banners come from. LocalStorage isn't any different and would require the same consent as cookies.
For traffic data such as IP addresses, processing is allowed if it's technically necessary for the “transmission”, if the data has been anonymized, if it's required for billing purposes, or if the user has consented (ePD Art 6). There is an argument that security logs might be necessary, other uses like analytics are more dubious. The good news is that Umami seems to properly anonymize the IP address, so this part seems fine.
In cases where ePD mandates using consent, we cannot fall back to another GDPR legal basis such as legitimate interest. Of course this discrepancy between ePD and GDPR is a huge problem, and the promised ePD update has yet to materialize.
Would randomly generating the session key instead of hashing client IP and other properties satisfy GDPR’s requirement of no PII?
The definition in GDPR Art. 4 reads: [1]
> ‘personal data’ means any information relating to an identified or identifiable natural person (‘data subject’); an identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person;
My intuition is that a randomly generated session key could not be tied back to the identity of a natural person, as long as client IP, user agent, etc., are also excluded from the analytics data.
My understanding is that it counts as an “online identifier”. It’s not all that different from a user ID, except the user didn’t ask you to create it (which certainly doesn’t help under GDPR).
As long as you can connect the id to one single client / user, it is PII. It does not matter, where this id comes from, a random hash, an encrypted IP adress. If it's unique, it's PII.
If you only save it on the server, not on the client side, it's not PII. But then it's almost useless for analytics. Because next time the user comes around, you create another hash and therefore another user.
If you do something like Plausible.io with daily changing salts, you know only about daily visitors. This might be GDPR compliant.
If you do something like Fathom with chaining requests, you can see daily uniques, bounce rates and click speed. Not sure this is GDPR compliant though. Would feel better if they run this through an European GDPR watchdog which AFAIK they haven't.
If you do something like SimpleAnalytics with using the referrer to find uniques, you can see daily unique visits but with some statistical errors. Should be GDPR an ePrivacy compliant without your customers needing to declare your usage or have a data processing agreement with you. But gets you the least analytical data (We use SimpleAnalytics).
None of these can do cohorts, the holy grail of VC analytics.
For cohorts I would think you could make something GDPR compliant with Bloom (Cuckoo) filters.
Lots of home-grown analytics are very privacy focussed these days and do not use cookies. That's a good thing.
For simple sites like blogs, simple low volume ecommerce, etc.
But for more "serious" eCommerce, SAAS based applications and sites that are concerned with marketing on email, social and web then then optimizing what you show then and finally generating leads for salespeople to call or actual sales...
Cookies or local storage, or some way of tracking the customer across all the channels and their actions are essential.
If one can avoid using Google Analytics, then that's a good thing also.
But let's get real -- the idea of a cookie-less future is not gonna happen because people actually do business in the web.
Exactly, other than very minimal metrics you can't do much of anything without cookies. It's great that there are now many alternative analytics services available, but I feel like they all just do the exact same thing – stick a two-line script on your website, then get some very minimal data about your website. This is probably good enough for most people, but it becomes very hard to actually do anything with this data if you're running a more "serious" project.
But I'm always amazed at how much popularity these projects seem to gather. I myself made a very simple landing page [1] for a similar service (but one that caters more to the saas based applications), and it's managed to gather some interest even though I've barely done any promotion to it.
"But let's get real -- the idea of a green future is not gonna happen because people need fossil energy & to pollute to do business"
Sure business-wise & cost-wise it might be better, but should we accept it ?
Also none of this is "essential" at all. It is only needed in a world where the competition does it too because they think it will give them a competitive advantage.
If we could decide that those kind of tracking becomes illegal, then all those big companies will be totally fine. We'll still be buying them the products we need
I have been using goatcounter [0] and love the simplicity. I used to use Matomo, but they want a lot of money to see the referrals from google search/etc. And it's a heavier dependency. Goatcounter is a drop-in golang binary.
I've seen a bunch of these simple self-hosted log dashboards here on HN, but I don't think they directly compare with google analytics, which is just a much more powerful and much much more complicated product. Not to say this isn't a great product, but it really isn't an alternative to GA.
I wonder how many users actually use those advance features. As someone who has only ever used GA to help provide insight into developmental priorities (i.e. not for marketing), this doesn't help too much. For example, this tells you the browser but it doesn't tell you the browser version. It tells you the device being used, but it doesn't tell you the resolution of that device. It tells you the country of your visitors, but it doesn't tell you the user's language. It tells you pages users visit, but it doesn't tell you the order in which they visit them.
This isn't a criticism of Umami. It looks like a nice clean app that accomplishes what it is trying to do. But if this is all you needed from Google Analytics than that tool was overkill in the first place.
Agreed, saying it's a one to one alternative to Google Analytics is probably a misnomer. I think a lot of people, myself included, used GA because there were no simpler alternatives and better overkill than nothing.
GA's paradigm is based on the Acquisition/Behavior/Conversion model championed by Avinash Kashik. His blog and Google's own courses are great starting points.
Now, I use Analytics heavily because we spend a lot of effort on growth, both organic (content, seo) and paid (ads), so knowing what’s going on at that level is essential.
If you don’t, there’s not much reason to use something like GA.
Thanks for mentioning https://www.userTrack.net, I'm the author and still working full-time on improving it. Let me know if you have any questions/remarks about userTrack.
Hey XCSme. Your product is one of the best I have seen. Very deep insights, with a good interface.
Ps: You probably need a better name. Since your website says it is privacy respecting, 'UserTrack' doesn't exactly convey that. Just something with 'Track' not in the name.
And adblockers like uBlock Origin tend to block everything like track.domain.com.
I did consider changing the name, but that's a lot harder than it seems (have to rebrand, change domain, probably lose all SEO, etc.). So far I didn't encounter any issues with ad-blockers (for users userTrack is self-hosted, so you can host it on any domain, so name doesn't matter there). I also rank highly for terms like "user tracking" which I think is good, as people would stumble upon a self-hosted alternative instead of some 3rd party platform like Google Analytics. In the end, it does track stats and users on your website, but if I were to start again I would indeed choose a friendlier name.
I am aware of that visual bug, I do have a better solution in mind for it, unforunately I have to write hacky code to make it work (due to the limitations of the material-ui library used). I think that's a very minor issue though, and there are more important issues I want to fix before it, especially that it's not an easy fix.
Wow, that’s actually the first one (besides matomo which is rather enormous) that looks like a decent alternative to me with more than just bare-bones features. I’ll keep it in mind. And I really like the clear and to-the-point website.
To be honest, I did work a lot on it, 6-7 years as a side-project and one year full-time. I think feature-wise userTrack is pretty comparable to Matomo (including some of their premium features that cost 400eur+/year). I also recently recreated the entire front-end from spaghetti jQuery to TypeScript+React+MaterialUI and implemented an auto-updater system. This means that I can now implement new features, fix bugs and distribute the updates to users very fast.
I am really glad that you like the landing page! I probably changed it like 200 times in the last 2 months (last change was 2 minutes ago). I still want to improve it (eg. some hero video actually showcasing the product, so you don't have to spend time understanding the demo).
PS: I hope that the BTC transfer was successful and thanks again for the comment! (jk)
> am really glad that you like the landing page! I probably changed it like 200 times in the last 2 months (last change was 2 minutes ago).
Hilarious, I have changed our new home page literally hundreds of times over the last few days and having looked at yours, I see inspiration for yet another change.
I currently use the self-hosted version on Heroku and impressed with its functionality. It's quite similar to Heap Analytics. My favorite feature is auto-tracking. That said, there are some scaling limitations currently if you have a high traffic site. We have a couple hundred thousand users monthly so we are likely on the larger side of PostHog deployments. The team is cranking out features and improvements incredibly fast and I'd expect these to be resolved soon. Feel free to DM - happy to answer any more questions.
The problem with matomo (not their fault) is that Microsoft flags your site as distributing malware and you disappear from search engines. You have to fill out a bunch of forms to fix it. It’s listed in the matomo faq and is basically either from a bot falsely reporting you, or some other glitch. It’s why my blog is still invisible to bing users: if you visit in edge, you get huge menacing red warnings.
There are a bunch of Github "awesome software" lists.
One thing I haven't seen is someone categorize open source web traffic analytics into Client Side Analytics (via javascript) and Web Server Log analytics.
Since each approach drastically changes the data collected and reported.
Matomo does provide an alternative to leverage web server log files (beyond the usual client side javascript)...using a python script: https://matomo.org/faq/log-analytics-tool/
When i first migrated (my personal sites) away from GA, i was concerned about performance, so was considering using server logs, and stumbled upon this feature of matomo. The javascript approach ended up not being the performance issue that i thought it would be...so i never ended up using the python script...So your mileage may vary, but to your question, this does exist.
Fathom started as open source, but the founders stopped supporting the open source project. It's basically abandoned at this point, with no new releases in almost two years and only updates to the README.
I've been using it for our name generator product Mashword (https://mashword.com) and it was really straightforward to implement. It's reasonably priced, has a clean interface and graphs, is privacy protecting and supports using your own domain for pulling in the js include.
True, but it at least respects the privacy of your visitors by doing very minimal tracking. Basically, all you get is country, some device stats, and a time stamp. Last time I used it, it didn’t even track return visits and used no cookies iirc. It felt nice
Of these, do any have a funnel tracking feature that shows what visitors went through a specific series of pages/events? Seeing how users moved about the site and seeing how many converted is a deal breaker for me.
https://volument.com might be a good pick since it focuses strictly on conversion optimization. It attempts to measure the more general conversion flow, known as the AIDA funnel (awareness, interest, desire, and action).
A comparison of Umami and Matomo (formerly Piwik) would be helpful since they seem very similar. I looked at both websites and didn't see any mention of the other project.
Any reasonable server side processing thing will exclude the obvious bots, which almost always have some kind of "Bot" wording in their user agent header.
There are a LOT of bots and crawlers with bogus browser user agents.
Some of the bad ones you can select see indirectly in logs because they pick UAs that almost no one uses any more. Go search your logs for IE8 or Firefox <= 70.0. Most just pick a random modern User Agent though and that's awfully hard to see in server logs.
Yeah, but so what? There are plenty of blacklists maintained for all sorts of things, and no metric is perfect. Some really rudimentary filtering or AI methods could get you pretty good data.
to be honest, if you are using nginx, just use / run https://goaccess.io/ It collects the same information as umami and is even more lightweight, since it just runs whenever you tell it to.
just add the command as a cron job, and you get an auto generated static dashboard. very neat.
I'm very excited to see this space heating up. It seems for years we defaulted to using Google Analytics and no one wanted in the market. Now there are plenty alternatives, with many of them open source.
It needs more granularity of OS versions and browser versions. Knowing which iOS version your users have is important to decide on what base level version you need for an iOS app, for example.
When I've seen GA used or recommended to people, it's because their use case is tracking the marketing performance of their website.
Tackling the privacy focus for GA is great, but they're a good deal of products out there that already fill that niche, not to mention the requirements of the privacy crowd usually being a venture into itself.
If you wanted to make it relatively competitive for marketing, the simplest addition would be adding labelling via regex for referrers.
is from your live demo referrers, and makes it difficult to actually assess the amount of traffic from Baidu. Using a regex label means that users can break down traffic from Paid/Organic marketing fairly quickly, and start to build up dashboards they can use.
If you ever extended it to allow multiple labels for each hit, could re-run the regex over past data, and could build reports off it, you'd easily have a benefit over GA that would start to wean the marketing crowd off it.
Congrats on launching -- really impressive. One important issue that these self hosted analytics solve is ad blocking. Ad blocking by users really undermines the ability of a site or app to figure out what is working and not working. When you host your own analytics, you can get usability information for all of your users, not just those that don't block. That allows you to make a better product.
I have been working on something similar at https://argyle.cc -- we combine cloud analytics with a self-hosted analytics collector js. That gives you the best of both worlds: privacy focused, user respecting analytics, but full featured reporting in the cloud and ad-blocker resistance. It also allows event tracking to be done over js/web or in-line/server side.
I know ~10 of them are React, and there's some in there that make sense. But I haven't got the time to audit them all, and re-audit it every time any of those dependencies update .
And escape-string-regexp? Really? it's literally 2 lines of code [0]. Why have I got to give the maintainer of that project commit access to this program that will be seeing potentially sensitive data?
Why, if the developer couldn't come up with those 2 lines themselves, isn't this a Stack Overflow copy/paste?
Is there a way for me as a user to opt out of this tool other than relying on third party tools like uBlock? I'm starting to get annoyed by so many "privacy focused" tools with literally no consent options at all.
If you want to respect user privacy while collecting analytics data, I recommend using
Local Differential Privacy (via Randomized Responses) when collecting information from browsers.
This looks great! For what it’s worth, I also maintain an open source (and self hosted) website analytics tool called Shynet [0] (someone else mentioned it in this thread, but thought I’d share here as well). Really great to see more options in this area!
Quite intriguing! I have no experience pitching to investors or advertisers, (but i do have web analytics exp.) and never would have thought that this would even be a question! Curious, is this something that you encountered, or is this hypothetical?
I had heard in the past that if your numbers were not GA, then they did not put much weight into them. Since you can grant access to other people directly into GA, they can validate the data. Using awstats or other metrics were deemed less trustworthy since they required someone gathering the data (which allows for potential manipulation). Before the days of 3rd party advertising, people tried to sell local ads just like a news paper. The website with more visitors could charge more for the ad banner space. Some "little" blog would have to prove they received the amount of traffic.
This would be amazing if, out of the box, it sent data to BiqQuery and/or Redshift. Postgres is fine, but for most companies this data is most useful in the warehouse. If this was a simple, drop-in solution to get well formatted data into BQ plus a bit of easy vis, that would be cool and VERY useful.
I've used https://count.ly/ instead of Google Analytics to gather exception data and business analytics from mobile and web apps. Relatively cheap for decent scale and they're very nice and helpful.
Slightly off-topic: Does anyone have recommendations for self-hosted open source analytics that can handle a large volume site (think 500.000.000 impressions per month)? I can't imagine systems with MySQL/PostgreSQL as database can handle this.
Clickhouse seems very suitable as database. Does anybody know open source analytics tools that use it? Two parts would be needed: the client javascript tracker which injects into the database, and a GUI for reports.
I wish to see a line about backend platform on installation documentation. Yes, it's simple, but IMO no one will find "Umami requires bla bla platform on bla bla operating system." sentence useless.
From a quickscan of the GitHub repo [1], this is a JavaScript client, like Google Analytics, that sends data to a self-hosted Node.js backend that stores the data in MySQL or PostgreSQL using the Prisma database toolkit:
I always like seeing competitors to GA but the website could really use some more information on why you should use it and the features it gives you. It's hard to beat top competitors in a saturated space.
Are there any "Google Analytics" alternatives that aren't based on Python, Node, Go, etc but something with a PHP back-end that can be deployed to any commodity LAMP hosting provider?
Would awstats meet this criteria? Not PHP but even simpler. I have data from it going back to 2006 (maybe 2002 if I dig up backups), which is a lot of fun.
I remember once at a previous job one of the devs forgot to setup google analytics which was the go to tool at the time. Client calls in wanting to get some stats for their site after 3 months, and we had nothing... thankfully awstats comes with cpanel without any additional setup, and we had something to show. Not great, but better than nothing.
The advantage of Go is that you can compile it to a single (static) binary, and then it doesn't really matter what the rest of the backend is running. Unlike Python, Node, PHP, etc. you don't need to set up an environment.
I think the question was if the software itself is actually simple to set up with a commodity web host. As easy as upload via ftp and configure in webbrowser easy.
It has a plain PHP + MySQL backend, so it's really easy to install (on a LAMP server, as a WordPress plugin or one-click install on a DigitalOcean droplet).
When I started building it 8 years ago, the idea was exactly this, it should run on any basic shared hosting that can run PHP, so any site can just have its own analytics dashboard, without relying on 3rd parties.
Awesome project! Google is definitely fading out for sure. I know many businesses and developers are tired of it. Looking forward to seeing what other inventions will give Google some competition.
What was your reasoning? Personally, I write tests for all my projects, it forces me to really think hard about how to break down the different components and functionalities and it helps others feel more confident to contribute.
Yes, event tracking is already in the current build. I just haven't finished the UI components or documentation yet. But basically all you have to do is add a CSS class to an element and it will automatically start tracking. Like this:
I just checked your tracking code. It looks like you're using the locale storage to set a session id to track uniqueness. According to this [0] Stackexchange answer you will still have to display a cookie banner.
The local storage is mainly for performance. It's to prevent a round-trip to the database to figure out the session again. The session id will be the same regardless and it can function without local storage. But I do see your point. I may consider removing it just to be safe.
Author of Umami here. I totally did not expect this response so it looks like you all hugged my little server to death. The demo should be back up now.
A little background. This is a side project I started 30 days ago because I was tired of how slow and complicated Google Analytics was. I just wanted something really simple and fast that I could browse quickly without diving through layers of menus. So I created Umami to track my own websites and then open sourced it. The stack is React, Redux, and Next.js with a Postgresql backend.
Would be happy to answer any questions you have.