The author also gives a rationale for choosing the EUPL (EUROPEAN UNION PUBLIC LICENCE) here: https://www.arp242.net/license.html.
ksec posted this link: https://github.com/zgoat/goatcounter, but, at the time of writing, my comment is higher...
It provides rationale for why GoatCounter exists and comments about _why not_ other solutions like Fathom, Open Web Analytics, KISSS, Ackee, Countly, Analysing log files, Google Analytics, statcounter, Simple Analytics, getinsights.io, statcounter.com. plausible.io/.
> Have a “strong” copyleft, including the so-called “network protection”, which mandates that people submit changes even if they operate the code as a service (rather than sending people binaries).
However the EUPL allows you redistribute under other "compatible" licenses most of which don't provide that "network protection". Effectively, the EUPL is only as strong as the weakest "compatible" license listed in the appendix.
 These "compatible" licenses wouldn't otherwise be compatible, except that the EUPL explicitly allows re-licensing to them instead.
ksec's link is: https://github.com/zgoat/goatcounter/blob/master/docs/ration...
BTW I love your website's design!
Thank you for making this
Also, a11y support isn't just for "blind" or "disabled" users; it tends to make the page better for all users. This applies to everything really; for example while being able to tell coins apart by touch is critical for you, it's also pretty convenient for me at times, so this kind of coin design is better for everyone.
Yes! Although I'm not "officially" disabled, I'm "blind" to my screen when I'm driving (you'll be happy to know), I'm half-blind to a message that pops on my screen when I'm drying off from a shower (no glasses in the shower is my motto), I'm "mute" when surrounded by strangers on a train, my fingers can't operate a mouse or keyboard when I'm doing dishes, etc., etc.
We're ALL disabled, and our circumstances change over very-short to very-long term as well. Having things designed with flexible interface options was one of the original goals of the web. Some of us remember before CSS, the publisher was supposed to specify semantics, and the user was supposed to specify presentation. I don't think we should go to that extreme, but I'd like to see our browsers, tools, and frameworks designed to make multi-UI flexibility easier and more common.
One tool I'd suggest looking into when getting started is Accessibility Insights for Web. A team at Microsoft developed a free, OSS browser extension for automatically detecting most common accessibility issues on your site: https://accessibilityinsights.io/docs/web/overview
Disclaimer: I do work at Microsoft, but my only affiliation with Accessibility Insights is as a happy customer :)
I immediately thought: how come none of the other browser vendors have something like this after all of this time?
Google develops Lighthouse, which, although an extension, I believe includes some a11y checks: https://developers.google.com/web/tools/lighthouse/
Similarly, Mozilla also promoted Webhint: https://webhint.io/ (which is cross-browser)
I'd also recommend Khan Academy's tota11y, which just works as a bookmarklet: https://khan.github.io/tota11y/
I haven't had much luck with their "live" report though so I have a Cron job running every few minutes to regenerate it.
The reason it uses copyleft is to prevent people from taking my work and operating a competing SaaS with it; I don't think that's very "crazy" IMHO.
I could add a clause about it to make it unambiguous, perhaps, but it strikes me as rather redundant as it seems fairly clear to me, unless I missed something?
> Monetizing open source projects is incredibly difficult, and simply pasting a GPL or EUPL license text into the project doesn't make them easier to monetize
Sure, I don't disagree with that. But as mentioned non-copyleft includes the risk of a certain kind of abuse that I don't really want to take, either.
"You will not (and You will not allow any third party to) (i) copy, modify, adapt, translate or otherwise create derivative works of the Software"
As you typed, The interpretation of what is or is not a derivative work in this context is a subject that is legitimately complex. Its not tested, beyond the fact that the companies of 1/3 of the largest websites has had their lawyers green light to use software with such language in the license. So far the bet that a website does not constitute a derivative work of the analytic software it is using is holding.
I actually have a local branch that I made after you last comment to change the license of count.js to MIT, but then I thought about it some more and wasn't sure if that was the correct thing to do. My concern is that "EUPL with clarifications/exceptions" would be more complex than "just EUPL".
While "telling customers they're wrong" would not be good, changing stuff at a whim after singular complaint would not be best for the product, either.
Also, providing feedback by calling stuff "crazy" is probably not the best way to get people to listen ;-)
There is a growing trend in EU that giving over traffic data is not really acceptable (or legal) if they represent personal data. Here in Sweden there was a lot of embarrassing leaks where classified information got mishandled and government contracts with IBM broke the law as data left the country. Just a few months later a major medical scandal happened where audio recordings of patients slipped out and the medical confidentiality was broken. The cost is climbing high, and together with GDRP it is really pushing demands that data do not leave the border.
Naturally one can always spend the money to develop a custom system, but then we come back to the problem of competition and budget constrains. It is not easy to get such projects green lighted, especially if some engineer comes up and suggest that they can just use some free software and put that developer time on more important things.
But looking at data is fun so I ended up creating my own super light counter that I run on my site so I can see hits. My goal was to store as little information as possible - only hit counts as stored, and no cookies are used at all.
I don't have any fancy graphs but the numbers are interesting
EDIT: all I discovered is that my blog gets pathetically few hits.
Analytics are a very useful tool to use in order to improve a product.
I find this really interesting, and am amazed that more sites and services don't surface some of their analytics data to users. Look at the success of yearly "wrap up" campaigns (disclosure: I work for a company with one of the most famous versions of that mechanic, but don't work on it).
You'll get users opting into some data and tracking if there's some tangible benefit to them on the other end. It seems like people love learning about their usage of products, and there's a lot of data that people would be happy to share if they got some benefit too.
For example, I know Google tracks when I click a link in a SERP - but now that they surface the "you've visited this X times, last time on Y", I'd happily opt into that data collection because of the pseudo-utility/interest factor of it.
The one that seems most natural is that organizations don't want people to know how much data they have on them. If too much of it was customer-facing and not wrapped up in a cool "2019 Wrap Up" video, then pressure would mount to be even more transparent, and eventually accountable for, the data organizations collect.
I think there are a few others, like the value to the bottom line that it offers. Most companies optimize heavily there so the only real applications are the ones that would like to drive more revenue, such as "Only 2 seats left!" or "Last One In Stock!" messaging based on urgency and fear. One-dimensional stuff.
I also look at it from the resources perspective. I think lots of companies are spending time and resources pretty poorly. Companies I've worked with outside of startups often forget how and why they make money and end up spending lots of resources on things that might not matter. Service professionals, for example, usually rely on a network connection like the local Chamber of Commerce for business. Despite 80%+ of business coming through that channel, they insist of trying social media or PPC ads instead of doubling down or identifying a similar network when they explore growth. This is natural ignorance that they can learn to overcome.
I really hope we get more data-sourced initiatives in the future. I use a few apps that do a little bit of it but leave a lot to be desired: Goodreads, Strava, Nike Run Club, Spotify, Audible, Kindle, & YouTube come to mind.
My dream is to have a Life Dashboard. I had designed it with some of these apps in mind but the API's and the output I'd get weren't enough to pursue when life got busy.
But I would hate to start seeing "You, personally, have visited this site 14 times" start cropping up because it would remind me how much information on me is available. Intellectually I know this data exists in Google Analytics, but actually seeing it would creep me out.
Maybe you should look at the analytics and determine your traffic sources. What pages are the most popular and in what markets?
But of course, there goes that dirty word "analytics" which "does nothing to help the users of your site"...
Log parsing seems like the logical choice for the static site crowd but it seems like there's little interest there. I must be missing something.
Any site is constantly being accessed by bots, only some of whom announce themselves in the user agent. Some are deliberately designed to mimic human browsing and you can only tell by carefully following their access pattern.
We host websites and the bots are super annoying, because even the well-behaved ones throttle requests per domain, which means they just hit all of our customers at once. If our cache architecture were a little more rotten, like I’ve seen on other jobs, then bot-driven evictions would get ugly, instead of just spiking our traffic, increasing our overhead, and making it harder to get clear metrics.
others have pointed out:
- Client side SPAs sometimes don't hit server logs
- Some static sites are hosted places where you don't have access (github pages, netlify, etc)
- Bots are sometimes defeated by a simple js file
Elevator pitch from `rational.markdown`
GoatCounter aims to give meaningful privacy-friendly web analytics for business purposes, while still staying usable for non-technical users to use on personal websites. The choices that currently exist are between freely hosted but with problematic privacy (Google Analytics), hosting your own complex software or paying $19/month (Matomo), or extremely simplistic "vanity statistics" (Fathom).
GoatCounter attempts to strike a good balance between various interests. Major features include a free hosted version so people can easily add analytics to their personal website, an easy to run hosted option, an intuitive user interface, and meaningful statistics that go beyond "vanity stats" but still respect your users' privacy.
- what channels / sites / campaigns is my traffic coming from?
- what pages are people landing on?
- what pages are driving conversions?
- what do my conversion goals look like (percentage and total conversions)
I'm a little bit hesitant to look too much at GA, since I don't to just make an "open source GA". In a lot of jobs I worked at we were essentially just "making a shit copy of a shit product", to put it crudely. I really want to avoid doing that.
So the way may be quite different, but the goal of providing meaningful business insights is definitely there.
The four scenarios I listed above are important to understand when running the marketing side of a B2B SaaS company.
I don't care about the implementation details (other than they're straightforward). But I need to understand that info. And today there's no obvious choice other than GA. This surprises me.
I agree that info is important, I just meant that the UI might turn out quite different from GA.
For startups there's plenty of options, mixpanel is probably my favorite.
Google Analytics probably has more users because of ease of use for small business but I wouldn't say it's a space without competition.
If a company is serious about learning about its customers and how they use their products, then it invests in Adobe Analytics.
If a company is looking for something that's free, quick, and "good enough," then it goes with Google Analytics.
Just like if a company is serious about advertising, it hires a professional ad agency. If it's looking for something cheap and "good enough," it go with Google AdWords.
- "Show me where my visitors are coming from"
- "Show me what landing pages are most popular... at driving conversion... by channel"
Also, the hosted version isn't free, and self-hosting is also comparative expensive (vs. free) and time-consuming. IMHO any serious GA alternative should have a free hosted option. I wrote about that a bit more in-depth yesterday over here: https://lobste.rs/s/ooag4u/goatcounter_1_0_release#c_o76csv
If you want to build analytics software on moral grounds for privacy and stuff you will just bleed out or just run very niche or indie business. It's great for nomadic makers, but not for serious business.
Look on Matomo, Simple Analytics or Fathom. They are all great(besides Matomo) but they can't compete on other market than small business. And yes, I know that Matomo has enterprise clients, but they are also small comparing to GA. :)
Want to compete with them? Have a great plan and support from major search engine like DDG. If not, then you can make another Mixpanel(which is great!).
It's hard to get statistics on conversion when you don't even track users across pages on your own site. It looks like GoatCounter can't show you unique visitors or how they move around on your site because it doesn't track them. There are no cookies on the main page! This seriously limits the kinds of features that can be implemented.
I might make it an optional feature, too. Again, need to look in to it in-depth.
There are a zillion-and-one things to do, and thus far other things have taken priority :-)
That said, having a unique but anonymous cookie within a single site isn't the end of the world but Google only provide GA for free because it is useful for determining other things.
Having said that, there is no need to create an exact copy of Google Analytics because most of the people probably use only 20% of the features anyway. Each business has its own use-case and data source so it would be much more convenient to ingest all the raw event data into your data warehouse either using third-party tools such as Segment or open-source tools such as Snowplow and Rakam. This is the only way to have full control over your data.
1. If you don't want to store sensitive user-data, just don't send it to your servers.
2. Create the reports either using SQL or something like Rakam that provides you an interface similar to Amplitude / Mixpanel but on top of your data-warehouse so that you don't need to share your data with a third party service.
Shameless plug: I'm working for the company behind Rakam. (https://rakam.io)
Am I missing something?
GoatCounter seems really promising, I especially like the simple web interface, the selected programming language (Go) and that it should work with a simple SQLite database.
Keep up the great work!
If you need some ideas for features that are currently implemented into KISSS but not into GoatCounter (AFAIK):
- Request stats from multiple domains (e.g. see the number of page views for all domains combined)
- Request stats by different criteria (e.g. only show stats with referrer of Hacker News or Browser Firefox)
- Reports: Daily email or Telegram message with stats
- Telegram bot: Request stats via Telegram
There's definitely a need of privacy-respecting analytics services that don't collect personal data. I hope you can succeed with your project!
Matomo is quite good at this.
I will add docs on how to run it in "server mode", where instead of using a JS script you add a HTTP request in your apps middleware. This is an idea I had the other day and I did some research on it, and it should work quite well (haven't started work on it yet though).
Regarding log analysis, it can be tricky to get right due to logrotate edge cases.
Say you run a SAAS and install this on your marketing site. You're still sending IP address and potentially identifiable information to a third party processor.
We (on HN) don't consider IP addresses as PII, but from a purely practical standpoint ad data brokers are selling/bidding on IP addresses all the time which makes them more than nothing.
You (as the controller) also need to validate that processors (services that you are using) are in fact doing what they're saying. You'd still need a Data Processing Agreement in place with GoatCounter because otherwise Goat could start collecting additional information without your knowledge, start generating more metadata (GeoIP/Company) from the IP, etc.
I'm just saying it's not only about not collecting data, but the processes that surround it and safeguarding users and their privacy.
The current setup is similar to Fathom in that I temporarily track a user session by generating a unique hash for the user, then, if that hash has already been seen in the past 30 mins, we move hash to the latest page view instance and can increment a pages viewed counter for the session. We can't tell which pages you've been on in the past, only that you started a session at X time, and viewed N pages, with the last view at Y time.
Incidentally, I had a PoC for the data ingest running as a Cloudflare Worker using their KV storage. What could be interesting about that is that there'd be zero third-party widget code to inject into a webpage: You'd log the pageview in the worker and pass on the request.
But the market for those Wordpress users who want to paste a few lines of JS snippet into their site would be lost. And it would add a few 100ms to each request you want to log.
Hit me up on email and we can arrange something: email@example.com
I like privacy-aware analytic tools. So far I have been using https://www.privalytics.io
Similar functionality-wise from what I can tell, but different in style.
For anyone interested, the site is https://www.videogamesbyyear.com, and if you want to see the statistics screen, I made them public here: https://videogamesbyyear.goatcounter.com/
Both solving an important problem in my eyes.
Its nice if someone has just done some of that sort of stuff for you.
Better data export facilities is something that I intend to do soon-ish, and you could build your own UI with that, but it'll be based on data/DB sync rather than querying an API, which is quite a different workflow.
Perhaps I should clarify the README on this a little bit.
Might be worth clarifying by bringing this line up especially as there are tons of "gdpr compliant" products that get fuzzy on the details pretty quickly.