Creator here. As a developer, I install analytics for clients, but I never feel comfortable installing Google Analytics because Google creates profiles for their visitors, and uses their information for apps (like AdWords). As we all know, big corporations unnecessarily track users without their consent. I want to change that.
So I built Simple Analytics. To ensure that it's fast, secure, and stable, I built it entirely using languages that I'm very familiar with. The backend is plain Node.js without any framework, the database is PostgreSQL, and the frontend is written in plain JavaScript.
I learned a lot while coding, like sending requests as JSON requires an extra (pre-flight) request, so in my script I use the "text/plain" content type, which does not require an extra request. The script is publicly available (https://github.com/simpleanalytics/cdn.simpleanalytics.io/bl...). It works out of the box with modern frontend frameworks by overwriting the "history.pushState"-function.
I am transparent about what I collect (https://simpleanalytics.io/what-we-collect) so please let me know if you have any questions. My analytics tool is just the start for what I want to achieve in the non-tracking movement.
We can be more valuable without exploiting user data.
First off: hats off for making a product that takes the rights of the end user seriously!
However, I am a bit confused as to who would want this product. The sort of questions this product answers seem quite limited:
1. What URLs are getting lots of hits?
2. What referrers are generating lots of hits?
3. What screen sizes are those hits coming from?
What decisions can be drawn from those questions? This seems useful only to perhaps some blog, where they're wondering what sort of content is successful, where to advertise more, and whether to bother making a mobile website.
Without the ability to track user sessions -- even purely in localStorage -- you can't correlate pageview events. For instance, how would I answer a question like:
- How many high-interest users do I have? By "high interest", I mean someone who visited at least three pages on my website.
- Is a mobile website really worthwhile? How much of an effect does being on mobile have on whether someone will be "high-interest"?
I should think some anonymized user ID system -- even if it rotates anonymous IDs -- should be able to answer these questions without compromising privacy.
Also, I'll leave it to others to point out it's unlikely this product is exempt from GDPR.
Since the creator points out that he doesn't store any IP addresses, he doesn't store any data that allows identifying an individual. For the GDPR to be applicable you need to store data that allows you to identify an individual.
Thus when you use this, you don't have to think about GDPR.
I'm not so sure. By putting this service's code on your website, you transmit personal data (IP addresses) to this third party. That appears to make the GDPR applicable here? Transmission is considered "data processing" under the GDPR.
Really, the central point that should be clear is that this is a question for lawyers. The GDPR is incredibly far-reaching.
The IP necessary for the connection itself is covered under necessary data, you can process it for the purpose of a request without needing consent at all. Same applies to shopping cart cookies or anything else that is essential to running a website and isn't being used for secondary purposes like data mining.
The key is to determine under which lawful basis you are processing that data. "Necessary data" is not a get out of jail free card. Because the analytics are not necessary to perform the contract (in any way that I can imagine), you can't claim contract lawful basis. Probably you are stuck with legitimate interest.
So I think you would have to notify the user that you are sending their IP address to the processor under legitimate interest and have a way for them to "object" to that use (i.e. turn off analytics). For legitimate interest, the objection can be after the fact, so having a configuration screen that stores a cookie that allows them to turn off analytics when they are on the site would probably do it.
Since in my case, the processor is me, there is no data being sent elsewhere. I don't have a notification since all data collected is either heavily anonymized in client if possible or on the server side or simply not identifying. Since the data I collect is used to optimize the website experience, I think it's a good enough legitimate interest with no privacy impact.
Here's a gdpr compliant system that answers complex questions. Hint: if your content is worthy, a part of readers will agree to reasonable analytics, and you can extrapolate from this.
I might be able to help because I wrote an analytics tool a while back that tracks these three properties and some other stuff
1. Knowing which URLs are being visited allows me to see if a particular campaign or blog site is popular
2. The referrer tells me where a user came from, this is helpful to know if I'm being linked to reddit and should allocate more CPU cores from my host to the VMs responsible for a particular service
3. The screen size allows me to know what aspect ratios and sizes I should optimize for. My general rule is that any screen shape that can fit a 640x480 VGA screen without clipping should allow my website to be fully readable and usable.
4. I also track a trimmed down user agent; "Firefox", "Chrome", "IE", "Edge", "Safari" and other. All will include "(recent)" or "(old)" to indicate version and other will include the full user agent. This allows me to track what browsers people use and if people use outdated browsers ("(old)" usually means 1 year out of date, I try to adjust it regularly to keep the interval shorter)
5. Page Load Speed and Connection. This is a number in 10ms steps and a string that's either "Mobile" or "Wired", which uses a quick and dirty heuristic to evaluate based on if a connection is determined to be throttled, slow and a few other factors. Mobile means people use my website with devices that can't or shouldn't be drawing much bandwidth, Wired means I could go nuts. This allows me to adjust the size of my webpage to fit my userbase.
6. GeoIP: This is either "NAm", "SAm", "Eur", "Asi", "Chin", "OcA", "NAf", "SAf", "Ant" or "Other". I don't need to know more than the continent my users live on, it's good enough data. I track Chinese visitors separately since it interests me.
Overall the tool is fairly accurate and high performance + low bandwidth (a full analytics run takes 4KB of bandwidth including the script and POST request to the server). It doesn't collect any personal data and doesn't allow accurate tracking of any individual.
If I want to track high interest users, I collate some attributes together (Ie, Screen Size, User Agent, Continent) which gets me a rough enough picture of high interest stuff for what I care. You don't need to track specific user sessions, that stuff is covered under the GDPR and not necessary.
Before anyone asks if they could have this tool; nope. It's proprietary and mine. The code I've written for it isn't hard, very minimal and fast. I wrote all this over a weekend and I use influx + grafana for the output. You can do that too.
Both mine and the product of the HN post are likely not in the scope of the GDPR since no data is collected that can specifically identify a user.
It absolutely isn't privacy-first if it requires running on someone else's machine and giving your users' data to them - another issue would be that while your server is in the EU, the hosting company is subject to US law, and all the stuff that comes with it (https://en.wikipedia.org/wiki/CLOUD_Act f.e.)
This look great—have bookmarked it for future projects.
I would however a little more skeptical with tools claiming to be privacy-first than I would be with GA (who I presume are not privacy-first). On that note, some quick questions:
- Any plans to open source? I've used Piwik/Matomo in the past, and while I'm not a massive fan of the code-quality of that project, it's at least auditable (and editable).
- You say you're transparent about what you collect—IPs aren't mentioned on that page[0]. Are IPs stored in full or how are they handled? I assume you log IPs?
- How do you discern unique page-views? You seem to be dogfooding and I see no cookies or localStorage keys set.
- No plans to go open source with the backend, but I do show the code that is run in the browser. The visualisation of the data is not super important I think.
- I don't save IP's, not even in the logs.
- I don't have unique pageviews at the moment. I will in the future. If the referrer is the same as the current page, I will measure that as a non-unique. What do you think?
If you don't go open source, will you at least offer paid self-hosting (similar to what e.g. Atlassian offers).
The idea of privacy is much easier to sell if the data never leaves your own server, instead of using some analytics provider that might be run by the CIA or the Russian mafia for all we can prove.
Apart from the unfortunate non-open-source answer, this sounds great!
I get others' concerns about wanting unique pageviews, but that metric is always a bit of a sketchy either-or for extremely privacy-conscious people. It's both an incredibly valuable metric, and also one that's difficult to square with complete privacy (basically it's always going to be pseudonymous at best).
If you need an open-source solution that truly cares about privacy yet can still keep track of unique pageviews, there's always Fathom Analytics (https://github.com/usefathom/fathom).
Have you considered using a shared-source license where they can inspect and build from source that they have to pay for? And where people can obtain the source freely for academic research and/or security reviews?
Shared-source proprietary goes as far back as Burroughs B5000 mainframe whose customers got the source and could send in fixes/updates. Microsoft has a Shared Source program. Quite a few suppliers in embedded do it. There's also a company that sells UI software which gives the source to customers buying higher-priced version.
I will warn that people might still rip off and use your code. Given it's JavaScript, I think they can do that anyway with reverse engineering. It also sounds like they could build it themselves anyway. Like most software bootstrappers or startups, you're already in a race with other players that might copy you with clean slate implementations. So, I don't know if the risk is that big a deal or not. I figured I should mention it for fairness.
Doesn't seem like a very useful measure of uniqueness.
What if you had one-day retention of IP addresses for per-day unique views? Seems like too important of a metric to eliminate completely, and one-day retention seems like a decent trade-off at the expense of being able to do unique analysis over longer time periods.
Not private enough as the space of IP addresses is too small.
Removing the last octet of IPv4 addresses before storing them should provide better privacy.
I solved this my SaaS by internally logging all the requests and then using the Measurement Protocol (https://developers.google.com/analytics/devguides/collection...) to send them from the server-side. While doing that I also set the last digit to 0 and unify user agents and other data that's not important for me.
When you can trivially crawl the input space like ipv4 addresses, you'd have to expire a fresh per-day salt as well.
But to my eyes, expiring salts isn't much different than deleting ip addresses after one day. Just more machinery. People have to trust that you're doing either, so why bother beyond being able to use the word "hashing" in marketing language?
You'd at least want per record salts. But even then it's trivial to check if a given ip is in the dataset. Better, but not great. (ie: you have access to the dataset, you want to check if a given ip/time match the log - read the salt, check the hash).
> You say that you do not store IP addresses, but why should anybody believe it?
I can show the code, I will probably do this in my next blog post, but that does not guaranty anything.
> Modern security is based on proof, not on trust.
Is it? So if there is a hosted version of a open source tool, you are sure they use the same code on the hosted version a in the open source tool? It's still based on trust.
Regardless of your intentions, you are collecting enough data to track users.
> I am transparent about what I collect ([URL])
That page doesn't mention that you are also collecting (and make no claim about storing) the globally-visible IP address (and any other data in the IP and TCP headers). This can be uniquely identifying; even when it isn't unique you usually only need a few bits of additional entropy to reconstruct[1] a unique tracking ID.
In my case, you're collecting and storing more than enough additional entropy to make a decent fingerprint because [window.innerWidth, window.innerHeight] == [847, 836]. Even if I resized the window, you could follow those changes simply by watching analytics events from the same IP that are temporally nearby (you are collecting and storing timestamps).
[1] An older comment where I discussed how this could be done (and why GA's supposed "anonymization" feature (aip=1) is a blatant lie): https://news.ycombinator.com/item?id=17170468
I think there's value in at least distributing the data that's collected. I may not like that the analytics provider has my data, but it seems like a lesser evil if that provider isn't also the world's largest ad company and they aren't using it to build profiles behind the scenes to track my every move across a significant part of the Internet.
Given the choice between a lot of data about me given to a small provider and somewhat less data about me given to Google, I'd generally choose the former.
Thats no a good way to make a decision. Big,small doesn't matter. What matters is who is providing better security? When 2 parties big,small are collecting data ,then the party which can act on security vulnerabilities quickly and has great security engineers and dedicated teams like Project Zero- is the much better choice. People nowadays assume that a small,indie developer is a good guy. I am just pointing out that this is a very bad bias to have. Technicalities matter, security robustness matters. Google might be collecting data,but their security is really good. Good effort by this dev though.
I totally agree on the security aspect, but I think we're talking about different threat models.
Security matters if your concern is the data leaking to a potential malicious actor. The concern that I'm speaking to is the intended use of the data. Google is definitely going to use it for ad targeting and building a "shadow profile", but a small developer probably won't. This one says they won't, but even if they do they're likely to be much less effective than Google would be.
Probably. Wow, you used the word "probably". I guess you aren't aware of the many cases wherein when a Chrome extension gets popular, indie developers are contacted by some company and many have sold their extension are let them collect data. Also yhis data gets sold to 3rd parties,many such cases with small-medium websites have occured. Remember Unroll.me
Also, Google knows how to make profiles and it knows the importance of that data amd keeping it safe. It is also somewhat answerable to Consumer groups,users,shareholders,regulatory bodies.
Indie dev doesn't know how to make good profile, more likely to sell the data to make revenue. Not ridiculing indie devs, just ridiculing your assumptions that if a solo dev is an angel.
I'm curious what your concern with Google building this 'shadow profile' is if you're not worried about this data being leaked to a malicious actor - Is Google simply having this data a bad thing, and if so, why?
I know Google creates global profiles for tracking - and my question (which is the same as my original question) is why do you care? If that data is only used internally by google to serve you better ads why are you concerned with them having your data?
Even if a user trusts Google, because the data is digital and therefore permanent, there's no guarantee it will remain internal forever, whether that's because of a hack, a rogue employee, police/government pressure, or a change of ownership.
It seems to me that, with the exception of a rogue employee, all of those examples are at a greater risk of occurring with a small, independent provider. Google almost certainly has more security resources, more legal resources and political clout, and isn’t likely to be acquired any time soon.
I can’t say I love having Google track me, but I don’t feel any better about someone else doing it either.
If the marketplace was full of independent trackers (which I'm not suggesting is a good idea, because third party trackers are bad in the first place), then as they get compromised, only a small subset of data is lost... The chance of losing everything or enough data to pair to your real identity is a lot lower. It's like IDs in physical activity. If you visit your bank they track you by a different id to the library, your medical record, etc, each might be lost individually and be upsetting, but do they reveal data about all the others? No.
Why is Google security better than anyone else? Monopolies often have more resource, but lack motive, because they are a monopoly. Without transparency we have no idea how secure Google's systems are, but we do know Google has been hacked before.
It's not just bad from a privacy PoV. By giving away signals to GA, you're actually underselling your user's data. Google can correlate your analytics with other's to place highly targetted ads for your visitors on other sites, stealing the attention your high-quality content generates, such that sites with big pockets for Google ad bidding and placement but otherwise only low-effort content (and Google itself, of course) make all the money.
> When 2 parties big,small are collecting data ,then the party which can act on security vulnerabilities quickly and has great security engineers and dedicated teams
This cannot be stressed enough. At my day job I write reasonably secure software on a team for big clients, then at home I write reasonably secure software independently for small clients.
Come new security issue, the big clients at day job get first priority. Not because they are big and not because they are paying more, but rather because as a team we can reallocate resources and work on issues in parallel. At home, there is only one Dotan to work on each independent client in series.
I think how the data is used is also a big factor.
There is 'justice' in the blog creator using analytics data to to improve the experience of blog visitors: a user's data will, theoretically and in aggregate, create a better experience for that user in the future. The class of 'users who browse this page' gets a benefit from the cost of providing data.
Selling browsing information to advertisers is sort of 'anti-justice'. Using blog visitor data to track and more effectively manipulate those visitors elsewhere on the internet into paying people money. The blog visitor's external online experience is made worse by browsing that blog.
Good comment! I only store the window.innerWidth metric. I updated the what we collect page (https://simpleanalytics.io/what-we-collect) to reflect the IP handling. We don't store them. And fingerprinting is something that would be definitely tracking, not on my watch!
First, "IPs" might be confusing; "IP addresses" would be more accurate.
More importantly, you have to collect IP addresses (or any other value in the packet headers[1][2]) - even if you don't store it - if you want to receive any packets from the rest of the internet. Storage of those values is separate issue entirely, and it's good to hear that you are intending to NOT store IP addresses (and updating the documenting)!
Also, I strongly recommend using Drdrdrq's suggestion to lower the precision of the collected window dimensions, which should be done on the client i.e. "Math.floor(window.innerWidth/50)*50". This kind of bit-reduction makes fingerprinting a lot harder.
I would argue that in the conversational context "collect" is more a synonym for "store" than for "receive" or "see". Moreso in the context of a tracking system. In my opinion anyway.
There is absolutely no reason to collect and store window dimensions, other than for fingerprinting and tracking. Sure it might be an interesting piece of trivia for the dev, but it's not necessary for the dev to "make sure the website works great on all of those dimensions", since that much is already obvious and presumed when making websites these days.
Actually there is, this is one of the most interesting metrics. It doesn't have to be precise though, rounding to nearest 50px would be more than enough. I would argue that height and aspect ratio should be collected too. (I didn't downvote you FWIW)
Could you round to buckets as well - take the 10 (say) most common screen sizes, and round users to the nearest? That way users with odd screen sizes aren't identified.
Could there not be value in knowing how many pixels your users have available to view your things? You could presumably get that information from device characteristics but then could also presumably use that for fingerprinting.
You as the developer have access to and are aware of all possible display dimensions and aspect ratios. It's not that hard to prioritize the sizes you want to support and then work based off that. There are plenty of tools out there that let you simulate different screen sizes for testing too. I don't see this information providing any extra value.
But you are assuming the users browse the website in full screen mode/maximised. Whilst true for most mobile devices, this is certainly not given on desktops.
Besides... optimizing a site for specific window dimensions? If I see conversion rate is lower on a certain band of dimensions, something likely doesn't display properly. It'd be impossible to test every dimension.
That page doesn't mention that you are also collecting (and make no claim about storing) the globally-visible IP address (and any other data in the IP and TCP headers). This can be uniquely identifying; even when it isn't unique you usually only need a few bits of additional entropy to reconstruct[1] a unique tracking ID.
This is true. The legal department for the healthcare web sites I maintain doesn't let me store or track IP addresses, even for analytics.
I'm only allowed to tally most popular pages, display language chosen, and date/time. There might be one or two other things, but it's all super basic.
> That page doesn't mention that you are also collecting (and make no claim about storing) the globally-visible IP address
I’m not the OP, but where is there evidence that they’re storing the IP? Sure it’s in the headers that they process but that doesn’t mean they’re storing it.
How are you storing all the information that analytics users want to know i.e. (What devices, what languages, what geolocations, what queries, what page navigations and clicks, etc.)
After reading what you collect I'm assuming you are doing a lot of JS sniffing of browser properties to gather this information along with IP address analysis is that correct? Or what are you plans about these features if you don't have them now?
Overall though I'd say great design + sales pitch. I think if the product delivers on enough features you will have something here. Great job!
Just a heads up, HN comments only use a (I think) small subset of Markdown for formatting, but your link will work as is without having to wrap it in [] and adding the ().
Anyway, cool project! I've always felt the same about using GA given I actually like to pretend I have some sort of privacy these days, and always have an adblocker on, so I hated setting it up for people. Definitely will be keeping an eye on this the next time someone asks me to setup GA.
Good Sir, props to you for including a noscript/image tag in the default code. Google Analytics didn't do it for the longest time, and in fact may still not do it.
Whether on purpose or by accident (or simply by mental bias) they seriously misrepresent the amount of people for whom JavaScript is blocked, not loading, disabled by default for unknown websites (me) or not available for any other reason.
Website owners and creators should at least have that information as a reliable metric to base their development choices on.
This is pretty much exactly what I have been looking for. I recently ditched Google Analytics and all other possible third party resources (except for YouTube which I implemented a click to play system) on my blog (consto.uk).
I just have a quick question. What subset of the javascript implementation does the tracking pixel provide? If all that is missing is screen size, I might just choose that to avoid running third party code. For performance, I combine, minify, and embed all scripts and styles into each page which lets me acheive perfect scores in the Chrome Auditor.
Could I ask what tech you're using for the graph data? I'm working on a similar SaaS (not analytics) which requires graphs. I'm a DevOps engineer for an ISP, and I do a lot of work with things like Graphite/Carbon, Prometheus and so on - but I can't seem to settle on what to use for personal projects. Do you use a TSDB at all? Or are you just storing it in SQL for now?
> like sending requests as JSON requires an extra (pre-flight) request, so in my script I use the "text/plain" content type, which does not require an extra request.
At my work (The New York Public Library), we created a “Google Analytics Proxy” that receives requests and then proxies them to Google’s Measurement Protocol so you still get the benefit of using Google Analytics but can control exactly what’s sent/saved in real-time.
It’s intended as a mostly drop-in replacement for the GA analytics.js API and to be used as an AWS Lambda.
I've moved away from using any kind of script embedded in my webpages for tracking and instead just use Goaccess (https://goaccess.io/) to analyze my logs. Though there are obvious caveats with this, you need to install it, configure the server logging to match it and so on. But personally the benefits outweighs the cons, it all runs on the server, you are the sole owner off all the data and this tracking doesn't require any kind of JS on the webpage.
Wow, it's been a long time since I've seen one of these. It's like the olden days with Urchin (what eventually become Google Analytics). They analyzed log files prior to the Google acquisition. IIRC you could buy whatever the current version was (e.g., Urchin 2) for a flat fee and use it forever. There were free alternatives, but I liked Urchin's UI and features the best at the time.
Anyone remember what the price was? I want to say it was something like $60-$100, but my memory could be conflating it with something else.
Isn't there a problem with GDPR compliance if you want to serve European pages? You are allowed to log IP addresses for security reasons. However, as far as I understand the situation, you need the agreement of the users if you use their personal data, which includes IP addresses, for anything else.
Has somebody figured out how to resolve this situation with log files?
You can use goaccess to create a log every day to json, excluding IP while retaining stats for geolocation.
For this you can logrotate daily and run goaccess before rotation. I believe you can keep the server logs for a week for debugging while respecting GDPR.
For today's "realtime" data you can use goaccess on today's log on demand and use a cache.
You can write your custom stat viewers or use goaccess to view time range data from multiple json files.
Goaccess is amazing and, in a world where seemingly every technology touts itself as "lightweight" (whether they really are or not), truly is very light weight.
I don't understand why Google Analytics works at all nowadays: A large percentage of visitors uses an adblocker and don't they block tracking and analytics by default?
Users like me must be complete ghosts unless one looks in their real server logs!
I LOVE Goaccess and highly recommend it as well. My single complaint is the lack of ability to filter/define a date and time range. I know there is an issue for it but last time I checked it had been open for quite some time :(.
There's a kind of a workaround. I rotate my logs with logrotate weekly, so the current week's logs are in access.log (and access.log.1) and past logs are in access.log.x.gz files. Then I run goaccess twice (once for .log and once .gz) to get both "all" and "latest" stats. It's not as flexible as a real filter, but it works for me.
Just curious ,are you using the web based UI to look at your data or the CLI? I use the web UI so I'm wondering how something like this might work with that. I'll have to poke around. Thanks for sharing!
I'm also using the web UI. I create "example.html" and "example-latest.html" with a daily cron job (the way I described) and move it into my web directory (behind a http auth).
I planned to do a write-up for a while now, I should finally get it done (my blog link is in my profile)
This looks awesome - I'm curious if anyone has found a good way to use this with Kubernetes. You can choose where to ship your cluster logs, so it should be possible.
From what Simple Analytics says they collect on their website, it sounds like the only information missing from GoAccess (or server logs in general) is screen width.
To everyone complaining about the price point for this service.
You are part of The Problem.
This is a solo dev's venture, that has a relatively pure and straightforward goal. If you can't afford it, don't use it and pick one of the others.
Do NOT compare this with a B2C offering that has nothing to do with analytics.
Do NOT compare this with a B2B offering that's free and feeds your user's data into the parent corporation's advertising revenue stream.
Do NOT compare this with a B2B offering that is open-source, with a team of a dozen core contributors that has had a decade of development under its belt.
Heh...I had the opposite reaction to the price. As someone building something in the analytics space, $12/mo seems so low that it won't get traction beyond the hobbyist demographic. If you want to sell to business, the price point needs to be at least $200/mo.
Plus, I have zero confidence that someone using a naive postgres implementation can scale an analytics backend with customers paying only $12/mo unless all those customers get barely any traffic. Perhaps if he was using Timescale on top of postgres, but even then, $12/mo seems awfully low.
But as it is, the price point signals that he doesn't think it's a particularly valuable service.
How do you know the postgres implementation is naive? I've worked on several analytics platforms...including offshoots of google analytics within Google itself, and this problem domain is ridiculously easy to shard on natural partitions. And after sharding, you can start to do roll-ups, which Google Analytics does internally.
By 2014 when I left, we had a few petabytes of analytics data for a very small but high traffic set of customers. Could we query all of that at once within a reasonable online SLA? No. We partitioned and sharded the data easily and only queried the partitions we needed.
If I were to do this now and didn't need near real-time (what is real-time?) I'd use sqlite. Otherwise I'ld use trickle-n-flip on postgres or mysql. There are literally 10+ year-old books[1] on this wrt RDBMS.
And yes, even with 2000 clients reaching billions of requests per day, only the top few stressed the system. The rest is long tail.
There's a comment elsewhere in this thread where he talks about his backend. He didn't explicitly say it was naive, but he definitely gave off that vibe. Is it possible to use postgres in a sophisticated way to work as an analytics store? Sure...Timescale does it and gives you the majority of what you'd need. But it's hard to get right and the creator hasn't given the impression that he's well-versed in this space.
Incrementing counters for pageviews, visited page, referer and page width, and putting that into chartjs is something I can put together myself in two hours. It wouldn't be nearly as polished, but it would be 90% there and good enough. Plus I would have a much better idea how well it scales, and generally have less unknowns and risks.
The goal is great, the design is sleek, but at the current price point (which is already lowered to $9) and feature set, it's just not worth it to me. For that price the tool has to provide more actionable data or other value.
There's probably a market out there, but most of that market is probably not the type of person you will find on hacker news.
Dropbox is a glorified FTP client, Slack is IRC with a nice skin and inline pictures, Spotify is bittorrent without the hassle of downloads (and more legal, but that hasn't stopped people). Convinience matters, and convinience sells. But it is only one of many factors that play into the decision to purchase. Dropbox for example wouldn't sell all that great without the free tier.
If you honestly think that Dropbox is just glorified FTP, or Slack is just IRC with pictures, or Spotify is JUST bittorrent (or Napster even), I question if you've ever actually used any of those services. They provide so much more than the alternatives you listed.
I made an argument about this offering vs other offerings, not this offering vs rolling your own, which is a whole different issue.
It's incredible how many developers undervalue their own time, effort, and liability. I believe you're mistaken if you think this'll only take two hours of your time. Even if that's true, I believe you're mistaken that your two hours of time is cheaper than $9/mo. I'm also certain that being responsible for analytics sets you up for liabilities and maintenance that distracts from your main value proposition.
There's definitely a market for this, and that market is absolutely here, but apparently a lot of developers don't know how to pick and choose their battles.
> I made an argument about this offering vs other offerings
No you didn't. You argued what it shouldn't be compared to, so I compared it to something else.
> I'm also certain that being responsible for analytics sets you up for liabilities
Outsourcing analytics opens me to the same or worse legal liabilities.
> Even if that's true, I believe you're mistaken that your two hours of time is cheaper than $9/mo.
I don't live in Silicon Valley, so $9/mo pays for two hours within a few months. Sure, there's hosting and an uncertain maintenance burden, but on the other hand buying a service has its own uncertain overheads.
What happens if the service is down, what if it fails to scale, what if it gets hacked? What if it just disappears because there weren't enough customers? Most of these are much easier to answer and take up less time with a self-built service.
You can choose to delegate the uncertain overheads to a company that relies (and specializes) on dealing with them. Or you can roll it yourself and add to your burden.
Code is not an asset, it's a liability. And I mean that from a pure responsibility standpoint, not just from a legal responsibility standpoint.
Sure, code is a liability. Having a dependence on on something outside your control is a liability. Those two have to be weighed against each other. I'm not against buying services on principle, I puy services from plenty of SaaS providers. I'm simply arguing that in this case the scales tip in the wrong direction for me, and likely a lot of other people like me.
It looks like anyone can see the stats for any domain using the service without any authentication. I added the tracking code to my domain and was able to hit https://simpleanalytics.io/[mydomain.co.uk] without signing up or logging in. I was also able to see the stats for your personal site.
Is that intentional? If it is, it seems like an odd choice for a privacy-first service. If not, it seems like quite a worrying oversight in a paid-for product.
It says there's the ability to make them public, but it doesn't mention that they'll be public by default. Maybe it's different if you sign up first before adding the tracking code, but it's odd that I can use the tracking code without signing up for the trial.
There are open source alternatives that do similar things, I want to give people not the hassle of setting up servers, maintaining their versions, and having no updates if they don't. See it as non-self hosted solution like Heroku is for deployment. I believe it should be simple as installing a Google Analytics code.
I think there are a lot of misconceptions about how Google Analytics tracking works. I'm pretty sure a vanilla GA setup does not, in fact, create profiles that track you across the web. For one thing, all the cookies it creates are first-party (on your domain).
Agreed, the first-party cookie is pretty self-evidently not a web-wide tracker.
There are lots of config options. Here's what I like to use:
// Google Analytics Code.
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
})(window,document,'script','https://www.google-analytics.com/analytics.js','ga');
window.ga=window.ga||function(){(ga.q=ga.q||[]).push(arguments)};
// https://developers.google.com/analytics/devguides/collection/analyticsjs/field-reference
ga('create', 'UA-XXX-XX', 'auto', {
// The default cookie expiration is 2 years. We don't want our cookies
// around that long. We only want just long enough to see analytics on
// repeat visits. Instead, limit to 31 days. Field is in seconds:
// 31 * 24 * 60 * 60 = 2678400
'cookieExpires': 2678400,
// We don't need a cookie to track campaign information, so remove that.
'storeGac': false,
// Anonymize the ip address of the user.
'anonymizeIp': true,
// Always send all data over SSL. Unnecessary, since the site only loads on
// SSL, but defense in depth.
'forceSSL': true});
// Now, record 1 pageview event.
ga('send', 'pageview');
> When a customer of Analytics requests IP address anonymization, Analytics anonymizes the address as soon as technically feasible at the earliest possible stage of the collection network. The IP anonymization feature in Analytics sets the last octet of IPv4 user IP addresses and the last 80 bits of IPv6 addresses to zeros in memory shortly after being sent to the Analytics Collection Network. The full IP address is never written to disk in this case.
Why do we trust this statement? It's coming from a company that plays loose with the law and has had some of the biggest fines ever thrown at it. Sorry, but with no way to validate this claim, it is meaningless.
I'm well aware that all we have is "certification" and "audit" programs to verify their claims. I am also that these are less then perfect and they have been found out to misleading/"lying" before and appeared to prefer large fines rather then fix the the issue. It is 100% likely that their public statements don't match reality perfectly.
I posted the quote because there seemed to be a lack of understanding that this feature even exists in GA. The author of the Show HN post didn't even have a statement on how IP address logging (and various other PI in the GDPR sense) was handled when it was originally posted.
BTW, I think it's great that someone is starting fresh with privacy in mind but even with them we will still no way of trusting what they do with the packets sent their way...
Google doesn't need cookies. They can profile your browser and check the source IP to track you around the web. Given that Google's business is information, it seems unlikely that they aren't doing this very easy form of tracking.
I assume that it's more of a feeler/prototype than a real product, but even then it is really basic and through that it's ultimately useless.
A Summary page should show traffic volume, who exactly is driving it and where it arrives. That's the bare minimum needed to make shown information actually _useful_ and _actionable_. Things like "Top Domain Referrers" and "Top Pages" are aggregate vanity metrics, their effective utility is zero. If you have a spike in traffic, you want to know the reason and with your current design you can't.
These are helpful comments, I will make it more actionable, but please also understand I need to test if there is a market for it, first. So that is what I'm doing now and I will improve the product to show actionable information. Just give me some time.
I am using fathom [1] for this. They allow hosting the backend yourself and your analytics are not publicly accessible. Biggest con is that each installation can only track one domain as of now.
Are there any plans to support SRI? It's a pretty big security risk to incorporate 3rd party JS onto all pages - if someone compromises your CDN account then they have full control over every site that's using this code.
This is one of the top ways that credit card breaches are happening lately - e-commerce sites include tons of 3rd party tracking / analytics / remarketing / etc code on their checkout pages, one of them gets hacked and the modified JS posts the credit card form to some compromised server.
I don't doubt your intentions, but I simply don't believe that any kind of user analytics as-a-service is ever going to be good enough privacy wise.
Do you know what isn't creepy and privacy invading? Analysing the attributes of the visitors to FranksKebabShop.com, as part of the tooling that runs FranksKebabShop.com.
This could be analysing web server/cache logs. It could be a more active piece of software that operates via JS and reports back to a service running on the same domain.
I know, I know "everything is SaaS now, nobody installs software". Nobody can install it if you don't make it installable. Be part of the solution not part of the problem.
I also don't understand what this javascript library does. "We just show you the essentials: pageviews, referrers, top pages, and screensizes." The information is already there in the server logs, which gives even more accurate numbers. Why would I want a slower website, with less precise tracking? The only new information is screensizes, so I don't see how it's worth any effort to install a library like this.
I personally wouldn't use one that isn't OSS, but plenty of people don't care about that, but do care about privacy, including the privacy of their site visitors.
You already have very brief comments at strategic points. If you would explain these one by one, I would learn a lot about optimizing for number of requests, skipping stuff to load, etc. Maybe a technical blog post at a later time when the dust settles?
For a project of mine I created an 'actions' table in my database. For every visit (only server-side data) I make an entry into that table. That way I keep track of key metrics that I am interested in (basically which page is loaded and where did the visit come from?). I also store the request id so that I can differentiate between different visits. Entries into this table are made in an new thread in order to prevent any issues or slow-downs on that end to influence load-times, etc too much. Works very well.
I like this kind of approach way more. It's simple, you don't need to rely on anyone else, and you have complete control over what data you ingest and how you assess it.
Thank you, I like it a lot too. The nice thing about 'just' SQL is that I can write queries for the insights that I need, when I need them. I can store the queries for later use, I could automate some stats into my inbox, etc. With other tools you are just stuck with whatever they give you. The only thing I would like is an app where I can quickly get to my stats when I go around. Might be a fun side-project for the weekend.
Haha, yes I just use SQL-queries as harianus suggested. How does he know? I assume getting on top of the front-page and staying there for a few hours makes your quick-draw the responses to the comments a bit. Good for him.
Screen sizes is ambiguous here - are you measuring viewport width (`window.innerWidth` - helpful) or the display the window happens to be on (not too helpful)? Also something to make that data useful would be show the range of sizes, instead of the top specific size. E.g. 1280 may be _the most popular_ but there may be more users using larger width windows, just more variation in those sizes (1320, 1440, etc), so a top level range could be a nice differentiator here.
But, how useful are these stats going to be without being able to see user journeys through a path of pages / actions? Yes, it's good to know which pages are getting how many views. But, in order to improve the UX, we often need to know how many users are able to go from Page A to Page C and whether they went through Page B first. Or e.g. if 90% of sessions that start on Page A (so we know what their purpose was), end on Page B but the main (perhaps beneficial) action for the user was on Page C. You can't just look at the pageviews for each, because you don't know where the session started.
I fear that this would reduce people to "inferring" (guessing) too much about the data that they see, and making decisions they feel are backed with data when there's not enough data to conclude. Then again, I'm sure that happens when the data is there too :-)
It doesn't even offer close to the features Google Analytics offers and costs $12/month. The same such a service as Netflix costs.
The idea is nice but looking at the actual product here: https://simpleanalytics.io/simpleanalytics.io
It disappoints in every way, you can't even check yesterdays stats.
Usually too early you don't want many low paying customer which would imply too much support while you iterate quickly. A low number of high paying customer is much better.
It's also easier to lower the cost later than increasing it. It's hard early on to find the right price point.
If he doesn't sale as much as he want, he will adapt.
$12 / month is a low-paying customer. An ultra low paying customer. You can only offer minimal support at that price point.
Also, it’s easier to raise prices than to lower them. If you lower prices, you need to do it for your current users too or they’ll complain. If you raise prices, you can grandfather people in AND it can be used to incentivize people to buy before the price goes up.
The early stage is the perfect time to start charging. You get to see if it's got any real-world value for users. If nobody signs up then you can change direction or move on to something else.
Besides you don't want to be dealing with desperately trying to convert free users later, or worse, having to grandfather everyone in at free forever from your initial launch. Gotta pay the bills somehow.
Exactly what I thought as well, for what it is at the moment it isn't worth anything. Also I doubt that Analytics is a B2B only product, the majority will be private persons running their blog or hobby forum using it.
From the user perspective, being an early adopter of something that isn't really worth anything now can have external value of increasing the likelihood that you can drive the product direction towards something that will be very useful to you in the future (and without having to pay upfront costs of hiring a contractor and figuring out exactly what you want now). Especially if you're paying, people tend to listen to paying customers' feedback a lot more than web randos.
I'm a potential user/customer. I support two small scale websites that give my two business a presence on the web. By 2013 I guess I started to feel too anxious when accessing Google Analytics because the service was getting bigger and bigger. I could not see its "UI boundaries" anymore, and with that I got the impression I was leaving useful views/analysis behind. Unfortunately I am the kind of user who needs somebody to provide a set of pre-built views/analysis I could make sense of. I don't have the time to rationalize on what I need at various levels and then build the views.
With that said, a minimalist approach to web analytics is attractive to me, specially if I can see its "boundaries", the set of reports etc.
The argument on privacy (or lack of it) has no impact on my perception about this service's value proposition.
Hi I work in digital analytics and have a question. A problem with Piwik is if that PSQL database goes down (a database is NEVER 100% up) what happens to the data your JavaScript snippet is sending?
Will also add a lot of comments here are very unfair I hope you take them with a grain of salt.
First: really slick site. I'm not so into the video which takes a while to get to the point, but the site makes it really easy to understand the point of your product (and that's something a lot of sites lack).
I do have some questions/comments and I apologize if they seem a bit rapid-fire.
* When I look at the "Top Pages", there are links. When I click the link, it brings me to that page on your site not a chart of hits for that page. Is that how it's meant to work?
* If I sign up for your service, do my stats become public? https://simpleanalytics.io/apple.com just says "This domain does not have any data yet" (presumably because Apple doesn't have your script installed). But that kinda indicates that any domain with your script installed would show up there. It might just be an error in the messaging, but probably something to fix.
* What's your backend like? I'm mostly curious because analytics at scale isn't an easy problem. Do you write to a log-structured system with high availability (like Kafka) and then process asynchronously? How do you handle making the chart of visitors? Do you roll up the stats periodically?
* Speaking of scale, if I started sending thousands or tens of thousands of requests per second at you, would that be bad? Is this more targeted at small sites?
* What do you do about bots? Bot traffic can be a large source of traffic that throws off numbers.
* How long before numbers are available? It's September 19th, but the last stats on the live demo are September 18th. Is it lagged by a day?
* Do you not want to track user-agents for privacy reasons as well? Seems like a UA doesn't really identify anyone, but it can be useful for determining if you want to support a browser.
* You're not counting anyone that has the "Do Not Track" header. To me, DNT is more about tracking than counting (which is different). Even if you counted my hit, it wouldn't be tracking me if you didn't record information like IP address and there were no cookies.
Kudos for launching something. I think my biggest suggestions would be fixing the live-demo page so it doesn't look like it's leaking other site's data and providing some guidance about limits. It's easy to think that you don't want to put limits on people, but any architecture is made with a certain scale in mind. There's no shame in that. Sometimes what you want is a "let us know if you need more than X" message. At the very least, it lets you prepare. People sometimes use products in ways you wouldn't imagine and ways you didn't intend which the system doesn't handle gracefully.
Data collection for legitimate purposes came up in our GDPR compliance review.
This product (https://truestats.com) collects the I.P. address and user agent for the purpose of detecting fraud (not selling data or profiling users). It is used for frequency checking and other patterns that would indicate fraud. We are still going through the legal analysis of how to deal with this, even though we have no idea who the visitors are.
I think considering the I.P. address as PII is a little much if you are not using it in a way that would violate privacy or selling the data.
Looks good! I'm the founder of a similar service (Blockmetry). Obviously non-tracking web analytics is the future!
I'm curious why you chose to host the data yourself instead of giving customers the data immediately at the point of collection. That's the path we chose for Blockmetry as it genuinely required to be a non-tracking web analytics service and makes it impossible to profile users. Any service that hosts its data would still be open to being untrusted on the "no tracking no profiling" argument.
Thanks,
Pierre
PS - YC Startup School founders: ping me via the forums and get an extended-period free trial.
> I'm curious why you chose to host the data yourself instead of giving customers the data immediately at the point of collection.
I want to build a brand around trust. If people self-host and say they use my software, but are doing different things behind the scenes, it would hurt my brand.
Simple Analytics does host the data themselves, so people know the numbers are not tampered with and the data is handled the same for every customer. If people use our scripts directly, vistors of those websites can be sure that we respect their privacy with our values.
I like to give people the benefit of the doubt. If criticism is necessary, then might as well make it constructive.
Sure, the blockmetry site has some issues. The menu is unusable on my mobile (android) and there are no screenshots or explanation of how it actually works (server/client side, self/cloud hosted?). There are some style choices that I don't agree with, like the binary background pattern.
But I like to assume good faith unless I have some solid evidence otherwise. Do you know for a fact that there is no product? If so, please share :)
It is, however, poor form to plug your competing product in a Show HN. It's a fine line between mentioning and plugging, but I think offering a discount falls on to the wrong side of the line.
Executing third party JS on your website is an access to the page content, so unless the customer never had any user data or sensitive data on the page, they'll have to categorise simpleanalytics as a data processor.
Referers are often on their own private data, for example https://www.linkedin.com/in/markalanrichards/edit identifies not just you looked at this user, but that you are this user as it is the profile editing page, unique to this account.
The difference between whether simpleanalytics get or store data might remove a GDPR issue for them, but it certainly is for customers. Having access to the IP addresses is sufficient for privacy to be invaded at any point or by accident (wrong logging parameter added by the next new dev), malice (how can we illegally use this and lie to customers) or compromise (hackers take control of the analytics system) and therefore puts users at risk of full tracking at any point. As mentioned earlier GDPR is also about access, it is definitely about storage but the part in between of being given data (not just access to take it and not putting it on disk) is definitely included too.
In summary, simpleanalytics need to stop lying and redo their privacy impact assessments. Meanwhile don't use third party analytics (I have no idea how you maintain security control on third party JS) and if you're silly enough to, then it definitely is a GDPR consideration that needs to be assessed, added to audit, added to privacy policies, etc.
"We don't use cookies or collect any personal data."
IP-address is considered personal data. So when the browsers visits a page with the JS, the IP-address of the user is transferred to your server. So that means the website I am visiting is sharing my IP-address with a third-party (you).
I don't think that is relevant. What matters is that it is transferred to a third-party. And regardless if it is stored in a database, the servers are still processing the data (and maybe storing the log of it).
How would you do analytics without the IP address being "transferred to a third party"? Outside of self-hosting, either the user's browser is going to be making a request to the analytics provider (and therefore exposes their IP), or you're going to have to have some sort of proxy mechanism on the site's server that strips that information and sends it from there.
It's exactly as you state. This is the problem. The IP address needs to be stripped before storing or sending to anyone else, or it's still something you need to consider as personal data. This matters for GDPR. So in effect, this service still has to adhere to GDPR, because it is in fact receiving IP addresses, regardless of them getting stored or not.
GDPR bores the hole off of me so I haven't done much reading, but I do remember a court dismissing a piracy case recently because 'IP addresses alone are not enough to identify an individual' - how would this play in to this scenario?
Then, clients that help keeping lights on start asking for this and that.
And suddenly you end up providing a service with user level insights, cross-device tracking and advanced behavioral segments powered by ML because why not.
Good thing that you mention this. I get a lot of requests of users that ask for adding support for custom events. I'm very strict in what I allow. If it could be named tracking, I say no. Custom events could allow tracking (if people use it like that or not), I will say no to those requests.
Is there a way to track country and language as an aggregate? For businesses this information is extremely useful as it gives an idea of what countries to expand to or what languages should be supported.
I was able to add the tracking code to my site without signing up and could see the stats without any authentication (see my other comment: https://news.ycombinator.com/item?id=18024886). Is that by design?
That is by design. For a very short period I supported a free plan which had only public data. So that’s why you see the behavior. It will be gone soon.
My feedback: someone else mentioned making the tiny live demo button bigger. I suggest scrapping it entirely... and embedding the demo statistics directly under the video, or very close to it, to go straight from "why" to "what it looks like". The chart/stats page design is sufficiently clean that shoving the whole thing onto the homepage won't actually be an information overload.
Speaking of the video, it's ridiculously professionally done, by the way; excellent acting to begin with and perfect line delivery (confident, well-timed, no hesitancy/awkwardness) as far as I'm concerned.
-
Apart from this, my only other advice is - reject buy offers, reject partner offers, sleep on VC offers for as long as you can (if, ideally, you don't outright reject these as well), and take this as far as possible on your own. I say this considering two standpoints.
a) Considering the developer: this is incredibly well done and you clearly have the competency to drive this forward without assistance. The website and video presentations are both great; the product defaults easily tick "sane enough"; and the only thing stopping me throwing money at the screen is that I have no projects that need this right now - but others definitely will, and I look forward to seeing this go viral.
b) Considering the product: "oooo internet privacy" is a well-trodden path with a thousand and one different options which are all terrible in their own way. You have the opportunity to differentiate by offering something that gains a reputation for actually not compromising, even months and years down the track by working to eliminate some of the sociopolitical cascade that can contribute to dilution of quality. Customers have sadly had good reason to associate buyouts with rapid decline in quality, so that sort of thing just looks bad at face value too.
To clarify what I mean by taking this as far as you can on your own: it's obvious others have already provided assistance - filming and acting in the video, and for all I know beta testing and maybe other development support - and I'm not pointing at that and suggesting it will bite you. I mean that, if you ever bring help on, find a good lawyer who will ensure the project remains _yours_ and make sure there are no implicit "50/50" partnership agreements or the like.
I can't find the references right now but I've read of a couple of projects/products that have exploded sideways (very sadly) because of jealousies and impedance mismatches creating imbalances that provoke partners brought onto projects to assume control and pivot things out of a creator's control, without the creator having any legal recourse.
I made the live button huge under the video, thanks for the feedback! Thank you for the kind words, means a lot! I will read this comment threat a hundred times after today, for sure!
Few Questions: How likely is this to be blocked by uBlock Origin/Firefox private mode (easy-list etc). Do they have any rules what they consider to be 'ethical analytics'? How much overhead does this analytics package have on page load.
Have you considered a free tier for up to 1k page views a month for example?
How can this track conversions for A/B testing? This is one of the most common usages of analytics in my experience. Is there a way to have user based conversion tracking whilst still being GDPR compliant?
If people want to block you, they should. I also respect the Do Not Track setting. If it is on, I just don't register the visit. I have considered the free version, but I only want to do this when I have enough customers. A/B testing is not simple anymore, so probably not doing that.
This feels like a rant, but I've posted my https://trackingco.de/ here multiple times, which has very similar proposal (and is cheaper) but never got a single line of feedback.
The example (https://trackingco.de/public/9ykvs7rk) does not work for me. Also, the first time I visited the site I saw Lightning Bitcoin and then left. You lost me as soon as I read that because I'm not interested in that. I was just trying to find a simple (but useful) analytics service that's easy to use.
Well, it didn't have anything to do with Bitcoin until some months ago. I just changed that because no one was using it anyway so it might as well serve as another Bitcoin experiment no one uses.
The example should work, however. Well, I guess your feedback was very useful. Thanks!
Some more (constructive) feedback: way too much text. The design of the landing page does matter (even for developers :) some of us at least). There should be a better way to convey the message.
I've changed the landing page twice. Before the current "minimalist" design I had one that was full of colors and partial screenshots. Made no noticeable difference in engagement :P
The idea is great, but price is way too high for a simple site. Many people are interested in anonymised data like pageviews and geographical distribution, for example, but these people pay 10€/year for domain and often 0 for hosting for static site generators. 12€/month is just really expensive at this level, but good luck and I’m sure for many people it’s totally fine price.
I disagree. If he’s a solo developer he doesn’t need to worry about free or cheap people. He needs to find people who value what he has built at a higher level and tailor it to them. The pricing looks great to me.
I agree. It would be better to make the pricing proportional to traffic and have a free tier. With a single price you're both pricing out small people who just host blogs or whatever and aren't going to pay more than $10/year, and also way undercharging businesses who don't really differentiated between $10/month and $50/month.
I thought about this, but I love the unlimited part, competitors start with $9 a month for limited visits, for my your credit card charge will be always the same. No matter how popular your website will get. No surprises.
If it's a single developer just starting out, the cheap customers aren't going to be the ones keeping development going. Increasing the price isn't a terrible idea though.
So I built Simple Analytics. To ensure that it's fast, secure, and stable, I built it entirely using languages that I'm very familiar with. The backend is plain Node.js without any framework, the database is PostgreSQL, and the frontend is written in plain JavaScript.
I learned a lot while coding, like sending requests as JSON requires an extra (pre-flight) request, so in my script I use the "text/plain" content type, which does not require an extra request. The script is publicly available (https://github.com/simpleanalytics/cdn.simpleanalytics.io/bl...). It works out of the box with modern frontend frameworks by overwriting the "history.pushState"-function.
I am transparent about what I collect (https://simpleanalytics.io/what-we-collect) so please let me know if you have any questions. My analytics tool is just the start for what I want to achieve in the non-tracking movement.
We can be more valuable without exploiting user data.