Hacker News new | past | comments | ask | show | jobs | submit login
Stop donating your customers' data to Google Analytics (dev.to)
433 points by bakztfuture on Jan 22, 2020 | hide | past | favorite | 156 comments



So the solution is to use Amazon's tracker?

I appreciate the problem and I would like to stop using GA in my static pages as well, but trading one privately-owned software from a tech giant for another privately-owned software from a different tech giant seems a bit ludicrous. I would readily swap GA for some decent open-source solution though.


I'd say the following to this (very reasonable) argument against using AWS: AWS makes money by selling services, not collecting data. Should Amazon make the leap and start harvesting data from AWS for marketing purposes, the data from their analytics platform will be the least of our worries.

Thus far, AWS has proven to be safe for companies to host their data upon, and there have been no leaks of data stored in AWS into Amazon's marketing program. The HIPPA, PCI, and FedRamp certifications help back up their claims that a company's AWS data stays in AWS.


People said the same about Apple, and then it turned out to be bullshit.[0] And, big surprise, their customers did not seem to hold them accountable.

So in the end, paying for stuff just shows that you're a more valuable product to sell. And gives them a great primary key to track you by.

[0]: https://news.ycombinator.com/item?id=22106536


I've said this to most naysayers, but AWS aims itself at businesses, who are much more forward with their pocketbook and lawyers in protecting their privacy. AWS can't safely market data stored in their systems without running the risk of being abandoned and sued into oblivion.

Even Google doesn't dare monetize data stored in their enterprise customer's databases.

EDIT: Failing to offer a privacy-enhancing feature, and actively compromising and mining your customers data are quite different scenarios.


Google doesn't dare directly monetize data stored in their customer's databases. There's zero doubt in my mind they are piping everything somewhere, completely anonymized, and feeding their models.


> their customers did not seem to hold them accountable.

That story is 48 hours old; it remains to be seen what the effects will be.


>People said the same about Apple, and then it turned out to be bullshit.

I don't understand this claim at all. It's always been clear that if your paranoid about the state, to turn off iCloud backups. And it's not like Apple is selling your backups either.


This is a tech oriented forum where we’re (as a group) more privacy conscious and more generally aware of how things are stored than the general public.

And yet even here, there is widespread surprise at the news that iCloud backups are not fully encrypted in such a way that keeps them private from Apple.

If we (as a group) have in some factions been caught by surprise, what are the chances that the general public are also not aware?


>And yet even here, there is widespread surprise at the news that iCloud backups are not fully encrypted in such a way that keeps them private from Apple.

Anyone who thought Apple kept iCloud backups fully encrypted was being willfully ignorant. Apple has been fully open to the fact that they share iCloud backups with the FBI, this exact same situation happened in the San Bernardino case where Apple provided the backups but the FBI cried about how they wouldn’t unlock the phone.

The tinfoil hat advice has always been to turn off iCloud backups. I don’t buy that anyone privacy conscious should have missed the fact that even Snowden was saying “use an iPhone but turn off iCloud backups” for the last several years.

If you got caught by surprise, then you weren’t paying attention. Apple wasn’t keeping it a secret that they would share your backups with law enforcement. The only reason this is possible is because today they don’t require you to enter a password to restore your iPhone to a new phone. Even the way the technology works today implies that Apple can read your iCloud backups without you knowing.


Or is that wishful thinking? AWS and Amazon are in the buisness of collecting, storing, and processing data.


Can you provide a citation about that with AWS and its customers? Amazon itself, I have no problem believing (and have seen it in action. However, AWS is run separately from Amazon('s storefront, specifically).

If AWS were really aggregating customer's data stored in AWS' platform, I think we'd be seeing a lot more about it in the news. And there would be a lot more than just Walmart advocating against its use.


Is that relevant? Nobody can provide such a citation for Google Cloud Platform either; the discussion is entirely around reputation of the parent company, not what the company claims it does with cloud services.


AWS mining their customer's data is a huge liability for their cloud business and a blatant violation of their service agreements. Something that would fundamentally break the concept of using cloud services like AWS or Azure.

This is why this comparison doesn't make sense. Google Analytics is a 3rd party service where you have no control at all of your data. You put a script of your app, and then you basically funnel data to them. That's it.

Using Pinpoint, in this case, is the equivalent of using EC2 and S3. You can control the flow and the lifecycle of data (deleting data forever, for example).

You should be able to trust that your customer data is safe there, otherwise, why use AWS at all? or better yet, why use any public cloud infrastructure provider at all?

If you're concerned about that, you probably should build your own self-managed server infrastructure.


There is one exception to this that I know of: AWS Rekognition will re-use the faces you scan with it for training and "to improve and develop the quality of Amazon Rekognition and other Amazon machine-learning/artificial-intelligence technologies." That's so vague, they can reuse it anywhere Amazon (not just AWS) uses machine learning.

You can opt out though.

First point under Data Privacy: https://aws.amazon.com/rekognition/faqs/


This is why I use Prime photos, instead of Google photos. I need one, so I’ll take the paid one.


I wouldn't conflate their general motives nor their data handling policies between their b2b offerings and their b2c offerings.


The idea that just because you're paying for something it means they're not sharing your data or won't sell your data in the future send a but naive. If you don't control the server you have no idea what they are doing. You're just hoping they don't sell your data.


Over the long term it's possible that maximizing shareholder value requires monetization of additional resources.

The approach of 'today, company X is known to make money via Y, so we can trust them with private data' only works until that data becomes valuable enough for the company to invest in extracting.


Monetizing data stored in AWS would destroy their credibility with corporations, who are much more proactive in protecting themselves (both legally and with their wallets) than consumers are.


Upvoted, and I don't disagree with your premise - we'll see how this plays out over the long term.


Well, AWS launched 13 years ago, and so far so good. It's no IBM, but it is a track record to consider.


I'm only aware of https://matomo.org/ (formerly Piwik) as a good open source alternative to Google Analytics. Are there others?


I can vouch for Countly ( https://github.com/countly ) which is open source, supports a few different platforms (web, iOS, Android), and has a nice administrative web interface.

The web SDK also supports collection of client-side JavaScript errors, which is neat for tracking down bugs and things which might harm user experience.


I'm a fan of Fathom[0]. The amount of data and insight is light compared to GA and others but if it meets your needs it's pretty great.

[0] https://github.com/usefathom/fathom


Do you know if fathom will remain opensource and/or self-hostable ? (https://github.com/usefathom/fathom/issues/268)


Nice answer from developer!:

> We are keeping this version open-source, forever, and committing to maintain it. We also have a business to run, and while we love open-source, it isn't paying our bills (and Fathom takes a lot of work from 2 people to keep going) and we're not a charity.

> If this repo was full of contributions and other folks pitching in, this would be a different story, but it's not—which is totally fine and accepted. But, since we want to keep going with Fathom, we have to separate V1 and V2 so we can make it sustainable. Otherwise we'd have to abandon it (which serves no one).

> If you truly want your complaint heard, maybe contribute to what you're complaining about (financially, time, effort, etc). My wife always tells me that I'm not allowed to gripe unless I'm also taking action.


Kind of a nice answer. Their comment about contributing is not really valid, I'm not going to contribute to something they have already solved and not released to open source. They already have the answer. And I understand the fact that open source is incredibly difficult to make money and maintaining it sucks. But just stop being open source and be another analytic company. Trying to pretend you are both and want contributions to things you've already written in your private repo doesn't work.


I think we read it differently. In my view this is as about as nice as one can be:

> We are keeping this version open-source, forever, ...

Here they are committing to keeping their existing Open Source around instead of taking their toys and go home.

> If this repo was full of contributions and other folks pitching in, this would be a different story, but it's not—which is totally fine and accepted.

I think they aren't asking for contributions to v2 but rather for contributions to v1 which they are committed to keeping Open Source. But even then they acknowledge that users are free to use it without contributing in any way.


The open-source (self-hosted) version of Fathom is dead. The only commits they have been pushing to it in the last 12 months have been upsells for their closed-source & centralized paid offering.


Agreed. They were super nice on their GitHub issues about removing cookies as a requirement (required on mobile) and then it was released to non open source. Don't use Fathom if you don't want to pay. I made that mistake for home projects.



> I would like to stop using GA in my static pages

Why don't you? Do you really need this tracking at all?


Depends on the job, purpose of site, and goals being accomplished. Hard to demonstrate value and improve digital business outcomes without GA (and other tools like HEAP Analytics).


I think it depends on what type of business you conduct...

Just like people are putting Alexa in their homes and businesses, it can be potentially used for anti-competitive reasons, or to get inside information. This will get worse over time as we know...

A simple example is hacking a CEO's Alexa to listen to his phone calls at home to get insider trading tips...

The companies that make these products are not bound by an official code of ethics, and Governments barely understand the implications of technology, much less than corruption of technology. Laws to prevent misuse and manipulation of consumer products are weak, but proper investigation and enforcement of those laws are even weaker.

Google has also been changing Chrome Browser to suit it's information gathering needs as well. If they own the majority of market share, they won't really need their analytic tools on every site. We need to start thinking on a larger scale about how companies can influence culture, markets, and lives, and how to ensure there are proper rules in place to prevent catastrophe.


Yeah, the "value" being shown rarely actually investigates causal impact. Else they'd recognize that the literature shows that digital advertising has no measurable (provable) impact.[0]

I recognize that there is more to digital business than adds, such as paternalistic commercial guidance, dark patterns, web traversal, and so on. However, I haven't seen proof that these patterns matter, especially given recent critiques on A/B testing (relative to multi-armed bandit).

[0] https://www.gwern.net/docs/traffic/2015-lewis.pdf


> Else they'd recognize that the literature shows that digital advertising has no measurable (provable) impact.[0]

This is a gross mischaracterization of what the linked paper says.

"It's extremely difficult to measure the impact" (the claim that the cited paper puts forward) has quite a different meaning from "no measurable impact." The paper is entirely about the difficulty of measuring the impact, and studiously avoids, as far as I can tell, any inference about what the impact actually was in these experiments. For example, Table I gives the mean of the control group sales and a standard deviation, but no mean of the treatment group sales, which you would need to do a statistical test of whether there was an impact; similarly, Table II reports standard deviations of the sales effect and ROI, but not means; Table III presents power calculations based on hypothetical ROIs and the real measured standard deviations, but gives no clue to the real ROIs were. Nowhere does the paper support your claim that there is no measurable impact.

In addition, the situation is very different for small companies that most people have never heard of. The article is citing studies done with major corporations with millions of customers and that are already well established in the collective consciousness. Measuring the effect of an advertising campaign taking you from 3.23 million customers to 3.231 million customers is indeed very difficult, particularly when you might fluctuate by tens of thousands of customers on a weekly basis. Measuring the effect of an AdWords campaign taking you from 200 customers to 250 customers is much easier.


And the freebie GA doesn't limit you to a 25% sample across the board basically the bigger the date range and amount of sessions the more sampling you get.

Id be interested to see how amazons et up would handle the set up I am backup analytics nerd for.

Major beverage brand hundreds of websites, multiple locales per site (a dozen or more for the big brands) on and has to handle roll up as well as well as custom metrics and dimensions.


In this case, AWS is contractually obliged not to use your data. My criticism of the article is that the author mentions that users are blocking access to third party trackers, but that means AWS Pinpoint set up in the way he suggests will get blocked as well. People who have that concern have to expose Pinpoint endpoints on their own domains or implement some other first party tracking solution.


I see a lot of suggestions for free or open-source analytics packages, but I would refrain from recommending anything you haven't personally used.

I've tried to separate myself from Google in various ways, and one of those was to replace Google Analytics with open source software. I tried several; they're all either non-functional out of the box, or require significant time investment to even start approaching Google Analytics.

After losing about a month of stats (which matters when you're also running AdSense), I ended up going back to Google. It took the same amount of time to set up as when I initially set it up: around 2 minutes of adding the tracking code and uploading it.


I’ve used Piwik (Matomo) and the open source version seems to work pretty well, but I’ve never had any particularly sophisticated use cases.


I also use open source Matomo [1] self-hosted with a simple LEMP stack but there is also a hosted option [2].

[1]: https://matomo.org/docs/installation "Installing Matomo On-Premise"

[2]: https://matomo.org/pricing/ "Matomo Pricing"


I'm building an analytics service and thank you for the feedback! I'm currently building a service that's nearly as fast as Google Analytics and simple as can be (although there's going to be a tone of new features soon).

Here's the link: https://sdan.io/pingpong. If you want to signup/give feed back, I would greatly appreciate if you do so:https://forms.gle/MhojBWWfdiWjZatC7 !


This caught my eye:

https://storage.googleapis.com/pingponganalytics/pingone.js

So... I should install a script that loads from Google servers?

Apart from that, you probably don't want to tie yourself to google like that. Once the users have this in their pages they will _never_ update it. You should use your own domain.


Yes. You're absolutely right. I'm just hacking stuff together right now (and recently moved every single piece of infra onto a local server from GCP). I'm still in that migration process.


> I tried several; they're all either non-functional out of the box, or require significant time investment to even start approaching Google Analytics.

It's almost as if you need to be a software engineer and do actual software engineering, to responsibly use tools like analytics.

> I ended up going back to Google. It took the same amount of time to set up as when I initially set it up: around 2 minutes of adding the tracking code and uploading it.

So how much effort is the privacy of your visitors worth, then?

It sounds like deep down you know the right thing to do, which is a lot of work, but seeing everybody (in your bubble of technical peers) just as easily use Google Analytics, makes you feel like you're owed the difference to these profits.

Maybe there shouldn't be a 2-minute turnkey solution to analytics, because even if it's self-hosted, your next excuse is going to be that it requires a significant time effort to keep it secure and act responsibly with the data.


> require significant time investment to even start approaching Google Analytics.

I think that for a lot of the "alternative" analytics tools, feature parity with Google Analytics isn't necessarily a goal, so this may explain your disappointment. I think the only exception here is Matomo, which is the only "advanced" OSS analytics as far as I know.


That's swell and all, but you are going to find businesses still going to Google Analytics because it is easy to setup and free. The cost to consumers by having their data shared everywhere isn't even thought on the horizon.


I'm not ever sure how that comment even relates to mine. I just explained the goals of various OSS project as I understand them. I never said anything about setup costs or price.


They mean that people are going to make excuses regardless, because you get more features for less time, never mind the consequences.

It's people that struggle with the conundrum: "X, Y, or ethical. Pick two".


If you're using AdSense, you're already giving Google your visitor data, no matter what visitor analytics package you use.


If you're using AdSense, you're already -selling- Google your visitor data, no matter what visitor analytics package you use.

FTFY


> I've tried to separate myself from Google in various ways

I'm aware of this. The next logical step is to find a better ad network.


Please also bear in mind that any developer's personal time (whether your own or someone else's), while important and valuable - is also a trade-off against the aggregated time and value of your users' privacy.


... which, if users actually valued, they'd signal by ceasing to use one's site.

There is basically no strong indication right now of any large segment of users boycotting sites because the users care about privacy. There's the same small amount that have always been present and the number doesn't appear to be growing.


It seems to me like a person could care about “these n people lacking privacy in this way” more than n times as much as they would care about any one of them marginally gaining or losing privacy in that way?

Or, idk if that is quite the right formulation for what I mean.

But, at the least, it seems likely that some people will sometimes be willing to take an amount of effort to protect the privacy of a large number of people, when they wouldn’t take that same amount of effort to just protect their own to the same degree.

It seems likely to me that a major impact of lack of privacy comes from many people lacking privacy, in ways that wouldn’t happen if it were only a few lacking it, and also where a few re-gaining it doesn’t influence the impact all that much.

If so, then people not avoiding something because of privacy concerns in sufficient numbers to substantially influence the amount of use, doesn’t seem to entirely rule out that they care about privacy. Perhaps their behavior could be attributed to a collective action problem, where they each would prefer that all of them avoid it, but don’t find it worthwhile to be among only a small number of people avoiding it.


The headline is wrong, it should be changed to "Stop donating your customers' data to Google Analytics ... donate to another large corporation instead!"

There are much better options out there. Quite apart from the solutions listed in these comments, a better option is to reconsider whether you really need analytics at all. Maybe the answer is yes if you are a business trying to understand your customers. But not every blog and project page needs analytics.


Or you could write the 10 lines of JavaScript that'll do what 99% of people use Google analytics for


If you think you can replicate that with 10 lines you have no idea what Google Analytics is used for.


I'm ashamed to admit that I use GA on my blog to essentially count page views. The other information is interesting but mostly unused (by me). I would be far better served by a tool or service handling server logs (any recommendations?). But GA is 0 friction, so it's what I picked up back in the day. I suspect there are a lot of people in this boat.


Even for page views, 10 lines of code won't replicate GA. Try counting how many hits, and you will find that all the bots and spiders quickly make the numbers meaningless.

Of course, if that is all you are doing, you should be using Matamo or Fathom or whatever, but it is not fair to say GA could easily be replaced.


Many of the common web log analyzers are a bit long in the tooth.

I've have used GoAccess for a while now and is mostly happy with it. It's fast enough and can generate pretty good looking static html which is mostly what you want for those simple use cases.

A side effect of processing log files is that you can freely try software on historical data.


Do you have a recommendation on log format for GoAccess? I run a lot of custom services with no nginx etc in front, so I'll have to figure out the logging myself.


You can always create an issue on their github page, lots of help in there: https://github.com/allinurl/goaccess/issues


Matomo can also be used to analyze server logs.


19USD/mo for 3 sites seems pretty steep, and I have 0 interest in managing my own PHP service. Is there anything cheaper in this space?


Just a shot in the dark, but if you use Cloudflare already, their stats include view counts. It's also probably more accurate than relying on JS tracking tools because they can count the actual requests to your site.


My CloudFlare stats show multiple times as many hits as GA. I'm guessing it's not handling bots/spiders or something.


We put this down to people with JavaScript disabled or ad blockers. CF does filter bot traffic but will capture all hits as it’s tracking through network requests. For us it’s around a 35% increase in what GA reports.


For me CF is showing 3x GA unique visitors. Though that's almost all HN traffic, so I suppose it's certainly possible that 2/3 HN users have blockers.


Those 10 lines probably depend on an npm package that relies on 100 other npm packages.


Feel free to post those lines


I reckon you'd need more than 9 "\n" characters to get it done.

But in seriousness the 10 lines would be just use local storage or wotnot to store a tag, then call tracker.com?tag=... on each page load. "Rest is done on the server (TM)"


This really reminds me of the infamous "you can just use FTP instead of Dropbox" comment.


I don't disagree, but in a bit of fairness, a lot of people just use GA for page-counts and basic correlation stuff...you could do that in a relatively small amount of frontend JS stuff and a slightly-more-complex backend API to handle the basic correlation stuff.

That said, that will only do about .01% of all the features of GA; like the infamous "FTP vs. Dropbox" the premise itself isn't exactly "wrong", just missing a bigger point.


Could you link me to the FTP vs. Dropbox discussion? I am curious. I haven't used either for years and the implication seems to be that there is a profound difference, so I wonder what it is.


This is the 'infamous' original thread in question: https://news.ycombinator.com/item?id=8863


That is cute, thank you.


A bit tangential but a quick click on the author's name in the article and their bio reads:

> ex-Amazon contractor, front-end lover, accessibility nerd, down for building cool shit, especially Vue.js and Amplify.js consulting

My alarm bells ring when the answer to "stop using X" is to "start using Y" where Y == company I worked for.

This isn't to say GA is or isn't problematic, but the article's bias is problematic.


It would be handy if people listed here all the alternatives that don't steal your customer info.



I gave many of these a visit.

fathom - Looks great. I am OK with closed source products (my motivation is self-hosting/privacy) but the direction is not clear to me. Maybe they will have a blog about it at some point - https://github.com/usefathom/fathom/issues/268. Having multiple code bases is going to be super hard.

goaccess.io - this analyses web logs

google-analytics-proxy - project is dead

matomo - this is what i use now and it works great. has a lot of quirks but if you spend some time, you can make it work.

ackee, goatcounter - simple but looks like this does not track users/sessions. it's mostly for page hits.

countly - looks good if you are enterprise. there is no pricing :(

freshlytics (from another thread) - page says it's in beta and not production ready


GoatCounter author here: doing some form of session tracking is on the roadmap; check back in a few months. The project is still quite new, with the first "real" release only being last week :-)

As for Fathom, I find that last "since that people are confused"-comment rather funny, since their messaging on this has been confused for almost a year, haha


Will do! I am already following your project :)


- Simpleanalytics[0] (also commented elsewhere in this thread)

- Goatcounter also came up a few days ago on HN and got a lot of traction/discussion[1]

I can't find the comment it was posted in, but a HN user did a good comparison of a few privacy-focused analytics tools[2]

[0] https://simpleanalytics.com/

[1] https://news.ycombinator.com/item?id=22044854

[2] https://dev.to/hmhrex/a-comparison-of-the-top-3-privacy-focu...

Edit: I found the OG comment for the blog post: https://news.ycombinator.com/item?id=21716544


If you're using WordPress, there is https://wordpress.org/plugins/koko-analytics/. Open-source, self hosted, no external services & very performant.


Building https://sdan.io/pingpong. We're dedicated to building the fastest client-side load analytics engine available. If you want to stay in the loop on how development is going please do so! https://forms.gle/MhojBWWfdiWjZatC7


I think including prices in this list would significantly improve it.


Or just use server log analytics. Client-side analytics are a significant contributor to the proliferation of JavaScript bloat and unnecessary 3rd party cookies.


Depending on which solution you use; some of them are just a few KB, which is not so much.

Doing log-analysis has its own drawbacks: not everyone has access to them, bot traffic will be a lot higher, and certain information is hard to access (like screen size). You can't always "just" use it.


I've been surprised lately by how much more pleasant, readable and usable the news/blog-web is with JavaScript turned off. JavaScript is basically just used for the user-hostile ad-tech.


I've been wondering about this recently: perhaps we'd actually be better off with simpler browsers that did less so that the websites would do less?


I've been thinking the same thing, and started[0] experimenting[1] with some ideas. I think it be fun to make a web browser that implements a few HTML tags, flexbox and a few other CSS primitives (no animations), and no JS. Sites that are compatible with it would still work on current browsers.

[0] https://anderspitman.net/19/#netcatable

[1] https://anderspitman.net/17/#curlable


It's why I'm somewhat against WASM, even though it's very cool from a technical standpoint. It makes the web even more of an operating system, where I'd like it to be less.


I recently found https://text.npr.org/ (linked down in the footer of the main NPR site) and it made me very happy.


You automatically get that if you visit them from the EU and decline tracking, it’s amazing :)


Agreed if you're already self-hosting. However, I don't know of a JS-free solution if you're hosting on a 3rd party like GitHub pages or Netlify for example.

(Netlify does sell access to log data but it looks expensive for most hobby / personal sites)


You can probably use the "tracking pixel" method with at least some analytics tools. This is a very old which probably predates even the invention of JavaScript.

Basically, if it accepts a GET with query parameters, it should work.


Right, but don't you still need to own a server in order to log these GET requests?


You can do it with one of the hosted services. I don't know which ones support it exactly, but GoatCounter does (although it's kind of an undocumented feature until I merge PR #122).


Cool, I saw GoatCounter on HN a few days ago. I've looked into ways to get a user count without JS and without paying for hosting, but no luck yet. If you find out what free services support it I'd love to know.


Well, GoatCounter does :-)

But any service that sends something like "GET example.com/collect?path=/foo" can be loaded with an <img>. It's perhaps not an explicitly documented feature, but it will still work :-)


Perfect! I'll give it a go :)



Disclaimer: I am the founder of RudderStack (https://github.com/rudderlabs/rudder-server)

With our open-source data collection framework like RudderStack (an alternative to Segment), dumping data into a warehouse (Redshift/BigQuery/Druid etc) and sticking another open-source visualization layer on top (e.g. like SuperSet), it is possible to put together an alternative to Google Analytics. One of our early users did it and we wrote a blog about it

https://rudderstack.com/blog/open-source-analytics/




This seems like more of a promotion for AWS Pinpoint than a criticism of GA.


Interesting that the page breaks if you're using Adblock because of Google Analytics being in the URL.

Shows me a fun 'You are not connected to the internet' page that lets you doodle on the page.


Not on Firefox with uBlock Origin.


I'd check your filters - EasyPrivacy has -google-analytics- blocked in the URL

I got to it by adding the following filter

    @@||dev.to/goatandsheep/stop-donating-your-customers-data-to-google-analytics-191?i=i$xhr,1p


EasyPrivacy was enabled.

My Firefox was a minor update earlier than the one on the sibling comment (72.0.1). It has updated now, and the site claims that my connection is down on my machine too.

Now that I have seen the message... It's a funny thing for a web page to claim.


Using Edge (Chromium) with uBlock Origin


Same with FF 72.0.2 + uBlock Origin


The problem is usually competing with "free" and Google knows this, there are privacy respecting alternatives like https://www.visitor-analytics.io though.


> Tracker blockers are increasing in popularity so consumers can protect themselves against this tracking, reducing the effectiveness of your analytics.

More to the point: there is probably going to be a bias in the analytics. Different people have different reasons for protecting themselves against tracking, but it is highly unlikely that people who are unaware of or disinterested in the issue will use a blocker.


"My competitors tracking solution is ridiculous. You should get your head examined if you use it. You should use mine instead."

Terrible argument.


You seems to have missed the arguments made tho. You get to avoid the cookies and as someone else pointed out Amazon doesn't use the data. It's your data.


Did not read through, but from a quick look, I suspect anyone can grab the code, and fill in your AWS with terabytes of garbage data which will end up in an enromous amount of dollars in AWS billing.

Am I missing something?


Machine learning. How can you say no to machine learning? Did I mention machine learning?


Interesting topic. This among others is one of the reasons we started building Harvest. Just as with Google Analytics, you can start tracking data with just a small snippet of Javascript.

We use Splunk as our data engine and you can install it on your own server. This way you have full control, access and ownership of your data without letting third parties get any data. In that sense Harvest is basically the infrastructure that allows you to collect, store, use and visualize your data.

Besides that, we have been focusing on features that will help companies comply with privacy regulations. It is proven that this is not always easy in the complex world of online data.

For more information check https://harvest.graindata.com/en.


The suggested Google Analytics implementation today is a collection of three separate Google technologies: the original GA, Doubleclick cookies to track demographics and interest, and Tag Manager to manage them.

The original GA does not give Google useful cross-site user data because it uses only first-party cookies and anonymizes data as it collected it. To my knowledge you can still implement GA this way If you want to. Such an implementation would be GDPR compliant in not tracking any personal data, although your counsel might still say you need to list them as “analytics” cookies in a cookie banner (mine did).


> anonymizes data as it collected it

No, they don't anonymize the collected data (for any reasonable definition of "anonymous". The IP address alone gives GA a very close approximation of a unique key, and their own documentation[1] explains the "anonymization" process:

    "... the last octet of the user IP address
     is set to zero ..."
(if the logged event doesn't opt-in to this behavior by adding &aip=1 then GA presumably saves the entire IP. How many GA users bother setting that option?)

The 8 least significant hits of an IPv4 address are the least interesting. The remaining 24 bits gives GA the ASN and is a lot of entropy for fingerprinting. It would be trivial to recover a unique key from the "anonymized" address by combining it with other analytics data, other cookies, timestamps.

[1] https://support.google.com/analytics/answer/2763052?hl=en


Yes, you can configure Google Analytics so that no data is shared with other Google services, at least no data about single visitors. I also came to the conclusion that using GA this way complies with the GDPR and I don't really understand what all the fuzz is about.


As someone going through this right now, the main difficulty in being GDPR compliant with GA is the cookie problem.

You can either disable cookies to run GA in cookieless mode [1], which presumably will affect how GA performs, since they can't determine repeat visits (but this might be fine, depending on the type of site you have), or you need to gain active consent to enable analytics cookies [2], which isn't much good if you want metrics for all users, not just those that opt-ed in.

If someone has solved this reasonably, I'd love to hear how! For now it seems like cookieless is my only option.

[1] https://developers.google.com/analytics/devguides/collection... [2] https://ico.org.uk/for-organisations/guide-to-pecr/cookies-a...


> Such an implementation would be GDPR compliant in not tracking any personal data, although your counsel might still say you need to list them as “analytics” cookies in a cookie banner (mine did).

Your council should also have advised you that you need active consent in your cookie banner, since GDPR raised the standard for consent, which is the stumbling block I'm facing. [1]

[1] See "In brief": https://ico.org.uk/for-organisations/guide-to-pecr/cookies-a...


I didn't know about AWS Pinpoint before, but from what I can see, it only offers analytics for email and other messages, not for web pages, so presenting it as a full alternative for Google Analytics is misleading.


The article doesn't even seem to mention anything for which GA is nearly indispensable. E-commerce analytics, conversion funnel visualization, customer segmentation, etc.


The author wrote a tool to make it do that I guess.


I have fun looking at these stats (sites with Google tracking vs. sites without) in a Firefox addon I made: https://bitbucket.org/tayler/google-spy/src/master/, https://addons.mozilla.org/en-US/firefox/addon/googley-eyes/


While running my first business GA was not really usefull (We used internal tools easier to integrate in the code and adapt to our needs).

However GA data showed its usefullness when selling the business. The data was considered as a trusted source of information for the buyer. And all the definitions (unique user, etc) were aligned with the buyer's, so it was easier for them to assess the metrics.


I tried Matomo (Piwik) recently, but I only do log analysis and it doesn't really treat log access as a first class citizen. If you use Javascript tracking, it's probably the right way to go.

I switched back to AWStats for my personal stuff. It's probably too basic for business or company apps, but for your personal stuff without javascript/cookies, it's still a great analytics tool.


I have created a simple workflow using AWS Lambda + Kinesis + S3 to track our customers and not to have any 3rd party dependency. It took roughly 2 weeks but it is worth it since do not leak customer data and we have much tighter control over what we collect (no PII except the source ip that gets hashed in the process).


FYI, if your setup relies on API Gateway, you probably could use VTL / Mapping Templates to directly send from API Gateway to Kinesis and skip the lambda altogether, like some do for dynamodb

See https://hackernoon.com/serverless-and-lambdaless-scalable-cr... And https://aws.amazon.com/blogs/compute/using-amazon-api-gatewa...


Woo thanks! I did not know it. I might re-architect the workflow to have this.


> I have created a simple workflow using AWS Lambda + Kinesis + S3 to track our customers and not to have any 3rd party dependency.

Except for each of the 3 components you listed that make up your system. They are 3rd party dependencies,


Everything is a 3rd party dependency then. The only way to not have a 3rd party dependency is to build your own infrastructure and use open source solutions (and even with OS you're still dependent).

I think OP was clearly referring to a self-managed solution as opposed to a set of 3rd party services like GA, Segment, etc, where the flow of data is out for your control.


I meant no 3rd party dependency on storing customer data that requires extra legal work in GDPR land. Maybe we need to include AWS in that though. I need to look into it how cloud vendors are 3rd party in that sense. Is there a difference between Google Analytics vs. storing data on S3 even if we do not collect PII?



Why not spin your own? This tool comes with a lot of tools out of the box and can also run personalization techniques and more: https://harvest.graindata.com/en/store


a few weeks ago my blood needed a checkup. They sent me the results by mail. The results where on a non password protected but 'unguessable' url. And the page ofcourse contained google analytics, I'm in the EU, I wonder if this is legal


If they haven’t notified you, and hence you can’t/didn’t comply, it probably isn’t. Especially medical companies are scrutinized for following gdpr. You could make a case here to either the companies privacy officer, or your countries privacy watchdog.

https://gdpr-info.eu/art-39-gdpr/


You really have 2 choices:

Are you relying only on data you can get from your app? There is no reason not to build your own solution.

Are you relying on data you can't get from your app/website? Then you can only use GA, since FB does not have a service like this.


The main issue is that competition is basically cut out due to the free pricing of GA.

Very few businesses/people would choose to pay for something when GA is free. Why do that? To tell your customers "we value your privacy"?


Don’t you think that, in today’s climate, customer knowledge and involvement about online tracking, fingerprinting, filter bubbles etc. is at an all time high? And that makes this the best time ever to indeed be proud to tell your customers that “we value your privacy”.

It’s one of Apple’s strongest marketing pillars.


Related: the discussion about Goatcounter https://news.ycombinator.com/item?id=22044854


I'm currently building a free analytics service that's the fastest. Ever. Faster than Fathom, Simple Analaytics, pretty much everything except Google Analytics you can think of.

https://sdan.io/pingpong

Still building it, but you can sign up for when it launches here: https://forms.gle/MhojBWWfdiWjZatC7 (I know it's ironically on google forms and I'll move away soon)

> https://sdan.io/pingpong


> I'm currently building a free analytics service that's the fastest. Ever.

How do you intend to make money from this free service?


If you're getting over a million hits I might add some incentive to donate, but mainly I take this as my payback to the developer community. I'm launching another product in a different domain at the moment and hoping that can compensate.

At the end of the day, if anything goes wrong, I'll always be happy to open source the whole thing.


I have used simpleanalytics for a while. It offers a lot less granular information because it deliberately collects less for privacy reasons


The problem is GA is orders of magnitude better than the competition for price to performance ratio.


Is there a free alternative that has feature parity with GA and isn't very difficult to set up?


Yeah right, stop using GA and start using Amazon analytics. Very good suggestion.


And firebase analytics is much the same as GA right? In fact I think it’s even viewed on the GA portal.


Connecting Firebase to Google Analytics is optional, for now.


> And if you think that's okay, you should take your head out of the sand because consumers are demanding it. Please tell me how many of your users like the large cookie agreement popups that they have to dismiss...I-I mean read and accept just to consume your content. Agreements that you're forced to have them agree to because you're using cookie-based trackers like GA.

I think that's the heart of why I so despise the GDPR. In an intent to change site behavior, politicians passed a law putting a burden on sites that did an undesirable thing (rather than, say, making the undesirable thing itself illegal).

Perhaps they thought sites would avoid the burden.

Did they not anticipate full shifting of burden onto end-users? Because being able to know how a site is used is extremely valuable to the site's owners.


Anybody ever heard of server logs?




Applications are open for YC Summer 2021

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: