Hacker News new | past | comments | ask | show | jobs | submit login
How many users block Google analytics? (2017) (wesleyac.com)
265 points by luu 30 days ago | hide | past | web | favorite | 234 comments



One of the corollaries to this is that if you are basing your decisions on analytics, you are probably under-representing people who block them, who often tend to be a non-representative subset of your users. I have seen multiple projects do things like stop supporting features that their analytics “showed nobody was using” but failed to realize that a significant portion of their (often technical and very vocal) userbase did in fact use the feature.

Oh, and by the way, responding to these kinds of issues with “if you wanted your voice to be heard, you should have turned on analytics” is inexcusable.


War story: we turned off the “print” button on MSN because it was unused. Users in Japan complained - they were printing articles to read on the train. Turned it back on in that geo


Why not turn it back on for everyone? If you have to maintain the feature anyway, might as well offer it.


page load times go up


Unless you’re doing something silly a button can’t be more than a hundred bytes or so. That’s only significant on a 90s era cell phone.

Then again most browsers have a print button.


They also won't have to include the print stylesheet (https://developer.mozilla.org/en-US/docs/Web/Guide/Printing#...)


Can’t you just load that when they click it?


1 9 9 5


Back in 1995 all websites shared one global stylesheet.

Every time a new rule was added, you had to wait up to 24 hours for the css servers to propagate.


Yes, the browser has a print function, although you may need a CSS code specific for printing rather than screen, if you want a good result. (If you do not use any CSS at all (which is not so common these days), then you probably don't need a CSS specific for printing either, though.)


There is also bloat from the UI perspective. If you keep adding features they will get harder for users to discover.

You have to delete something every once in a while to make room for something new.


I'm glad you reposted your comment.

If you want to read the data behind many of the decisions, read these papers [0].

[0] https://exp-platform.com


That page has 30 papers so it is basically a (hopefully unintentional) "go fuck yourself".


you're right. I was responding to the initial, deleted comment.


Additionally, that page tripped Canvas Blocker a half dozen times..so I intentionally told it where to go.


Probably faxed them to their friends and family too. I sympathize - paper is really nice


Are adblockers very prevalent in Japan?


Dunno. I haven't worked on a Microsoft property since 2009.


What year was this?


2007? 2008?


Wow, thanks for answering! More recent than I expected!


Sure. If you want to read more, search for ["Jim Pierson" Microsoft "page load time"]. A lot of his work informed what we did at MSN.


When I turned off my Windows 10 Telemetry, I expressed a clear choice that I'd rather not give my data, and it's fine for my voice to not be heard. I just really don't care about Windows 10 that much. On the other hand, I turned on all the telemetry for OctoPi, Steam, and a bunch of other softwares I care about. I want my voice to be heard for those cases.

Just curious, what's so inexcusable about that?


How I use a software is not an indication of how I want to use a software. It’s what I’m doing with the options the software has given. If I’m really lucky, I’ll be like everyone else, and an inefficiency in the software will be spotted and fixed so the software will be slightly easier to use. In practice, I’m just doing workarounds to accomplish things the telemetry won’t explain on its own; and even if other users would want to do these things, I’m one of very few actually doing them. So... I don’t really want to trade potentially a lot of personal information just for... I don’t know... to make it easier for developers to develop for the lowest common denominator rather than me?


"How I use a software is not an indication of how I want to use a software."

Exactly. The ergonomics of much software is pretty damn terrible with users struggling to get done what they want to do. Thus, developers who rely on telemetry only cement-in bad problems.

It's for this and privacy reasons I always turn telemetry off.


It's not even the options the software has given. It's the options you have awareness and recall of being given.

It's very valuable feedback to hear, "wait, it does that?" Yeah, we spend a quarter million dollars on a feature that our users don't even know exists.


"How I use a software is not an indication of how I want to use a software."

Sure it is. It's not a perfect indicator all the time, but can definitely pinpoint a common workaround users perform due to a lacking or unfriendly interface. It's just a helpful data point though, not a silver bullet.

I'm with you on the trust issues though when 90% of software companies abuse our data. Just another reason why we can't have nice things.


It really sounds like you’ve never worked on a product team that actually used analytics tbh.

> In practice, I’m just doing workarounds to accomplish things the telemetry won’t explain on its own

This is one of the most obvious things people are looking for in analytics. People will always use your product in unpredictable ways to get the whatever functionality it is that they really wanted. This is one of the things product managers are most interested in knowing about because they either want to properly implement that functionality, or they already have it and they want to know if they’re doing a bad job of getting their customers to adopt it.

It’s incredibly unlikely that you’re such an ultimate power user that what you want out of the product is so unique that you’re inventing your own usage patterns that others aren’t also following.

I completely agree that you should have the right to not share your information with anybody you choose. But if you choose not to share any information about your use case with the company that makes the product, then you don’t have much right to complain about it not doing what you want it to.


> It really sounds like you’ve never worked on a product team that actually used analytics tbh

Please don't cross into personal attack. You don't need to, it poisons discussion, and it evokes worse from others. Your post would be fine without that bit.

https://news.ycombinator.com/newsguidelines.html


Analytics driven development is like flying a plane on instruments only with all the windows covered up. I mean, sure, you'll get there, but compared to flying normally you are missing out on so much.

Watching individual user sessions from session recordings provides more depth, but you are still missing out on great swathes of information.

Analytics and session recordings don't give anything near as much concrete data as taking the time to engage with your users. You then use analytics to confirm behaviour.


> sure, you'll get there

Hum... On real life you would quickly discover how vulnerable all of your instruments are sensitive to both random and bad-intended interference.

You can apply the same phrase as a metaphor for software.


> Hum... On real life you would quickly discover how vulnerable all of your instruments are sensitive to both random and bad-intended interference.

Well no, commercial pilots have to be rated IFR. Flying via instruments only is completely normal and part of the training. Visual is perhaps easier for many people, but you lose the use of visual navigation in all kinds of circumstances, like at night, over the ocean, or in a storm.


Analytics driven development is a strawman. It’s simply one of many factors that can influence development decisions. Depending on the particular circumstances, it may be incredibly valuable, or not valuable at all.


> But if you choose not to share any information about your use case with the company that makes the product, then you don’t have much right to complain about it not doing what you want it to.

I generally disable the analytics in software I buy, but I do participate in surveys they send my way. Would that count for you?

Forums can also be a rich source of use case info for software developers who really care.


I haven't worked on a product that used analytics to drive feature development.

What you wrote here sounds very naive to me.

I might be completely wrong, but i assume if you have a large enough user base you would use telemetry only in an aggregated way, and then prioritize the majority of the user base. So I might totally be underrepresented but still have the same loss of privacy as everyone else.

I also feel i always have the right to complain about a product, publicly and vocally (it's just a different channel for analytics/telemetry, imo). Much like product owners have the right to just change it remove features that break my workflow.


What I have written here is based 100% on working with product teams who use analytics very competently to influence feature development (which is almost never driven by one single factor).

> I might be completely wrong, but i assume if you have a large enough user base you would use telemetry only in an aggregated way, and then prioritize the majority of the user base.

In some cases yes, but you need to get that aggregate data from somewhere. It’s also naive (and probably a bit arrogant) to assume that your requirements aren’t in fact well aligned with the majority of the user base.

This is however a somewhat immature way of using analytics data. There’s plenty of incredibly valuable data you can get from peculiar and unpredicted usage patterns. Any product manager will tell you that most customers are quite bad at expressing what they actually want. A good product manager can gather a fair amount of insight from what they choose to complain about, but analysing the most immediately obvious trends really only gets you so far. You’ll often get the most insight from the rare customers who are able to concisely articulate their feedback, and those that are frustrated and ingenuitive enough to subvert your application to their will.

For any motivated organisation, there’s countless ways to get insight out of their analytics. You’ve just described the lowest hanging fruit.

> I also feel i always have the right to complain about a product

Depending on where you live, you likely have the right to complain about anything you like. But if you want to prevent an organisation from analysing your usage of their service, then complaining about them not meeting the needs of your use case is hypocritical.


It’s also naive (and probably a bit arrogant) to assume that your requirements aren’t in fact well aligned with the majority of the user base.

No. In any high-dimensional space (in this case the dimensions are features used by a given user), the vast majority of the elements will be outliers in at least one dimension. Everyone has idiosyncrasies. It is arrogant to assume that everyone is the same as oneself.


Because telemetry, by default, subverts consent of what I’d like to share: I may care about feature X and would like you to know I am using it, but I may not trust that you’re not also logging me when I do sensitive operation Y that I might not want to share (or even want to share with you that I used feature X at 10 PM on Friday). Perhaps it would be better if programs let you review telemetry data before you sent it, but I am unsure if anyone will actually do this or if people can meaningfully take the time to essentially review and opt out of things they don’t want to share.


I disagree because telemetry collects _your_ data. This means, in an ideal world, if you are pissed off enough about the company collecting it, you would have a chance to alter it and mess up their metrics. Your data => you can and have to right to alter it.

The problem is those companies view it as their data for which you have no right to modify it. Let alone chose to transmit it at all.

Before anyone replies this would invalidate telemetry, ask yourself this question: "If you see a spike in data that makes no sense, isn't that an indication that somehow your data collection policies are pissing off people?"


> This means, in an ideal world, if you are pissed off enough about the company collecting it, you would have a chance to alter it and mess up their metrics.

I really don't have the time or malice to do this. I can and do move on to something else.


Customers should not have to surrender their information to be accounted for by the business they pay.

Giving a business your data is an act of charity. It should never be an expectation.

Yes, it costs more to collect the data manually. It usually costs more to behave ethically when you aren't legally required to.


I get where you're coming from, and this is a problem, but this isn't a practical or pragmatic stance. Typical web/app tracking is orders of magnitude cheaper than research.

> Yes, it costs more to collect the data manually.

Not just that, but you also get data biased to people willing to spend two hours answering questions for a $25 Amazon gift card.


> Not just that, but you also get data biased to people willing to spend two hours answering questions for a $25 Amazon gift card.

Sure, but the whole point of this article was that the telemetry data is also biased.

There is no magic bullet. Your options are to get biased data the cheap way or the ethical way. Either way, the data is biased and you have to consider that when making decisions.


> Typical web/app tracking is orders of magnitude cheaper than research.

It doesn't just ethics, and I think it should be "first, do no harm", as in, first, be ethical.

Btw, opt-in metrics are ethical and cheap afaik. As for biased? all metrics are biased...


Its not charity, I allow them to collect my data so that they can use it to improve the service.


Then you are being charitable.


No since I'm not giving my data for free. I expect service improvement as a return.


You're already paying for the service. The service improves because the business providing it has to stay competitive to keep getting payed.

A business is not obligated to improve a service just because you donated your telemetry data.


Yes and in addition I also pay with my data. I didn't simply donate my data. I expect the company to use my data and my money to improve the service. Even with money, the company is not obligated to improve service just because you pay.

If they stop improving, I'll stop paying them as well as stop giving them my data.


> Giving a business your data is an act of charity. It should never be an expectation.

Alternatively the business could pay us to do that but obviously they resist the idea.


Most web telemetry packages are extremely creepy, either by sharing all the data with tons of third parties, or by slurping up insane details to try to specifically fingerprint each user. Trying to tell users that they have to opt in to that sort of tracking for their voice to be heard is inexcusable.

Not all software telemetry is that creepy, but things like google analytics definitely are.


Windows 10 telemetry is pretty scary. I used to do the Windows Insider program, but noticed that they want to collect every website you visit, which seemed a little much to me. Now every week or so Windows complains that I'm not getting insider builds anymore and asks me to reenable the telemetry. The hilarious thing is that I still am getting those builds; when my computer "blue screens" it's green and claims to be a prerelease build. Makes me wonder if they really turned off telemetry.


Office telemetry is worse. There are many cases where your data goes up to Microsoft.


There are many cases where your DOCUMENT is uploaded to Microsoft.

Much clearer.


I haven't heard of this, do you have any references?


It’s hard to find now with O365, but the process that reported bugs to Microsoft post crash would always do it by default.

It’s not as big of a deal now that most users upload everything anyway.


What's your build number? You may still be on an Insider build and since a new "stable" build hasn't been released, Windows had nothing to upgrade to.


18363.657


Yeah, exactly. "Inexcusable" seems a bit extreme.

Are you suggesting companies spend 10x on manual user research? Are you willing to pay more for a product because of increased costs to understand what users actually want?


Yes, I expect them to do manual user research. And “making the users pay” is the wrong way to frame this - product pricing and labor aren’t that tightly correlated.

Besides, When we bought magazines the editors didn’t know what stories people were reading or what words you were spending more time reading. When you buy a tool the manufacturer doesn’t know how you use it. Why software gets a free pass at getting all that data for free without asking is beyond me. What I expect are for regulations to eventually hit this industry.


> wrong way to frame this

this is the reality of how businesses operate.

> product pricing and labor aren’t that tightly correlated.

If you're the staff accountant, sure. You're technically correct.

If you're the CFO and I'm trying to convince you we need to spend $1M on user research instead of $100K, you can be sure this is taken into account when modeling monetization strategies to recoup R&D costs.


> this is the reality of how businesses operate.

It's the reality of how a subset of industry operates, which doesnt make it less wrong.


Call me stuffy, but somehow I think that, for example, Ms Office managed to have more user-friendly UX (Clippy and performance aside) back when they had 0 telemetry.


>Are you suggesting companies spend 10x on manual user research? Are you willing to pay more for a product because of increased costs to understand what users actually want?

Move your analytics to the server and stop tracking my every mouse move and page scroll. Needing to know that stuff suggests you're either creepy or that your organisation is saturated with marketing and SEO wonks.

User research and analytics are two entirely different things. Your product will suffer if you neglect the first.


> Are you suggesting companies spend 10x on manual user research?

if they want to make their product better, that's what they should do, instead of drawing wrong conclusions from the limited data that telemetry gives you.

> Are you willing to pay more for a product because of increased costs to understand what users actually want?

It's your business. Add telemetry, have it turned off by a significant amount of your users which might not be an average representation of your user base.

So your go ahead and optimize for a biased subset, maybe even interpret some of that data wrong, and before you know it, for iterations later some fancy startup is stealing the show, because they simply read all the detailed complaints about your software on reddit and HN instead of jacking off to analytics data.


I wish there was a simple one-button toggle to turn it off completely, but there's not.


try hosting your own dnscrypt-proxy in combination with https://github.com/notracking/hosts-blocklists. That will turn off most trackers on your entire network.


Some MS apps will explode if they can't send telemetry. I tried it with MS Teams and it just kept bloating memory until it had to be killed. A mitmproxy config worked there to fake a 200 response to the telemetry endpoint and keep memory down.


Serial numbers filed off: an author announced on his newsletter that he was going to turn on the "feature" of his mail service that would automatically unsubscribe anyone who didn't read the newsletter.

It tracked this, of course, by assuming that the mail would be opened in a web browser that would make requests for images.

He cancelled that decision:

"Logged opens for each newsletter are between 53% and 60% -- but an experiment a while back revealed that hundreds of you aren't logged, for various reasons around email security and that one guy who reads everything on a 1980's greenscreen monitor. Clicking of links depends massively on what's in the newsletter, plus the note in the previous sentence, but has gone as high as 41% of readers on one week this year."


> Oh, and by the way, responding to these kinds of issues with “if you wanted your voice to be heard, you should have turned on analytics” is inexcusable.

Some folks seem to think analytics is the only way to get feedback from users.

Just ask me. Survey, email me directly, anything but siphoning all my data off to Google in the process.


"Oh, and by the way, responding to these kinds of issues with “if you wanted your voice to be heard, you should have turned on analytics” is inexcusable."

Authentic question: how would a business know this otherwise in an actionable and effective way?


Server-side analytics will tell you in most cases, and if they don’t, they’re easy to add in a way that makes them unlikely to be blocked?


Server side analytics are overwhelmed by bots and crawlers and other non humans in my experience


Maybe we should we whitelisting for analytics. Instead of perpetually trying to identify bots, Pick the 20% of your users who you are pretty sure are actually humans. Nielsen-style, if you will.

Maybe two cohorts. The people we're really sure about, and the ones we're pretty sure about. If they diverge, ask why.


> Instead of perpetually trying to identify bots, Pick the 20% of your users who you are pretty sure are actually humans.

Note that reCAPTCHA thinks you’re a bot if you block analytics…


Not that that matters to me since I have (re)captcha blocked, too...


Blocking reCAPTCHA has lost me hours of work, repeatedly. And I still do it.

The Textarea Cache Firefox extension https://addons.mozilla.org/en-US/firefox/addon/textarea-cach... is incredibly useful, but I keep forgetting to install it when I get to a new machine. I forget only once.


In general it's quite easy to filter out bots and crawlers from your basic access logs, as most bots and crawlers will identify themselves as such.

If you're running anything with an API, then unless somethings horribly wrong it's even easier: look at the number of requests being made to an API endpoint and spot check a few of the user identifiers (tokens, keys, whatever you're using) to see the variety of users.

All of this is assuming you're trying to merely investigate the volume of use of a feature, not trying to diagnose demographics. If you're trying to extract more fine-grained detail, I don't have as many answers; I hope others will chime in with constructive ways to get things like geographic demographics via server logs.


A very sizable portion of bot traffic does not identify itself as such. I don’t know if it’s a majority now, but it could be.


Many bots and crawlers are designed to be indistinguishable from humans.


That sounds like a server-side problem, not a client-side one. Don't expect me to solve it for you at my end.


Just ask people!


People are bad at analyzing their own behavior.

I don't use Google analytics but I have seen time and again vocal users who seem ignorant of their own usage of an application...and very much not representative of the majority.


People are bad at analysing metrics, too. You make a change and users spend half as much time on a page. Did you make the page twice as easy to use, or so bad that they gave up?


I honestly think that a simple “we are considering removing feature X, please let us know if you’re still using this” can work extremely well. Everyone who cares about the feature will be sure to write in.


How? In surveys? No one does those.

Plus, people often don't give valuable feedback when asked questions about features they want and use... people are poor judges of what they actually want, and will list things they think they care about and then end up never using or which don't affect their choices.


In surveys? No one does those.

We do. We have real customers do real surveys of our real web sites. In person. We even do shadowing to see how real people use our sites in their daily work.

It's uncommon in SV due to laziness and an unwillingness to talk to actual human beings. Which is dumb because there are companies that will handle this for you.

But SV is stuck in this mindset that everything can be solved by an algorithm. It can't. The tech echo chamber really needs to get over itself.


How? In surveys? No one does those.

Our customers certainly do. We get excellent results from asking a few simple questions now and then, providing both a good source actionable feedback on feature requests and any current problems, and often some encouraging comments that reassure us we are basically doing things that our customers like.

It doesn't just have to be surveys with lots of participants, though. For example, we've known for decades that a simple observational study with just a handful of people is often enough to identify most of the serious usability problems with an interface.

The idea that everything important must be reduced to automated analytics and number-crunching is a very strange disease. Even if the numbers don't lie -- and as we see here, that is far from guaranteed -- you still need to be asking the right questions and comparing useful alternatives for the results to be valuable.


"...someone recently explained to me how great it is that, instead of using data to make decisions, we use political connections, and that the idea of making decisions based on data is a myth anyway; no one does that." -- from https://danluu.com/wat/

Sorry, your comment just reminded me of that. Are surveys perfect? No, but they have their uses, and plenty of companies find real value by making use of them.


Ask thousands of people?


Or have a company do it for you. There are plenty of them out there.


You ask a sample.


I don't understand why first-party analytics and JavaScript aren't more popular, but I suppose it's a matter of cost in the serverless and freemium world. But for VC stuff... There are plenty of pre-built analytics kits out there that can run from your site and you would have none of these problems.


Is there something that will integrate with nginx or rails ? I am having hard time finding these.


Matomo can easily share nginx with whatever site you are running. It's PHP/MySQL, which these days for a low impact site, will run perfectly well out of the box, no running required.


https://matomo.org/faq/how-to-install/faq_116/

Just searched "nginx 1st party analytics"


A/B testing is where beloved products go to die.

You can't just follow analytics. You have to understand your users.


Any specific stories?


Depends on the service, but if they are ad supported, people who block analytics probably also block ads, so they don't really have any incentive to cater to them.


> “if you wanted your voice to be heard, you should have turned on analytics”

It's true though. If you want to be recognized, you can't be incognito. That's like refusing to vote and then complaining about politics.


If I'm a product manager, and I cut a feature based on inaccurate analytics and lose paying customers because of it, blaming the customers ain't going to bring back the money I was making off them.

Chrome's analytics ought to say nobody uses Incognito Mode, but they'd be dumb to remove it.


I don't usually get to pick and choose what I share when you run analytics. For example, I might be fine with telling you that I saw a certain article on Hacker News; I however might not be OK with you seeing all the articles I click on.


But the vote is secret. Analytics is not.


Analytics should be one of your sources. Server logs is another. I know of a company that has a whatsapp group for people who've contacted them via support, and one more for "fans". Each of these bring novel perspectives, and its up to the org to extract signal out of them.


They should have analyzed their logs.


Not everything is visible in normal server side logs, especially if they have a lot of client side actions.


Then create a custom endpoint that records those client side interactions in a database.


So... client side analytics? Like Google Analytics?


Obviously not a third party like Google, especially not Google.

Inform your users honestly and do not hide things. Make promises about tracking data usage and stick to them. Tell them the truth and give them a choice. Stop acting like you got something to hide and do your tracking yourself.


I run a non-tech eCommerce site in North America that does $6-8m a year and I tested two 90 day periods year over year and found 12.4% of transactions didn’t show up in Google Analytics. I haven’t segmented them by geo yet but about 75% of my sales are in North America. There was only a 0.1% difference year over year.


This is very similar to my experience. I recently measured the difference on my site as 10%-15% (also a non-tech site, though sadly not at $8m a year yet!)


This test compares:

1. Number of visitors as recorded in Google Analytics

2. Number of loads of a 1x1 pixel served on a different domain

They see higher numbers for (2) than (1), and attribute the difference to users blocking Google Analytics.

I don't see them describing how they excluded bot traffic, however, and for my sites the majority of hits I get are from bots. Only some bots run JS, so I suspect their numbers for blocking users are thoroughly diluted by these bots.

(Disclosure: I work for Google, speaking only for myself)


Only some bots execute JS but even less bots fetch images.


It's not just two numbers of total hits the article is comparing.

The author extracted the browser information from the server logs (presumably from the User-Agent header i guess?). If they were able to do this, i'd assume they also filtered out bots from the tally :)


I'd expect serious bots to masquerade as whatever the latest chrome user agent is.


Then you would expect for Chrome to show a higher number of GA blocking "users". But that's not the case: the article mentions that the percentage of users blocking GA on Chrome is on par with Safari.

And I don't know what you mean by "serious". The most common crawlers (Google, Baidu, Yandex, etc) identify themselves as bots on the User-Agent very clearly. Personally, those are the ones that I'd call the most "serious". And also the ones which I've seen generating the most on servers.


The net is full of unidentified bots scraping content or looking for vulnerabilities (contact forms , wordpress logins, etc). On many occasions I had traffic issues and had to check logs and these were very hard to block, because they ignore the robots file, don’t advertise themselves in the User Agent and use a large pool of IPs.


I don't know why this comment is downvoted, it mirrors my experience. I'm responsible for a few domestic high traffic websites and have done some analysis from log files to find suspicious traffic, i.e. user agents saying they are chrome but not loading images or css files, having many page views (i.e. 50 where our average user has 2) etc. It wasn't foolproof but the false-positives where < 10% in my random checks. These bots made up ~10-15% of page views.


I meant bots with sufficiently sophisticated adversarial motives (ad fraud? blog comment spam? automated wordpress exploitation?), who I'd expect want to avoid being recognized as such.


Ah, I see what you mean. Those are serious-ly malicious bots then! But yeah, completely agreed; those can be a PITA on sites with user-generated content.

But, is your experience that these kinds of bots cause much traffic? Because, from what I've seen, they can make a mess with fake accounts, fake content, fake clicks, etc, but as far as traffic goes, they were completely dwarfed by search engine crawlers and real users' traffic.

Thanks for the clarification :D


Mhm, we have a bunch of non-crawled content that sees a significant minority of request volume from disguised bots. Overall, it is definitely dwarfed by traffic from real users, but it still forces a lot of work to prevent the bots from gaming metrics/analytics.


Yeah, the numbers in the article are so far off my intuition that I'm happy to latch onto any explanation for why they're weird. Being unable to effectively discount bots seems likely.


I'm reconsidering my intuition in the face of the fact that the sample is from OP's blog and not a customer-facing business.


How do you identify bots on your personal sites?


++ for checking that you were using the correct pronoun for the author!


I didn't check; I just use "they" when I don't know


I run a geek-tech blog in Hebrew. I used analytics for a while in the past, but the numbers were completely off. There were posts with almost more comments than visitors based on the analytics.

Took me a while to realize that most of my readers block analytics since they're super privacy-saavy. I shut the tool down, it's useless for some crowds.

Nowadays I also doubt if it's ethical for any crowd.


What's interesting to me about the number of people who block Google analytics, is the number of people working with those analytics in product management, marketing or SEO, that are apparently unaware of anyone being able to block Google Analytics.


Everyone I know who works in digital marketing (a lot of people) use ad block on their personal devices.


I often wonder whether Page and Brin install an ad blocker.


I wouldn’t be surprised if it was mandated by company policy at google and Facebook.

Any company that does machine images for employees should ship them with adblockers preinstalled. There are zero downsides.


The Timberland website used to block people from adding any items to the shopping basket if they had Google Analytics blocked... and thats when I switched brands.


How would they? Their analytics don't show it.


I think very few people who block Google Analytics are actually aware that they do it. They install an adblock to block annoying ads and enable privacy options to block scary trackers because they read somewhere that if they don’t do it they can be hacked and have their nudes leaked. Those are stuff done by shady companies and bad guys. Google is not one of those, it’s a respectable company. Only paranoid nerds who use that weird duck site go out of their way to block Google.


This is why I have analytics on the JavaScript side and server side. I can calculate those who block my analytics and at the same time capture a lot of the relevant information


Subverting the desires of people who would like privacy is not a good look.


I mean, on the one hand I can see that view point —

On the other hand, that’s a fairly entitled viewpoint.

Users are using infrastructure I finance, to do things on the website(s) I created. If you want privacy from the services youre using, create or host your own.

I keep the data secure and don’t give it to 3rd parties. That information is used to fix bugs, improve / build services, etc.

I’m not “subverting” privacy, users are coming to my house and playing with my things, so to speak.


Yeah, at some point we're going to have to address that (internet) communication is a two way street.

Until then, your HTML is interpreted by my browser, any resources are going over my network (and only my network if they're 3rd party) and the JavaScript is running on my machine . So I'll be having them obey my rules.


I’m not sure I understand your point. It’s you who requests those resources, nobody pushes that data towards you. Are we debating adblockers? Because that is a different topic than server side log analysis and I agree that we should be picky about what we process (but it’s a thin line, you are using resources from the owner of that service though).


So a cryptominer won't bother you then.


It’s you who requests those resources,

That does not matter, at least in the EU where citizens are protected by the GDPR. Article 6 of the GDPR:

Processing shall be lawful only if and to the extent that at least one of the following applies:

(a) the data subject has given consent to the processing of his or her personal data for one or more specific purposes;


You are now changing the argument from a moral one to a legal one. The GDRP may or may not proscribe certain things, and it may be a good law or an unjust one.

However, OP does not understand why you are so entitled to want to use their website for free, and then also want to tell them that they is not allowed to make note of you having done so.


You are changing directions with your comment.

"his or her personal data"

Analytics is not personal. No one shares their names or social security number. And I strongly agree with people above mentioning The Visitor is the requester. We provide services, they use it, we want to understand what we are doing by checking analytics. Nothing immoral about this.


You don’t get to redefine ‘personal data’ ;). The GDPR is clear about that:

* ‘personal data’ means any information relating to an identified or identifiable natural person (‘data subject’); an identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person;*

In the legal opinions that the EU provides with the GDPR, random tokens that are associated with a person are PD. An IP address is also considered personal data:

https://ec.europa.eu/info/law/law-topic/data-protection/refo...

Nothing immoral about this.

What is ethical is a personal opinion, but collecting personal data (following the definition above) is simply not legal without consent in the EU.


IP addresses might not even get associated with analytics events - just used for basic counting of different users before aggregation? At that point you have no right to it.


Item (f) in that same Article 6 can be applied to collect data without explicit consent:

(f) processing is necessary for the purposes of the legitimate interests pursued by the controller or by a third party, except where such interests are overridden by the interests or fundamental rights and freedoms of the data subject which require protection of personal data, in particular where the data subject is a child.


IANAL, but privacy regulators in various countries (but also the GDPR) have been fairly explicit that this cannot be used for blanket collection of data. E.g. GDPR recital 47 states that there must be a reasonable expectation of the data subject that such data is collected, e.g. because they data subject is a client (in the non-technical sense) of the controller. The purpose for the collection should be specified and properly communicated. Also AFAIK all the rights of the data subject are retained. E.g. they can request the data and ask that the data is removed.

General analytics on a website are probably not covered under f, since it is not necessary and not what you’d expect when you visit a website to which you have no customer relation.

There are clear cases where one has to collect data, even without consent, such as fraud prevention.


On the other hand, that’s a fairly entitled viewpoint.

Users are using infrastructure I finance, to do things on the website(s) I created. If you want privacy from the services youre using, create or host your own.

IANAL, but this is not legal under the GDPR. The GDPR requires op-in. Moreover, you cannot make non-anonymous[1] data collection mandatory to use a product, unless it is necessary for the product to function (in a strict sense).

[1] Anonymous does not seem to include pseudonyms like a random identifier in the GDPR, since pseudonyms could be linked to real identifiers in the future.


[flagged]


Good luck taking a vacation in the EU or (maybe) having to change planes after getting diverted.

As long as you stay away from the EU, they'll probably not touch for GDPR.


Sending a User-Agent header unsolicited seems like user error, not bad intent on the part of the site owner.

I do the opposite, my webserver sends back some information unsolicited about my production environment that you can track if you want.


If you don’t want to give your information to a site, don’t use that site? That’s a very different proposition from not wanting to give your information to a third party.


It’s not always clear that a site is logging things server-side.


Yes it is; all sites log.

Not everyone does smart things with the data.


This argument works equally well in both directions: if you don’t want people to modify your site with client-side scripting, don’t serve it to them.


Sure. I’m not seeing the link?


"X is not a good look" should be left in the echo chambers of Twitter. It comes across as smug, because you assume you're an authority on what a "good look" is, that is, what the correct thinking is. But you're doing this while the reader is noting that you failed to express a real argument, so they're not sure you even know what the correct thinking is, let alone why it ought to be correct.


> "X is not a good look" should be left in the echo chambers of Twitter.

I'm not on Twitter…

> It comes across as smug, because you assume you're an authority on what a "good look" is, that is, what the correct thinking is. But you're doing this while the reader is noting that you failed to express a real argument, so they're not sure you even know what the correct thinking is, let alone why it ought to be correct.

I think it's fairly obvious what my argument it: people who are blocking client-side analytics have expressed a clear interest in not being tracked. Saying that you're tracking them anyways on your server shows that you don't really care about their wishes–and as other have mentioned, it might fall afoul of active consent legal requirements. Considering that my comment is currently at the top of this thread, I think it would be fairly difficult to not know what my argument is.


You can have analytics and respect users' privacy. As long as your data are anonymized.


We do ~1.1m sessions per month according to GA, comparing to our Clofudflare visitor data real number is ~35% higher. I think CF is going to be as accurate as possible and both say the filter bots out.


I recently got tired of Google Analytics (for a number of reasons) and switched to Fathom Analytics.

Now, Ghostery says I have "zero trackers" on my personal site.

Fathom doesn't collect personal info about my visitors. They just show me my aggregate metrics (popular pages, top referrers) which is all I need anyway.


Unfortunately it's just a matter of time. I have no doubt they will get added to uBlock pretty soon. Most people don't make a difference between tracking anonymously for data aggregation and tracking individual users. Many are paranoid and think that any type of tracking is bad.


It's because it only takes one flip of a switch to go from collecting aggregate data to collecting heavily personalised data.


Do the browser And/tracker blockers also block Fathom? If I switch, will I see higher numbers than with Google Analytics?


> I was expecting Firefox to be the largest, but it was a bit surprising to me that Safari and Chrome were approximately equal

Don't forget about all those smartasses spoofing their user agent just to make your life even harder.

https://addons.mozilla.org/en-US/firefox/addon/random_user_a...


other smartasses runs sites that block firefox with "unsupported browser" message while the site's working perfectly fine


Note that the 0% for Android is because Google banned ad blockers on the Play Store.


Try Firefox mobile with ublock origin. I wish more Android users knew about this.

Sidenote: ads on sites degrade the experience so badly, whenever I browse without adblock I am honestly shocked at how bad it is, and I am amazed others don't seem to care.


I'm confused, on my stock Android phone, I use Firefox with ublock origin. So the Play Store blocking ad blockers cannot be that effective?


Root users like myself use tools like AdAway to managing hosts blocking. This will help cover a lot of ads in apps as well. You also have alternative WebView implementations like Bromite (although I wish it were easier to install and worked on Android 10) that can give you ad blocking in apps using web views too.

This is also missing the part where you can run Android without the Play Store using microG and F-Droid + Aurora. It's not a huge number, but there are still a lot of people blocking ads in one form or another--even if it's as simple as using Firefox or Brave for basic browsing.


You don't even need root, you can have an ad blocker as VPN (AdGuard, Blokada).


True! Do those work while simultaneously actually using a VPN?


On the Samsung Galaxy you just install Samsung Internet and from within it you can add AdBlock Plus (and most of the other adblockers). They are delivered through Samsung's apk store.

It saved the mobile surfing experience for me (was using chrome on the mobile before)..


So, is this why Google keeps complaining about Samsung's bloatware?

Because I don't think the amount of bloat on their phones is even unusual. Yet, Google keeps singling them out.


I am certain many jailbroken androids are out there using adblockers. Firefox also has plugins internally blocking ads.

https://addons.mozilla.org/en-US/android/addon/adblock-plus/


You are doing HN readers a disservice by recommending ABP! It is mafia software.

https://news.ycombinator.com/item?id=17639624

https://medium.com/@trybravery/please-stop-using-adblock-but...

https://translate.google.com/translate?sl=auto&tl=en&u=https...

Recommend uBlock Origin instead. "Free. Open source. For users by users. No donations sought." https://github.com/gorhill/uBlock#installation


Please take a look at Blokada: https://f-droid.org/en/packages/org.blokada.alarm/

You'll have to get it from the F-Droid app store, but this doesn't require a rooted phone. It uses a local VPN to block ads at the network level, hence no ads in your apps as well

I've been using this for a month and it works wonderfully


DNS66 from F-Droid works well. It routes all web traffic through a tunnel applying hosts file(s) to block domains associated with ads and other dodgy cruft.


Turned it off via PiHole, so my ChromeOS laptops and Androids are covered at home, still trying to get Wireguard set up on the PiHole to allow remote VPN, while suffering double-NAT/CGNat through my AT&T hotpsot as home connection. Turns out it's not a slam-dunk!


> There was also one BSD user in the sample, who blocked google analytics.

There's always one


That one needs to up their anti-fingerprint game. Showing up as BSD narrows your identity group very significantly, and I would wager is enough to identify you uniquely in many cases when correlated with all other information.


FWIW the numbers for my blog are broadly similar, with a larger sample taken when I had a blog post on HN, with about 2x higher unique user numbers from Cloudflare analytics (server side) than Google Analytics.


Is there a free server-side analytics service that’s not a pain to set up? I’m using Netlify, but enabling analytics is 9$/site/month, so I tell myself that GA is just fine.


I like GoAccess: https://goaccess.io/


Side note: looking at the 'live demo' (https://rt.goaccess.io/) I see a very large number of people using MSIE (9%) vs Edge (1%).

I wonder if this is actual data?


Probably is. There was a very long period of time where I used to use Internet Explorer 11 religiously (though obviously Firefox when that was installed), since both Edge and Chrome were the only other options. I only stopped when the lack of ES6 support started causing me major problems.


Cloudflare if you want basic analytics.


Not only advanced users. At least as of 2019 we for example have pihole installed LAN wide at home and Lockdown on all portable devices. Also recommend or installed on devices of anyone I know, technical or not and of course my number one recommendation to anyone is to install Firefox and avoid chrome.


I refuse to implement client-side analytics; for one thing, it is a waste of bandwidth, and for the other thing, it is not how software should be designed, and for the other other thing, it is unethical.

I do log some stuff sent to the server, as well as some stuff about the response (such as the HTTP status code, data size, timing, etc), although I do not sell this data to anyone else, and this logged data is reduced further if the client sends a "DNT:1" header.

But decisions about how to make something would normally be based on actual comments by the users, rather than analytics, I should think.


Does this take into account bots and other non-human stuff GA might block? Server logs and GA are measuring different things.


This. Does it take into account scraping bots? I remember looking at my server logs many years ago, finding a surprising amount of Chinese and Russian visits for my personal website. Eventually I realized it was Baidu Yandex type stuff


I don't think there is an effective way to distinguish bots from real users when a UA is faking what it is. IP filtering or other measures can help, but you ultimately rely on the UA strong from the UA.


That's true for the server logs but GA has access to much more information to make a decision like that. My point is that comparing server logs to GA results is invalid.


How many users who block GA are also going to not be honest about the OS or browser they claim to be? I suspect that number is going to be a much higher percentage than of those who don't block GA...

I'd imagine most users who block GA/ads on desktop would also want to on mobile, but can't just because it's so difficult to set up an adblocker on mobile.

A HOSTS file or blocking DNS server will easily do that, my whole network in fact has GA and a bunch of other crap blocked this way. On the other hand, setting up a MITM proxy/VPN is much much harder on mobile. However I am surprised at the 0% for Android and 17% for iOS blocking GA --- I was expecting it to be the opposite, with the former being historically much less of a walled-garden than the latter.

On the other hand, perhaps everyone who blocks GA and uses Android is in the aforementioned situation of not saying that they're using Android --- they may be reporting a Linux or some other user-agent.


IOS is super easy to block ads. There is an api for Safari so apps can use it. I like to use https://giorgiocalderolla.com/wipr.html


Note that this only works in Safari and in certain contexts inside of apps.


I find it quite surprising how many companies don't invest enough in having a certain maturity in terms of web analytics, despite they heavily invested in CRM, machine learning and other fields. Ad-blockers are still not considered a fact in data gathering nor in QA or, in the best case, just a minor singularity. Despite different sources agree on a +30% of usage, at least in Europe, they're still ignored. Still many business decisions are based on solutions like GA. If decisions were to be taken on any other source, let's say polls, marketing studies, etc., after being warned of a 30% of uncertainty, people would be very reluctant to choose A over B. This is not the case for web analytics, it seems. Now technology has evolved and any medium size company should harvest their own data (yes, stop giving away some of your most valuable data to third parties, please!).


would it be naive to say that sites can get around the blockers by proxying their analytics through their own domain? Isn't the lazy way of sending everything directly to ga from the client the cause here -- and any thoughtful site owner should be able to circumvent this if they truly care to?


ssshhhhh. Don't give people ideas


That's actually (as far as I understood) something ublock origin seems to address in one of their most recent updates. See 1.25.0 release description:

> From now on uBO will CNAME-uncloak network requests.

https://github.com/gorhill/uBlock/releases


That only blocks CNAME-based domains, not proxying. proxying is pretty hard to block unless the request is using paths that obviously go to analytics/trackers.


I have started some weeks ago this project: https://github.com/iris-analytics It's a small JS that gathers data and sends it to a Go backend to then be stored in ClickHouse. Although there's lots to do, we use it in production successfully. Remember ClickHouse was born precisely for web analytics where a single instance can handle hundreds of millions of inserts per day with no effort. I did this because stats say adblocker penetration in Europe is beyond 30% and this would give us real time insights with no sampling and ad-hoc queries.

If you want to help me out, you are very welcome!!!


Perhaps you should post it on "Show HN".


Thanks for the suggestion. As soon as I document it better I certainly will. This was more of a call for help :)


I actually noticed this when testing what kind of users Facebook ads brought to my site. The numbers from Google Analytics didn't seem to match the number of clicks reported by Facebook.

As a side note, those users were useless anyway as all of them were bots: https://www.reddit.com/r/marketing/comments/4smisl/facebook_...


I run a web-based tool for devs and about 40% of customers aren’t showing up in Google Analytics.


Expect at least 5% of traffics are bots if you have a large amount of traffics.


Is it really that hard to self host analysis service natively from the web server?

Why would you handover the data to third party when you don't have to?


How many “users” of google analytics are worth it anyway?


Good luck explaining not having google analytics to your client when a 3rd party company begs you to add them so they can track their marketing campaigns.


everybody should :D


2017


Why are we talking about a blog post from 2017?


I think this is a brusquely-worded request to put (2017) in the title.


Narrator: not enough...


everyone who understands how google is tracking them across the web hates it


It's from 2017 but even more valid today. GA is blocked by most ad blockers thus no company can rely confidently on it anymore. uBlock blocks also a few other tools like MixPanel that helps companies understand their users behavior and how they use their product.

I can't believe how ignorant people that say "use server logs" are. They clearly haven't done any online marketing or run an online business, yet they want better and cheaper products - even free if possible.

How do you think a company gets to improve and optimize their product? By surveys? I think many assume the entire analytics required for a business is just reading a few GET requests from the server logs and categorizing them by user agent.


> They clearly haven't done any online marketing or run an online business

As far as I'm concerned online marketing is cancer. It acts against me, wastes my time, etc. Marketing should be about presenting your product in the best light possible, and stop there. You shouldn't be allowed to track or waste other people's time to promote your product - being in business is not a right after all.

> How do you think a company gets to improve and optimize their product? By surveys?

Yes exactly - companies were in business just fine for over a century and they didn't have analytics, why should it suddenly be required?


I really hope one day you start your own online business and see how it's actually run.

It's one thing you imagine a business should be run in an utopian world where people respond to surveys and know exactly what they need to solve their issues (Henry Ford said it perfectly "If I had asked people what they wanted, they would have said faster horses") and another thing how it works in real world.

> Yes exactly - companies were in business just fine for over a century and they didn't have analytics, why should it suddenly be required?

This is a very ignorant response so I assume all your answers are the same.

> As far as I'm concerned online marketing is cancer.

I think you meant "advertising"... if yes, in some cases you're right.


> This is a very ignorant response so I assume all your answers are the same.

If that is ignorant then I guess most companies founded in the 20th century (without analytics, stalking and targeted advertising) must not be real then? In fact I'd argue most of the money that funds the cancer that is modern advertising, marketing, etc was made before such things were actually invented.

> I think you meant "advertising"

I kind of agree, and this is why I clarified my definition of marketing. For me, marketing is about putting your product in the best light possible on your website (or physical space if you're into retail), so that people who stumble upon your product (randomly or by searching for it) will be encouraged to buy it.

The modern definition of marketing however seems to be stalking (aka analytics), spam (newsletters, push notifications), creepy targeted advertising (often relying on the previous items) and so on, essentially forcing your product onto people who haven't asked anything. I consider this modern marketing to be cancer.


You know, before UX was captured by A/B test fanatics, there was a discipline known as "user testing". It had all kinds of interesting tools, for dealing with all kinds of user distribution and sample size.


so that makes spying and pervasive tracking okay? because it makes money? no! businesses have managed without previously. the world won't end without GA.


Google Analytics spies you? Lol. Those tools that you use daily on the browser that saves you time and makes your life easier got to that point because they had a way to learn how they were used. Seems that you don't make a difference between ad related tracking and product based tracking.


> Those tools that you use daily on the browser that saves you time and makes your life easier got to that point because they had a way to learn how they were used.

As far as I am aware, Hacker News has never used Google Analytics.


Hackers News doesn't need GA. A product that relies on data to make decisions does. People here are delusional, sorry. I don't care I get downvoted, but that's the reality.

I'm totally against ad tracking where a company can identify what I do on multiple websites and link my browsing behavior. But I have no problem to let a company track my clicks and what I do on their website if it's used to improve their product and ultimately to benefit from it.


Google is an advertising company. Google Analytics permits it to identify what you do on multiple websites and link your browsing behaviour.

And yet you have no problem using it when it's included as a third-party script, because you want to give the first-party that information. Why do you suddenly forget about the third-party (Google) in this framing?


> A product that relies on data to make decisions does.

My experience tells me that most "data-driven" products are absolute trash. The best products I've used were developed in the good, old-school way of someone having a concept of how the product should be, common sense, user research and feedback.

Nowadays I see shit changing all the time for no good reason because I guess somewhere there must be a 0.1% short-term increase in some metric, while annoying a bunch of people and eroding their goodwill.

I also don't feel "free" using a product with analytics because I don't always want my usage pattern to skew the data and make the product change in any way. There are times when I do weird things for good reasons, but it doesn't mean I want the weird thing to become the primary way to interact with the product because someone wanted to make a "data driven" decision.

> I'm totally against ad tracking where a company can identify what I do on multiple websites and link my browsing behavior

You do understand that third-party analytics services can do exactly that? And those provided by advertising companies like Google probably do use the data for their own purposes as well.


>Hackers News doesn't need GA. A product that relies on data to make decisions does.

That doesn't mean you have a right to collect it from me, and if your business model fails as a result, well that's a shame for you.


Or the ones commenting never used. I am certain many people read, chuckle and don't bother explaining. Any small/medium size company running an e-commerce platform will be using Google products, including GTM, analytics, ads, youtube ads.

It is impossible not to use it at this point. If you don't use it, you will be forced to add it for another 3rd party working with you or with your client.


> It is impossible not to use it at this point.

Maybe the reason is that a lot of products are just useless trash and don't actually fill a legitimate need? They only sell because of the advertising exploiting a weakness in people's mind, but wouldn't sell on their own because it turns out people don't actually need this product?

If you make a product that truly solves a problem, it should sell itself. A good example of this is Monzo - they founded a new modern bank that addresses all the problems of the legacy banking industry. The product sold itself and they reached a million customers without any marketing.

In any case, whether you need ads and tracking to be in business is one thing, but being in business and profitable is not a right, so if you can't do so while respecting the law such as the GDPR then it shouldn't be our problem as users.


> Google Analytics spies you?

That is literally its purpose, yes.

> Seems that you don't make a difference between ad related tracking and product based tracking.

Now that's an interesting question - does Google combine data from GA into their other tracking, or does that data stay separate?


It isn't that hard to get around ad blockers for doing analytics.

You can hide third party tools behind a proxy for example or write your own custom tracking software.


Oh yes, you can explain that to the advertising agencies which mostly run Google Ads for clients. Google ads need google analytics 100%. Any online marketing will go thru GTM (google tag manager). They are dominant, a monopoly.

YC news tech gurus will come up with all sorts of geeky server side solutions but people need to accept Google Ads is a must for most companies out there, and one will use google analytics one way or another. No escape.


Well let's try in small ways to change that, for instance by blocking it and encouraging others to do the same.


> yet they want better and cheaper products - even free if possible.

I want good things, I have them, and now I want greedy, needy people to stop pushing their mediocre, poorly understood imitations to drown out the really good things we might use and nurture instead, hurting us all and even themselves in the process.

> How do you think a company gets to improve and optimize their product? By surveys?

If what you make serves a purpose other than generating needs to make a profit from, then you'll probably be fine with mostly simply paying attention to what you're making, using it yourself, and occasionally making surveys and collecting metrics from volunteers to see if there's anything you missed.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: