Hacker News new | past | comments | ask | show | jobs | submit login
13% of my website visitors block Google Analytics (markosaric.com)
187 points by markosaric 21 days ago | hide | past | favorite | 172 comments

Seeing that Plausible Analytics is not using cookies and instead hashes fields including IP address - the measure of "unique site visitors" could be a proxy for false uniques by Plausible Analytics. For example, an user visits once via his data connection and once via wifi - Plausible may see different users where Google's cookie will pin it as one user.

Another interesting result is the 16% of total page views - I don't think false uniques could changes this. But it will be distorted by correlation of views-per-user with blocking settings.

I think this is probably correct. I wrote my own hit-tracker for my website and I spent a lot of time looking at logs to see if I could reliably track unique users by hashing headers and the IP address. My conclusion is that you can't - you will always double count some users.

It is no uncommon for consecutive requests from obviously the same user to come in from different IP addresses. Usually these are either cell phone users whose network changes from second to second, or browsers behind big proxies with multiple servers fetching content.

For example, I would see this quite often (all fields made up): index.html mobile-browser-v25 style.css mobile-browser-v25 header.jpg mobile-browser-v25 code.js mobile-browser-v25
Plausible would plausibly see that as 3 different users.

Another problem is bots. So many bots. Some that take the trouble to mimic real browsers - they don't know they are bots. And they have a plan.

Most bots don't bother to execute javascript but analytics that rely on single-pixel webbugs and the like will likely show inflated numbers.

> they don't know they are bots. And they have a plan.

Nice subtle Battlestar Galactica reference there.

Business users behind an HTTP proxy will be another problem, as they will all have the same IP, and be using the same OS and browser (mostly).

That turned out to be less of a problem than I anticipated. Browsers spew all kinds of mostly unique information into the User-Agent header. This is slowly getting better as the browser companies wise up.

Aren't Google planning on killing the User-Agent header altogether?

I don't see why the focus is on "unique" visitors. To compare to older media, if someone buys two newspapers then they are a double-good customer. Either they are spending so much time reading it themselves that they need two copies (ie reading via two devices/IP addresses) or they are giving that second copy to a friend (two people, one device). So imho page view should be valid regardless of uniqueness (absent bot detection etc).

If the purpose of your website is to get people to sign up for the trail of your SaaS product, then you have very different reactions to "ten people looked at my product once and one signed up" vs "one person looked at my product ten times before finally signing up." In the first case, you work on improving your landing page. In the second case, you let people sign up for an email newsletter.

This depends on monetization.

If you're selling copies of newspapers, 2 purchases by 1 customer = 2 purchases. The number of customers is irrelevant.

If you're monetizing via on-page ads or affiliate links (as the author is), revenue is much more likely to correlate to unique visitors than to page views.

>> revenue is much more likely to correlate to unique visitors

That's the bit I don't understand. Why does an advertiser not value a customer seeing an ad twice, or for a longer time? In fact I would say that those customers are much more valuable because their multiple views bookend an extended period of viewing an ad. If I view a website at home in the morning, then again while at work, I have been exposed to that ad for many minutes, as opposed to one-time readers who click away within seconds. That has to be worth something more.

You're on to more than you think.

In traditional media, it's pretty much an axiom that people have to see the same ad at least seven times in order to get the message. There is no sin in showing the same advertisement more than once. It's why, for example, advertisers pay extra to have their ad appear at both the beginning and end of a commercial break on television. It's called "bookending."

When digital advertising started, the ad companies decided that someone seeing the same ad more than once was a bad thing, and then convinced their advertisers this was true.

I don't know why this happened. Maybe the digital people had some data that proved it. Maybe the digital people were just computer people and not advertising people and so didn't know about the decades of prior research into this.

But it's where we are now. Digital advertising companies value unique visitors, so web site owners do, too.

I dont know why either, but to me viewing traditional TV with same ads every 15 min or 30 min is not an annoyance, but doing it on Youtube or Website is god damn annoying. To the point where I think the brand is trying too hard and I have negative feelings with it.

There are huge difference in Online / Digital Media consumption compared to traditional media like Tv and NewsPaper. And may be I am old I tend to value the latter a lot more.

I always thought this was mostly to prevent fraud. If you pay for multiple views by the same user, it's much easier to generate revenue from fake views.

The value is screen time. If a user spends ten minutes on a site and comes back on a different IP for another ten minutes they're twice as valuable even without knowing they're the same person.

Well, if I were buying an ad I guess I would pay more to reach 10 people one time, rather than 1 person 10 times. It would increase my chances of getting a receptive customer.

Afaik if you visit the site from your phone and laptop on same wifi they will count as 1, which wouldnt be the case with google' way

None of my website users need to block Google Analytics.

Same here. Also, I don't regret it: I realized that most data from GA is actually garbage. Both because of its dubiousness and because "page visits" do not correspond to any meaningful actionable business metric, unless you are in the adtech business (in that case, rethink your life choices).

These days, I worry about signups and subscriptions, and run my own A/B (bayesian bandit) experiments.

This is exactly how you recognize a good internet marketeer. They don't focus on getting a lot of traffic, they focus on getting the right traffic.

Not sure what you mean by "from GA" given that the tool collects what you tell it to collect. The data is only garbage if you don't have or hire the skillset to use it, and when you do you'll have proper leading indicators for your business metrics and quantitative prioritization support.

Mine either. They also don't have to worry about Facebook analyzing click behavior via the SDK.

Sigh - the number of websites which were broken when browsing the internet from China because trackers from Facebook were not wrapped in exception handling and the endpoints were blocked.

Literally couldn't give the companies business.



Because when I visit a theatre website to book tickets, whether or not Facebook can stalk me is an orthogonal concern to the primary one of "let me give you my money for your tickets"

The number of pages that silently fail or time out after 5-10 mines due to these assumptions is really disappointing.

We live in a globally connected world, maybe assuming that a person in China today might be still in China tomorrow is a bad one. But in the meantime you may have left money on the table.

Pretending that any resource will always be available is shoddy programming. Check your return values, trap non-fatal exceptions, and always assume remote network resources might not be available. Forget China; there are many reasons an HTTP request to facebook might fail.

A website that ignoring errors when fetching a remote resource is bad programming for the same reason ignoring "file not found" errors when calling open() is bad programming.

Maybe they shouldn't. But they probably should care about users blocking Facebook if they are at all ethical. They should probably care about Facebook breaking their site if they care about uptime or might compete against Facebook.

Are you not tracking your sites performance at all or are you just not using Google Analytics of Facebook analytics for that?

Are there any (open source) privacy friendly alternatives with similar basic features?

Piwik is probably the most “out of the box” open source GA alternative, there’s also Snowplow if you want it to be a full edge-through-to-data-warehouse solution.

Piwik is now Matomo

Sorry to say, but without analytics and campaign analytics, you probably don't know what your user want or need. I don't like the analytics auto hate behavior.

Edit: I don't justify google and facebook wrongdoing in the past and present. But to hate on those who are (forced) using them and calling to remove them from the website is not helping without an alternative. It will surly take a toll on your digital presence. Showing better alternatives is the way to go.

I don't like the analytics auto hate behavior.

It's not "auto hate", it's 25 years experience of making web stuff. I used to track users. I can read Apache logs by sight alone. I was an Urchin stats user before Google bought them. I had hit counters.

I realised a long time ago that massive data gathering didn't really help me. I had a ton of data and no idea what to do with it. If you launch a site and find the engagement time is 5.54s and users scroll 40% down the page before the bounce when they hit the site from a Google SERP for "my special page" that doesn't tell you why they left. That only tells you that they did. If you want to fix the problem you need to actually reach people and engage with them. To build what users want you need to actually talk to users.

Maybe some people can get value out of GAnalytics data, but I reckon for most users it's "metrics theatre". They're randomly poking at their website and seeing if the numbers change. That's stupidly inefficient when you can just talk to some people instead and immediately know where the problems are because people will tell you if you ask. Oh boy, will they tell you. It's hard to make them stop.

In the case of your 40% scroll map example, you don’t need to know why they left. You can start by making the page shorter or putting your CTA 35% of the way down the page.

Does anybody make the argument that analytics is supposed to tell you the why or what to do next? That’s up to you. The data just guides those decisions.

"I can read Apache logs by sight alone"

And actual log file parsing is what I'm missing in his comparison.

Often I try to help sites with more generic, borked issues. Your comment about "Oh boy, will they tell you" brought one to the fore.

I sent the following to a site. Do you think I received a response? A reply?

Either they don't actually read emails from customers, or they employ a person which became 'confused', and their eyes glazed over, once they failed to understand the message.

(Note that I sent to about 5 email addresses. The one sent after I purchased the pillow, a website info, whois email addresses, and even took 2 minutes to find the CEO email address. Yup. No response from any of them.

And more than 18 months later (I just checked), the SPF record remains unchanged.)

I wonder.

While metrics may be helpful, I wonder how many only pay attention to metrics, and then ignore emails such as mine.



I'm terribly sorry, but unfortunately you've sold a pillow to a SysAdmin. And now that you've done something wrong "with computers", I'm going to prattle on whilst your eyes glaze over. :P

But in all seriousness, I recently ordered a pillow from you. Yet your confirmation email bounced! Here's why:

May 13 07:10:54 XXX postfix/smtpd[32163]: NOQUEUE: milter-reject: RCPT from smtp2.shopify.com[]: 551 5.7.1 SPF verification failure: sender host 'smtp2.shopify.com'[] not among explicitly allowed origin hosts for domain 'canadiandownandfeather.com', and misses are forbidden; REJECT; from=<info@canadiandownandfeather.com> to=<xxx> proto=ESMTP helo=<smtp2.shopify.com>

What's happening, is your SPF record does not list Shopify email servers, as valid for your domain name canadiandownandfeather.com...

You do have outlook.com there:

canadiandownandfeather.com. 300 IN TXT "v=spf1 include:spf.protection.outlook.com -all"

But not Shopify. If fact, the "-all" part at the end means "and nothing else can send mail!".

So, as you are telling people that outlook is the only acceptable place to send mail from? Until you fix this:

- some mail server servers, like mine, will completely reject your mail from shopify's platform - almost all remaining mail servers, will place this type of mail in SPAM folders

I must stress, that due to your 'outlook' line, normal email should work perfectly fine. We're only talking about email, using your domain, but from shopify's platform.

If you go here:


And search for "SPF record", you'll see this:

-- To verify your customer email address, you need to add Shopify's SPF record — v=spf1 include:shops.shopify.com ~all — to the TXT record in your custom domain settings. --

Now.. you want to KEEP your outlook domain too.

So, just add "include:shops.shopify.com" to your TXT record, so both are there.. something like this:

"v=spf1 include:shops.shopify.com include:spf.protection.outlook.com -all"

Do this, and your problem should be fixed.

You can go here:


And check if things are done correctly. --

I think the humour at the start might have been enough to make many people not read the email TBH. Something a little less tongue-in-cheek and simply nice/approachable is likely to net a better chance of response.

Hard agree.

You don't describe the business problem, but go into incredible detail about technical details that either nobody reading the email[1] cares about.

Even more than that - you're basically asking them to run random bash commands from the internet.

They shouldn't trust your advice, anything you say would have to be verified by their own team, so the level of technical info is useless to them. In fact, including all the technobabble mnakes it look even more like a scam.

Remember, not that long ago it was a common scam to ask employees to make changes to the phone system, which would give the scammer free international calls. Your email is idenitical[2] to one of those scams.

A better email would be (and this isn't perfect):

"Hello, I am a recent customer of yours and noticed that all your confirmation emails aren't being sent properly and won't be delivered at all for most users.

I'm a sysadmin, so had a look at the the problem using public data, and it looks like your tech team overlooked a setting when they set up your email system. Its a easy mistake to make and would take them 5 minutes to fix.

The technical details are that there is an "SPF" record which says who is allowed to send emails as your company - this is correctly set up for your office emails through outlook, but not set up for the automated emails from shopify.

If you let your tech team know that shopify is missing from the SPF records, and they'll quickly see the issue and fix it within minutes.

If you would like more information about the issue, I'd be happy to advise further[3]"

[1] The people who do care won't see your email. [2] To the untrained user. [3] I mean, I wouldn't, but you seem like you would ;)

I get it, please see my other response.

RE: random bash commands. Part of the reason I linked shopify's SPF page to them -- an attempt to provide external validation.

But I get it. Which is why, as I mention in that other response, it doesn't matter what I send. Even pleas to forward to a technical person = ignored.

(I bet lots of tech people get forwarded all sorts of crap that IS a scam, so yell "DON'T FORWARD THIS STUFF TO ME".)

In fact, I have even called some places. One started screaming at me. I'm guessing some phone scam "got them", and therefore equated me to the same.

What can you do?

I mean, seriously?

Tell them that you placed an order, and did not receive a confirmation email. If you really feel obligated, you can mention something about SPF settings, but I literally would not provide more detail than "it might be related to your SPF settings."

Ah yes. Which is only sometimes responded to, and if so, with a "Check your SPAM folder" email.

If you respond to that without technical data, then you're basically saying "No, I did not receive it", and their response is "Your gmail lost it" (because mail = gmail).

The problem is that emails sent with "I didn't get the confirm email" are immediately thought to be "user error". And you know what?

99.9999% of the time, they surely are. Bad email providers, lost in SPAM folders, even missed in INBOX.

So if you don't include tech info, you're stuck in this category, else many exchanges back and forth.

Because once I then respond with technical info?

You get the same glassy-eyed type of incomprehension, with often zero response. Or a response that invites another 5 back and forths, with the other side upset, because now they feel ignorant.

I feel there really is no proper answer here... except, I still have to try. :P

While I know why you say this, I find it matters little.

I've started emails off with (very close to):

"NOTE: this is a technical issue. You may not understand it. Please forward to a technical person", with a lot of variety on that line.

These too garner little response.

I've started out with problem first, then tech info. It seems to matter little, and my humour via that first line, was an attempt to amuse non-technical types into reading further.

I must have concocted > 100 over the years. Perhaps 10% responded, regardless of the layout or form of the email. Even 10% is generous.

The reality? Anything technical, no matter how presented, is too confusing for many.

Yeah that first paragraph would have been better spent giving an overview of the information that was laid out after.

'I have found a problem. The problem has this negative impact. Brief description of problem.'

Be serious if you want to be taken seriously, especially in your initial communication with an unknown third-party.

This is not true. One can make a product better without relying on analytics. Sure, it takes more time (and probably more skilled people) but it's possible. There are companies out there doing well without GA.

GA is just one tool, and it’s not even the best one (though the better tools are often worse privacy-wise). But client side analytics will always be able to deliver UX insights that you’ll simply never be able to get any other way. If you had two equivalently skilled and resourced product teams, one would always be at a disadvantage to the other if only one of them was using client-side analytics tooling.

That said, product teams that are actually good enough to gather and productively use that sort of data aren’t very common, so switching off client-side analytics probably wouldn’t make much difference to most organisations. Page hits and error stats can be pretty easy to make use of, but you can get that from server logs.

Why is analytics synonymous with Google analytics?

Duopoly (Duo-monopoly) with Facebook Analytics.

More like duopoly with Adobe Analytics, at least among companies that invest in their analytics instead of just implementing GA or FB Analytics because they're free and never really use them.

Adobe Analytics is enterprise grade though, and is usually only used by companies large enough to have their own analytics department or contract out to analytics consultants.

This is a very limited solution. For instance, how can you track what search keywords showed your clicked ads on Google? The advertiser keep and will keep all the power (=all the context) to themselves exactly because of that reason.

Is there any difference in privacy from user's perspective if you use Google Analytics or self hosted? If any, the later is more prone to be hacked or misused than former and could not be easily blocked like Google's.

I'd argue that yes, there is a difference between having lots of small data silos with private data and having one huge database that tracks complete behavior chains. I don't really care whether the bartender at my usual pub knows what I like to drink, but I would care if there were a global database tracking my drinking behavior.

Yes there is, it's harder to track you around the web when the data is scattered around multiple places with no way to group the identifiers.

There's no analytics hate per se; the problem is the centralization of all usage data at Google, and analytics without consent.

Analytics are fine, the problem is analytics without consent.

The article explain very well he is using Plausible analytics, that works without cookies or stocking strong PIs. https://plausible.io/data-policy

It doesn't seem like even with analytics they know

It just gives a false confidence in bullshit.

Most internet analytics and the billions spent on data mining is all for naught.

You can collect analytics yourself, no Google needed.

Not ads insight which are crucial in digital marketing.

Ad insight can be done on your end by having each ad campaign redirect to your website with a "campaign ID" URL parameter, and you can measure the click/conversion rate by seeing how many people arrive on your site with each campaign ID.

Why do you think it increases user's privacy? I don't think Google uses analytics data for tracking the user.

> Why do you think it increases user's privacy? I don't think Google uses analytics data for tracking the user.

You've got to be kidding

Google uses every datapoint they get their hands on to track their users. No matter if its a share button, google analytics, youtube embed, custom search, gmail or any other Google product. They can and will use it to track the users.


Why would you trust them though? Their whole business model is based on them knowing as much as possible about everyone so they can better target ads to them.

Facebook also tried the promise of "trust us to not use your 2FA phone number for ads" and broke it with no/little ill effects, so there's no reason Google can't do the same.

Building & running a service such as Google Analytics is not cheap so they wouldn't be giving it away for free unless they got something out of it.

Chrome and android has lot more data than analytics, and if you go that extent to assume Google is lying and using our data, it would be least of my issues of Google using analytics data for displaying ads. I also don't think such a sparse and context dependent data like analytics would help that much to Google for displaying ads. They just want time you spent doing what, which is already present with Google.

> I also don't think such a sparse and context dependent data like analytics would help that much to Google for displaying ads.

Sorry but I think that is a very weird statement. Google Analytics collects a LOT of datapoints. Not only can they track users no matter how they get to that page, they can track how they behave on the page itself. They can track what content is interesting for the user (they know what the page contains after all) and how long you actually stay on the website. They can collect all of that data, without the user knowing and the user himself doesn't even need to use a single Google product.

I wouldn't exactly describe that as sparse data collection and the context dependent tracking is what makes Google Ads context relevant.

The two aren't mutually exclusive. They can use data both from GA as well as Chrome/Android.

Ultimately, their business model relies on violating people's privacy and thus the safest course of action is to treat them as hostile and avoid them.

Think again.

It's expected when i got to a website that the owner of the website knows i've visited

It's not expected that when I visit a website unrelated to google or facebook that they know about it

Google Analytics has a fair amount of filtering by default - including bots/spiders.

Went to Plausible's website, it makes no mention of this type of blocking. That 13% could be eaten into significantly by removing that type of activity.

More convincing would be matching up activity across sites, and seeing for what sessions they differ. If this was a decrease in users who had significant sessions (>30 secs, for example), there'd be more meaningful conclusions to draw.

1. the OP works for Plausible

2. The OP uses a non-standard call to Plausible which probably isn’t in many block lists

3. It’s in mine

So, the domain in question is ms.markosaric.com. Which is a CNAME to custom.plausible.io. uBlock Origin is able to block based on CNAMEs, but it is not a default configuration. PiHole V5 blocks based on CNAME as well, and it is actually enabled by default.

I run a little blocklist project [1] and I've had custom.plausible.io blocked in my list since April 8th [2]. So, although I didn't have ms.markosaric.com blocked directly in my list, the PiHole still would have blocked it via CNAME blocking. Also uBlock origin if you have CNAME blocking enabled.

[1] https://www.github.developerdan.com/hosts/

[2] https://github.com/lightswitch05/hosts/commit/21fd108ffd2996...

> uBlock Origin is able to block based on CNAMEs, but it is not a default configuration

It is a default configuration.

Thank you for the info, I'm sorry I misrepresented your project. For some reason I thought you had to enable advanced settings.

Thanks for everything you do with uBlock Origin and uBLock Matrix. uBlock Matrix is one my the primary tools I use when researching domains to add to my blocklist.

But this doesn't work on Chrome right?

> uBlock Origin is able to block based on CNAMEs, but it is not a default configuration.

I believe this is the default now, isn't it? [0]

[0] https://github.com/gorhill/uBlock/releases/tag/1.25.0

https://www.reddit.com/r/uBlockOrigin/comments/f8qnpc/ublock... The latest uBlock Origin blocks CNAMES >= 1.25

So Web Site owners are now back to Self Hosting with their own domain name and possibly with non default tracking javascript name?

That is a recent change, probably prompted by Ublock Origin [1] or others adding his tracker to the blocklist. This morning it was using the standard tracking call.

(The site owner submitted the site to HN, and is taking part in the discussion.)

[1] https://news.ycombinator.com/item?id=23819934

Yes, correct. I did the study with June numbers using regular GA and regular Plausible. Plausible was not on any blocklists until yesterday from what I have learned in this thread.

Incorrect. Plausible has been blocked in my list since April 8th, including custom.plausible.io: https://github.com/lightswitch05/hosts/commit/21fd108ffd2996...

Only 13%? According to some independent sources, up to 30% (and in some countries 40%) of Europeans use adblockers. If you're interested, I can try to find out the sources. This is why I use my own tracking system.

Ad blocking and tracker blocking sometimes, but clearly not always, overlaps.

Is that 30-40% on all platforms or just desktop though?

I don't have an ad blocker on my phone, yet use ublock origin on my Macbook + Chrome. I suspect this is pretty common, as it's harder to set up ad blocking for your mobile browser than desktop.

It's actually super easy now, with PiHole if you'd like to self-host or NextDNS if you want something that Just Works.

I wouldn't describe either of those as "super easy", especially for non-technical people, compared to going to the Chrome Web Store and clicking install on uBlock Origin.

uBlock Origin, Privacy Badger work on FF for Android

FF's market share on Android is very small. Chrome is the default browser and does not support any kind of adblocking. Oddly enough, Samsung's built-in browser does support adblock through an extension.

The most popular ad blocker is AdBlock, which does not block Google ads or analytics by default

Can you please tell the sources? I would be interested in reading more about this.

I've been trying to find the reports I had at the time but I'm having no luck. In the meanwhile: - https://strikesocial.com/blog/everything-you-need-to-know-to... - https://www.emarketer.com/content/ad-blocking-growth-is-slow...

> How I implemented my study

> I installed Google Analytics alongside Plausible Analytics on three sites in June.

All he measured was people who load Plausible Analytics and not Google Analytics. Anybody running noscript (like me) wouldn't have shown up here at all.

Even when I run JavaScript (I don’t by default), uBlock Origin blocks https://plausible.io/js/plausible.js through Peter Lowe’s Ad and tracking server list. Not sure if that’s one that’s enabled by default or if it’s one I’ve enabled.

So yeah, I would expect the true Google Analytics blockage figure to be markedly higher than the 13% reported here.

It wasn't blocked for me, also with uBlock Origin.

I found that it was only added to _Peter Lowe's_ list yesterday/today, so it wouldn't have affected the statistics for this blogpost


Yeah, I added this yesterday.

For the record, you can also view exact timestamps on the detail page: https://pgl.yoyo.org/adservers/details.php?hostname=plausibl...

The peter lowe list is enabled by default.

Is plausible also blocked when used with a custom domain so that tracking results are POSTed to your own domain?

If you use a CNAME (tracker.mysite.example → tracker.example), uBlock Origin on Firefox will still be able to block it. (I have no idea whether Plausible offers this mode of operation. I don’t see “CNAME” mentioned on their site, so probably not.)

If you run some kind of proxy yourself on an unconventional name (e.g. “ads.mysite.example” may well be blocked) and make sure that any script it needs loads from your own site, again with an unconventional name (e.g. “piwik.js” would be blocked as Matomo), then your tracker won’t be blocked until someone notices.

If you want to head far down this route, smuggle analytics data in with legitimate requests and separate and forward it on your server.

Fortunately, taking it to this extreme takes enough effort that it’s decidedly uncommon.

> I have no idea whether Plausible offers this mode of operation

The site of the article cloaks `custom.plausible.io` as `ms.markosaric.com`.

Sometime between the mention here of plausible.io being blocked, and your comment, markosaric.com was changed to use CNAME cloaking.

A technical note: it is possible to do limited client-side tracking of such users:

  <noscript><img src=//tracker.example/pixel.gif style=position:absolute></noscript>
This won’t activate if some of the scripts on the page get run, only if none of them get run. Also any ad blockers that block tracker.example will still block this.

Some of Google’s tracking things include (or used to include?) such a snippet. But broadly the technique has fallen into disuse. I don’t see Plausible advertising any way for it to work, and their main snippet POSTs to https://plausible.io/api/event, which an image won’t be able to achieve.

Also doesn't count anyone who uses uMatrix I think. Highly recommend it for technological literate people. It's a bit more work than uBlock but you gain a lot of control on what you allow to load.

Yup, would have been interesting to compare both vs a log analytics tool like Matomo.

How hard is it these days to tell humans from bots in server logs?

My idea was to use server logs too but AWStats showed more than 100% higher number of unique visitors and more than 18 times higher number in page views (both compared to Plausible numbers) so I excluded it from the study as I thought it’s very inaccurate.

It's a tough problem.

I am not surprised by the low iOS numbers, adblockers are a pain to setup and I haven’t found one as good as ublock origin. Would anyone have recommendation for a good iOS adblocker?

> I haven’t found one as good as ublock origin.

That's because ublock origin uses an API that is not available on iOS/Safari

If you want to be generous to Apple, they're trying to stop the 90% of trash adblockers like ABP, and great adblockers like UBO are collateral damage.

A cynic would say that's a combination of an admission their app store review process can't catch evildoers; and a cynical result of the fact the app store is plastered in (presumably very profitable) ads.

90% of everything is crap (Sturgeon's law) so by that reasoning, you can block everything.

And I don't know what ABP on iOS does or how it changed but the original extension wasn't trash. Sure, it has that controversial "acceptable ads" policy but it can be disabled by just unchecking a box in the settings once. I took a look at the code, compared it to a "cleaned" extension called AdBlock Edge and there wasn't much that unchecking that box didn't do.

Like most people, I switched to uBlock, then UBO, but performance was the main driver. Unless there ave been some major change ABP still fit my needs, UBO is better for me but it doesn't mean the other is trash.

Pi-Hole at home, NextDNS on the go. That’s my setup.

You install the blocker and that’s it. I don’t see how it can get any easier than that. At least with Safari, you can officially have ad blocking. :D

I use Adguard. It has a customized combination of the blocking lists you know from desktop browsers. Trust doesn’t really matter all that much—the blocker cannot see what you’re doing in the browser.

I was pleasantly surprised to discover that iOS has “content blockers” built in. If you google “iOS ad blocker”, AdGuard should be the first result, just install the app and flip a few switches in settings to enable it. I still haven’t actually used ad blockers yet on my other browsers, only iOS Safari.

I use Wipr. So far so good.

Wipr here as well, on macOS and iOS. Never have any problems, or ads for that matter.

1Blocker works fine here. One filter (Ads) is for free.

> Safari was a big surprise to me. With all the marketing Apple does focused on privacy and with them even highlighting Google Analytics as being blocked, I was expecting numbers closer to Firefox.

Safari does not block anything (not even in the upcoming release) - it will still happily load all of the tracking and fingerprinting JS. All it does is prevent some information from being sent back (3rd party cookies, stripped referrer etc.) but the requests are still being made and received code executed.

Better way perhaps we do is compare Cloudflare visitors to GA visitors over same time frame. This method doesn’t measure people with JS disabled.

I think it’s important to remember also with large numbers a percent of users blocking GA is likely not a big deal - you just want something generally accurate. Switching analytics providers to get more data isn’t in itself a reason to switch.

If people care about payload size of tracking scripts on their site then I see a good opportunity here for CDNs to offer analytics.

My idea was to use server logs too but AWStats showed more than 100% higher number of unique visitors and more than 18 times higher number in page views (both compared to Plausible numbers) so I excluded it from the study as I thought it’s very inaccurate.

As I understand that unique visitor would be higher (awstats, IIRC, considers that after an hour, a same IP is a new visitor), I don't get how you can have that number for page views. I guess it depends on people gowing back to pages and JS is not executed, but logs are updated ?


It's strange. My best guess is that despite AWStats filtering bots, many do get through. It was easy to see as most viewed pages according to AWStats were back end pages etc. I tried Webalizer with similar results too. I published the stats difference here: https://plausible.io/blog/server-log-analysis

I haven't found an answer to this elsewhere and I've also asked Cloudflare but it's unclear if Cloudflare's analytics filters known bots in any way which AFAIK GA does.

How many sites break because of Google Analytics being blocked? Do they even realize that their content appears broken?

Many sites use JavaScript client-rendering, even for static content. Sometimes, the analytics failure blocks the rendering. I've experimented it recently, with a blank page for a French governmental site (securite routiere) and with a truncated content for an American newspaper (Boston Globe, IIRC).

When you send a server-rendered page, you know the content is present, and very probably it renders correctly on the client. Transmission errors are logged by the web server. With a SPA, many factors could break the rendering, and the web server won't hear about it unless hard work is done to collect JS errors.

I think the most common and critical failure mode I’ve seen is when websites wrap click handlers for important conversion events (ie “Complete Purchase” or “Join Mailing List” buttons) in a call to a tracking script.

Google Tag Manager has a callback system that makes it really easy to trigger an event and wait for all the tracking pixels to fire before you advance someone to the next page, “guaranteeing” that you capture the events. If GTM is blocked and no effort has been made to handle a case where it didn’t load, however, the Complete Purchase button just straight up won’t work. I debugged this behavior in a high traffic site several years ago and have since noticed it all over the internet. With how widespread ad blockers are these days–especially in some demographics–I’d consider it an important QA step to ensure that customers can actually pay you even if they’re blocking ads and trackers.

And umatrix blocks both Google Analytics and Plausible Analytics since it blocks all third party scripts by default.

I know this was just written as content marketing, but a more accurate insight wouldn't involve the conversion to uniques at all. On one side you'd have count of unique IPs from the access logs at your web server. And on the other side you'd have count of unique IPs seen by the tracking tools you use.

And even then you wouldn't be able to map this quotient to human behavior. But you would have an upper bound on how inaccurate your analytics tools are.

“Unique IPs” is a terrible metric, and has been for more than a decade.

Carrier grade NAT, corporate networks, VPN service users, etc. all will share the same source IPv4 address.

In IPv6, the opposite is the case: clients will change their IPs frequently for privacy reasons resulting in an overcount.

That number is wrong. It's greater than 13%. He installed another javascript tracker. So, he didn't catch any of the users that block javascript.

Didn't read his whole lengthy post, but not sure why he just didn't use a standard 20-year old log analyzer (like AWStats or similar) to just compare _any_ visit by a single IP address over a set 24-hour period.

Forget about page views, A/B tests, metrics, customer journeys, etc... that's what G.A. is designed to be used for (along with its competitors like P.A.).

The question is very, very simple - especially for standard websites. How many different IPs requested what volume (in bytes) of resources from your server?

We've been measuring that number on the web for nearly 30 years and it requires ZERO installs, scripts, 3rd-party services, etc.

None of this would be so bad except that all the alarmists decrying abuse or overuse of G.A. and looking at alternatives are (generally speaking) not anyone that Google cares about. Google's attitude "Uh, go ahead and use a different spyware/tracker/script tool. We don't care about your crummy blog. We are still installed on 499 of the Fortune 500 websites. That's what we care about."

Tangentially, is there some webpage you can visit that confirms which means of tracking your browser/extensions are blocking? I have installed Privacy Badger, uBlock Origin and NoScript on Firefox, but I don't know how I can verify that they are doing what I hope they are.

EDIT: I suppose there's https://panopticlick.eff.org/

I don't care to be tracked but ad blocker automatically also blocks tracker and I have no motivation to unblock tracker manually.

Only 13%? (Just kidding!)

Serious question now: how do we get this number to rise to 50%? How about 80% or 95%?

I'm been on a quest for a long time now to convince people I know to install ublock origin on their browsers, but it seems to be a hard sell, even among more tech-literate people, the effort to convince them is non trivial. They stick with it after I install it for them, and they use it everywhere themselves after a while, and thank me for it, it's just that getting them to experience it makes them go defensive. What gets them is not improved privacy, nor tracking removal, but no longer having to deal with video ads on YouTube, or those embarrassing naked people or viagra banner ads on websites, like Yahoo mail.

So basically privacy is an afterthought, a nice to have consequence, as it's not an immediate visible part of the experience for them.

The problem is that people still think of ad blocking as similar to piracy instead of as a security or privacy tool

The other problem is that "AdBlock" used to be the standard ad-blocking tool, but now has been taken over by advertisers and allows "non-intrusive" ads, such as google

True, some more daring people go to google and search for "adblock" when I tell then they should install an adblocker, and I have to correct them and pitch "ublock origin", and that's when the defensive stance happens. They ask: "Why? Why not adblock? What's the difference? There's also an ublock without the origin part in the name, why is that, why use the other? it has the same logo..."

Do it at the nameserver: OPNsense's Unbound blacklists do most of the job for my whole network - in particular it covers tablets, where ad blockers are a rarer occurrence than on desktop.

Slightly off topic: Plausible claim that their cookieless tracking is GDPR compliant but this seems a bit shaky to me. Unless they are doing something that isn't specified in their docs their fingerprinting seems reversible which would make their user identifiers PII like any session ID you'd store in a cookie.

AIUI the GDPR doesn't particularly care about cookies, it cares about you tracking people without their consent. If you have a cookieless way of tracking individual people that is as accurate as cookies then clearly you still need consent?

Even if this hash would include a salt and the salt is rotated daily, you'd still be able to reverse information for the last day (there's an interesting doc by the European Union itself on that topic: https://edps.europa.eu/data-protection/our-work/publications...).

I feel most of these "privacy" focused analytics tools are mostly about working around the need for a cookie banner. I also don't get how they can continue to show off their public dashboards full of data that has been collected without user consent, boasting about how many hits they have.

Gives me the feeling "Privacy" is more of a marketing tactic here than a mission.

The thing is the "cookie banner" is not required simply because you use cookies. Consent is required for any processing of PII regardless of the technical means through which you get it.

I don't know if "privacy" is just a marketing tactic but I have definitely seen a lot of misplaced good deeds based on not understanding the regulation correctly.

Correct, the rulings to date have generally involved cookies, but it's the act, not the tool.

It seems on-topic, since the post is essentially advertising the Plausible.io service.

> In summary, here’s how we assign a hash that we use for unique user counting:

  hash(website_domain + ip_address + user_agent)
Would adding a daily salt to this make it compliant? You'd no longer be able to track if someone returns to the website the next day.


You also have to adhere to the ePrivacy directive, which specifies, among other things, how you are allowed to identify users on their devices (e.g. using cookies).

I don't have a link on hand, but I posted about this recently in an HN article about Fathom.

There is an ePrivacy Directive ammendment that specifically prohibits the use of hashing identifiers (e.g. IP address, browser, OS) to work around cookies, even it a timestamp is introduced to limit the life of a hashed ID.

I'm a privacy advocate myself, yet find this a bit OTT - but that's the ePrivacy regulations, and I think Plausible could get burned here at some point. Presumably they know about this amendment and are betting that there will be no enforcement, or their lawyers have found a weasel-wordy loophole (I doubt the later, as the directive is very specific).

I think this is the document you are referring to: https://ec.europa.eu/newsroom/article29/document.cfm?action=..., Revision of Article 5(3)

Yes. GDPR is about collection and processing of personal information. How you collect them is not important. HTTP cookies or fingerprinting and successive deanonimization is the same as getting the name and address with a form on paper. You need consent.

GDPR doesn't care about tracking, it's about data processing. How cookies are handled in a compliant manner is handled under something like PECR (in the UK at least).


If data is being processed without containing PII, then GDPR becomes a non-issue.


My bad, because this tool uses cookies it does in fact require consent if it's to be used in a properly GDPR compliant manner.

It contains IP addresses, which are personally identifiable under the GDPR.

The answer is: it depends.

The GDPR allows for collection and processing without consent for several reasons including (but not limited to) legal requirement (e.g. anti-fraud) and legitimate interests (e.g. app install conversions).

The GDPR is also quite clear that consent is not required to collect _anonymous_ data, i.e. data which in no way can be traced back to the individual, but this requires balancing with the other principles of the regulation. Recital 26 of the GDPR states:

“…The principles of data protection should therefore not apply to anonymous information, namely information which does not relate to an identified or identifiable natural person or to personal data rendered anonymous in such a manner that the data subject is not or no longer identifiable. This Regulation does not therefore concern the processing of such anonymous information, including for statistical or research purposes.”

It's not so easy, as anonymization is also considered a processing activity (there is still a debate around this but the legal opinion around this seems to solidify), so you cannot just say "I'm anonymizing this data so I can collect it without asking". You will need a legal basis for processing your customers information.

Also, the ePrivacy directive (and the upcoming ePivacy regulation) impose additional constraints on what kind of data collection requires consent, so again you usually can't just refer to the exemption of the GDPR for anonymous data when collecting telemtry from your users, even if you anonymize the data on your server.

I feel like you've moved the goalposts on me since the parent was referring to GDPR, and so was I, and now you've brought up the ePrivacy stuff.

Furthermore, it only counts as processing under the GDPR if anonymisation was performed as a step on some data already covered on the GDPR. Ergo, if the data originally collected qualifies as anonymous data, then processing it is not covered by the GDPR.

I think OPs question was if you can really collect "anonymous" analytics data from a client device without asking for consent, which you cannot really answer based on the GDPR alone. Yes, anonymous data is exempt from the GDPR, but the question is just if a HTTP request to a tracking service that contains the IP address and user agent of a persons' browser can be considered anonymous already (in most cases not). Some companies like Matomo (and Fathom, Simple Analytics, Plausible, ...) argue that you can collect tracking data without consent if you offer an opt-out to the user, I haven't been able to find a legal basis for this though. Even Microsoft asks you for consent before enabling their anonymous telemtry collection in Windows now.

Yes, I think opt-out is specifically forbidden when consent is used as a legal basis, at least under the GDPR...

It's not impossible to fingerprint people by using the GDPR consent cookie itself.

I've used Simple Analytics since day one of my site. Thoroughly recommended for privacy.

Fathom Analytics is another option.

OP works on Plausible.io

This should be the top comment

What kinds of actionable insights have you gained through analytics? Have you changed your site with the goal of improving those numbers?

I also use Simple Analytics. It's great.

I'd consider this a little on the low, but perhaps that makes sense given nature of the site and potential type of audience.

On publisher sites where I've had access to analytics we'd see an Ad-Blocker rate of 8% on the sites with a 'good' ad experience (low programmatic, less crappy positions and UX) but as high as 30% on the crappier ones (saturated with banner ads, had Outbrain/Taboola modules, etc).

(That was about 2 years ago so I would have thought that ad-blocking with continue its slow increase)

Rightfully so. Shameless plug, a few months ago I began working on a privacy conscious UX/UI research and qualitative analytics tool [0] based on open source (MIT) technologies, which aims at offering a self hostable backend. It's in alpha, feel free to reach out if you want to give it a spin.

[0] https://www.sessionforward.com

YSK that cloudflare removed the contact email on your site.

Thank you, will check! Indeed, website is still in progress. :-) Meanwhile, you can find my personal email in my HN profile!

Of course another way to phrase this is “13% of my website visitors are actively blocking my support for google’s privacy invasive tracking”

On my website [0] I opted not to use Google Analytics or any other invasive trackers.

We use SimpleAnalytics and so far I really like it. Yes, it's "simpler" in that you get less data but honestly, I think people over-value some of the data they get from GA. A lot of it is superflous.

[0] https://makely.me

I'd like to see the spread in statistics of what OS, platform, browser is doing the most blocking and who's most underrepresented. Not wanting to be tracked is very reasonable, but these I would guess are mostly to be your power users anyhow and now you're making decisions around people that may understand your product best.

> We run it through a one-way hash function to scramble the raw IP addresses and make them impossible to recover.

It is very much possible. The space of IP addresses is about 10^10 so a rainbow table for 256 bytes hashes would be around 1TB large.

Adding the target websites would scale the size linearly

>But Linux is also the least popular of the operating systems with only 8% of the total laptop/desktop market.

That sounds very high, I thought it was still ~1-2%?

Shouldn't Firefox block GA by default now since they enabled enhanced tracking protection by default? I'm surprised the number for Firefox is so low.

If only it was a higher percentage...

Path of least resistance will unfortunately make it lower, once social networks, media groups and browser vendors play the usual dark UI patterns

What happened to using your web logs? It requires absolutely no javascript and no third parties and there's no invasion of privacy.

It also is completely useless because of the tens of thousands requests random bots and hacking tools do each day on most websites, the many network hops for mobile users, the utterly impossible task of counting time spent per page (on active tab), conversion tracking and so many data that web logs aren't designed to collect or present in any useful way.

They're actually pretty useful for me despite, and even because of all the things you mention. I don't know why a corporation couldn't achieve the same as my single person plus a bit of perl. I guess for-profit organizations just have unique goals that can't be met unless they're acting unethically.

Being able to reflect on what works and what doesn't is actually the ethical thing to do so that the business' decision makers don't spend time, money and human resources on efforts that are at best useless and at worst detrimental to the company.

Employees or (worse) freelancers who deliberately retain data and keep it away from the company are utterly unethical in that they prevent the company to assess their actions and to take adequate measures based on them, including firing them if they cost but don't provide.

You're assuming anything that saves/time money is ethical because it is beneficial for the company. But many things that are beneficial for a company are unethical and bad for actual humans the corporate "person"/entity interacts with.

says: https://markosaric.com/speed-up-wordpress/ >> Steps I took to make my site speedy and green >>> To not add much additional footprint, I’ve decided not to use Google Analytics, ....

.. this is too weird for me.

I installed GA for one month only last month in order to check the data. Now it's GA-free again.

This is like reading books with one's reading glasses on for a while, then off for a while, then on again for a while.

In Web analytics the medium to long term trend almost always trumps individual data points. Those don't mean a thing and can actually be bad for the business owners if "corrective" actions get taken based upon them.

Could we establish a convention of adding "(ga'd)", "(pixel'd)" and/or "(GDPR'd)" here on HN, similar to what we do with rehashed stories by putting the year into braces after the title?. While we're at it, maybe we should also use "(paywalled)" because that's a frequent complaint?

You could do an alternative HN frontend that puts badges behind the title.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact