Another interesting result is the 16% of total page views - I don't think false uniques could changes this. But it will be distorted by correlation of views-per-user with blocking settings.
It is no uncommon for consecutive requests from obviously the same user to come in from different IP addresses. Usually these are either cell phone users whose network changes from second to second, or browsers behind big proxies with multiple servers fetching content.
For example, I would see this quite often (all fields made up):
18.104.22.168 index.html mobile-browser-v25
22.214.171.124 style.css mobile-browser-v25
126.96.36.199 header.jpg mobile-browser-v25
188.8.131.52 code.js mobile-browser-v25
Another problem is bots. So many bots. Some that take the trouble to mimic real browsers - they don't know they are bots. And they have a plan.
Nice subtle Battlestar Galactica reference there.
If you're selling copies of newspapers, 2 purchases by 1 customer = 2 purchases. The number of customers is irrelevant.
If you're monetizing via on-page ads or affiliate links (as the author is), revenue is much more likely to correlate to unique visitors than to page views.
That's the bit I don't understand. Why does an advertiser not value a customer seeing an ad twice, or for a longer time? In fact I would say that those customers are much more valuable because their multiple views bookend an extended period of viewing an ad. If I view a website at home in the morning, then again while at work, I have been exposed to that ad for many minutes, as opposed to one-time readers who click away within seconds. That has to be worth something more.
In traditional media, it's pretty much an axiom that people have to see the same ad at least seven times in order to get the message. There is no sin in showing the same advertisement more than once. It's why, for example, advertisers pay extra to have their ad appear at both the beginning and end of a commercial break on television. It's called "bookending."
When digital advertising started, the ad companies decided that someone seeing the same ad more than once was a bad thing, and then convinced their advertisers this was true.
I don't know why this happened. Maybe the digital people had some data that proved it. Maybe the digital people were just computer people and not advertising people and so didn't know about the decades of prior research into this.
But it's where we are now. Digital advertising companies value unique visitors, so web site owners do, too.
There are huge difference in Online / Digital Media consumption compared to traditional media like Tv and NewsPaper. And may be I am old I tend to value the latter a lot more.
These days, I worry about signups and subscriptions, and run my own A/B (bayesian bandit) experiments.
Literally couldn't give the companies business.
The number of pages that silently fail or time out after 5-10 mines due to these assumptions is really disappointing.
We live in a globally connected world, maybe assuming that a person in China today might be still in China tomorrow is a bad one. But in the meantime you may have left money on the table.
A website that ignoring errors when fetching a remote resource is bad programming for the same reason ignoring "file not found" errors when calling open() is bad programming.
Are there any (open source) privacy friendly alternatives with similar basic features?
Edit: I don't justify google and facebook wrongdoing in the past and present. But to hate on those who are (forced) using them and calling to remove them from the website is not helping without an alternative. It will surly take a toll on your digital presence. Showing better alternatives is the way to go.
It's not "auto hate", it's 25 years experience of making web stuff. I used to track users. I can read Apache logs by sight alone. I was an Urchin stats user before Google bought them. I had hit counters.
I realised a long time ago that massive data gathering didn't really help me. I had a ton of data and no idea what to do with it. If you launch a site and find the engagement time is 5.54s and users scroll 40% down the page before the bounce when they hit the site from a Google SERP for "my special page" that doesn't tell you why they left. That only tells you that they did. If you want to fix the problem you need to actually reach people and engage with them. To build what users want you need to actually talk to users.
Maybe some people can get value out of GAnalytics data, but I reckon for most users it's "metrics theatre". They're randomly poking at their website and seeing if the numbers change. That's stupidly inefficient when you can just talk to some people instead and immediately know where the problems are because people will tell you if you ask. Oh boy, will they tell you. It's hard to make them stop.
Does anybody make the argument that analytics is supposed to tell you the why or what to do next? That’s up to you. The data just guides those decisions.
And actual log file parsing is what I'm missing in his comparison.
I sent the following to a site. Do you think I received a response? A reply?
Either they don't actually read emails from customers, or they employ a person which became 'confused', and their eyes glazed over, once they failed to understand the message.
(Note that I sent to about 5 email addresses. The one sent after I purchased the pillow, a website info, whois email addresses, and even took 2 minutes to find the CEO email address. Yup. No response from any of them.
And more than 18 months later (I just checked), the SPF record remains unchanged.)
While metrics may be helpful, I wonder how many only pay attention to metrics, and then ignore emails such as mine.
I'm terribly sorry, but unfortunately you've sold a pillow to a
SysAdmin. And now that you've done something wrong "with computers", I'm
going to prattle on whilst your eyes glaze over. :P
But in all seriousness, I recently ordered a pillow from you. Yet your
confirmation email bounced! Here's why:
May 13 07:10:54 XXX postfix/smtpd: NOQUEUE: milter-reject:
RCPT from smtp2.shopify.com[184.108.40.206]: 551 5.7.1 SPF verification
failure: sender host 'smtp2.shopify.com'[220.127.116.11] not among
explicitly allowed origin hosts for domain 'canadiandownandfeather.com',
and misses are forbidden; REJECT; from=<email@example.com>
What's happening, is your SPF record does not list Shopify email servers,
as valid for your domain name canadiandownandfeather.com...
You do have outlook.com there:
canadiandownandfeather.com. 300 IN TXT "v=spf1
But not Shopify. If fact, the "-all" part at the end means "and nothing
else can send mail!".
So, as you are telling people that outlook is the only acceptable
place to send mail from? Until you fix this:
- some mail server servers, like mine, will completely reject your mail
from shopify's platform
- almost all remaining mail servers, will place this type of mail in SPAM
I must stress, that due to your 'outlook' line, normal email should work
perfectly fine. We're only talking about email, using your domain, but
from shopify's platform.
If you go here:
And search for "SPF record", you'll see this:
To verify your customer email address, you need to add Shopify's SPF
record — v=spf1 include:shops.shopify.com ~all — to the TXT record in
your custom domain settings.
Now.. you want to KEEP your outlook domain too.
So, just add "include:shops.shopify.com" to your TXT record, so both are
there.. something like this:
"v=spf1 include:shops.shopify.com include:spf.protection.outlook.com -all"
Do this, and your problem should be fixed.
You can go here:
And check if things are done correctly.
You don't describe the business problem, but go into incredible detail about technical details that either nobody reading the email cares about.
Even more than that - you're basically asking them to run random bash commands from the internet.
They shouldn't trust your advice, anything you say would have to be verified by their own team, so the level of technical info is useless to them. In fact, including all the technobabble mnakes it look even more like a scam.
Remember, not that long ago it was a common scam to ask employees to make changes to the phone system, which would give the scammer free international calls. Your email is idenitical to one of those scams.
A better email would be (and this isn't perfect):
I am a recent customer of yours and noticed that all your confirmation emails aren't being sent properly and won't be delivered at all for most users.
I'm a sysadmin, so had a look at the the problem using public data, and it looks like your tech team overlooked a setting when they set up your email system. Its a easy mistake to make and would take them 5 minutes to fix.
The technical details are that there is an "SPF" record which says who is allowed to send emails as your company - this is correctly set up for your office emails through outlook, but not set up for the automated emails from shopify.
If you let your tech team know that shopify is missing from the SPF records, and they'll quickly see the issue and fix it within minutes.
If you would like more information about the issue, I'd be happy to advise further"
 The people who do care won't see your email.
 To the untrained user.
 I mean, I wouldn't, but you seem like you would ;)
RE: random bash commands. Part of the reason I linked shopify's SPF page to them -- an attempt to provide external validation.
But I get it. Which is why, as I mention in that other response, it doesn't matter what I send. Even pleas to forward to a technical person = ignored.
(I bet lots of tech people get forwarded all sorts of crap that IS a scam, so yell "DON'T FORWARD THIS STUFF TO ME".)
In fact, I have even called some places. One started screaming at me. I'm guessing some phone scam "got them", and therefore equated me to the same.
What can you do?
I mean, seriously?
If you respond to that without technical data, then you're basically saying "No, I did not receive it", and their response is "Your gmail lost it" (because mail = gmail).
The problem is that emails sent with "I didn't get the confirm email" are immediately thought to be "user error". And you know what?
99.9999% of the time, they surely are. Bad email providers, lost in SPAM folders, even missed in INBOX.
So if you don't include tech info, you're stuck in this category, else many exchanges back and forth.
Because once I then respond with technical info?
You get the same glassy-eyed type of incomprehension, with often zero response. Or a response that invites another 5 back and forths, with the other side upset, because now they feel ignorant.
I feel there really is no proper answer here... except, I still have to try. :P
I've started emails off with (very close to):
"NOTE: this is a technical issue. You may not understand it. Please forward to a technical person", with a lot of variety on that line.
These too garner little response.
I've started out with problem first, then tech info. It seems to matter little, and my humour via that first line, was an attempt to amuse non-technical types into reading further.
I must have concocted > 100 over the years. Perhaps 10% responded, regardless of the layout or form of the email. Even 10% is generous.
The reality? Anything technical, no matter how presented, is too confusing for many.
'I have found a problem. The problem has this negative impact. Brief description of problem.'
Be serious if you want to be taken seriously, especially in your initial communication with an unknown third-party.
That said, product teams that are actually good enough to gather and productively use that sort of data aren’t very common, so switching off client-side analytics probably wouldn’t make much difference to most organisations. Page hits and error stats can be pretty easy to make use of, but you can get that from server logs.
Adobe Analytics is enterprise grade though, and is usually only used by companies large enough to have their own analytics department or contract out to analytics consultants.
It just gives a false confidence in bullshit.
Most internet analytics and the billions spent on data mining is all for naught.
You've got to be kidding
Facebook also tried the promise of "trust us to not use your 2FA phone number for ads" and broke it with no/little ill effects, so there's no reason Google can't do the same.
Building & running a service such as Google Analytics is not cheap so they wouldn't be giving it away for free unless they got something out of it.
Sorry but I think that is a very weird statement. Google Analytics collects a LOT of datapoints. Not only can they track users no matter how they get to that page, they can track how they behave on the page itself. They can track what content is interesting for the user (they know what the page contains after all) and how long you actually stay on the website. They can collect all of that data, without the user knowing and the user himself doesn't even need to use a single Google product.
I wouldn't exactly describe that as sparse data collection and the context dependent tracking is what makes Google Ads context relevant.
Ultimately, their business model relies on violating people's privacy and thus the safest course of action is to treat them as hostile and avoid them.
It's not expected that when I visit a website unrelated to google or facebook that they know about it
Went to Plausible's website, it makes no mention of this type of blocking. That 13% could be eaten into significantly by removing that type of activity.
More convincing would be matching up activity across sites, and seeing for what sessions they differ. If this was a decrease in users who had significant sessions (>30 secs, for example), there'd be more meaningful conclusions to draw.
2. The OP uses a non-standard call to Plausible which probably isn’t in many block lists
3. It’s in mine
I run a little blocklist project  and I've had custom.plausible.io blocked in my list since April 8th . So, although I didn't have ms.markosaric.com blocked directly in my list, the PiHole still would have blocked it via CNAME blocking. Also uBlock origin if you have CNAME blocking enabled.
It is a default configuration.
Thanks for everything you do with uBlock Origin and uBLock Matrix. uBlock Matrix is one my the primary tools I use when researching domains to add to my blocklist.
I believe this is the default now, isn't it? 
(The site owner submitted the site to HN, and is taking part in the discussion.)
I don't have an ad blocker on my phone, yet use ublock origin on my Macbook + Chrome. I suspect this is pretty common, as it's harder to set up ad blocking for your mobile browser than desktop.
> I installed Google Analytics alongside Plausible Analytics on three sites in June.
All he measured was people who load Plausible Analytics and not Google Analytics. Anybody running noscript (like me) wouldn't have shown up here at all.
So yeah, I would expect the true Google Analytics blockage figure to be markedly higher than the 13% reported here.
I found that it was only added to _Peter Lowe's_ list yesterday/today, so it wouldn't have affected the statistics for this blogpost
For the record, you can also view exact timestamps on the detail page: https://pgl.yoyo.org/adservers/details.php?hostname=plausibl...
If you run some kind of proxy yourself on an unconventional name (e.g. “ads.mysite.example” may well be blocked) and make sure that any script it needs loads from your own site, again with an unconventional name (e.g. “piwik.js” would be blocked as Matomo), then your tracker won’t be blocked until someone notices.
If you want to head far down this route, smuggle analytics data in with legitimate requests and separate and forward it on your server.
Fortunately, taking it to this extreme takes enough effort that it’s decidedly uncommon.
The site of the article cloaks `custom.plausible.io` as `ms.markosaric.com`.
<noscript><img src=//tracker.example/pixel.gif style=position:absolute></noscript>
Some of Google’s tracking things include (or used to include?) such a snippet. But broadly the technique has fallen into disuse. I don’t see Plausible advertising any way for it to work, and their main snippet POSTs to https://plausible.io/api/event, which an image won’t be able to achieve.
That's because ublock origin uses an API that is not available on iOS/Safari
If you want to be generous to Apple, they're trying to stop the 90% of trash adblockers like ABP, and great adblockers like UBO are collateral damage.
A cynic would say that's a combination of an admission their app store review process can't catch evildoers; and a cynical result of the fact the app store is plastered in (presumably very profitable) ads.
And I don't know what ABP on iOS does or how it changed but the original extension wasn't trash. Sure, it has that controversial "acceptable ads" policy but it can be disabled by just unchecking a box in the settings once. I took a look at the code, compared it to a "cleaned" extension called AdBlock Edge and there wasn't much that unchecking that box didn't do.
Like most people, I switched to uBlock, then UBO, but performance was the main driver. Unless there ave been some major change ABP still fit my needs, UBO is better for me but it doesn't mean the other is trash.
I use Adguard. It has a customized combination of the blocking lists you know from desktop browsers. Trust doesn’t really matter all that much—the blocker cannot see what you’re doing in the browser.
Safari does not block anything (not even in the upcoming release) - it will still happily load all of the tracking and fingerprinting JS. All it does is prevent some information from being sent back (3rd party cookies, stripped referrer etc.) but the requests are still being made and received code executed.
I think it’s important to remember also with large numbers a percent of users blocking GA is likely not a big deal - you just want something generally accurate. Switching analytics providers to get more data isn’t in itself a reason to switch.
If people care about payload size of tracking scripts on their site then I see a good opportunity here for CDNs to offer analytics.
When you send a server-rendered page, you know the content is present, and very probably it renders correctly on the client. Transmission errors are logged by the web server. With a SPA, many factors could break the rendering, and the web server won't hear about it unless hard work is done to collect JS errors.
Google Tag Manager has a callback system that makes it really easy to trigger an event and wait for all the tracking pixels to fire before you advance someone to the next page, “guaranteeing” that you capture the events. If GTM is blocked and no effort has been made to handle a case where it didn’t load, however, the Complete Purchase button just straight up won’t work. I debugged this behavior in a high traffic site several years ago and have since noticed it all over the internet. With how widespread ad blockers are these days–especially in some demographics–I’d consider it an important QA step to ensure that customers can actually pay you even if they’re blocking ads and trackers.
And even then you wouldn't be able to map this quotient to human behavior. But you would have an upper bound on how inaccurate your analytics tools are.
Carrier grade NAT, corporate networks, VPN service users, etc. all will share the same source IPv4 address.
In IPv6, the opposite is the case: clients will change their IPs frequently for privacy reasons resulting in an overcount.
Didn't read his whole lengthy post, but not sure why he just didn't use a standard 20-year old log analyzer (like AWStats or similar) to just compare _any_ visit by a single IP address over a set 24-hour period.
Forget about page views, A/B tests, metrics, customer journeys, etc... that's what G.A. is designed to be used for (along with its competitors like P.A.).
The question is very, very simple - especially for standard websites. How many different IPs requested what volume (in bytes) of resources from your server?
We've been measuring that number on the web for nearly 30 years and it requires ZERO installs, scripts, 3rd-party services, etc.
None of this would be so bad except that all the alarmists decrying abuse or overuse of G.A. and looking at alternatives are (generally speaking) not anyone that Google cares about. Google's attitude "Uh, go ahead and use a different spyware/tracker/script tool. We don't care about your crummy blog. We are still installed on 499 of the Fortune 500 websites. That's what we care about."
EDIT: I suppose there's https://panopticlick.eff.org/
Serious question now: how do we get this number to rise to 50%? How about 80% or 95%?
I'm been on a quest for a long time now to convince people I know to install ublock origin on their browsers, but it seems to be a hard sell, even among more tech-literate people, the effort to convince them is non trivial. They stick with it after I install it for them, and they use it everywhere themselves after a while, and thank me for it, it's just that getting them to experience it makes them go defensive. What gets them is not improved privacy, nor tracking removal, but no longer having to deal with video ads on YouTube, or those embarrassing naked people or viagra banner ads on websites, like Yahoo mail.
So basically privacy is an afterthought, a nice to have consequence, as it's not an immediate visible part of the experience for them.
The other problem is that "AdBlock" used to be the standard ad-blocking tool, but now has been taken over by advertisers and allows "non-intrusive" ads, such as google
AIUI the GDPR doesn't particularly care about cookies, it cares about you tracking people without their consent. If you have a cookieless way of tracking individual people that is as accurate as cookies then clearly you still need consent?
I feel most of these "privacy" focused analytics tools are mostly about working around the need for a cookie banner. I also don't get how they can continue to show off their public dashboards full of data that has been collected without user consent, boasting about how many hits they have.
Gives me the feeling "Privacy" is more of a marketing tactic here than a mission.
I don't know if "privacy" is just a marketing tactic but I have definitely seen a lot of misplaced good deeds based on not understanding the regulation correctly.
> In summary, here’s how we assign a hash that we use for unique user counting:
hash(website_domain + ip_address + user_agent)
There is an ePrivacy Directive ammendment that specifically prohibits the use of hashing identifiers (e.g. IP address, browser, OS) to work around cookies, even it a timestamp is introduced to limit the life of a hashed ID.
I'm a privacy advocate myself, yet find this a bit OTT - but that's the ePrivacy regulations, and I think Plausible could get burned here at some point. Presumably they know about this amendment and are betting that there will be no enforcement, or their lawyers have found a weasel-wordy loophole (I doubt the later, as the directive is very specific).
If data is being processed without containing PII, then GDPR becomes a non-issue.
The GDPR allows for collection and processing without consent for several reasons including (but not limited to) legal requirement (e.g. anti-fraud) and legitimate interests (e.g. app install conversions).
The GDPR is also quite clear that consent is not required to collect _anonymous_ data, i.e. data which in no way can be traced back to the individual, but this requires balancing with the other principles of the regulation. Recital 26 of the GDPR states:
“…The principles of data protection should therefore not apply to anonymous information, namely information which does not relate to an identified or identifiable natural person or to personal data rendered anonymous in such a manner that the data subject is not or no longer identifiable. This Regulation does not therefore concern the processing of such anonymous information, including for statistical or research purposes.”
Also, the ePrivacy directive (and the upcoming ePivacy regulation) impose additional constraints on what kind of data collection requires consent, so again you usually can't just refer to the exemption of the GDPR for anonymous data when collecting telemtry from your users, even if you anonymize the data on your server.
Furthermore, it only counts as processing under the GDPR if anonymisation was performed as a step on some data already covered on the GDPR. Ergo, if the data originally collected qualifies as anonymous data, then processing it is not covered by the GDPR.
Fathom Analytics is another option.
On publisher sites where I've had access to analytics we'd see an Ad-Blocker rate of 8% on the sites with a 'good' ad experience (low programmatic, less crappy positions and UX) but as high as 30% on the crappier ones (saturated with banner ads, had Outbrain/Taboola modules, etc).
(That was about 2 years ago so I would have thought that ad-blocking with continue its slow increase)
We use SimpleAnalytics and so far I really like it. Yes, it's "simpler" in that you get less data but honestly, I think people over-value some of the data they get from GA. A lot of it is superflous.
It is very much possible. The space of IP addresses is about 10^10 so a rainbow table for 256 bytes hashes would be around 1TB large.
Adding the target websites would scale the size linearly
That sounds very high, I thought it was still ~1-2%?
Shouldn't Firefox block GA by default now since they enabled enhanced tracking protection by default? I'm surprised the number for Firefox is so low.
Employees or (worse) freelancers who deliberately retain data and keep it away from the company are utterly unethical in that they prevent the company to assess their actions and to take adequate measures based on them, including firing them if they cost but don't provide.
.. this is too weird for me.
In Web analytics the medium to long term trend almost always trumps individual data points. Those don't mean a thing and can actually be bad for the business owners if "corrective" actions get taken based upon them.