99.9% of spammers are too lazy to spend any time figuring this out for a single site, and their tools won't even tell them spam isn't working. I've gotten away with adding a simple static ID to everything and except for really large juicy targets spammers don't even watse time on this.
All of my sites get zero spam with this filter
The Next Web has open-sourced its Google Tag Manager setup (https://github.com/thenextweb/gtm), which has things like Scroll Tracking, Engagement Tracking (riveted.js), Outbound Link Tracking and lots of other things that are not in the default GA setup. They have recently added support for AMP.
In my experience it allows clients to get up and running with a useful GA setup in a couple of hours and means that you as a developer don't get bothered to make trivial changes.
Scroll Tracking, Engagement Tracking (riveted.js), Outbound Link Tracking and lots of other things that are not in the default GA setup.
In GA's case, none of this is personally identifiable (Google actively strips info which could be PII out of listings), and the way Scroll Tracking is implemented could only ever be used in aggregate. So that is something.
I'm a more technically inclined marketer, but I make damn sure to check with an engineer before trying anything fancy with JS, and I make sure to test with QA.
But having stuff in GTM more or less means you have separate workflows for code outside of your existing repo. Yes you can dump a JSON export of the container and commit that, but it definitely can cause some headaches when engineers not super familiar with GTM or how it is setup have to touch things that impact it (or vice versa).
Aside from GA, most companies have a shit load of 3rd party scripts that run on their pages, so GTM provides a central place to manage everything.
One problem is that anyone with editor access to GTM can inject just about anything (unless you block custom scripts within GTM which has other implications) and those deployments are real-time and done directly within the GTM web interface. I'm a developer on the marketing team, so I'm a fan, but it's risky in the wrong hands.
People who block my tracking scripts don't want to be tracked, so I won't track them.
I use that info to see how people use a product, how they interact with it and what I can do to improve it. Where my time and money will be best spent.
If people want to block them, that's fine, I'm not going to try and get around them, but their "voice" is also muted here. I'm no longer factoring in their usage patterns, their usage at all.
I don't disagree with your mindset at all, but could you be missing out on a large percentage of users and now know about it?
I've had it happen before, i'm not going to make the same mistake again.
It's kind of a hamfisted approach, but if you block tracking, i'm going to treat you like you don't exist. It's not that i'm trying to punish anyone, it's just that the analytics seems to be the ONLY reliable source of getting any kind of information about what people use. Surveys only reach an extremely small subset, or nobody at all (i've had them get a 0% response rate while there were tens of thousands of daily users), unsolicited feedback is almost 100% negative and a large percentage of it nonsensical (things like "I hate the new update"... What am I supposed to do with that?), requests for feedback on new features or changes might get a few good responses, but I have no way of comparing those responses with actual usage (especially when one gets linked on reddit and suddenly gets 10X the number of responses simply because it was linked on reddit). And then if you decide to go against what your 3 responses to a problem requested, you'll get more blog posts like "x asks for feedback, does whatever they want anyway". I'm sorry, but I'm done with that. I go by usage numbers and patterns only now.
Are you saying that you ignore server logs?
The short answer is that it's significant on an aggregate level worldwide, but the reality is that it varies _massively_ by country, device, day of the week , and even different sections on the same site. Additionally, there is small percentage of pageviews that have JS disabled you have to account for. This analysis was on HN earlier today  saying 0.2% of pageview worldwide have JS disabled, but, again, with huge variation (notably, Tor, but elsewhere too).
Q4 numbers are not released yet, but the trend is generally up, with some notable drops. Get in touch if you want more info or to set it up on your site .
With larger businesses, you'll probably see more server-side implementations as they have the budgets to ensure the data they're collecting is accurate. For a blogger or a small publisher without a dedicated tech team, there's nothing easier than dropping in a script tag and watching the data roll in.
The folks at Segment.io warn their users to expect ~20%, with the caveat that blocking rates vary wildly between demographics .
Its not technically difficult at all. It takes a few clicks to install and one click to disable on the minority of sites that don't work with ad blockers.
I highly doubt I'm entirely atypical.
I have also personally seen around 30% users use ad-blockers, for a site with around 100,000 visitors a day. However, most of the audience for that site is people in twenties, so it's not surprising to see higher than average ad-blocker usage.
Purely for self-protection/anti-aggravation I absolutely recommend it to every casual user I advise.
Realize that the link in either the chrome or firefox case will be to the official addon site and in the mind of the user safe doubly so since it came from someone trusted.
What percentage of people who see it will spend the 3 clicks to install it?
* Note I know you have no incentive to actually do so its hypothetical many people are encouraging non techies to do so and have been doing so for a very long time the percentage that are aware of adblocking is increasing.
My non-technical friends have never mentioned ads to me before in the context of the web. I doubt that means they appreciate ads on sites but I don't think it occurs to them that they need to find a way to remove them. I think they appreciate that Hulu lets you pay to remove them, or that Netflix doesn't have ads, but I never hear "this website sucks because of ads." They just assume it's the way of things.
> The other thing about ads is that 41 percent of millennials are using ad block. My daughter has ad block and she goes around infecting every machine she gets to. She puts it on everything.
> But the other thing is that she lives in incognito mode. She’s a total nightmare for advertisers, because she’s not leaving any cookies and she’s not seeing any ads.
Digital privacy is an undeniable rising trend. Just stating the vast majority of people are not using adblock is, at minimum, shortsighted.
Seems like she has the right idea to be honest.
Poor things, back to not knowing which 50% of the budget they're wasting, like it's the XX century.
We can only hope.
And I run everything through Segment rather than embedding individual trackers directly. Segment either relays data to other services or loads required JS on page load.
Of course my homegrown analytics reporting is far from Google's, but at least I have found a great balance between getting useful usage data on my sites, and at the same time respecting the visitors' privacy.
The one thing I didn't like, and stopped using it was the pricing , which looks like has now been updated to be more realistic.
Definitely checking it out again now.
Anyhow, for many websites you'll get more accurate traffic data with GoAccess parsing your logs and showing you page views and basic demographic data. Use it alongside Google Analytics if you must, to see the exact difference between what Google tells you your page views were versus what your server tells you.
It's hard to know how effective the bot filtering features in GoAccess are compared with those of Google Analytics.
I operate a service that measures this (see another comment on this discussion), and all I'll say is you'll be very surprised how many bots actually execute JS, especially stealth bots. You have to be careful either way.
Interesting. Do you have any numbers you can share?
BTW, are you the same Peter Hartree on this Segment thread? https://community.segment.com/t/1889n1/how-common-is-client-... It would appear we've crossed paths before on this topic. Please do email me if you want to talk properly. That Segment thread has my email.
> ga('set', 'anonymizeIp', true);
That's a nice placebo that does almost nothing. Even if the packet body doesn't contain the IP address, it's still available in the IP header's Source Address field.
However, even if we assume Google - in a reversal of their general focus on gathering as much data as possible - doesn't recover the address from the IP header, their own documentation for analytics collection URLs with the &aip=1 parameter (which should be present when 'anonymizeIp' is true) says:
"... the last octet of the user IP address
is set to zero ..."
Their documentation even betrays their intentions:
"This feature is designed to help site owners comply
with their own privacy policies or, in some countries,
recommendations from local data protection authorities,
which may prevent the storage of full
IP address information."
Is this really a good idea?
- If you have multiple domains, sub domains, etc. make sure to spend plenty of time reviewing the cross-domain setup documentation and test it thoroughly.
- If you have high volume, frequently do deep segmentations, use lots of custom dimensions, etc., make sure you have a clear understanding of how sampling in GA works, how to tell if you are being sampled, and find ways to avoid it by pulling reports in different ways. Otherwise you can end up in a situation where you are making decisions off of .3% of your traffic and while Google's sampling algorithm thinks it is fine, comparison against other data sources often shows it is not.
- Make sure any reporting you do across things like GA vs. AdWords is done with a clear understanding of how they each report on paid search. GA reports on it by default on a last non-direct click basis. AdWords just counts everything AdWords touches. This means that AdWords can give you a good sense of where you are gaining traction, whereas GA can help you understand how it works in conjunction with other touch points, and perhaps how you might change the way you weight things and measure success.
- GTM is powerful and free, but with great power comes great responsibility. Also, it can be a real PITA sometimes.
- Annotations are a highly underutilized tool in GA and can save you a lot of headaches. I just wish there was a way to bulk import/export them via spreadsheet or API.
- You can't currently create goal funnels from event-based conversions (please Google add this!), but the workaround for the time being is to push virtual page views at the same as the event fires, and then create funnels off of those.
- User stitching sounds awesome, but is actually much more limited than you'd think from reading overview. You need a separate view (which means your main GA view you use can't segment for the stitched sessions for comparison--just the new view which only contains the stitched users). And there's a 90 day rolling data retention window, so you need some sort of export process if you care about that data. Unfortunately, this is pretty important data if you have lots of cross-device tracking issues.
- Depending on your volume, you can reach the hit limits of the free tier pretty quickly if you start tracking a ton of events (since they all count as hits). Here's a good overview  of what these limits are, how they work, and what they mean for you. When I got the scary notification, Google was exceptionally unhelpful in working with me to resolve the problem, despite considerable ad spend. After reducing them to what we thought would be fine, they were unable to assure me that our data would not be nuked, and basically couldn't give me any real info beyond "this is the policy." Super frustrating.
- If you have good logging of events that tracks both server and client-side, it is healthy to compare for variances monthly or quarterly. You'd expect client-side tracking to break more often than server-side, but it is important to see how much that can alter your numbers.
This language code is 99% of the time associated with bots. I had one site where 20% of all the sessions in a given month was such fake traffic!