
Show HN: Bypassing ad blockers for Google Analytics - StefanoC
https://analytics-bypassing-adblockers.netlify.com/
======
tjpnz
Those who would consider doing this deserve a special place in hell right next
to devs who don't respect user privacy and the crooks in the advertising
industry who turn a blind eye to the fact they're distributing malware. By
installing an adblocker I've made a conscious decision to not have your BS
running inside my browser. Forcing it on me will at the very least result in
me disabling JavaScript on all your pages.

~~~
heliodor
If we're going to use ad blockers, at least let's admit to what we're doing
and not claim a moral high ground.

You're implying the creator of the website is okay letting you receive the
service or content on your terms. They are not. Ads and tracking are there
because they earn the creators some amount of money.

One day when our tech will limit you to a binary choice of ads+tracking versus
paying money, which way are you going to swing once your hand is forced?

~~~
mathnmusic
This is quite silly. Am I also supposed to compulsorily watch all the ads that
come up on TV in the commercial break and not mute it or leave the room?

~~~
joshmanders
Just like ads on your TV you can "leave the room" by closing the page.

Whats with this entitlement that you shouldn't have to endure ads but also get
to have the content too?

~~~
mathnmusic
I said "leave the room during the commercials". If what you meant is that I
can't leave ONLY during the commercials, then this comes across as extremely
user-hostile approach.

Should a user be forced to NOT mute the commercials? I'm clearly in the "NO"
camp on this issue.

~~~
joshmanders
Can you fast forward through commercials? All you can do is mute the TV and/or
leave the room. Even in systems like TiVO where you can rewind and fast
forward they still mostly block you from fast forwarding commercials... No
different than circumventing adblocks, but nobody complains about that.

Hell even TV networks track you. They know how many viewers are on certain
shows and that. That's how they're able to garner high prices too.

Advertising is just something we always had to deal with. You don't have to
watch commercials. You do that by not going to channels that have commercials,
or using different services that you pay to not see commercials.

My point is, you are entitled to not be tracked. You're entitled to not have
to see ads. But you are _NOT_ entitled to the content without those if the
website decides the trade-off of you getting that content for free is by
enduring those ads.

Close your window. Go somewhere else for content if the site you're visiting
displays ads.

------
beagle3
Meh.

If you're using GA to prove your site's worth, e.g. in some M&A deal, this is
useless - your proxying means that you can fudge numbers and thus is no better
than anything else you say. (This is a significant use case among looking-for-
exit startups).

If you're using GA to get insight about your website, it would be somewhat
useful, but not really - because GA would not be able to correlate the cookies
to figure out the demographics, etc (and I don't know how much it would trust
Via / Proxy-for headers, so other statistics it gives you are also limited).

Also, if you have non trivial traction, you're going to get flagged by their
fraud filters.

You're probably better off running a local Piwik or whatever it's called these
days.

~~~
StefanoC
Could you please expand on fudging numbers and fraud filters?

The original question that I was trying to answer was if the numbers that I
was seeing for mobile users were skewed by how much more difficult it is to
get an ad blocker for mobile.

~~~
Nextgrid
Fraud filters is about GA not expecting such a large number of events from a
single IP. You’d be sending _all_ your visitors’ events from that single IP -
at some point GA will ignore your traffic or give you a captcha (effectively
blocking the analytics because it’s not designed to handle the captcha
response).

------
Nextgrid
This is akin to bypassing antimalware protection by hosting the malware on
your own reputable site.

What are you trying to achieve here? Your entire domain will just end up
blocked if you do this at scale, not to mention Google themselves would ban
your reverse proxy’s IP because of too many queries (since you’ll be proxying
all your visitors’ requests from a single IP).

~~~
taneq
To be fair, self-hosted ads are a thing on some sites and often don't get
blocked by adblockers. I know I don't specifically go out of my way to block
such ads because they're generally on sites that I'd like to support.

~~~
Nextgrid
Do these self-hosted ads also embed malware (stalking/tracking code)? If they
do not then I'm 100% with you and would totally support this kind of self-
hosted advertising.

However this example is a bit different, the site in question is going out of
their way to being a reverse-proxy for a spyware command & control server, and
the entire domain should be considered & blocked as such.

~~~
taneq
If we're just talking about tracking then they don't need to because the site
already gets all of your requests. They inherently know what pages you're
viewing on their site because they gave them to you. A great many sites run
this kind of analytics (often including client-side ones to track user actions
- think medium.com's "most highlighted paragraph) and it's not considered
malware.

If you're talking about them selling the data gathered by these, then that'd
be less common but certainly not unheard of. If you're talking about them
doing something more nefarious on your machine (keylogging/cracking) then
hopefully that's pretty hard to do against a modern browser and any site
caught doing so would never get any traffic from me again.

~~~
Nextgrid
The problem we're discussing here is not about the site having a record of all
the _legitimate_ requests needed to load a page.

The problem is that the site is now serving a piece of (third-party, but
that's besides the point) malware explicitly designed to monitor events that
would normally not cause a network request (and thus wouldn't be logged), and
then sending that to a malicious third-party through a reverse-proxy.

------
kevingadd
If you're hosting the analytics on your own domain, is it really even
something an ad blocker should be blocking? It's not coming from a known
third-party service domain (for ads or tracking or otherwise) so there's no
real reason a blocker should be blocking it. It's first-party analytics on
your own website. The fact that you're implementing it via reverse proxying is
kind of an implementation detail, because at any point it could stop being
Google Analytics, or an existing first-party analytics solution on a website
could become GA.

It is kind of unfortunate that third-party tracking can 'hide' this way but in
this case there's not really much you can do if the content author is going
out of their way to pull a fast one...

~~~
reitanqild
> The fact that you're implementing it via reverse proxying is kind of an
> implementation detail, because at any point it could stop being Google
> Analytics, or an existing first-party analytics solution on a website could
> become GA.

I think you (probably unintentionally if I understand you correctly) actually
just pointed out a good reason why those who really really care should block
analytics even from the same domain as the site they are visiting : )

Not that it will help against a determined web site owner trying to track
though: Very much of the tracking can be done one the server side (and even
proxied from the server side to another third party).

~~~
kevingadd
Right, my point is essentially that I don't think it's realistic to try and
block first-party trackers. They're indistinguishable from page content. The
closest you could get would be the 'disable javascript' hammer but there are
non-script-based ways to do first party tracking pretty well, I'm sure.

I get why people would want or expect tracking blockers to work on reverse
proxying but it seems silly to try. On the bright side, if the tracking is
being done first-party it makes it much clearer who's taking your data and
who's responsible for where it goes - it's going through them even if they're
just bouncing it to another server.

------
rvnx
Nice try but doesn't work on Kiwi Browser ;) Shows "This content should be
overriden by GTM". This is because an heuristic is used instead of a
blacklist. So to answer, yes this can be blocked easily.

~~~
StefanoC
That's interesting, and good to know! I wonder if the heuristic can be
bypassed by changing the code (e.g. adding a semicolon) or changing the URL
further.

~~~
rvnx
Of course it can be bypassed and it's not very difficult. It's just that the
way of filtering is different (many browsers / extensions are just
Easylist/Disconnect clones)

To go further on the proxy idea, I think that the best strategy could be to
actually do server-side calls to GA: [https://ga-dev-tools.appspot.com/hit-
builder/](https://ga-dev-tools.appspot.com/hit-builder/) (yes there is an API
for server-side hits).

The minus of the proxy idea, is that since you don't have access to
*.doubleclick.net (which should be blacklisted by any decent track/adblocker)
you don't get demographics info back into GA.

But after all, like other comments said, aren't you simply a first party
tracker ? GA is just a more evolved storage point than, let's say using
goaccess on raw logs.

~~~
StefanoC
> To go further on the proxy idea, I think that the best strategy could be to
> actually do server-side calls to GA: [https://ga-dev-tools.appspot.com/hit-
> builder/](https://ga-dev-tools.appspot.com/hit-builder/) (yes there is an
> API for server-side hits).

Yes, probably big players would like to use server side analytics! But that's
a bit too involved for small websites.

> The minus of the proxy idea, is that since you don't have access to
> *.doubleclick.net (which should be blacklisted by any decent
> track/adblocker) you don't get demographics info back into GA.

When I pull down Google Analytics I also change its content to make it point
to the reverse proxy itself. I didn't find any call to that domain being
blocked, so I didn't do it for that particular case.

I think that the data collections is done via [https://www.google-
analytics.com/r/collect](https://www.google-analytics.com/r/collect), which I
do proxy. Notice however that sometimes an easy list filter kicks in and
blocks that just because it happens to match "r/collect". I think there is a
race condition somewhere that makes it not work sometimes, because I couldn't
replicate it consistently. Anyways, it would be as simple as changing that
domain specifically to something else. I tried doing so, but Netlify's
redirects where playing up (possibly because I'm on the free tier) so I gave
up. The concept of masking the domain/url still applies.

------
maaaats
Since it goes through a reverse proxy, wouldn't it _not_ leak personal data
the way using it directly would? If using GA directly, the browser uses my
google-session data which GA can track between sites/domains. But here the
proxy only gets the unique session for this proxy, so it doesn't know who I
am. Or?

~~~
StefanoC
I checked the analytics dashboard yesterday and updated the website: the only
data that I'm not getting though is the users country/city and their provider.
So in a sense it's better for your privacy: the IP is not your own!

I'm not an expert of Analytics but I'm also assuming that since the cookies
are different (because the HTTP call to analytics happens on a different
domain than usual) it shouldn't be able to track you just as well: G Analytics
don't know your IP and have no trace of your previous anonymous IDs set in
your cookies!

------
userbinator
It's an ongoing cat-and-mouse game. This is like the inverse of people using
VPNs and proxies to get around filtered Internet, except it's now the _server_
that does the tunneling instead of the client.

Personally, I've found that JS off and all the GA/GTM domains (along with many
others) blacklisted is sufficient in daily use; no JS gets rid of most of the
crap, and the blocked domains clean up the rest. My goal is not to become
completely untrackable (I believe that's next to impossible), but just to stop
slow-loading pages full of junk I don't care about (which is what I suspect
most people using _ad_ -blockers are aiming for.)

------
Cynddl
> Hello from Google Tag Manager. This text is being added by a tag running
> from GTM.

One should note that this inclusion, without an opt-in consent banner for
instance, is not GDPR compliant. The URL [https://analytics-bypassing-
adblockers.netlify.com/proxy/htt...](https://analytics-bypassing-
adblockers.netlify.com/proxy/https://www.google-analytics.com/r/collect?..).
sends personal data to a third party (Google) without my explicit consent. See
Article 7 and Recital 32 of the GDPR:

> Consent should be given by a clear affirmative act establishing a freely
> given, specific, informed and unambiguous indication of the data subject’s
> agreement to the processing of personal data relating to him or her, such as
> by a written statement, including by electronic means, or an oral statement.

~~~
ddebernardy
> One should note that this inclusion, without an opt-in consent banner for
> instance, is not GDPR compliant.

IANAL but as I understand GDPR, this is incorrect. The paragraph you cite
discusses _personal_ data. Google's FAQ on GA is instructive (emphasis mine)
[0]:

> When using Google Analytics _Advertising_ Features, you must also comply
> with the European Union User Consent Policy.

They admittedly keep things as vague as they can, but to me it kind of reads
like: using GA to collect site usage analytics is actually fine and requires
no explicit consent as long as you've configured it to anonymize the IP
addresses (toggle this in GA) and you're not tracking e.g. user IDs and such.

Similarly, using GTM to deliver a paragraph like OP did is also fine.

In both cases the spirit and the letter of the law would seem to be respected
if you add some notice about tracking going on in your footer. No explicit
consent is needed here, because no personal data is getting tracked.

Edit: clarity.

[0]:
[https://support.google.com/analytics/answer/2700409](https://support.google.com/analytics/answer/2700409)

~~~
Cynddl
This website does collect personal data. Google's FAQ on GA simply states that
the first party should obtain consent before transferring data to a third
party (and transfer of consent might not be GDPR compliant, but that's another
issue).

Here, the first party (analytics-bypassing-adblockers.netlify.com) has to
obtain consent before collecting personal data. And IP addresses are not the
only personal data that GA can collect.

------
Xelbair
I remember when modern telemetry gathering practices were labeled a
malware/adware..

~~~
distances
Especially the phone home of ZoneAlarm, that blew up quite big. And to think
that's what basically every application does nowadays.

------
tex5
[https://rrregain.com](https://rrregain.com) does this as a service. There are
others as well but most do not use your own domain.

~~~
StefanoC
Interesting, do you know if they rely on the same principle of using several
domains, making it harder to block?

~~~
tex5
I'm not sure, it uses your own domain, thus www.google-analytics.com becomes
yourdomain.com/analytics.js. Not all requests are proxied, only the ones
blocked by adblockers.

Taking this further, you could have your server send an event to GA when
/index.html is requested, this can even be from tail -f access_log. No one
will know GA was requested.

~~~
rbinv
In general, you wouldn't be able to access third-party cookies this way,
though.

------
highace
I implemented something like this on a site visited almost exclusively by
developers, assuming that developers must have amongst the highest adblock
usage, and that my real visitor numbers according to GA would be much higher.

I saw a boost of about 7-8%. Remember, most adblockers (like Adblock Plus)
don't block Google Analytics. uBlock and Ghostery are probably the 2 main GA
adblockers, but as a % of adblockers as a whole they're not that large.

It's probably not worth it.

------
everdrive
This is unfortunate, but it simply means that we have three options:

\- Block entire domains \- Prevent javascript from running \- Use the internet
less, read books, use your local library.

Happily, I was able to get my browser from the default message: Hello from
Google Tag Manager. This text is being added by a tag running from GTM.

To the blocked message: This content should be overridden by GTM.

But, how far will this game of cat and mouse go?

------
ionised
No personal offence intended, but I hope this project dies on its arse.

It's malicious software, circumventing the protections afforded to me by my
ad/tracker blocking software.

I'll contribute in any way I can to adblocking tech, and to any impotency of
this kind of technology.

~~~
StefanoC
None taken. Believe it or not I'm mostly on your side. I published this
because I've managed to do this in 4 hours, for fun. It exploits the url based
blocking which is so prominent but so easily subverted, and If I've done it
anybody can, so I wanted people to know.

Having said that, I must add, I don't think this is malicious software. Beside
the legalities and the GDPRities which I may have overlooked, when you ask a
website for its content that comes with analytics, but you want to block
analytics. I don't think you can complain about the content provider bypassing
your attempt at blocking it. Don't get me wrong, when I come across websites
that stop me from browsing them because I use uBlock I usually bypass their
block, or close the tab, but I can hardly complain at their attempt, or deem
it as malicious, IMHO.

------
judge2020
Would like to know, does Google Analytics actually use data for tracking/ad
targeting? I thought it would only track users if they embedded the AdWords
script. If so, why is it blocked by UBO and Ghostery?

~~~
mcintyre1994
I've always just assumed it does, in the same way I assume Facebook's like
etc. buttons do plenty of tracking even if you don't interact with them.

------
deca6cda37d0
I blocked GTM and GA with Little Snitch... your bypass doesn't work

~~~
StefanoC
Please explain, I use Little Snitch too!

------
stunt
:popcorn:

------
pdkl95
> [ This content should be overridden by GTM. ]

lol... pages look better if you send the actual document instead of _assuming_
you have permission to run software in my browser.

~~~
StefanoC
It's a proof of concept. If it doesn't work for you then you are meant to know
that :)

It's not a bug, it's a feature!

