Hacker News new | past | comments | ask | show | jobs | submit login
Google presents method to circumvent automatic blocking of tag manager (developers.google.com)
145 points by iamacyborg 9 months ago | hide | past | favorite | 81 comments



First-party tracking has been increasing in the last couple years or so I've been told.

When I first heard about it I decided from now on I'll run uBlock Origin in default JavaScript off mode. It's a checkbox in the settings.

I only re-enable on a per-site basis if I have no other choice.

I prefer this over NoScript, since it's one click and I still get all the uBlock filters even after enabling JS.

I've been using web like this for over a year now, and while it's painful, it's also enlightening.


I do that, too. cmd-J as shortcut to enable JavaScript without needing to reload. In my experience once I hardcode which websites I need JavaScript on, browsing becomes painless. And much faster obviously.


Alright but the last time I did this the answer to "which websites I need JavaScript on" was "All of them". Is that still the case.


You don't need all of the javascript for all websites. In my experience, I enable javascript to load from probably 10-20% of domains (using noscript).


How do you set the shortcut? When I press cmd-j the Library in Firefox opens. I have checked "Disbale javascript" in settings in uBO. I guess that setting will be disable for just the site i visit when i click cmd-j?


This shows the keyboard shortcut configuration: https://github.com/gorhill/uBlock/wiki/Keyboard-shortcuts#ac...

The setting is called "Relax blocking mode" and will enable JavaScript (but keep all ad blockers) if JavaScript is disabled by default.

In summary, for Firefox it’s under

  about:addons > gear icon > Manage Extension Shortcuts
and for Chrome you can set it under

  chrome://extensions/shortcuts


Not to mention how much faster the web feels with Javascript turned off.


Sometimes if a web page claims to require JavaScripts, the text can still be viewed if you disable CSS as well as JavaScripts. Another thing you could try is view source; sometimes the text or data will be visible in there, or you can find a link.

An idea which might be able to help more, is a script substitution feature, allowing the user to substitute their own scripts for some or all of the existing ones (and enable or disable or limit the rest of the scripts), by the URL or by the hash (if there is a integrity attribute, then it will not be necessary to download the file in order to determine its hash).


Thanks for telling me about this. I was already blocking most web fonts and suspicious third parties, but now I can also selectively unblock the sites I really need.


> Choose any path you want for setting up first-party mode. Examples of paths you might want to use include: /metrics, /securemetric, /analytics, or preferably a random alphanumeric string that you don't use on your website already.

Ah, yes, _preferably_ some random string that is hard to block rather than the descriptive ones.


To be honest I wasn't sure why sites weren't doing this already, it seemed very obvious to me at the time, when I learned that uBlock Origin has a hardcoded list of paths to block.


It is becoming common. For example Segment has a documentation for similar setup: https://segment.com/docs/connections/sources/catalog/librari...

I've seen this being implemented at least 5 years ago so probably a lot of sites already do it,


Yeah, it’s just that some trackers are more ubiquitous than others and gtag is one of the most deployed.


You know, like malware.


It's more like the opposite of what malware does--malware usually names things so they will look like something you want. Like malware would be naming the directory something like /images or /fonts.


"Like" is being too generous when we know what, why, and how they do what they do.


Wow just in time for ManifestV3, which would prevent clever heuristics like looking at the entropy of the path segment.


Well yes, how else will you

> The setup in this documentation will help you recover X% more measurement signals on your website.


Obviously "recover" is not the correct word here. Maybe "swindle" is a better word, considering they are actively circumventing the users' wishes to block tracking.


It used to be that codepen used puppies-and-kittens... https://news.ycombinator.com/item?id=35981467


Obfuscation is the 'Google' way /s

"Privacy centric users hate this 1 trick"


As I read these comments, there is plenty of outrage at how Google keeps invasively tracking us. And rightly so. But I wonder about the other side of it: in my SaaS business I made a point of not using any tracking whatsoever and not letting Google peek into my users' behaviors. So, no analytics, no Google Fonts, no tag manager (obviously).

But I don't think any of my users noticed, and I don't think any of them care.


I think that we wear different hats during the day. As a product manager of a SaaS Enterprise application, I value data to improve the product; as a person outside the work environment, I value privacy.

Tough call.

On the other hand, I believe that browser tracking is only a small percentage of data being gathered about us, aggregated, and processed.

There are so many ways to track us, and being clever ain't really a way to get out of this system. Too late, too sophisticated.


> As a product manager of a SaaS Enterprise application, I value data to improve the product.

You can gather about 100% of useful data for improving SaaS Enterprise applications without selling out every single one of your users to Google, and it's not even hard.


Since you have clearly solved this problem, please tell the rest of the class how to accomplish this.


If they installed stuff to block it, that would be a sign they care about it at least in part?

It's hard for me to tell the difference between a site with no ads and a site with some ads if I'm always using unlock and all. I just care that it ends up not showing me ads.


It’s less about the ads for me so much as having some mechanism in place to prevent unauthorised data slurping.

Enough orgs have misconfigured cookie consent that this is fairly critical to me.


> I don't think any of my users noticed, and I don't think any of them care

Did marketing every run with it?


> But I don't think any of my users noticed, and I don't think any of them care.

You're probably right (unless you make a big deal of it). But you're doing something good. Nobody needs to notice it for it to still be good.

Also, do you get benefits in not having cookie banners, or (small) performance impact?


> Override the Host header to be equal to GTM-123456.fps.goog. Allow all cookies and query strings to be forwarded.

Did a security team review this? This leaks session cookies for your domain to Google in a way GTM did not previously capture.


> This leaks session cookies for your domain to Google in a way GTM did not previously capture.

Only if you set up your session handler to emit cookies that apply to all subdomains instead of using the __Host- prefix and the SameSite=strict attribute [1].

[1] https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Se...


I think the load balancer is the one forwarding all cookies to Google with this configuration. The browser has already sent this to your own domain/LB as first-party mode introduces yourdomain.com/page and yourdomain.com/metrics.


I don't think this would prevent the session cookie from being sent to tag manager. The tag manager document describes setting up a specific path on the website's normal domain, not using a subdomain.


You can issue cookies on a sub path though.


You can, but it's typical to use / for login cookies. And I don't think you can issue cookies that exclude a sub path.


This is incorrect, the documentation in the article involves configuring an L7 load balancer to route a path on the same domain as the origin to Google Tag Manager. This means even `SameSite=strict`, `Secure`, `HttpOnly` cookies will be sent to GTM, if the instruction I quoted is followed to pass all cookies and query strings.

It's weird that the document specifically says "all cookies" - that gives GTM access to every cookie sent to your application.


> Setting up first-party mode may help your tag setup perform better, resulting in recovery of lost measurement signals, in a privacy-safe way.

Does Google explain what "privacy" means in this context? I'm unfamiliar with Google jargon, but I'm sure it's completely different from the dictionary definition.


They mention full IP obfuscation.


That's hardly assuring considering they still make detailed profiles of everyone regardless of whether they even have a Google account.


this ^^


Only weird thing is why they are so late to the party; first-party proxying is so obvious (and not novel) solution, they could have pushed this out already five or ten years ago.

This is also the reason why I have been always dubious on PiHole and other DNS based or network level blocking solutions. Blacklisting DNS entries or IP addresses was never going to be workable approach.


In the last 24h my PiHole blocked 6500 queries. Maybe I don’t catch some first party analytics, but I am sure catching loads of other things.


This leads to an interesting phenomenon that increases tracking in some poorly designed browsers, despite the wishes of both site owners and visitors.

Sites will end up using this, and some site owners will also install privacy compliance solutions to regulate these trackers in accordance with user rights and consent. Misguided browsers such as Brave bluntly block some privacy compliance solutions, resulting in an increase in unconsensual tracking from first-party-fronted trackers.


Does this not change who is responsible for the data shared with Google?

With this the owner of the host is sending the scripts collecting data, where before it was Googles servers.


I guess this would defeat DNS-based blocking approaches, but more sophisticated tools like uBlock can still cope with it, albeit requiring larger blocklists.


That becomes a problem when Google Chrome limits the number of filter rules extensions can add. https://developer.chrome.com/docs/extensions/develop/concept...


Only for people using chrome.


What do you think the point of creating Chrome was?


We're a web company. We should own a critical technology for getting on the web. Better experience surfing means more traffic for Google. But also, if we own the browser, we can push Google services and make them better and more convenient. We can also influence web standards to our favor. And a nice side-effect is we can better track users.

So--one part we're an ad company and one part MS-style "embrace and extend" the web.


To not allow Microsoft IE to dictate web standards? To gather more search queries? Chrome beta released 15 years ago, that's such a long con I'm not sure I buy it.


When Chrome came out, Microsoft was in no position to dictate anything web-related; they barely managed to get IE7 released and it was playing massive catchup to Firefox (and Opera) in regards to standards, and didn't really bring any new web features that weren't already in other browsers.

MS really dropped the ball after IE6. Notably in Europe Firefox actually temporarily surpassed IE in market share, before Google came and stomped everyone.


Maybe you were not using IE at that time, but surely you had to create IE compatible content if you worked as a web developer, and this was really hell. If window.forms, if ie6 if opera, if ns6 - ifing hell there was I tell you!


> Microsoft was in no position to dictate anything web-related; they barely managed to get IE7 released

I recall a blog from that time titled Chrome is the new IE6.

On the corporate side, we paid attention when an SAAS site had a "Built For Internet Explorer" badge. Elements can+did falter in other browsers (because devs only dev'd for IE).


it's not a "long" con, it's a con which keeps on giving.


AKA the vast majority of browsers out there.


There's never been any reason to think Google wouldn't pull something like this with Chrome though. Which is why people should be advocating switching to Firefox.


> Which is why people should be advocating switching to Firefox.

Pity about Mozilla having recently switched to the dark side themselves, becoming an advertising company as well.


Well, if it’s okay for them, then that is still their decision.


It's what google was counting on, user ignorance and apathy, to defeat adblockers.


I mean sure but ad blocking is still a killer feature. If users see ads the frog jumps out of the pot.


> Well, if it’s okay for them, then that is still their decision.

A decision reached because both

    informed & consent
are minimized as much as possible. This is a core method of unethical marketing.


it will convince more people to switch


Frankly, this just gives more reasons to switch.


Sadly, most people don't care or understand these "tech whiz" things. Here in our little corner of EU, I see massive Chrome adverts everywhere about how convenient it is to save your password in chrome password manager, sync bookmarks across devices, how all other browsers are baddies leaking your private details and chrome can save your data and other fear mongering adverts.

Basically, these adverts are blasted at you 5-8 times a day depending on how many YouTube videos/shorts you watch, also I recall seeing same advert even on someone's FB timeline being shared by their friends to switch to chrome.


afaik uBlock wouldn't be able to block these in most scenarios since the resource requests look like any other. ublock is still based on huge lists


I'm sure it's only a matter of time before folks are able to fingerprint the responses that come back from the actual service's origin vs Google's service, or perhaps by abusing the health-check that exists at /healthy


Which wouldn't help much when tracking happens on server-side. By the time your adblocker is able to analyze the response Google will already have tracked your visit.


The end game for ad blocking is to render each page in its own vm/container, have some AI blank out things that look like ads, and stream the transformed video to the user


It depends how literally you take “ad blocking”. A lot of folks use an ad blocker for privacy purposes.

I use an ad blocker but would have no issue with static image ads. I just don’t want to allow ads that are shown via RTB.


But then your ad blocker will need to introspect and run rules on the contents of every request payload. The impact to web browsing performance would be prohibitive.

And if it got to that point Google would just randomise the payload. It's pretty easy to do with obfuscation tools.


> your ad blocker will need to introspect and run rules on the contents of every request payload. The impact to web browsing performance would be prohibitive.

Could ad blockers run WebAssembly? I suppose it will be up to the task, because it means minimum work for a GC, and no overhead coming from dynamic types of js. With the jit compilation it will be comparable by performance to a native code and native code has no issues dealing with every payload byte-per-byte.

> And if it got to that point Google would just randomise the payload.

And then ad blockers start to measure entropy.

> It's pretty easy to do with obfuscation tools.

It is easy to do, but obfuscation really works only when no one is targeting you specifically, when you are defending yourself from bots that try random targets in hopes to find vulnerable ones. Against targeted attacks it becomes an arms race, so you'd need to change constantly, and eventually you will need to spent a lot of time discovering the ways how your obfuscation is defeated, so it comes to an equal amount of difficulties for both sides.

On the side note, I wonder is there possible an attack of poisoning google stats by sending the fake data from the website. Probably the Google's trick to overcome this threat is to control CDN, so it gets the data from the trusted server.


In general I see more and more websites switching to using the actual domain to bring in ads. For example, IMDb hosts ads on the actual domain (no subdomain) making it impossible to block that script with DNS blocking.

And on iOS they employ certificate pinning so you can't even inspect the traffic with MITMProxy.


I was waiting for something like this since the "death of the third-party cookie"


Seems a bit simpler than setting up a Server Container. Am I correct in understanding that this is meant to run on a CDN or load balancer? I'm curious what kind of effect is has on performance vs making direct requests to endpoints.


"The setup in this documentation will help you recover X% more measurement signals on your website."

Caught! The conclusion was written before they actually knew if it worked.


"X%" here likely means "single digit number of percentage points", as opposed to something like "XX%".


Google has had a similar approach for a while now with server side containers.

Works flawlessly to prevent ad-blockers from blocking tracking tags and integrates with platforms like Shopify to allow their user tracking to be hidden and unblockable.

And because Chrome allows long-lasting cookies sites can track you for retargeting for over a year compared to a week for Safari. And because they add every API under the sun fingerprinting is 99% effective. Thanks Google !


> Thanks Google

I have to ask because you never know: was this sarcasm?


Yes


Ok thank you. I though it was but I read some wild takes on this site so I had to ask before typing an unnecessary rant.


Is the filename still gtag.js?




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: