Hacker News new | past | comments | ask | show | jobs | submit login
AdGuard publishes a list of 6K+ trackers abusing the CNAME cloaking technique (github.com/adguardteam)
210 points by KoftaBob on March 4, 2021 | hide | past | favorite | 77 comments



Related ongoing thread:

The CNAME of the Game: Large-scale Analysis of DNS-based Tracking Evasion - https://news.ycombinator.com/item?id=26347110 - March 2021 (51 comments)


This is also part of AdGuard Home . I think this was the PR that fixed it - https://github.com/AdguardTeam/AdGuardHome/issues/1185 .

If you haven't tried out AdGuard Home, I can highly recommend it. Has same feature set as Pi-Hole and support DoT as well. It's also super trivial to install since it's just a Go binary. Have been using it for ages now and love it!


I want to try these DNS-based blockers (AGH or Pi-hole) but am always wondering: is it easy to temporarily disable, or "debug" them?

I have encountered multiple times (not common, but not trivial) that a filter blocks something wrong. With traditional ad-blocker as extension, I can quickly find it out by using build-in logger, and then simply either temporarily disable them or add the site to whitelist with a single click (if I feel like it, I can write my own rule too.)

If I have to change my DNS setting everytime this happens with these DNS-based blockers, I feel like to stick with extensions since I don't really use my phone to browse Internet too much.


PiHole has a web admin UI that's pretty slick. It has options to disable the entire thing indefinitely/for a set period of time if you need to, and it can log all DNS queries, so you can override/manually block anything you need. There's also nifty charts and metrics to show you how much traffic has been blocked.

I found that, after tinkering with blocklists for a bit, I turned off logging altogether and just let it run. The one thing that gives us grief occasionally is (unsurprisingly) tracking links from promo emails and social media. These are usually easy enough to bypass, but it can be a pain for non-tech-savvy people.


Thanks!

Does it have "cosmetic filters" (the ones that block certain elements on page) or similar feature?


This is a feature that has been requested, but isn't implemented yet. Suggest you give it a thumbs up on github as they implement things in order of highest number of thumbs.


Please don't, as explained in sibling comment it's not possible with DNS blocking.


It wouldn't be DNS blocking. The feature request on the project is about adding an HTTPS MITM Proxy that would then do the cosmetic filters among other things.

AGH already supports adding AdGuard filters, but for obvious reasons it only applies domain based filters. Adding the MITM proxy would allow for processing of the cosmetic filters too.

More info:

- https://github.com/AdguardTeam/AdGuardHome/issues/391

- https://github.com/AdguardTeam/AdGuardHome/issues/1228


Ah - fireattack's question was regarding Pi-Hole.


No. It inherently cannot. Pi-Hole isn't a proxy where all traffic is flowing through it and has a chance to be modified. Pi-Hole is strictly answering the question, "What is the IP address for this hostname?" If a given hostname is known to host trackers or something undesirable Pi-Hole will claim it's an unknown host so the device is unable to reach it.


Yep, literally a button that says Disable / Enable.

The only problem is browsers like Chrome that are pretty aggressive with DNS caching.


I setup 2 wireless VLANS on my network and one uses filtered DNS. Just swap between networks as needed. Of course most people aren't going to have that capability.


Another AdGuard home user here!

I actually discovered it when I wanted to install pi-hole on my mac server and it just wouldn't work besides with the Docker container, which had other issues like not being able to see the client IP that made the request.

Been running AdGuard Home for a couple months now and it's really nice!


Totally! I used to use pi-hole, and I genuinely much prefer AGH.


AdGuard is all around great. Been using their native apps, too.


Worth noting that PiHole also resolves CNAMEs and blocks them if they're on your filter list.


I had thought that their all software is proprietary, but it's OSS and looks good.


I wonder if the end game of adblocking is simply to proxy the entire page to the client - completely obscuring who you are or where the request originated.

edit: I almost wonder if this could be done safely in a decentralized manner. Everyone runs a BitTorrent like service and when you make a request the swarm proxies it to a specific node who then serves the page back to you.

Under this scheme it would become impossible to trace who you are.


Hiding every URL behind a proxy seemed like the obvious response ~5 years ago[1] when the popularity of DNS filters like the pi-hole was rapidly rising. The proxy could even increase the obfuscation by rewriting URLs and filenames so every HTTP request to the proxy was just random noise:

  https://proxy.example.com/ZGQgaWY9L2Rldi91cmFuZG9tIGJzPTEgY291bnQ9NTIgfCBiYXNlNjQgCg==
However, the "end game" of adblocking is far worse - the entire page becomes this:

  <!doctype html>
  <html lang=en>
    <head>
     <meta charset=utf-8>
     <title>null</title>
    </head>
    <body>
     <canvas>Run the WebAssembly blob to render the page</canvas>
     <script src="load_webasm_blob.js"></script>
    </body>
  </html>
The entire page becomes an giant obfuscated WebAssembly blob that renders the page into the canvas tag. The technologies of the open web like HTML become legacy baggage; a "web page" is just a stub loader for what would effectively be a statically linked executable binary that uses the canvas tag as a generic framebuffer.

In this "end game", URL based blocking is irrelevant; most page assets become part of the blob. The question "is this an ad/tracker/virus?" becomes undecidable; answering any question a page's behavior requires running potentially hostile code or solving the Halting Problem. Some people don't want an open web that respects things like user agency. They want control. They want the "web page" to be an opaque binary blob that nobody can investigate or modify. They want full control over what the user sees and is allowed to do. They want TV.

[1] https://news.ycombinator.com/item?id=10294187


That’s not the endgame, because advertisers don’t trust publishers: they need their ads to load from, and phone back to, servers that they control. If they trusted publishers enough to do something like you describe, they’d already be evading adblockers by just serving the ads from the publisher’s page.


I think you have it backwards. The advertisers can serve the publishers' content themselves, rather than the publishers serving the advertisers' content. See Google's AMP.

Of course, the publishers will need to trust the advertisers, but I don't see how they have much choice...


Pretty sure that website is not ADA-compliant.


But that'll be addressable when WASM gets its DOM bindings, won't it?, no different than lazy loading a page's content while placeholders appear first


Actually, the endgame is just serving you all the content from one machine, and tracking you on the backend.


Maybe you want to take a look at stealth [1] as you've seem to understood the real strength of a browser with a decentralized (and delegateable) request system.

Personally I don't think the strength of p2p is only trust, it's delegation. If you have peer to peer encryption running, the possibilities are endless.

Add a statistical "proof of authenticity" and you have an unbeatable anti censorship mechanism that can also identify in-page modifications and weed out malicious MITMs in between.

[1] https://github.com/tholian-network/stealth


It will be the 99% of the internet in ignorant bliss roaming the censored walled gardens, and the 1% in a constant arms race with ad tech writing scripts to separate the content from the adware and view with their hardware on their own terms. Who knew The Matrix would be so prophetic?


The idea you propose in your edit is basically Tor, no?


I am familiar with tor but wasn’t sure how it was implemented. After briefly checking the tor wiki it appears you are right.

I guess the only thing that’s necessary now is to set it up so that the node could be another mobile client. Maybe it’s possible with webrtc or an extension?


Just deliver the goods through the Advertisers systems. I think it will work until we have an AI based adblocker that can identify and remove the parts of the content that are advertisement.


"Near-path NAT"[1] has been suggested as a mechanism that browsers can use to proxy requests through an intermediate server, similar to what you suggest.

[1] https://github.com/bslassey/ip-blindness/blob/master/near_pa...


Wouldn't this just create a whole new set of privacy concerns if you're having random people MITM your traffic?

Even if it's encrypted, I feel like someone smarter than me would figure out how to do bad things with this.


Yeah that’s the main challenge. With TLS in theory it shouldn’t matter but not all pages are served with https. Managing self signed certificates would also be an issue.


> I almost wonder if this could be done safely in a decentralized manner. Everyone runs a BitTorrent like service and when you make a request the swarm proxies it to a specific node who then serves the page back to you.

> Under this scheme it would become impossible to trace who you are.

This is how i2p works, in a nutshell. Every i2p user is a node in the network for others to proxy through.

Unfortunately, hiding your IP and geographic location isn't enough to stop fingerprinting and other forms of de-anonymization.


The most prolific trackers already require authentication; this approach will lead to every website requiring authentication. Impossible to trace = easy to attack.


I think the endgame is machine learning based blocking and Apple/Mozilla disabling API's used for fingerprinting.

I think Google pre-emptively locked down Chrome API to prevent ML based blocking from working. This would explain their fixed URL block list and removing interactive API


I've been thinking about this too. But taking it a step further. Streaming not only web pages but apps from a virtualized enviornment .


I think this is a better approach than what I described - but how would we make it so the virtualization doesn’t expose fingerprints? With VirtualBox for example it’s possible to figure out you’re in a VM from outside the guest.


You would just need alot of users/be very popular. And then have a standardized vm


There already exist Linux distros which do this natively.


> the end game of adblocking

in the future, given sufficient CPU resources on the clients, another method could be browser page-rendering engines that use advanced machine learning image recognition to categorize and blank out ads, no matter where they come from.


The next step is simply proxying the tracker through the original website.

/img/logo.png?uuid=..&res=1920&os=MacOS&osv=11.2

and when heuristics catch up with that ...

/img/logo.png?757569643D2E2E267265733D31393230266F733D4D61634F53266F


In the limit, they can always send visitor statistics directly to whoever is paying, without involving your browser. This can be done entirely on the server side.

But these changes are still good for privacy. These direct-proxy endgame methods will hopefully make it harder for ad companies to detect ad-fraud, making the ads less economical to begin with.


The game isn't between the host wanting to push ads and the client wanting to block ads. It's a 3-way conflict between the client wanting to see the fewest ads, the advertiser wanting their ads to be seen by the most clients, and the site host wanting the most ads to be seen by each client. And neither side trusts either of the others. If the entire transaction was only carried out between the site and the client then the advertisers would have to believe that the reports being delivered to them by the site owner are accurate. And they know from experience not to believe it. So they insist on having clients tracked themselves or by a third-party (fourth-party?) which is where all the complicated connections, alternate domains, and bloated Javascript come from.


You could do the same thing with a separate subdomain like this—the CNAME strategy is just using DNS to point to your provider so you never have to serve analytics requests yourself. If you self-host analytics, CNAME blocking was never intended to nor will prevent that.


OK, we can still run some user script or have a browser setting, which removes the ?757569643D2E2E267265733D31393230266F733D4D61634F53266F part for things like images. What would be the level after that?


The endgame is that the browser is only a thin client (think VNC) and the actual website is rendered server-side.


I think for ads this might actually be an improvement (compared to no blocking). A big reason why ads continue to stink is because they always seem to be much more poorly optimized than the actual site. I'm probably being too optimistic though, they'll probably just move the slowdown to the server rather than on the client...


Then nightly releases or AB versions of pages that change the image file name, it's place in the Dom, the structure of the t tracking links, so user scripts are difficult to maintain. Or the tracking company could just develop a browser and convince 90 of internet users to use it, infiltrate the web standards committee to gain control of web features, release new features that are not standard but get widely adopted, make many websites check for user agents of browser to ensure they are using the tracking companies browser, so most other viable browsers fail, then implement their own tracking in the that browser that cannot be blocked by user Javascript, and release a whole new OS that has browsers without extensions, or the basic ability to manage the flow of your information .....


But wait, that is today! fakes shocked expression


You just have unique names for images - isn't that already used for email pixels?

ie, se09d.png

Also used sometimes for standard cache invalidation efforts so common for both nontracking purposes too.

Worst case sites will just install a middleware layer to proxy all their requests and traffic back to ad tech machine I'd imagine or install some middleware on their stack.


https://en.wikipedia.org/wiki/Steganography

Think along these lines: website.com/blog/f5/babBys-first-bl0g-post-2f.html

Can you (or a computer program) answer definitively whether there's a tracking ID in that URL?


DNS is cheap, host proxying is less so


Not exactly. If the site is already using a CDN it shouldn't be too hard to add a custom route or something. Even better is if the ad company is also a CDN. That doesn't seem too hard to do, just rent a few VPSes around the world and you should be set.


still more than just pointing to an IP address


Be careful what you wish for. Ultimately you can't stop publishers from using third-party tracking code in a first-party context. This is just the easiest (laziest) way for publishers to do so.

It's not like publishers will give up on integrating third-party tracking code into their tech stacks once this technique is dead. They'll have to do so at a deeper level, e.g. reverse proxies, NPM modules, etc.

Next battle in this war: adtech code becomes indistinguishable; tracking IDs start appearing in all exit links from major publishers; adtech firms & publishers coordinate on the backend to sync those IDs. What then, get rid of URLs?


>Next battle in this war: adtech code becomes indistinguishable; tracking IDs start appearing in all exit links from major publishers; adtech firms & publishers coordinate on the backend to sync those IDs.

that can be easily filtered out eg. https://addons.mozilla.org/en-US/firefox/addon/link-cleaner/


They mean the link won't work unless it has the id, which could be as mentioned indistinguishable


It would be easy to come up with an encoding scheme that evades detection.


For fellow Pi-hole users deep CNAME inspection was added in [v5.0](https://github.com/pi-hole/FTL/releases/tag/v5.0).


For me this broke Google docs, but happy with it otherwise. It makes troubleshooting harder though since querying the blacklists will not show you the actual domain being blocked


I think that treating this as a tracking problem is misstating the issue. First party isolation largely solves this particular tracking problem On the other hand, this is a gaping security problem in which a website (the first party) willfully grants unlimited privilege to an unrelated third party. Let’s solve it with regulatory / legal fixes. If you willfully grant your origin permission to a third party (by CNAME cloaking, subdomain delegation by other means, loading third party JS, etc), then, unless the third party has the appropriate certification and/or contracts in place, the first party suffers the natural consequences:

- Loss of HIPAA and PCI compliance.

- Loss of trade secret protection for contents of the website.

- Liability for security breaches due to third party capture of sensitive information.

And anything else that follows. If e-commerce sites with dubious trackers and ad networks can’t take credit card payments, they’ll quit the trackers.


For anyone else unfamiliar with CNAME cloaking, here's what I think is a much better explanation of the issue: https://webkit.org/blog/11338/cname-cloaking-and-bounce-trac...


i'm embaresed that this really obvious and straightforward means of evading third-party-cookie-blocking never occurred to me before.


I confirmed that my bank, credit card, and stock broker on the list (mainly uses Adobe Experience Cloud), horrible.


I just proposed to this merged to NextDNS.io (a kind of Pi-Hole in the cloud) reference blocklists: https://github.com/nextdns/metadata/issues/601


We can never win this game. If the CNAME strategy is blocked, the web sites will themselves write the trackers into their code and funnel everything through the backend to the trackers. There is no way to detect this.


Only a matter of time til one of these trackers gets owned or bought by blackhats and starts impersonating other sites with their subdomain credentials. Green lock means I'm safe right?


Just asked if this can be added to the adblocker in OpenWrt.


Lots of "first-party" domains used the same IP address of known adtech servers, it's pretty easy to find if you have the data.


In the future, ad networks will serve content instead of content networks serving ads.


We should just make third party tracking illegal. We also need a GDPR for the US.


Wait, SaaS is evil now?

Even if track.a.com and track.b.com are both CNAMEd to eviltracker.com, the server at eviltracker.com doesn't have any special ability to crossreference traffic between those two domains. At this point, you're effectively blocking first party tracking.

Except, if you make it hard for sites to employ even SaaS-based first-party tracking solutions, then you probably encourage them to roll their own or use hosted first-party tracking, which is less likely to support features like cookie opt-outs or GDPR compliant data handling.


> server at eviltracker.com doesn't have any special ability to crossreference traffic between those two domains

Sure it does. It generates the IDs in the first place, so it knows how if they match between the sites. Not perfectly precisely (at least not for now), but with high confidence by the source of traffic and browser fingerprint.

> use hosted first-party tracking, which is less likely to support features like cookie opt-outs or GDPR compliant data handling

On the other hand it doesn't aggregate data between unrelated services. And the first party can access a lot of the same data just from standard logs.


Why cares? These are useless as third-party cookies -- you can't track users cross domains! It's a first-party cookie!

It's just a hack around analytics software needing an on-prem deployment or server-to-server integration. Whatever!


I dunno if the downvotes are deserved. I think you are correct in that the amount of data exfiltrated is limited.

But it's still exfiltrated data, and if the user was given the choice they would almost certainly choose to block such requests.

Further, it strikes me that such trackers engaging in this behavior are knowingly subverting user wishes. Which is sort of demonstrating bad faith. I believe attempting to demonstrate the bad faith of tracking companies was a motivator for the DNT flag. They slithered out of that issue by arguing that because people were setting it as default, users didn't effectively consent to not being tracked. This case seems more clear, where they are subverting anti tracking plugins.


I just don't see a functional difference between doing this and sending your server logs to Datadog or Cloudwatch for analysis. It's sorta convenient, but nothing you can't do on the server side with slightly more fuss.

There's no cross-site aggregation (much less in-browser tracking) possible in this model, unless I'm missing something major.


> These are useless as third-party cookies -- you can't track users cross domains

http://uniquemachine.org/

As long as you can run scripts, you can be matched between sites. But even before you get into completely unique identifiers, you can get lots of data from repeating ip/location - people are fairly predictable that way.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: