
WebBundles are built for content-addressable networks - adlrocha
https://adlrocha.substack.com/p/adlrocha-webbundles-are-built-for
======
jefftk
The Brave post was discussed here extensively a few days ago, with several
people (including me) pointing out how the author misunderstands what can be
done today and what bundles make easier:
[https://news.ycombinator.com/item?id=24274968](https://news.ycombinator.com/item?id=24274968)

Afterwards I wrote up a response, explaining how bundles don't facilitate
adblocker circumvention: [https://www.jefftk.com/p/webbundles-and-url-
randomization](https://www.jefftk.com/p/webbundles-and-url-randomization)

(Disclosure: I work on ads at Google)

~~~
csande17
There is a very simple way in which WebBundles will be used to bypass ad
blockers:

The tool to create WebBundles will, in all likelihood, be created and
maintained by Google. Google will program this tool to detect script tags with
a src of ads.google.com/js/adscript.js and replace them with a local copy of
the ad script embedded in the bundle. (They may do this by calling the feature
something like "embed common third party resources" and also use it for files
like jQuery and Google Fonts.) Then, adblockers will be unable to block the ad
script, because it appears with a different path in each bundle.

In other words, you're right that randomizing URLs to evade ad-blockers
requires server-side coordination. Conveniently, though, adopting WebBundles
_also_ requires server-side coordination, with the same company that stands to
benefit from ad-blocker evasion.

~~~
jefftk
I don't think what you're describing is at all likely, but let's accept the
scenario for the sake of argument. The bundle ends up including the
advertising JS, ok. This is the JS that inspects the environment on the page
and sends the ad request. Then the ad blocker can just block the ad request.

~~~
csande17
I'm curious which part you think is unlikely: that Google will make a tool to
create WebBundles (what, are they just going to sit on their hands and wait
for someone else to write one?), that that tool will include well-known third-
party resources in the bundle as a performance optimization (reduce the number
of connections needed to load your bundle! browsers no longer cache resources
across origins!), or that the Google Ads script is such a third-party
resource?

The difference between blocking an ad script by URL and blocking a request
sent by an embedded ad script is that the script can include arbitrarily
complex logic to generate the URL to request. I would not be surprised at all
if that logic starts getting more and more complex and hard to analyze --
perhaps under the guise of "reducing ad fraud" \-- and starts generating URLs
that can't easily by blocked by Manifest v3 ad-blockers without breaking non-
ad functionality.

~~~
jefftk
_> that Google will make a tool to create WebBundles_

The tool exists:
[https://github.com/WICG/webpackage/tree/master/go/bundle](https://github.com/WICG/webpackage/tree/master/go/bundle)

 _> that tool will include well-known third-party resources in the bundle as a
performance optimization_

The tool doesn't do that. You tell gen-bundle what you want included and it
builds the bundle.

You could imagine someone writing a easier to use tool that inspects your
website and packages up everything it needs to make your website run, but
including third-party resources that are cache-control:private (as Google ads
scripts are) wouldn't work! For example, many third parties use user-agent
sniffing to serve JS optimized for that particular browser.

 _> the Google Ads script is such a third-party resource?_

Imagining for the moment there was a tool like this and a way for scripts to
indicate whether they should be includable, I don't think Google Ads scripts
would be likely to opt in for the same reasons that Google Ads discourages
publishers from rehosting these scripts today.

 _> I would not be surprised at all if that logic starts getting more and more
complex and hard to analyze._

As long as third-party cookies are still a thing, ad requests are still going
to go to well known domains, because otherwise they won't get the cookie.

I see the next step in the ad blocking arms race as sites CNAMEing over to an
ad network which reverse proxies their site. The network is then in a position
to rewrite anything they want as inscrutably as they choose. This is a service
you can buy (from others) today. I really hope this doesn't become widespread:
even though the particular ad network I work for would likely do very well if
the world went this way, it would even further promote centralization in the
industry.

~~~
csande17
> [...] including third-party resources that are cache-control:private (as
> Google ads scripts are) wouldn't work! For example, many third parties use
> user-agent sniffing to serve JS optimized for that particular browser.

This is relatively easy to work around: only embed resources that are included
with SRI or match a whitelist of commonly-used third-party resources that
don't do this.

And yes, Google Ads would either need to change its stance on rehosting or
write a "wrapper" that uses the arbitrarily-complex JavaScript to generate the
path from which to download the _real_ ad script. Both of these seem like
things they could plausibly do.

> As long as third-party cookies are still a thing, ad requests are still
> going to go to well known domains, because otherwise they won't get the
> cookie.

Google owns several well-known domains, such as "google.com", that ad-blockers
can't just blanket deny access to because that would break non-advertising
functionality.

> I see the next step in the ad blocking arms race as sites CNAMEing over to
> an ad network which reverse proxies their site.

This could work too (Google's currently developing their own version of this,
which I think they call the "AMP cache"), but you're right about the drawbacks
and it makes sense for them to be trying multiple approaches to see what
sticks. After all, ad blockers are an existential threat to their primary
revenue source!

~~~
jefftk
_> only embed resources that are included with SRI or match a whitelist of
commonly-used third-party resources that don't do this_

Essentially zero SAAS third-party resources are set up to be included with
SRI, because (as you might expect) none of these third parties want to commit
to serving one version indefinitely.

 _> ad-blockers can't just blanket deny access to because that would break
non-advertising functionality_

Good point! I wasn't thinking about how that played out with Google (since
display ads today are served from a doubleclick domain). Pretty much anyone
running an ad blocker is also going to be blocking third-party cookies,
though, so routing the ad request via a proxy on the publisher domain would
work even better.

 _> Google's currently developing their own version of this, which I think
they call the "AMP cache"_

AMP doesn't contain anti-adblocking, and an anti-adblocking-via-inscrutable-js
approach would be incompatible with AMP's policy of requiring that all AMP
code, including extensions, be open source. For example, this is the code that
requests AdSense ads and renders the responses:
[https://github.com/ampproject/amphtml/blob/master/extensions...](https://github.com/ampproject/amphtml/blob/master/extensions/amp-
ad-network-adsense-impl/0.1/amp-ad-network-adsense-impl.js)

~~~
csande17
> Essentially zero SAAS third-party resources are set up to be included with
> SRI

Right, but jQuery often is, which lets you sell people on the feature by
saying it's for jQuery. The whitelist is just in case people forgot the SRI on
their jQuery, and hey, why not throw a couple other trustworthy scripts on
there while we're at it?

> AMP doesn't contain anti-adblocking, and an anti-adblocking-via-inscrutable-
> js approach would be incompatible with AMP's policy of requiring that all
> AMP code, including extensions, be open source.

Again, "Google currently has internal policies forbidding it" isn't a good
argument that Google won't do something, especially when they are working on
setting up systems that give them the ability to do that thing. Google used to
have a policy saying that ads could never go in the same column as search
results, but ad revenue turned out to be important to Google, so they changed
that policy.

(In AMP's specific case, you might not even need inscrutable JS since pages
are served from google.com and thus already include the tracking cookies. Just
swap in the personalized ads server-side.)

~~~
jefftk
_> Right, but jQuery often is, which lets you sell people on the feature by
saying it's for jQuery. The whitelist is just in case people forgot the SRI on
their jQuery_

There's a huge difference between libraries (versioned, can use SRI, open
source, don't have canonical URLs) vs services (none of those, change a lot,
JS you reference on your page is generally only the first link in a long chain
of JS). When I worked on mod_pagespeed (disclosure: at the time a Google
project) we had the "Canonicalize JavaScript Libraries" feature
([https://www.modpagespeed.com/doc/filter-canonicalize-
js](https://www.modpagespeed.com/doc/filter-canonicalize-js)) which similarly
only operated on libraries.

 _> why not throw a couple other trustworthy scripts on there while we're at
it?_

I'm still confused how you're imagining this working. The bundler inspects the
site (really hard!) and packages up the resources? If you're doing that,
"trustworthy" isn't a thing the bundler would decide: the site owner has
indicated their trust by referencing the JS in their page source (directly or
indirectly).

 _> "Google currently has internal policies forbidding it"_

While Google started the AMP project and is its largest contributor, it's now
an external project at the OpenJS foundation:
[https://openjsf.org/blog/2019/10/10/openjs-foundation-
welcom...](https://openjsf.org/blog/2019/10/10/openjs-foundation-welcomes-amp-
project-to-help-improve-user-experience-on-the-web/) Modifying AMP policies to
allow ad networks to run non-open source JS at the top level where it can't
easily be blocked would be completely politically impractical.

 _> In AMP's specific case, you might not even need inscrutable JS since pages
are served from google.com and thus already include the tracking cookies_

As sites adopt SXG this wouldn't work. From the browser's perspective a site
that is served from site A and signed for site B is not connected to site A,
and site A's cookies won't be included on the request.

------
marijn
> Brave concerns about WebBundles are legit in a location-based addressing
> Internet, _but all of them would immediately be removed the moment we switch
> from a location-based addressing to a content-based addressing approach for
> the Internet._

I failed to find any good case being made for why content-addressable content
would be any less likely to try to perform malicious actions than URL-
addressed content. Is this just utopian wishful thinking or did I miss
something?

~~~
spankalee
Brave's concerns aren't legit though. WebBundles don't change the
request/response system or origin model of the web. They really don't change
URLs or blocker abilities at all. Brave is ascribing them either powers they
don't have, or that you can already do with plain servers.

~~~
hinkley
What’s the origin for addressable content?

~~~
yencabulator
Whatever Signed HTTP Exchange it can successfully claim. The whole point of
this work is to separate origin from where you managed to download the bytes
from.

Imagine CDNs that cannot forge content for your site.

~~~
hinkley
Okay, so for single source content I can either derive or assert an origin.
Composition, not so much...

------
outsomnia
I don't really get what the advantage is for making an big atomic blob as the
resource vs independently updateable pieces as h2 streams / etags / client
cacheable.

It's just PUSH gone crazy?

~~~
aclelland
I think one of the reason larger companies like them is because they may end
ad blockers once and for all.

You can read a discussion about some of the issues that it causes here -
[https://github.com/WICG/webpackage/issues/551](https://github.com/WICG/webpackage/issues/551)
Of course, the Brave browser also have some concerns about it -
[https://brave.com/webbundles-harmful-to-content-blocking-
sec...](https://brave.com/webbundles-harmful-to-content-blocking-security-
tools-and-the-open-web/)

~~~
rektide
Widely disputed.

If this really did prevent ad blocking no one would expect it to ship. Inside
the bundle are the same resources as without a bundle.

~~~
spankalee
Same resources with the same URLs, access with the same request/response
pipeline that's just as visible to blockers. If you can change URLs in the
bundle, you can change them outside the bundle.

UAs are also completely free to request a resource from the URL rather than
the bundle.

------
thorum
WebBundles seem like they would be a useful format for distributing PWAs -
like a web standard alternative to Android's APK files.

------
dsun179
Isnt that already possible with mhtml?

[https://en.m.wikipedia.org/wiki/MHTML](https://en.m.wikipedia.org/wiki/MHTML)

~~~
kinlan
Mhtml tends to not have JavaScript run with it because there is no origin
attached to it. That's one of the benefits of web bundles, they can run with
an origin attached so they have access to the correct storage and other
sandboxing primitives.

------
cordite
The repo [1] appears silent for a few months. Is this being actively developed
inside the google org, or is this just an experiment left abandon?

[1]:
[https://github.com/google/webbundle](https://github.com/google/webbundle)

------
skybrian
I'm wondering if IPFS or other content-addressable networks handle version
updates for documents as well as git handles code?

It might be nice to make websites that are more like PDF's that can be
redistributed, downloaded, and stored. But when there are many versions of
immutable content, the result is a mess, with people having random versions
distributed all over the place. Having built-in history and being able to sync
to HEAD would make this a lot easier.

~~~
matt_kantor
I believe mutable pointers like that are outside the scope of IPFS itself,
instead within the realm of name systems like IPNS[1] and DNSLink[2]. I'm not
sure if/how those systems track history.

Unsurprisingly, some people want to use blockchains[3][4] (those definitely
have history).

[1]: [https://docs.ipfs.io/concepts/ipns](https://docs.ipfs.io/concepts/ipns)

[2]: [https://dnslink.io](https://dnslink.io)

[3]: [https://www.namecoin.org](https://www.namecoin.org)

[4]: [https://ens.domains](https://ens.domains)

------
bonfire
So basically we can give up HTTPS for origin authentication (leave it for
encryption/privacy) assuming the bundles are all signed?

------
Ericson2314
While I get the argument that things vaguely like AMP could be used with IPFS,
these WebBundles in particular catting everything together would seem to
undermine IPFS's ability to dedup. No thanks.

------
gumby
WebBundles are the CDROMs of 2020 and solve nothing for the end user. It’s all
about those ads.

~~~
spankalee
Are you claiming that everyone should stop using Webpack and Rollup too?

~~~
gumby
I know that rollup in particular is intended to collect all your JS modules,
somewhat like an .a file in your filesystem, but Webpack can easily combine
all the assets.

In both cases you’re downloading globs of stuff that could already be in your
cache, or that you don’t want to download at all (e.g. ads). I have never
approved of efforts to turn the web into a “TV remote with a ‘Buy’ button”
(which stretches back decades) but indeed, I consider such projects
pernicious.

