Have you considered raising a spec issue?
I don't see any good way of enabling a target origin to opt into allowing source origins to share caches with it, that wouldn't also reintroduce the privacy leaks. (As, after all, even if the only things malicious-site-X can see in your cache are ad-tech providers' origins that opted into allowing anyone to interface with them, that's likely still enough to fingerprint you.)
Just so I understand it correctly:
* The iframe loads resources, e.g. /static/bundle.js and /public/index.css (does this include user-defined resources?).
* But due to the iframe being embedded on sub**.framercanvas.com, the cache key includes the subdomain. So all resources are fetched again for all projects?
Is there a large enough audience visiting multiples sites that would make the effort worth it?
Your website runs terribly on firefox. Multiple hundred-of-millisecod periods where the viewport went blank.
<link rel=prefetch href=url>
As mentioned in similar comments, the observed behavior for this particular test case is potentially a problem if you are building Modern Web Apps by following the received wisdom of how you're supposed to do that. There are lots of unstated assumptions in the article in this vein. One such assumption is that you're going to do things that way. Another assumption is that the arguments for doing things that way and the plight of the tech professionals doing the doing are universally recognized and accepted.
From the Web-theoretic perspective—that is, following the original use cases that the Web was created to address—if that resource is so important to your organization, then you can mint your own identifier for it under your own authority.
Ultimately, I don't have a lot of sympathy for the plight described in the article. It's fair to say that the instances where this sort of thing shows up involve abusing the fundamental mechanism of the Web to do things that are, although widely accepted by contemporaries as standard practice, totally counter to its spirit.
The issue is that you're immediately deciding that domain X and domain Y are different entities.
In practice, I find that there are a HUGE number of use cases where two domains are actually the same organization, or two organizations that are collaborating.
There is basically no way to say to the browser - I am "X" and my friends are "Y" and "Z", they should have permissions to do things that I don't allow "A", "B", and "C" to do.
We actually have a functioning standard for this on mobile (both iOS and Android support /.well-known paths for manifests that allow apps to couple more tightly with sites that list them as "well-known" (aka - friends)
The browser support for this kind of thing is basically non-existent, though, and it's maddening. SameSite would have been a PERFECT use case. We're already doing shit like preflight requests, why not just fetch a standard manifest and treat whitelisted sites as "Samesite=none" and everything else as "Samesite=lax"? Instead orgs were forced into a binary choice of none (forfeiting the new csrf protections) or lax (forfeiting cross site sharing).
Isn't that CORS? That sounds like CORS.
CORS is a stronger security layer in the browse than cache segregation, some people want to keep the CORS security model but weaken cache segregation.
That's... literally what domain means.
"A domain name is an identification string that defines a realm of administrative autonomy, authority or control within the Internet." - Wikipedia
The entire security policy of the Internet is built on this definition. It's not an assumption. It's a core mechanism.
Entities are allowed to control many assets.
Simplest possible case in the wild for you, since you're being obtuse.
I'm company "A". I just bought company "B". Now fucking what?
I wouldn't expect any less.
> In practice [...]
There's your problem. Try fixing that.
Having a domain is identical to having a driver's license: This org says I am "X".
It is fundamentally different from uniquely identifying me.
I am still the same person if I give you my library card - a different ID from a different org that says I am "Y".
> Having a domain is identical to having a driver's license: This org says I am "X".
Nope. You just described two different documents.
It's very odd the position you're taking here, given the sentiment in your other tirade about being digital sharecroppers to Facebook and Google <https://news.ycombinator.com/item?id=27369652>. Your proposed solution is dumping more fuel in their engines—which is why it's the kind of solution they prefer themselves—and completely at odds with e.g. Solid and other attempts to do things thing would actually empower and protect individual users. I'm interacting with your digital homestead, why are you so adamant about leaking my activity to another domain?
Speaking as engineer.. the Firefox folks don't really get it. You can't just break what sites like StackOverflow and Wikipedia have been doing for years (and in some cases decades) and then say "you were doing the wrong thing." Some version of FPS will ship in browsers, probably in the next 2 years.
Quoting Apple's position directly "[...] Given these issues, I don’t think we’d implement the proposal in its current state. That said, we’re very interested in this area, and indeed, John Wilander [Safari lead] proposed a form of this idea before Mike West’s [Google] later re-proposal. If these issues were addressed in a satisfactory way, I think we’d be very interested. [...]"
Also it was a W3C TAG review. The W3C and IETF are different organizations.
No, they allowed an origin to list other origins whose cookies would be sent back to the serving origin correctly even if they were iframes loaded in the parent origin DOM.
I.e. this is the expected behavior for iframes until Safari decided that there was such a thing as "third party" origins whose web semantics could be broken in their war against advertising.
Google is trying to (partially) restore the expected behavior of iframes so that named origins get their own cookies sent to them, which is how things worked for the first two decades of the web.
Because I can't comment on an RFC I haven't seen, and a quick google search of my own based on your comment turns up nada.
That said - I'm fully aware of the downsides of this approach, but I want my browser to be (to put it crudely) MY FUCKING USER AGENT. I want to be able to allow sharing by default in most cases, and I want a little dropdown menu that shows me the domains a site has listed as friendly/same-entity, and I want a checkbox I can uncheck for each of them.
Then I want an extension API to allow someone else to do the unchecking for me, based on whether the domain is highly correlated with tracking (Google analytics, Segment, Heap, Braze, etc)
The way I see it, the road to hell is paved with good intentions. If the web was developed in our current climate of security/privacy focus, how likely is it that even a fucking <a href=[3rd party]> would be allowed? Because I see us driving to a spot where this verboten. Which also happens to be the final nail in the coffin for any sort of real open platform.
Welcome to the world where the web is literally subdomains of facebook/google. What a fucking trash place to be.
You are attributing a lot of intention into a mechanism. You don't know if it's a 3rd party tracker or the news link in a discussion page.
The proposal at the article is actually quite good, since I should always know very well if it will load into a frame or a link.
Interestingly, I think this remark is a signal that you've read something out of my comment that's just not there (and thus attributing a lot more intention to me than you should).
> The resource is not that important to me.
These are a class of resources that are important enough that folks would pause what they are doing to try and deliberately mark it with a magic incantation that they expect will cause the user agent to do something, notice that it doesn't do the thing that they want it to do, and then go and either write a blog post to complain about it, or throw their support behind someone else's complaints it. The argument that you don't find it particularly important is pretty much self-defeating.
A.test prefetches b.test/visited_a.js, b.test/unique_id.js, and log(n) URLs that bisect unique_id.js so that you can search the cache for the unique id.
Have to be careful to balance performance and “this is useful to me” with abuse prevention at scale. It’s also important to realize we have to tread carefully with browser features that seem useful as the graveyard of deprecated features that didn’t survive privacy attacks is quite large.
B obviously knows what resources A prefetched because they were requested from B in the first place. And if A wants to pass information to B, they don't need to do a complex prefetch dance, they can just load an img src.
So I don't see any way for A or B to learn anything about the user's behavior on one another's site without the other site's cooperation?
But I don't see that problem. On this case the a.test domain can not see what is on the cache, only b.test sees it. (At least by what I understood.)
While Google also states that requests do not contain cookies, Google Chrome will automatically send a high-entropy , persistent identifier on all requests to Google properties, and this cannot be disabled (X-client-data) . Google can use this X-client-data, combined with the useragent's IP address, to uniquely identify each Chrome user, without cookies.
So, perhaps the privacy statement is more of a sneakily worded non-denial?
: A sample: `X-client-data: CIS2yQEIprbJAZjBtskBCKmdygEI8J/KAQjLrsoBCL2wygEI97TKAQiVtcoBCO21ygEYq6TKARjWscoB` - looks very high entropy to me!
X-Client-Data indicates which experiment variations are active in Chrome:
Additionally, a subset of low entropy variations are included in network requests sent to Google. The combined state of these variations is non-identifying, since it is based on a 13-bit low entropy value (see above). These are transmitted using the "X-Client-Data" HTTP header, which contains a list of active variations. On Android, this header may include a limited set of external server-side experiments, which may affect the Chrome installation. This header is used to evaluate the effect on Google servers - for example, a networking change may affect YouTube video load speed or an Omnibox ranking update may result in more helpful Google Search results. -- https://www.google.com/chrome/privacy/whitepaper.html#variat...
Google doesn't use fingerprinting for ad targeting, through like with IP, UA, etc it receives the information it would need if it were going to. I don't see a way Google could demonstrate this publicly, though, except an audit (which would show that X-Client-Data is only used for the evaluation of Chrome variations.)
(Disclosure: I work on ads at Google, speaking only for myself)
Doesn't mean that won't change in the future though. But log retention is only a matter of days, so they can't retrospectively change what they do to invade your privacy.
I hope Google doesn’t do this, but I would not be entirely surprised if they did.
Well, obviously I can't say for sure they don't have any. I didn't look it up, and if I had I wouldn't be able to tell you. But since I didn't, I can tell you that the concept seems completely infeasible. There's too much traffic, and nowhere to put them.
Besides that, not everything is legal to log. The frontends don't know what they're seeing, though; they're generic reverse proxies. So...
If there’s one company in the world for whom bandwidth and storage are not an issue, it’s Google.
But as it stands I don't want to trust Google, Facebook etc. more than absolutely necessary. They have lost every right to that a long time ago and are incentivized by their business model to not change anything.
Apologies that I'm not a front-end person so this may be naive, but it would be great to hear your thoughts!
With HTML resources, the goal of prefetch is typically not to get a head start on loading enormous amounts of data, but instead to knock a link off of the critical path. The HTML typically references many different resources (JS, CSS, images, etc) and, if the HTML was successfully prefetched, when the browser starts trying to load the page for real it then can kick off the requests for those resources immediately.
A couple of related techniques are also useless: domain sharding and cookieless domains. HTTP/2 multiplexing and header compression made them obsolete, and now they're just an overhead for DNS+TLS, and often break HTTP/2 prioritization.
You should be careful with prefetch too. Thanks to preload scanners and HTTP/2 prioritization there are few situations where it is really beneficial. But there are many ways to screw it up and cause unnecessary or double downloads.
I'd argue there's few 'shared' dependencies on websites nowadays.
Besides it being useless in terms of cache, it also incurs other overhead. Another DNS request, another TCP handshake, another TLS connection. With HTTP 1.1, this might still make sense because you don't get resource pipelining, but with HTTPv2, the extra overhead is simply extra overhead. With HTTPv3, it becomes even less useful to have the domains sharded. Generally speaking, the best use of resource usage with the modern web is to serve everything from the same domain.
The only solution is not to allow cross-origin document preloads. Which is lame because the impact on user experience is reasonably substantial.
With preload you can do this in the background very efficiently.
That seems fine to me. Implement it and if the users don't want it then it doesn't occur. You should still code as if it works.
> Apparently prefetched resources aren’t filtered by extensions
This sounds like a browser bug. It should probably be raised against the browsers.
> as I’d rather avoid the trackers.
Again, this is just a result of the browser bug. I see no reason to throw away a nice declarative prefetch simply because browsers forgot to allow filtering.
Please correct me if I'm misinterpreting this statement. Are you saying it is acceptable if the code breaks if prefetch fails?
Isn't this just a performance optimization?
I take the original statement to mean that worse case scenario is extra time to load.
A primary issue, as I see it, is caching of third-party assets (as dmkil posted elsewhere, think jQuery, Google Fonts, Facebook Pixel, etc).
Could this not be solved using the Cache-Control header, or maybe some HTML attribute variation of this? Maybe something like:
<!-- Use a site-specific cache for its stylesheet, default behavior -->
<link rel=spreadsheet href=index.css cache-key="private">
<!-- Use a global cache for jQuery -->
The cache-key idea would only work if the user themselves could specify it for every resource.
When I found out I wrote a blog post about the HTTP Cache partitioning and hosting jQuery, or any library, from a CDN.
The problem described in the blog post is that prefetch loads the resource into cache, which when combined with per-site cache segmentation means that it's ambiguous which cache a resource should be loaded into when it's prefetched across sites.
How does one detect when everything is loaded? I've seen some websites break when UI interaction occurs before all js are loaded.