Very good points. We proposed a way to deal with DOM manipulation in
the paper , but Stefan omitted this in the blog post. Specifically,
Section 4 of the paper (the "Page access" paragraph) briefly describes
this. (Sorry for referring you to the paper, but our wording in the
paper is probably better than my attempt to paraphrase here.)
Of course there are other ways malicious extensions can used to leak
data---pick your favorite covert channel. But the idea was to propose
APIs (and mechanisms) that are not overtly leaky. (We are early in the
process of actually building this though.)
"To ensure that extensions cannot
leak through the page’s DOM, we argue that extensions
should instead write to a shadow-copy of the page
DOM—any content loading as a result of modifying the
shadow runs with the privilege of the extension and not
the page. This ensures that the extension’s changes to the
page are isolated from that of the page, while giving the
appearance of a single layout"
Could you elaborate more on this? Do you mean that you'll compare the network requests made from the main and the shadow pages? What if the main script is sensitive to the request receive time? Then the shadow DOM may act differently.
From more practical standpoint, having two DOMs for every page will eat even more of my laptop's RAM.
"Do you mean that you'll compare the network requests made from the main and the shadow pages?"
Essentially, yes. Requests from the extension should be treated as if
they are of an origin different from the page. (We could potentially
piggy-back on existing notions of security principals (e.g., that
Firefox has) to avoid huge performance hits.) And if the extension is
tainted the kinds of requests will be restricted according to the
taint (as in COWL , likely using CSP for the underlying
"What if the main script is sensitive to the request receive time? Then the shadow DOM may act differently."
If by main script you mean a script on the page, then there should be no real difference.
"From more practical standpoint, having two DOMs for every page will eat even more of my laptop's RAM."
I hope this won't be so bad down the line (assuming we'd be able to
leverage some underlying shadow DOM infrastructure and that performs
So in principal, we already have the context from which we can decide which network requests to allow or block (this is already used today to allow cross-origin XHR from content scripts).
However, it is super gnarly to implement your idea in practice because:
1. There has to be a connection traced from every network requests back to the JS context which ultimately caused the request (e.g., by mutating the DOM). This is doable, it's just a lot of work.
3. Even if you do 1 and 2, there are still channels such as hyperlinks. The extension could add a hyperlink and get the user to click on it. I suppose you could try and tie that back to the script context that created or modified the hyperlink.
4. Even if you do 1-3, if you can induce the page script (or any other extension, or the browser, or the user) to request a URL of your design, you win.
Sigh. Still seems fun as a research project to see how close you could get.
CSP does help, but it is document-specific, not js-context specific. At least in Chrome, tracing the js context that was responsible for some network request would be significantly more difficult to implement.
People have been building multiple applications from a single codebase for a very long time. Put all the targets on a continuous build, add tests, and you're set.
The only sense in which versioning matters at all is that if the protocol the applications talk to each other over has to be versioned, and an application must understand the oldest version of the protocol they could potentially be spoken to over.
But things like Thrift and Protocol Buffers have this feature built in. You shouldn't need to version the actual code...
If you have multiple components that are depending on the same code, if you try to split them into multiple services that are being independently deployed, you need to keep the code they are depending on in sync.
So making a change to the code shared by multiple services, means you have to deploy multiple services for a single change.
I completely agree with you about the protobuf/thrift angle though. If you're in that situation, you are already doing it right.
It would be possible to generate basic API documentation from the IDL the browsers actually use internally. It would not necessarily be completely accurate because sometimes browsers don't use IDL for their features, but it would be another tool, and would have helped in this case.
You wouldn't need to even be a member of the browser team to do this, since the relevant data is all open source.
In Chrome, extensions are sandboxed using similar technique as web pages are.
However, just because they are sandboxed doesn't mean they are safe. It just means they won't be able to get privileges beyond those explicitly granted to them.
Many extensions request very powerful privileges though, like the ability to read and write to web pages you visit. The browser will dutifully grant this privilege if the extension requests it and you allow it. It won't get the ability to run arbitrary native code though (unless it also requests that privilege).
That isn't declarative in the same way... that's markup. The kind of declarative that is used in React is more like functional programs. You write programs, using actual code, that compute ("declare") exactly what the UI should look like from top to bottom, given each possible input.
This is in contrast to a model where you mutate an existing UI model each time something interesting happens.
Declarative is a vacuous word that means anything you want it to depending on context. We used to call functional programming "functional" and logic programming languages like Prolog "declarative." Then declarative started meaning markup, then declarative started meaning...immutable functional code? In PL, we mostly just avoid the word altogether these days since everyone has a different idea about what it means.
React is closer to an immediate-mode UI model: you write programs that compute exactly what the UI should look like on each frame, rather than mutating a scene graph each time something interesting happens (as occurs in retained-mode UI models). Substitute DOM for scene graph, and the distinction might hold.
I think that the reason people started calling React 'declarative' was because 'functional' was interpreted to mean (purely) functional, with no side effects.
But yes, exactly. React is similar to an immediate mode graphics API. Except that also has weird connotations, because people think of things like canvas that are very low-level: all you get are lines, arcs, and fills. React's primitives are at the same level of abstraction as the DOM, you just work with them in immediate mode, not retained mode.
If I understand correctly, the DOM is retained, and React brings abstracts it efficiently back to a immediate mode API with some state retention, which has benefits since things stay consistent automatically.
In contrast, a UI model like WPF uses (declarative) data binding to achieve something similar, but without as much flexibility and with more verbosity.
I'm working on a system that allows for state retention in an immediate mode model, though wrapping WPF rather than HTML: