Hacker News new | past | comments | ask | show | jobs | submit login
DOMPurify, Security in the DOM, and Why We Really Need Both [pdf] (usenix.org)
72 points by jorangreef 20 days ago | hide | past | web | favorite | 37 comments

Trusted Types are the solution for this: https://developers.google.com/web/updates/2019/02/trusted-ty...

With Trusted Types on, unsafe strings are disallowed directly at the unsafe sink level, ie innerHTML doesn't accept strings anymore, but instances of TrustedHTML. TrustedHTML can only be created by a Trusted Types Policy, and by isolating policies from user-generated and other untrusted content you guarantee that you can't have XSS holes.

* Note for the curious: This is how we're locking down lit-html so that it's completely safe from XSS. We have a simple policy that's only accessible to the template strings processor, so that the only strings trusted in an application are the template literals written by developers. All other strings will not be allowed at unsafe sinks. We don't even trust the other internals of lit-html. See https://github.com/Polymer/lit-html/blob/ceed9edc0aecdf82588...

Trusted Types are good for most cases, but the case from the PDF is one where you're given a blob of untrusted HTML that you do still want to render (an HTML formatted email).

Trusted Types will prevent a dependency or careless developer from setting innerHTML without going through a policy you've evaluated and decided to trust, but it doesn't have an HTML sanitizer, so for those cases a library like DOMPurify is still necessary.

I'm skeptical about trusted types. It seems very heavy and needs 100% buy-in to work.

Reminds me how CSP was seen as the glorious savior. But so far it only helps the big sites (Google, Facebook Twitter etc)

I am always irrationally(?) scared of using these sanitizers despite their successful history. As soon as new html/js/css syntax/features are introduced, won't your security model need to be reevaluated? Which seems like a lost cause at the rate new capabilities are introduced to the web. E.g., when CSS Shaders lands, you might be able to execute arbitrary gpu code with just css (hypothetically speaking, I don't actually know how it will work. I am sure it'll be sandboxed pretty well. But the problem remains that there are too many new possibilities to keep up with!).

DOMPurify (as a client-side sanitizer) uses a whitelist. There's also CSP for defense-in-depth.

I would be more concerned of using server-side sanitizers due to the impedance mismatch between client/server HTML parsing algorithms.

Security models are constantly being re-evaluated as new threats and attack vectors emerge.

What you said can be generically applied to every security control and which is why security is hard.

Isn't that like saying there's no point in using an anti virus as viruses are always evolving?

You're still catching entire classes of existing issues..

> Isn't that like saying there's no point in using an anti virus as viruses are always evolving?

You're very close to understanding something.

(Though in defense of DOM purifiers they can use a whitelist)

You mean, you are catching exploits for vulnerabilities that don't exist anymore, and you pay for that with a gigantic attack surface that can be used to compromise you? Yeah, that sounds about right.

Bad example. Anti virus software is a scam. Just adds another attack vector when the anti virus software has a bug in their file parsing & makes it that you can be impacted by just downloading a malicious file

Windows Defender is sufficient & bundled with Windows

I mean I never said anything about buying one you just assumed that. I also just use windows defender of which part of that is an anti virus..

Make it a whitelist. :)

It wouldn't help if new features extend the capabilities of existing stuff (which is done all the time). For example the CSS Shader example from before adds new syntax to the existing 'filter' css style, which you might've already whitelisted because it is safe today.

I guess a nested, parameter-granularity whitelist would work in that case :)

You can do that with DOMPurify using hooks.

In modern browsers that support the Shadow DOM[1] standard, this is a somewhat solved problem with one caveat: it wasn't built for this use case.

Architecturally, however, it does the job but the challenge is integration with dated browsers. Polyfills for Shadow DOM inherently break the security features it provides.

Better cross-browser Shadow DOM support would be a step in the right direction to making things like DOMPurify safer, but unfortunately it seems like we're a while away from that according to Can I Use[2].

[1]: https://developer.mozilla.org/en-US/docs/Web/Web_Components/...

[2]: https://caniuse.com/#feat=shadowdomv1

I don't think ShadowDOM can be used for security purposes... https://blog.revillweb.com/open-vs-closed-shadow-dom-9f3d742... makes it seem trivial to access closed shadow roots via side channels like prototype manipulation

Wait, how exactly does iframe sandbox not solve everything? Emails definitely should be shown in them, even with client side decryption, you can create an iframe from a data: URI. iframe sandbox is the strongest sandbox possible. Unique origin, no JS execution…

I used to think the same, except iframe sandboxes:

1. Don't resize dynamically to fit the email content, not unless you enable unique origin JS execution and do message passing to the parent window. But if you do that then you open the door to crypto-mining, tracking, spectre variants, and browser zero-days.

2. Don't play well with keyboard shortcuts since they steal keyboard events from the parent window when focused. Proxying keyboard events to the parent is even more dangerous since an attacker could then spoof keyboard events to control the parent.

3. Don't let you whitelist allowed HTML tags, attributes and CSS properties, which means there's no way to block email tracking.

And that's just for viewing email content. How would you sanitize and whitelist unsafe email content when replying/forwarding?

DOMPurify combined with CSP is safer and stricter. And if you wanted to, there's nothing to prevent you from putting the result in a sandboxed iframe once sanitized anyway. But it needs to be sanitized.

What you need is something like https://bugzilla.mozilla.org/show_bug.cgi?id=80713 that lets you make the <iframe> act more like an autosizing <div> than a fixed-size frame. https://github.com/w3c/csswg-drafts/issues/1771 is a suggestion for adding this sizing into CSS, but there's concern about the ability to leak information through the size of the container.

What does this have to do with E2E? I don't see how filtering HTML is harder to do - even if somehow server-side algorithms are better (which this presentation seems to imply), cannot the same algorithm be used client-side?

In a way, the situation is better client-side, because when running code on the client's side, you can check how exactly the browser parses the HTML code.

It's in page 18. If you have end-to-end encryption you can't sanitize in the client.

I mean, you're really just summarizing the presentation. It should be an API that's in the browser. It isn't. So people need to use a library. That's OK. But not great.

"If you have end-to-end encryption you can't sanitize in the client."

I think you meant to type that you can't sanitize in the "server"? Because with end-to-end encryption the server has no access to the plaintext to be sanitized. Only the client can sanitize, only the client has the plaintext.

oops yes.

"even if somehow server-side algorithms are better (which this presentation seems to imply)"

The slides provide several reasons why server-side algorithms are worse.

"the situation is better client-side, because when running code on the client's side, you can check how exactly the browser parses the HTML code."

Yes, and for this reason, DOMPurify is a client-side sanitizer.

This looks like a good idea, but what happens when the user disables this by inspecting the code (or something)?

DOMPurify is a great library, BTW. Super small, super safe by default, no security holes found in months/years

They just did a security update last week.

I am hesitant of articles that bash the DOM for only stylistic concerns.

I don't see any evidence of alleged DOM-bashing in Mario's slides?

In fact, rather than bash the DOM, Mario wants the DOM to subsume his own DOMPurify project, rather than have users trust him as a third-party module developer. I think that paints the DOM in a favorable light if you ask me.

It's slide 27.

That's not referring to stylistic concerns, and it's not bashing the DOM per se.

The context of "The DOM is a mess!" on slide 27 is specifically in terms of security, namely "DOM Clobbering" where an attacker can rewrite DOM methods from underneath you, and impedance mismatch owing to parser differences and bugs ("HTML elements implemented in completely different ways, different attribute handling" in the context of defending against XSS).

It's an honest assessment that's more a statement of fact than anything intended to be hurtful. It's not even a harsh statement of truth at that. I find it hard to believe that Chrome or Firefox engineers would find that offensive. I think they would well agree.

DOMPurify is really fantastic security work. It would make for a brilliant contribution to the DOM.

> namely "DOM Clobbering" where an attacker can rewrite DOM methods from underneath you

I don't see that as a valid security concern in this case. Yes, it will break your code or do unintended things. In order for this to happen an attacker must have access to the page in your user's security context, which means some other preventable security violation has already transpired. This applies equally with any application/language. Even if you could freeze the DOM such that nothing can be assigned to object properties then you might be able to ward off DOM clobbering, but there is still a malicious user in your security context reading all your secure and private details. If you prevent the malicious agent from access this security concern with the DOM is eliminated.

In other words whether or not DOM clobbering occurs a prerequisite security violation is necessary and hardening the DOM won't provide the necessary solution.

Aside from malicious third parties intentionally writing over event handler assignments DOM clobbering really comes down to poor code management, which is the real security problem here. That makes this a stylistic concern. Additional layers of concerns isn't going to make people instantly less lazy. There are better ways to solve for this.

> HTML elements implemented in completely different ways

HTML is not the DOM. These are separate and unrelated technologies that are maintained in very different specifications. This separation is not an accident. It is by design. I know this is a contentious point, about HTML and the DOM being far separated.

"I don't see [DOM Clobbering] as a valid security concern in this case"

It is for sure a valid security concern when doing client-side XSS filtering, which is what the presentation is about. And no, DOM Clobbering does not require an attacker to "have access to the page in your user's security context". Fastmail have an introduction here: https://fastmail.blog/2015/12/20/sanitising-html-the-dom-clo.... Simply put, there's no way to do safe client-side XSS filtering without addressing DOM Clobbering as a valid security concern.

"hardening the DOM won't provide the necessary solution."

And the author is not suggesting or waiting for that. On the contrary, the premise is that XSS sanitizers need to be client-side exactly because the DOM is not hardened and has so many different implementations (even across browser versions). It's counter-intuitive I know, but server-side XSS sanitizers really can't address cross-browser parser differences safely. So again, it's not a question of "stylistic concerns" or "code management" but of doing secure XSS filtering wherever it is best done.

"There are better ways to solve for this."

And if you go on to the next slide, 28, the point is that despite the difficulties, this has been solved in DOMPurify, which should be added to the DOM so that developers can finally have a first-class client-side XSS sanitizer, without having to trust DOMPurify as third-party code.

There are not many people who know more about client-side XSS filtering than Mario Heiderich. And I know of no better client-side solution than DOMPurify.

How does the security aspect of DOM clobbering occur without injecting malicious code into a page?

Again, see Fastmail's introduction: https://fastmail.blog/2015/12/20/sanitising-html-the-dom-clo...

No code injection is required. DOM Clobbering simply presents an ambiguous view of the content being sanitized.

The article makes some false assumptions. My first job out of college was writing HTML in Email. HTML embedded in email presented in webmail was the toughest. That is learning CSS through the school of hard knocks, particularly when IE7 was released with a different box model.

Again, the problem here is injection, specifically HTTP injection. Email doesn't have an injection problem because it has a more robust protocol: RFC 2821, 2822 and their descendants. To make emails pretty somebody had the really bad idea of embedding HTML in email messaging. HTML is reliant upon the simplified architecture of the HTTP protocol. When you want that pretty content in email you make an HTTP request and some server issues a response.

If they simply took the HTML out of email this security problem would be instantly solved for email. Therefore this isn't an email problem. It isn't even an HTML problem. Its a problem of unregulated HTTP requests.

> HTMl is essentially just a serialisation format for the Document Object Model (DOM)

They are separate things.

I can speak to all of this with confidence. I passed the Security +, CASP, and CISSP exams on the first try just from reading a book. I did security for the military for 10 years, have been developing web technologies for 20 years, and have been writing JavaScript/TypeScript for more than a decade.

The real problem is that lazy developers are punishing their users under pressure from business marketing leaders. There are two simple solutions to this problem:

1. Don't do stupid things that punish your users.

2. Create a web standard ACL that limits all HTTP traffic to/from a browser.

These are both sane and simple solutions. Nobody wants them because bad developers don't want to own the liability for implementing somebody (probably a marketing executive) else's bad decisions. Also, because an ACL standard in the browser would kill the web media business.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact