
Trusted Types Help Prevent Cross-Site Scripting - spankalee
https://developers.google.com/web/updates/2019/02/trusted-types
======
comex
I’m a little skeptical. The example in the post, where you validate a URL
against a regex before string-interpolating it into an HTML fragment, is
essentially an anti-pattern. It’s too easy to screw up the regex and end up
with an injection vector, especially in cases (probably the majority of them)
where the thing being validated is less inherently constrained than
“alphanumeric”. Instead, it’s best to use APIs that are safe by construction -
in this case, using DOM APIs to create elements, without ever going through
the intermediate string representation of HTML. See also, “Never sanitize your
inputs!”:

[http://blog.hackensplat.com/2013/09/never-sanitize-your-
inpu...](http://blog.hackensplat.com/2013/09/never-sanitize-your-
inputs.html?m=1)

But if you use DOM APIs for everything, this “Trusted Types” API seems largely
unnecessary. It would be enough to expose a switch to disable innerHTML and
similar APIs entirely.

On the other hand, DOM APIs are rather unergonomic to use raw. Many wrappers
exist, but React‘s JSX takes the cake by letting you write code that _looks_
like string interpolation, yet compiles down to type-safe node creation. (Kind
of like what parameterized queries do for SQL.) If we’re looking at browser-
based approaches to solving XSS… how about standardizing something like JSX as
a built-in browser feature? That way it could be used by everyone, even those
who want to minimize dependencies or code directly for the browser.

(Yes, I know it’s been tried before, in the form of E4X. But that was a very
different era…)

~~~
rictic
Trusted Types are definitely designed with safe-by-construction APIs in mind.
A primary use case is to write a safe-by-construction library that internally
declares and uses a Trusted Types Policy that your CSP headers declare that
they trust.

If none of the policies in your CSP headers declare a createHTML method, then
you can be confident that innerHTML can't be used anywhere in your app.

You still want the other policy methods because there are other unsafe sinks
in the DOM. For example:

    
    
        const scriptElem = document.createElement('script');
        scriptElem.src = someUntrustedInput;
        document.body.appendChild(scriptElem); // arbitrary code execution!
    

There's a number of these sinks, and they have legit important use cases, but
you want to be able to sustainably review all such uses. For example, you
could make a Trusted Types policy that will only accept a small number of
constants for script urls. That way you can still create script elements to
e.g. implement lazy loading of code, but you're certain that those APIs will
not be used by an attacker to load unknown code.

------
jrockway
I am surprised more programming language research hasn't focused on problems
like this. Perl had taint mode back in the day (presumably it still exists),
but it didn't quite do enough to really be helpful. I am glad to see this idea
resurfacing because I think it can solve a lot of problems, not just security-
related.

A long time ago, I remember people having an insane amount of trouble with
character handling; when you read binary data from a TCP socket or UNIX file,
you're reading bytes, not characters. But many people would treat the bytes
like characters, causing all sorts of trouble. My favorite was the double-
encoding, where you read UTF-8 encoded characters as bytes, treat the bytes as
Latin-1, then encode the Latin-1 characters as UTF-8. This was a perl quirk
because Latin-1 was the default, but the same bug happens in other languages.
Anyway, a good tainting system could prevent this sort of bug. The language
can say "hey, this is a TCP socket, you can't treat those bytes as
characters!" But it doesn't. And the bug occurs again and again.

(The corner cases that people don't think about are the real problems. What
charset are those bytes in your URL? What about filenames? The answer is: it's
often undefined. So rather than hope for the best, a compiler error would be
ideal.)

The state of the art, as far as I can tell, is to just treat everything as
UTF-8 these days. Since everyone seems to love UTF-8, it just works. Maybe
that was the real solution. But I know there are a lot of Japanese-speakers
with names that can't be encoded as UTF-8. I wonder what they're doing about
that.

~~~
Sohcahtoa82
I feel like Python 3 has largely resolved the bytes vs characters problem.
Byte arrays and strings are different classes in Py3. File and socket I/O
deals only in byte arrays which are treated more like lists than strings. To
perform a lot of string-like operations, you have to explicitly decode the
byte array into a string. If you get it wrong, you'll likely get either a
TypeError.

~~~
jrockway
This is good, but I'd prefer the error to occur at compile time. Consider the
case where you do something like creating an error message because you can't
open a file, and want to include the filename. The filename is bytes, the
error message is a string, so instead of being able to print the error
message, you instead throw an undebuggable exception.

Dunno if that in particular is a problem in Python or not... but it is the
sort of thing to watch out for.

~~~
Sohcahtoa82
In Python, filenames can be either bytes or string.

Also, I made a mistake in my previous post. File I/O can deal with bytes OR
strings depending on the parameters sent to the `open()` function.

    
    
      file_handle = open('somefile.txt')
    

This opens the file in text mode (so `file_handle.read()` returns strings)
using a platform-dependent encoding. On Windows, this will probably be
CP-1252. On Linux, UTF-8. Of course, you can always explicitly choose the
encoding:

    
    
      file_handle = open('somefile.txt', encoding='utf8')
    

If you know you want to only deal with bytes, you can explicitly set that:

    
    
      file_handle = open('somefile.ext', mode='rb')
    

I agree that it'd be nice to be able to see bytes vs strings errors at compile
time, but this is difficult or even impossible with a duck-typed interpreted
language like Python. IDEs and linters can make a good attempt at it, but
they're not perfect.

------
rubbingalcohol
How about just don't use innerHTML? It's slow, insecure and lazy. We don't
need Google pushing more proprietary fake web standards in their IE6 browser.
They should fix the existing issues in CSP (like WebAssembly being completely
broken) before adding new crap to it.

~~~
arkadiyt
There's a lot more XSS sinks than innerHTML, and "just don't do it" isn't
helpful security engineering. Mike Samuel published a great post with lots of
context on the design of Trusted Types at Google and why they think it works
well:

[https://github.com/w3c/webappsec-trusted-
types/wiki/design-h...](https://github.com/w3c/webappsec-trusted-
types/wiki/design-history)

~~~
rubbingalcohol
It's a lot of handwaving. So Google improperly saved HTML into database string
fields and now needs to figure out how to safely render it in a template. We
don't need a new web "standard" to help them wallpaper over their first-party
bugs.

For the longest time Firefox add-on developers were prohibited from submitting
extensions with eval or innerHTML precisely because it is ~not safe~! Adding a
bunch of browser-enforced regex checks to your strings is the wrong solution
here. The solution is to not write code that writes itself.

~~~
jamesgeck0
> So Google improperly saved HTML into database string fields and now needs to
> figure out how to safely render it in a template.

FWIW, "store anything, sanitize at render time" is the preferred approach for
some popular web frameworks, including Rails.

~~~
rubbingalcohol
And that's fine. To the extent that stored data represents a security risk,
sanitization can and should be done on the server side.

------
zer0faith
Am I reading this correctly.. creating a regexp introduced into a template,
then applying that template to a value in the web request?

~~~
stevekemp
Pretty much. Interestingly it is a very similar approach to Perl's "taint
mode". The intention is obviously that you can't blindly use/output values
that are user-provided. Instead you must validate, and convert them to a
"trusted type", at which point you can use them in your DOM tree, or wherever
you wish.

The big difference is that if you forget a place here you'll get a type error
- rather than the current situation where if you forget to validate/sanitize
you get an XSS attack.

