

Solve XSS by signing SCRIPT tags - jgrahamc
http://blog.jgc.org/2009/09/solving-xss-problem-by-signing-tags.html

======
tptacek
What about attribute-based XSS? How do you sign "onmouseover"?

What about dynamically-generated scripts that include user input? People do
this all the time in the real world. Even if you sign it, you're still screwed
if you get quoting wrong.

The crypto is also superfluous; you'd get the same protection by having the
server set a long random nonce in a header, and then require every <script>
element to bear the same nonce. No crypto required.

~~~
jgrahamc
Clearly you ban onmouseover etc. inline. These have to be set inside a
<script>. Agree about the user input part.

How do you securely tell the browser what the nonce is to check?

~~~
tptacek
It doesn't matter how secure the nonce is. All that matters is that an
attacker can't predict what it'll be on each page render (just like a CSRF
token), which means they can't craft an input that'll render as Javascript ---
and, in particular, they can't ever expect to store such an input and have it
render as JS.

I'm not endorsing "script tokens" either --- I think it's a bad idea to change
all browsers for a half-measure --- but I think you can execute your core idea
far more simply than with crypto.

------
geal
XSS isn't only about SCRIPT tags. What about javascript URIs, IMG and IFRAME
tags, flash elements that you can add in a page by XSS? Filtering what is
shown to the user is still the best solution IMHO.

~~~
joe_the_user
Could explain what you mean by this?

Do you mean filtering what is sent to the browser - validating libraries so
user input cannot generate URIs, IMGs, etc?

~~~
mike-cardwell
This is a hard problem. There are all sorts of places you can hide JS. My
favourite is inside a data uri in an iframe. Eg:

<iframe
src="data:text/html;base64,PHNjcmlwdD5hbGVydCgnRk9PQkFSJyk8L3NjcmlwdD4="></iframe>

------
chime
Thanks John for posting this as a main thread of its own. I noticed that your
comment was lost in a flood of other comments in the YT thread. I don't know
if you saw my comment but I mentioned your solution, while technically very
solid, requires a lot more work for everyone.

Why wouldn't this work: Have a meta-tag (or something in the <head>) that
says: do not allow in-line script tags on this HTML page. Only run scripts
from external .js files. No onclick/onmouse code allowed in HTML. For
additional security (I don't know if this is possible or not), external .js
files cannot do document.write("<script>...") or
$("#foo").innerHTML("<script>...") i.e. if external .js file makes an Ajax
call and document.write() the response, the response should be handled as non-
executable. Moreover, only external .js files can attach events and that too,
using attachEvent etc. and not via the on __* attributes.

------
billpg
1\. Browsers can only run JS inside application/javascript files.

2\. Browsers will ignore any 'on' attributes.

3\. HTML may link to JS with one or more LINK tags.

4\. JS may assign 'on' attribute event handlers to individual HTML tags.

5\. JS will have access to only the cookies of domain hosting the JS file.

There's still a risk of bad LINK tags creeping in, but the code will only be
able to see the cookies of the domain hosting the dodgy JS.

------
keefe
So, I've been using this guide as the gold standard :
[http://www.owasp.org/index.php/XSS_%28Cross_Site_Scripting%2...](http://www.owasp.org/index.php/XSS_%28Cross_Site_Scripting%29_Prevention_Cheat_Sheet)

what it seems to boil down to is : sanitize user input.

In what circumstances is this not sufficient?

~~~
Groxx
When your sanitizer fails against a previously-untested kind of input, such as
the recent problems on YouTube. As audiodude on reddit[1] points out, all you
needed was "<script><script>PAYLOAD" to break their sanitizer. I'm honestly
surprised this wasn't discovered sooner, odds are someone just got lucky.

But yes. Sanitizing user input (assuming a perfect sanitizer) _can_ assure
perfect XSS protection, but at a rather severe loss of features (no images /
URLs / formatting in your comments). A separate, limited DSL is often the
solution for this (BBCode, Markdown, etc), but those sometimes make mistakes
too, especially when they're younger.

[1]:
[http://www.reddit.com/r/programming/comments/cluc5/html_inje...](http://www.reddit.com/r/programming/comments/cluc5/html_injection_vulnerability_in_youtube_comments/)

~~~
ashearer
If you're going with a DSL anyway, making that language happen to look like a
strictly-defined subset of HTML doesn't necessarily reduce its security (plus
it allows for a WYSIWYG editor option). But too often what happens next on the
back end is an attempt to "sanitize" that input in place and output whatever's
left, rather than to fully parse it. That just leads to an arms race between
sanitizers and XSS exploiters.

Instead, parse the HTML just as you'd have to with BBCode or Markdown. Store
it in an internal representation that's only capable of the minimum needed
formatting features. (There's no actual HTML left at this stage. It's
equivalent to any other DSL.) Then render HTML out of that parse tree, so that
every bit of user input is HTML entity-encoded, and everything else (tags and
attributes) comes from constants in the program.

This can be even more secure than regexp-based DSL translators that build up a
result from input in multiple passes, since they tend to lack such a well-
defined separation between input text and output HTML.

~~~
keefe
This makes perfect sense and it's a cogent explanation besides.

Do you know of good libraries for this type of work off the top of your head?
bonus points for native java

