Hacker News new | past | comments | ask | show | jobs | submit login

This bug probably existed because some developer thought "this is an internal application, I don't need to apply the same rigorous input/(edit: and output, as replies point out) sanitation as I do with normal sites because it's only accessible by VPN."

As a consultant that gets to see a lot of "internal only" applications, this is one of the misconceptions that me and my coworkers try to fight against. XSS is effective even if the attacker doesn't have access to the internal application, because it's not the attacker's computer making the requests.

Output sanitization is what you want to bet on. Only your website / app knows where a piece of data will be displayed, so that is when you should apply appropriate encoding of the output stream.

Yup. React has this by default with everything you render. You have to override it with “dangerouslySetHtml” if you don’t want it.

As have done other frameworks for quite some time, btw.

As other frameworks have done for quite some time* :)

As done framework time some other.

Actually, input might be the better option as one rarely needs to accept HTML or such special characters.

It's also a more common to display or use data than storing it, so you don't have that many places where you can fail when you just convert the input before storing it.

It's nice to be able to trust all data coming from the server.

Trust but verify. The only correct option is to do both.

Can you (or another commenter) give an illustration of this sort of output sanitation?

The term you probably want to look for in your web framework is "encoding".

I don't like "sanitization" personally, because it sounds like you're removing "bad stuff", but in general, "bad stuff" is not identifiable or removable because "bad stuff" is highly context-dependent, plus a lot of times the "bad stuff" is perfectly legitimate [1]. Apostrophes are "bad stuff", because they can break out of SQL queries and HTML tags, but they are also parts of people's names. Double-quotes are "bad things", but they are legitimately part of all sorts of real data. Any "sanitize(string)" function is by definition wrong because it has no place for a context to go, and it will do bad things to your data.

One of my items on my short checklist for examining an HTML templating language is "does the simplest possible way to dump out a string to the user at least do HTML encoding on the value"? That is,

    x = "<>"
    template = compileTemplate("{x}")
    template.Dump({x: x})
for whatever the simplest output of a value is, should output "&lt;&gt;"; if it outputs "<>", you've got a templating language that you ARE going to write XSS attacks in, no matter how careful you are. The time you want to dump out non-encoded text is the exception, not the rule.

Bonus points for being even more aware of the context and correctly encoding things in Javascript context vs. HTML context, etc. This isn't a magic wand that fixes everything, but in general, if it does default to a blind HTML-encode it at least means that instead of a security failure if you screw up the encoding, you'll get the user seeing some ugly stuff on their screen like &lt; instead.

[1]: Although, technically, I think it's acceptable for an HTML encoding function to just eliminate the ASCII control characters other than newline, carriage return, and tab, rather than encode them. Those are just asking for trouble, even if you encode them. Especially NUL. Even in 2019, best to keep NULs out of places they don't belong.

Exactly, sanitization is a misnomer. If you are concatenating plain text together with HTML then you have an app which is functionally broken when someone with an apostrophe in their name tries to use it -- it's not just a matter of security. The strings must be the same format (i.e both valid HTML fragments) before you concatenate them or the result will be unparsable garbage.

And the idea of "sanitization at input" is especially ridiculous: how can you know what you will be concatenating that input with until you actually do it? I.e. is it being inserted into some HTML? is It going in an attribute value or a text node? What about outputting JSON?


This is why we typically speak about defense in depth. Input sanitization works best when applied to known expected inputs, like a phone number or dob.

Output encoding is the real solution where we know where we intend any data to end up (this is how it’s displayed) so we can ensure that it’s in the correct format and that that format parser won’t interpret it as code instead of data. Ie html attribute, html, Json, JavaScript, etc.

If a user has a bracket character in any field, it's OK to allow it, as long as you don't render it directly in any HTML. You have to make sure that when you render it you render it as `&lt;` or `&gt;`, which get displayed as `<`, or `>`, but aren't interpreted as HTML.

Correct. And one reason to properly format for output, rather than sanitise input is because you do not know how the string might be used. I mean you can sanitise for HTML output, but it won't cover shell command output (i.e.: when you pass the string as a parameter to a tool via --vehicle-name=). Thus input is to be stored as is, and NEVER trusted even if some input sources "sanitise" it.

this mistake is what causes the incredibly common html entities in plain-text emails, as well as RSS article titles.

Output formatting, example: "><script src="// becomes &quot;&gt;&lt;script src=&quot;// for the web. For other types of applications, where any of these characters have special meaning and might be interpreted, these might be formatted differently for output.

Hell even in my past career supporting hardware network products a lot of companies had / have management ports that are vulnerable to all sorts of stuff. The industry standard response from engineers was "well that should be behind a firewall".

It's time we stop pretending the big bad internet is just "out there" just because it should be, it is everywhere.

Normally, it would not be the input to be sanitised, but rather the output properly formatted. It's easier to make sure that ANY type of input is shown properly, as opposed to eliminating SOME of the known issues.

Note that even if it's only accessible by VPN, attackers can still make HTTP requests to it because when an employee connected to the VPN visits attacker.com , attacker.com can make XHR calls to internalsite.com . The attacker can't read the response (unless there are other vulnerabilities), but if you don't have CSRF protection, the attacker can perform actions on the internal site.

Could just be because the application was written by a less experienced programmer, or even outsourced?

It could be, or it could even be that whatever process that brings code from development to production is less stringent on internal applications. Maybe people don't review the code as closely (or at all!), maybe they have fewer tests for internal code. "Internal only" applications almost universally have less scrutiny applied to them in my experience.

I've seen very experienced developers make mistakes with input/output sanitation.

I work as a contractor for a bank and while investigating a small security issue reported by a third-party audit firm, we discovered that the clever, bytecode-weaving-autogenerated-declarative security had been overriden by someone who added his own, equaly fancy security module directly in a parent project.

I cannot describe the shock when I realized what information an attacker could have gained in a window of 6 months the bug was active.

All of this code was written by experienced programmers, it's just that nobody ever wrote any tests to ensure the fancy security code was still in place.

Tesla and SpaceX are both pretty maniacal about not outsourcing programming, to my knowledge.

Interesting. Obviously they view it as a core competency. This would seem like a non-obvious and unnecessary expense to many, but (on the Tesla side) differentiates them from other automakers. Whether that results in a barrier to competition... we'll see.

Although if you believe these anecdotes from a supposed ex-employee then competency is not the word to use:


I only know of low-level tools being open sourced like service meshes, RPC clients, event busses, and metric servers. I’ve never seen internal applications open sourced. Do you have an example?

OP said out sourced, not open sourced

This stuff should be taken care of by your web framework wherever possible.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact