Hacker News new | past | comments | ask | show | jobs | submit login
[dupe] Go Language – Web Application Secure Coding Practices (gitbooks.io)
266 points by rbanffy on June 23, 2017 | hide | past | web | favorite | 50 comments

When I see anything that touches web coding practices for Go I always look up an area that I know best: sanitization.

I wrote https://github.com/microcosm-cc/bluemonday which is a pure Go HTML sanitizer inspired by https://github.com/owasp/java-html-sanitizer .

The key things to understand about HTML sanitizers:

* They must be whitelist based

* They must be aware of context

* You must sanitize ALL user input even if you don't think you're going to render it on a web page.

The book linked to in the article does not seem to understand any of the above.

The section on sanitization has the equivalent of "string replace" as the primary recommendation. Elsewhere in the XSS section a focus is on escaping content before it is rendered.

Sanitization needs to know not to run on <pre> blocks, and to escape HTML entities automatically, and to understand what links are safe and which are not.

XSS can be really interesting and quite targeted. It can be that a user-agent contains the XSS, because the target may not be the person reading the page but the admin looking at a web page of their web server logs through an analytics program on the same domain.

The bluemonday package I wrote can deal with all of these things but that isn't the point, the point is that this is an area I know and the book falls way short of a decent standard for creating a secure and safe web application. And if it falls short in this area (the first 2 chapters) then I would assume that it falls short in all areas.

> You must sanitize ALL user input even if you don't think you're going to render it on a web page.

I'm not able to make sense of this. Sanitize it for what context? SQL? JSON? HTML? Inclusion as a command-line argument? All of these, and hope that sanitizing it for one context doesn't un-sanitize it for others?

Context here means the context of the output page.

Usually this means the HTML context. Different sanitization is needed depending on _where_ in the HTML document the input is used.

For instance, if the input is used in between HTML tags (let's say $foo is user input in this PHP example):

    ... <body><?php echo $foo ?></body>
Here, the input that you need to transition to JavaScript execution is a < character (among other things): <script>alert(1)</script>.

Therefore, to correctly sanitize this, you would call the PHP `htmlentities` function:

    ... <body><?php echo htmlentities($foo) ?></body>
Now, this XSS vulnerability is fixed.

What if foo is used in a different context?

    ... <body><a href='<?php echo htmlentities($foo) ?>'>...
Here, what we need to transition the HTML parser to executing JavaScript is a ' character, and this can be exploited by the following input (in between the double quotes): "' onclick='alert(1)"

The key problem is that `htmlentities` is not valid sanitization in the context of an HTML attribute value. In this example, you need to use `urlencode`

    ... <body><a href='<?php echo urlencode($foo) ?>'>...
The general idea also applies to CSS, JSON, and JavaScript. SQL is a different vulnerability class (SQL injection).

I highly recommend the following research paper from 2011 that discusses the context-sensitivity of JavaScript in depth: http://www.comp.nus.edu.sg/~prateeks/papers/scriptgard-ccs11...

In my mind, the context-sensitivity of XSS is one of the key reasons why it is so prevalent.

I've always had issue with this advice for this very reason. Worse, when people sanitize in and out of the database. Double encoded html just cracks me up.

I'm a big fan of not necessarily sanitizing, but treating it appropriate in context. This may mean removing characters, or mapping them, or just delimiting the entire thing.

To that end, I argue that you should not necessarily sanitize on the way to the storage mechanism. You should only sanitize at the boundaries. So, a web view should make sure any strings are treated as strings. A database layer should make sure query parameters are not able to alter the query. Etc.

(All of this is trying to simply reinforce your point.)

Right. It should be SQL sanitized (parameterized) going into the database, and and then HTML/PDF/whatever Santosh when it is put in one of those formats.

Honest and earnest question for you (as bluemonday's author): Where should gophers draw the line between Go's standard html/template library for rendering "safe" HTML, vs. needing to use bluemonday? The question assumes that any and all user inputs are checked/validated at the point of receipt. Is bluemonday just the "on steriods" version (e.g. provides more granular control, whitelist of specific tags, etc)?

Perhaps it is more of a use-case question - I haven't worked on comment or wiki systems but that seems like it would be much harder to validate user inputs, and thus maybe that is where the line should be drawn? When user inputs are expressly expected to have markup?

If you're using the Go (text|html)/template and you are not rendering any user or untrusted input... i.e. this is 100% a little blog where you are provably the only one providing content... you're gold. No need to do much else.

If you accept any user input and this will be displayed or processed anywhere (be it in HTML fragments, input to SQL, etc)... then sanitize it and use bluemonday or any other sanitizer applicable to your content.

For user generated content, like comments on Hacker News, there is a pretty rich policy provided by bluemonday so you don't have to think about it and are going to be alright.

That looks like this:

    p := bluemonday.UGCPolicy()
    htmlOut := p.Sanitize(htmlIn)
If you have custom needs beyond "I accept UGC"... then I'm afraid you're going to have to think about your needs and how to express your own policy, but even then, you can use bluemonday's built in policies as a starting point.

Nice. I use bluemonday, and I also took a lot of inspiration from how microcosm was written. Just wanted to say thanks for the effort!

>You must sanitize ALL user input even if you don't think you're going to render it on a web page

Not really.

Yes, really. Otherwise, you must take extra care not to reflect any input data back in any response to the user, whether it's in the HTML body or not. See: HTTP response splitting.

No, not really.

In fact, you have it totally backwards: you're not supposed to sanitize all user input before storing it. Instead you're supposed to sanitize any user input before you output it back to your webpage.

Even more so: it's the output that dictates what sanitization you should perform, not the input. You don't do input sanitization for HTML (for XSS etc) when you store your data in your DB. Instead you should sanitize the input for SQL Injection issues. And similarly for whatever other output -- if you take user input and run a shell command, you should sanitize for shell safety, not run html sanitization.

Sanitization does not refer to escaping HTML entities before rendering to avoid cross site scripting. That is just escaping of entities done when rendering templates.

I believe by sanitization most people mean processing content which will be rendered and not escaped, a good example is content from WYSIWYG editors. And this is where sanitization libraries would come into play.

You would sanitize HTML fragments before storing them in database because you don't escape them during rendering. Text content is not sanitized before saving to database as you can just escape it when rendering.

You've said nothing that contradicts my post.

As long as the data is sanitized before it can affect the storage/transport mechanism for its content type, you're good.

> As long as the data is sanitized before it can affect the storage/transport mechanism for its content type, you're good.

No, not really. Storing the user's data as is is almost always of paramount importance. The fact that it may be output as HTML/XML/MarkDown/whatever means that it really is at output-time that you must sanitize/escape/quote.

That's why the moral of the Bobby Tables story isn't: "Oh, just remove all semicolons". It's "use prepared queries".

I don't disagree with sanitizing data at output time when it's clear that A) the input won't affect anything else and B) output is going to happen. But realize not all input winds up in a SQL database, not all input will be considered valid in all contexts, and not all input eventually becomes output.

Sometimes, data really does need to be sanitized at the point of submission. If you disagree, that's more of a point about application design than appsec.

> But realize not all input winds up in a SQL database, not all input will be considered valid in all contexts, and not all input eventually becomes output.

That was the point I was trying to make: Sanitizing input is fail-from-the-start. There's no way to know ahead of time what outputs you're going to be producing 5 years from now. Conclusion: Store all input exactly as received. (We can do that these days with form/url encodings and whatnot).

Ok, so now you have the data stored accurately.

Next step: You need to output to, let's say, HTML. Ok, so you just escape/quote everything appropriately and nobody gets hurt. If you just do the escaping/quoting properly there is no XSS attacks. It's really just that simple.

However, it is NOT about sanitizing at the "input" point. Do you get what I'm saying now?

(I realize that that sounds aggressive, but I really just want to force this point home. Please tell me if you disagree or find some detail in my explanation confusing. This is important for the security of the web and either I'm wrong or you're wrong or I didn't understand what you said. Let's figure out which is the case.)

[1] There are caveats here.

You're all saying the same thing.

I didn't specify whether the sanitize occurred on receiving user input or displaying it.

I only said, sanitize all user input.

> I didn't specify whether the sanitize occurred on receiving user input or displaying it.

I'm sorry, but you basically did. You said:

> You must sanitize ALL user input even if you don't think you're going to render it on a web page

Which implies that sanitizing input at display time, when you know you're rendering it to a web page, is too late. That's why people are jumping on you. Keeping a clean database is the absolute most important thing you can do. The database isn't contextual. The data it stores can find its way into HTML pages, REST responses, SQL queries, PDF reports, XML/JSON data exports and a ton of other formats. Each of these output formats will require a different form of sanitizing. Sanitizing before the data hits disk creates a nightmare for anyone displaying the data in a context other than the sanitization that was performed. So what you said originally is precisely incorrect. Only sanitize input when you know it's going to be rendered to a webpage. Otherwise, leave it alone.

Now, you should be using view-layer frameworks to make that sanitization easy, automatic and the default action. When rendering to HTML, the templating language should sanitize by default and give a way for template authors to opt-out when they know the data did not come from user input. Likewise, in the SQL context, prepared statements also make it easy for the developer to do the right thing. But at no point are you speculatively sanitizing all user input. You're getting user input to disk in as pristine a format as possible and sanitizing contextually depending on how the data is outputted.

Also really it should be less about sanitising, more about escaping. If possible, the input should be stored essentially as-is.

Last time this was discussed: https://news.ycombinator.com/item?id=14192383

Still not convinced about the usefulness of the source, given the criticisms raised here and there.

This guide still has some issues. It's missing common classes of web app vulns I've seen in Go code (e.g. CSRF, SSRF) and has some weird advice here and there (scan uploaded files with AV? Really?)

Fail to scan with AV and you might be an unwitting malware distributor.

If you allow binary uploads, you're going to be a malware distributor whether you scan or not. AV just introduces complexity and attack surface and doesn't really belong in a guide about Golang secure coding practices.

Under Validation and Storage > Storing password securely: the theory

That code -- using sha256 to hash a password+salt, and store it in a db -- should not be there. Someone WILL copy paste it.

Don't make it easy for people to do something stupid.

To be fair to the author, the paragraph immediately following that code snippet says:

"However, this approach has several flaws and should not be used. It is given here only to illustrate the theory with a practical example. The next section explains how to correctly salt passwords in real life."

and goes on to use bcrypt.

The fear is that people won't read that or just use the SHA256 example because it's before the bcrypt one, or maybe because they find it runs faster.

IMHO, I think examples of dangerous code should implement one of the following mindless-script-kiddie-consultant defenses (in order of most sane to least sane):

1. use screenshots of the snippet

2. use pseudo-code

3. put in deliberate syntax mistakes

4. have a javascript alert on right-click or ctrl/cmd+c with the warning about it being unsafe (a.k.a. going full 90s)

Yes the problem with "unsafe" examples is that people will use them -- perhaps thinking, "this is a POC, we'll come back later and do this properly" -- only it gets forgotten and ends up in production.

To be fair to the person you're responding to, do you think people looking for copypasta are going to check the fine print?

The code sample in question is in no way clearly labeled as being problematic, and a tiny statement buried in a large body of text does not change that fact.

To be fair to the person you are responding to, I don't think copypasta developers are going to create secure applications anyway.

Which is exactly why the only copy-paste-able code should be secure by default. Much like the idea behind the existence of NaCl, the easiest thing to do should be the correct thing to do.

Go has also an amazing static-analysis tool, vet, that is not mentioned in the article: https://golang.org/cmd/vet/

It can find printf-format errors, invalid shifts, unreachable code, etc.

Even though go vet is very helpful, it is sometimes scary that the compiler allows to build such incorrect code. For instance: https://play.golang.org/p/2AVHUt5Wcf

That's a weird criticism to read when most of the time people moan about the Go compiler being too strict for failing to compile code with errors that other compilers might only warn or even if ignore.

Go does already have one of the strictest compilers (by default) out there. But it is pretty typical for a great many language frameworks to support optional additional strictness. Eg "use strict" in Perl or the gcc flags of which I cannot recall off hand. I see God's vet as akin to those. ie there when you need it but keeps out of your way when you just need to get some prototyping done.

What by the language spec makes your code incorrect? The only question is, does the compiler generate a constant call to print "false"? But this would be an optimization, not a correctness question.

I'm not sure how `fmt.Printf("%s\n", i) // invalid type` is really that "scary". Expecting the compiler to introspect that Printf's `...interface{}` argument's first value is incorrect, even though `i` does fulfill the stated parameter's type, is a FAR more scary thought.

The important thing here is, that fmt.Printf is just a function which takes a string and a variable amount of interface{} parameters. There is nothing in the language spec which creates a type correlation. the "%s" denotes a string parameter to print is solely some inner working of the Printf function. Yes, you can special case fmt.Printf, as some compilers do, but given the standard go vet tool, I rather think it belongs there than into the compiler, which, following the Go philosophy, exactly understands the Go language and precisely compiles that.

That's exactly my point. Special casing the compilation of 1 specific set of functions is a scary concept, esp. as the tool that can check the types is already in the go toolchain, AND there are plenty of edge cases that can't be caught anyways, so it can create a false sense of security.

Ah, had not quite caught your point as you had phrased it (and that by the time I answered your post was difficult to read for me, as for some downvotes it was printed in a very light font). Then we fully agree.

Overall I thought it was a good read. Does anyone know if using bindvars is foolproof against SQL injection? I'm currently using the sqlx library because some of the ORMs available for Go are rather clunky to deal with.

I once made a tragic security error with

     http.Handle("/o/", http.StripPrefix("/o/", http.FileServer(http.Dir("/"))))
It's a bit too easy to expose to much of the FS aka "Directory traversal attack". My advice, always deploy into a container.

Theres a whole section on stored procedures - it was my understanding that the database/sql package in the sodlib didn't support stored procedures (something to do with multiple rows or output var support)

The database/sql package is just bad anyway. The idea is sound but the execution is catastrophic and not many thought went into the design of this package. This is especially true with Postgresql where you are kind of forced to use db.Query instead of db.Exec on inserts or updates. Go std lib have a lot of issues like these. On the top of my head the log package and the flags package are pretty useless in real world apps.

is there an equivalent of this book for python?

and Ruby as well

People who are not happy with the current content may contribute to make it better. This is an open source and collaborative work. There is obviously room for improvement. But instead of spitting in the soup, provide merge requests or issues. This book is shared for free. If everybody is giving a little of its time to make it better, then we may get a really usefull and informative book. It's a huge work to write a book.

See here how to contribute : https://checkmarx.gitbooks.io/go-scp/content/howto-contribut...

Don't read it, always start with using a good framework. It will do everything for you

Hi homakov, given your experience with security, could you please expand on _why_ we should not this particular guide? I do understand why we should start with using a good framework, and I have so far felt that reading guides also help us understand things to be aware of.

Because just reading a book or tutorial won't give you enough info to be comfortable with all terms there. For hobby or if you have spare time - yes go ahead, for practical business needs - waste of time. Good framework handles most of them better, and if it doesn't, it's not good one. So step 1 is always framework. Optional 2 is tutorials.

Given he got kind of famous for breaking GitHub using a Rails bug, I'm pretty sure he was being sarcastic.

For reference:

http://homakov.blogspot.fr/2012/03/how-to.html https://news.ycombinator.com/item?id=3663197

That's a pretty bold and ultimately misguided assumption.


> Don't read it

and listen to you instead? Wrong.

> always start with using a good framework.


> It will do everything for you

Extra wrong.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact