In my experience it's pretty rare for template language bugs to cause errors if user entered content includes one of their special characters. The template would have to be evaluated twice for any problems to occur - once to insert the user's template code in to placeholders within the original template , and then once again to execute the resulting combination.
I disagree! It does not make sense that submitting any text via a form input should in any way interfere with a templating engine, in the same way we dont expect to be able to affect a database by entering SQL into a form field.
The fact that Google brings back an empty result set to me indicates the problem is a bit deeper...
Guys, (and I don't mean Google, I mean all of us), don't fix injection by plugging injection bugs; put together some framework that actually avoids all of these problems (or at least doesn't let you add bugs).
This is actually a hard problem in the general case, and it is an active area of research. One promising approach is static taint analysis, wherein the source code of a web app is analyzed to detect whether "tainted" output is given to a sensitive "sink" without being properly sanitized. See, e.g., Omer Tripp et al., "TAJ: Effective Taint Analysis of Web Applications" (PLDI 2009) (http://www.cs.tau.ac.il/~omertrip/pldi09/paper.pdf).
As an example of a difficult case, consider the following pseudocode snippet:
That's a poor example. I would never send a document as HTML without tags. html_sanitize() should really be generate_html(), which adds structure to the document.
What the GP is saying (and I agree with) is that generate_html() should use a library which understands HTML structure and only allows content to be generated using a strict API (no doc+="<foo>bar</foo>" garbage).
Such a discipline greatly reduces the chance of injections, to the point where you have to actively write code to create injection points. And it's simple to follow: any time you write HTML, use the library.
Taint analysis sounds nice in theory, but you can get the same effect by writing code modularly (i.e. only one small module can actually access the raw output stream) and using libraries to create structured data.
You don't know where "doc" is coming from from a snippet like that. I use logic moderately similar to that for my blog software. If I'm writing the blog post, pretty much pass through what I wrote. If it's a comment (back when I had them), process the heck out of it. The blog post itself is a blob of HTML, basically.
I do not currently write my blog posts in a templating language (any more than anyone else does), though Hamlet  has me sort of tempted as it is so close to what I write anyhow.
It's not a poor example, and it's not about sending the document without tags. It's about whether special characters should be escaped, and the answer depends on the Content-Type that the client requested.
You're making the same mistake made by people who mistake good static type systems and type inference. Yes, it would be nice to infer the correct escaping function; but forcing people to escape somehow would be sufficient.
Nice article. In response to the last part I can think of a way to achieve the sort of smart escaping via template you talk about using Haskell and Hamlet (among other templating systems used by yesod). I believe, although I can't absolutely confirm that Hamlet already performs context appropriate escaping, based mostly on the type signatures and the names of a few of the functions.
I disagree with the articles premise that injection is always a display issue. In the [Yesod web framework](http://www.yesodweb.com) which uses Hamlet, we sanitize, not strip html by default before it is ever put in the database. The more you can make injection not a display issue, the better- you just have to know your options.
I don't know what you mean by "90% problem", but unlike what your blog article suggests, Django's template engine escapes everything by default. You have to explicitly pass content through a filter to request that it not be escaped.
Based on the fact that the suggestions in your blog article could easily support someone forgetting the "|escape" on a variable, I would accuse your methodology of only solving the "90% problem".
Due to django's "we want the templating system be general, to be usable for stuff other than html", it can't provide support for such 'guarantee that the output is well formed / valid / has no injection attack entry points' features.
I think that Django uses sha for password hashes. They should use bcrypt, right? Did you turn on XSRF protection (which I think is off by default)? Are cookies secure?
Web security is not as simple as 's.replace("<","<")' (escaping by default).
How do you generate slugs? Could someone put something nasty in a pathname or URL?
Django is not secure. You can secure it, with minimal effort, if you keep things radically simple. But you do need to know what can go wrong, so you don't introduce any "features" that are actually "gaping security holes".
Even if you do everything right, it doesn't mean that Django is magically secure no matter what people use if for (obvious, yes, but this is HN and sometimes failing to point out the obvious can get you downvoted). That's why people are objecting.
Handling user-submitted image tags is (in my opinion) way outside the scope of the framework. Which tags and attributes to whitelist, or whether to use html markup at all compared to a different language like markdown, is very project dependent. If you have to, just install BeautifulSoup or any of the other great libraries that have cropped up in the last year or so to handle the sanitizing.
Django uses sha for password hashes because until recently there hasn't been a better library to ship with natively across all the platforms that Django supports. If you know you'll only be working on *nix, django-bcrypt can enhance the default password hashing behavior. As other commenters have noted, they're moving to PBKDF2 in the near future as a better included hashing library.
CSRF is on by default. If you need secure cookies and HSTS headers, there's a package that provides them called django-secure, which last I heard is being rolled into Django proper in the near future.
Django prevents path traversal and anything else you can imagine that might be nasty in a URL. The auto slug generation included.
So how exactly is Django not scure again? Where are the "gaping security holes"? Or do you have no idea what you're talking about.
CSRF in on by default.
Cookies could be more secure, and it's being worked on.
Django is moving to PBKDF2 (there's no pure python bcrypt lib).
There's not really opportunity to do anything interesting with slugs.
Like any framework, there will always be room to improve security, but it does do very well out of the box. At least it makes you work to expose anything obvious.
...which you can do simply by posting a link anywhere.
Edit: I guess it would be more helpful to explain why for those not familiar with XSS. If all it takes it a specially crafted URL to your site to exploit it, your site is toast. The security model of the web assumes that people can open even the shadiest of links without negative consequences. I could have obscured the URL with a shortener and named the link "Cutest cat pic ever!" I could have hosted a page on a totally separate domain and put the crafted URL in a hidden iframe. All I have to do is send document.cookie over to my server and now I control your account.
On a related note, searches for many symbols do not product results. Searching for '&' will bring up results for ampersand, but most others that map well to words do not, e.g. $ => dollar, % => percent, etc.
Everyone seems to avoid to mention the obvious. This breaks the page layout only, not really a critical issue concerning google's integrity/security.
If your templating language is going to use a magic character it would seem useful to pick something less common than $. There are several odd characters on my keyboard (§`~±|¤) and if you are willing to use the ALT key there are really obscure characters that you can safely filter from the input instead of going through the trouble of escaping them. Filtering is so much more efficient/easier/safer than escaping.
Imagine how much easier life would be if in HTML we only had to filter for § instead of escape every <, > and ".