

Ask HN: What do you use for HTML sanitization? - waldrews

There's some established HTML sanitization solutions like HTML Purifier in PHP, but I can't seem to find their counterparts in either Java or .Net.  Code samples in C# and Java are floating around, but I haven't seen anything like a project that's undergone serious testing and is being updated for new security threats on an ongoing basis.<p>Heck, I'd pay for a commercial component if it was well maintained.  Any suggestions?  Or, this being Hacker News - would someone want to create such a component as a mini-ISV startup?
======
cperciva
I don't do HTML sanitization. I escape everything except [A-Za-z0-9,. ].

Attempts to sanitize HTML (or SQL...) by eliminating a set of "dangerous"
inputs inevitably end up as whack-a-mole processes as people keep discovering
new evil things to throw at you. The only secure solution is to escape
everything except a small set of "safe" inputs -- this is analogous to the
situation with packet filtering firewalls, where "default deny all" is widely
accepted as the right way to do things.

~~~
waldrews
Escaping is important for all text fields, and I never use user text to build
SQL strings without using some kind of typed parameter API.

But I'd like to let the user submit formatted HTML via something like
FCKEditor and have it checked against a limited whitelist.

I don't trust my own knowledge of browser-specific HTML rules to do this on my
own. It's got to be a common enough scenario, right?

------
ScottWhigham
I use PeterBlum.com's controls for ASP.NET plus some hand-rolled stuff.

