I mentioned it in a comment to his article, but I'll repost here - "Markdown" is not a StackOverflow core competency. People don't go to StackOverflow to write Markdown. They go to get their questions answered or to answer others' questions. I don't think his "core competency" argument floats.
His "I couldn't find libraries for C#" on the other hand may well be valid.
Agreed, and I think Markdown's inclusion of HTML makes it a bad choice for this application* as well - why put yourself in the position of being in an arms race with spammers and scammers, when there are other valid choices (WikiText being the most obvious example.)
(* This is not a slam on Markdown, which I quite like, but John Gruber created it for himself and his uses as a writer with complete ownership over his servers.)
The problem with sanitizing all possible XSS exploits in all HTML accepted by browsers is similar to being the outlaw vs. being the cop. As an outlaw, you have to make good your escape every single time. The cop only has to catch the outlaw once.
Admittedly, this is confusing because Jeff, be being an HTML cop, has put himself in the position of outlaw.
Atwood can treat markdown as critical to his business, he can roll his own, and learn to create a parser and he should STILL start with something like Beautiful Soup.
Then he can modify it, extend it, refactor it, what ever.
But starting from scratch all that will happen is that he will, after a LOOONG time, end up with something almost as good Beautiful Soup, but with a much uglier code base.
And he will not learn more doing it that way. He will perhaps learn just as much, but not more.
Reinventing the wheel is stupid.
If X is critical to your business get a wheel for X then own the damn wheel and X.
Joel himself started with an open source VPN project for Copilot.
And I'm guessing that's a pretty critical part of Joel's business.
The best solution here is to just not allow HTML, require all markup to be Markdown. It's not hard to do and is actually a feature in the Python Markdown module. I had to make a public form Markdown compatible the other day in a Django project, it's brain-dead easy:
I made a similar comment on the Buffet story on the front page, but Atwood's argument is good fodder to extend my argument.
Atwood says: If it's core to your business, don't go shopping, do the hard work and make it yourself.
Buffet says: America's seen worse and prospered in response. Go buy some America today.
It's debatable if Atwood's example is core to his business (I think it might be since it's the freakin' user interface for making comments), and the comments provide some helpful follow-up to Buffet's argument. Here's my synthesis of the two:
As a patriotic, peace-loving American I want to do the hard work and bring my passion and vision to a business that will make our economy more efficient and save folks money. I'll do that in the area of my expertise, technology. Everywhere else, where I have the money I'll invest in the companies that can and will rebuild America.
That's the investment criteria, personal and financial, that I believe Atwood and Buffet are driving at. Of course, it doesn't just apply to Americans, it's probably just more obvious to those folks living outside of our parochial bubble ;)
I find it hard to believe that HTML sanitization is central to the Stack Overflow site. (At least, not central in the "worth-reinventing-the-wheel" sense. Note that Atwood is depending on Markdown rather than inventing his own markup syntax and implementing a library for translating it into HTML.) Does the quality or efficiency of HTML sanitization make such a difference in the overall performance of the site?
Let me know if I got this straight, Markdown doesn't have it's own HTML sanitation mechanism, so Stack Overflow is rolling their own general HTML sanitation solution?
Why use Markdown in the first place? Is this a time when Markdown doesn't solve the problem Stack Overflow has? Look for another markup language?
Markdown doesn't even try to solve the HTML sanitation issue because it was designed for use when you have complete control over the content, so it passes all HTML through in the clear so that you can use Markdown to make the usual/trivial stuff easier, and leave the complex stuff to how HTML designed it.
Because it ignores HTML altogether, you need to have a separate sanitation process if you only want a subset of HTML to be usable.
The story is the same with most other simple markup languages, like Textile.
His "I couldn't find libraries for C#" on the other hand may well be valid.