Is HTML so complex that people honestly cannot be expected to write legal markup? Because, if the great majority of people who have made websites are perfectly capable of writing to-spec HTML, having a stricter parser really wouldn't reduce the size of the web in any significant way.
I also posted it to HN:
> Is HTML so complex that people honestly cannot be expected to write legal markup?
If you're already a programmer, HTML syntax is easy enough to understand and produce in valid form.
However, most people who create web content aren't programmers - particularly during the early days of the internet when it grew exponentially and established the positive network efficiencies that ultimately dragged more professional developers onto that platform.
Rather, they were amateur enthusiasts exploring a new technological domain. Thanks to HTML, the number of people who could create web pages was vastly higher than the number of people who could write computer programs.
It was the rapid democratization of HTML made possible by forgiving parsers that accounts for much of its success as a language - and of the success of the internet as a platform.
When an HTML parser finds code so bad that it can't render it, the parser just skips it and moves to the next line, in the manner of VB's `on error resume next`. (Contrast the stricture of XML parsers, which fail on encountering malformed code and produce no output at all.)
Since HTML rendering in response to an HTTP GET request is essentially idempotent, there's no real harm in continuing to parse code after encountering an error - but the network effects from doing this are huge.
If only programmers possessed the arcane ability to produce well-formed, valid markup, it would never have experienced the early growth that transformed it into the reigning standard of a huge and growing public network.
Edit - one more thing: for many people, HTML provided a gently sloping pathway into programming for people who might otherwise never have managed to overcome the steep barriers to entry of, say, C or C++.
If not, then I'm not sure what the point of the piece is.
 I realize that many places obfuscate their JS source. But there is a significant step from that to advocating changing the web's infrastructure so that we send what are essentially binaries instead of plain-text source.
It'd still be a horrible thing for the webdev community to lose/throw-away the ability to view the source of any site, but I don't think the effect would be quite as bad as it would have been ~20 years ago.