Might be worth clarifying this a little higher up on the page.
While just the opener for discussing XXE-style and injection attacks in XUL, I'm not sure this isn't satire. Neither was nor is HTML unparseable, nor has XHTML and XML on the web replaced it. XML was just a proper subset of SGML doing away with element-specific markup declarations/parsing rules such as required for `img` and other elements in HTML, and also with tag inference/omission (also as required for parsing HTML).
HTML in the wild was a mess, and required a lot of error recovery and empirical quirks. Browser vendors were all on board with the idea of rebooting it as well defined XHTML which has XML structure and HTML semantics. There was quite a bit of work towards that goal.
But the actual content creators did not care, and vendors relented. HTML in the wild is not quite as bad as it was and is much more easily parseable as well.
> HTML in the wild is not quite as bad as it was and is much more easily parseable as well
I beg to differ with the "HTML in the wild" concept and need for heuristics. HTML, including HTML5, was and is easily parseable using SGML, the markup meta-language on which HTML versions up to HTML 4 were specified. My DTD for HTML 5 (W3C's current HTML 5.2) can parse up to 97.31% of the test suite normatively referenced by the HTML spec (the rest being somewhat in a grey area containing constructs to make test automation happy, which is designed to never fail, but rather to produce something still in a predictable way) .
: http://sgmljs.net/docs/sgml-html-tutorial.html (reported on the slides reached via the "TALK" link)
I'm having difficulty unpacking your claim about a DTD "parsing" HTML5, but I find it hard to believe that a primarilly DTD-based parser would get all the rules right around, say, foster-parenting, or handling of -- in script elements, or pseudo-entities or any of the the other weird quirks that are specific to HTML and not shared with SGML languages. Perhaps you can link the code and details about the claimed test suite pass rate?
document.innerHTML = '';
.textContent = "";
And it has the same end result.
Also, technically, this should be faster since that empty string is not put through the html parser (tho browsers might optimize for this special case already).
As an example, there was a lot of cargo cult performance discussion around React in the first few years. A coworker was really excited about it but the first time we did a benchmark for code which updated a report table, React was 5 orders of magnitude slower because it used innerHTML instead of the DOM and even with keyed updates that was doing tons of extra work.
There were compromised solutions, such as using document fragments. Almost nobody did this though. A document fragment allowed constructing a DOM artifact without touching the document object and so there was no bottleneck until inserting that artifact into the page. This was the fastest way to construct large complex things, but it imposed a lot of work on the developer, it wasn't widely supported (even though it was standard), and the API was less clear/familiar.