That algorithm - for what it is, namely pretty basic tree construction - is absu...

That algorithm - for what it is, namely pretty basic tree construction - is absurdly complicated. Did you actually read and grok all that stuff about the stack of open elements, and how all kinds of elements have special gotcha clauses? How you can't nest stuff like `<div>`s in `<p>`s, even in descendants - except in those weird corner cases where you can? And let's not forget that the page you linked to is only one of a few pages you need to parse html; stuff like tokenization also has a bunch of weird, legacy modes, and so on. It's pretty crazy; worse than quite a few programming languages, and they have a conceptually much, much more complex domain. Worst perhaps isn't just the sheer size, it's the haphazard inconsistency. If you don't know all the exceptions, it's hard to predict which part will be exceptional.

Seriously, XML is arguably bad, but html5 is absurdly horrible. The only reason it's acceptable is because it's so widely used that parsers are huge shared projects and most bugs are shallow.

I guess it's all a matter of perspective: sure, you might argue, hey, the spec isn't hundreds of pages, so it's humanly comprehensible. But from my perspective that's a really low bar, and it clears it just barely.

It matters too, because these weird quirks have often hidden things like performance issues, misparses, and security issues due to incorrect normalization.