Hacker News new | past | comments | ask | show | jobs | submit login

> A rendering engine for clean, valid XHTML could be very simple. But everyone wants to be able to see web pages created by technically incompetent designers with "tag soup" HTML.

The HTML5 parsing algorithm is not _that_ hard:

https://www.w3.org/TR/2011/WD-html5-20110113/parsing.html

Do you really think that the complexity of a full XML implementation of the half dozen specs required to implement XHTML would really be that significant a savings compared to the actual features browser developers actually spend their time on?

You're correct that complex software is what people want but that's complex as in an advanced document layout system with advanced language support, rich media, forms, etc. rather than the format those features are implemented in.




That algorithm - for what it is, namely pretty basic tree construction - is absurdly complicated. Did you actually read and grok all that stuff about the stack of open elements, and how all kinds of elements have special gotcha clauses? How you can't nest stuff like `<div>`s in `<p>`s, even in descendants - except in those weird corner cases where you can? And let's not forget that the page you linked to is only one of a few pages you need to parse html; stuff like tokenization also has a bunch of weird, legacy modes, and so on. It's pretty crazy; worse than quite a few programming languages, and they have a conceptually much, much more complex domain. Worst perhaps isn't just the sheer size, it's the haphazard inconsistency. If you don't know all the exceptions, it's hard to predict which part will be exceptional.

Seriously, XML is arguably bad, but html5 is absurdly horrible. The only reason it's acceptable is because it's so widely used that parsers are huge shared projects and most bugs are shallow.

I guess it's all a matter of perspective: sure, you might argue, hey, the spec isn't hundreds of pages, so it's humanly comprehensible. But from my perspective that's a really low bar, and it clears it just barely.

It matters too, because these weird quirks have often hidden things like performance issues, misparses, and security issues due to incorrect normalization.


Again, it's not like there's no work to implement it but … do you think that's more or less complexity than implementing CSS Grids, text layout systems for complex scripts, building a high-performance JavaScript engine, etc.? The browser teams are not shy about saying when they think changes are important for security, performance, or reliability so I'd consider it an indicator that it's not a huge part of their ongoing efforts.


Browser do much more than just HTML5 parsing.

https://en.wikipedia.org/wiki/Comparison_of_lightweight_web_...


Yes, that was my point? Even if it was plausible to switch to XHTML, that’s a small and rather insignificant part of browser code bases.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: