Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

>You can make your HTML as malformed as you like and the web-browser will do its best to display the page for you. I love the todepond website, but the source-code makes me break out in a cold sweat. Yet it renders just fine.

It renders just fine because it is syntactically valid HTML. HTML is not and is not supposed to be XML. It is originally an SGML application described by its Document Type Definition and SGML Declaration (https://www.w3.org/TR/html4/HTML4.decl). HTML uses and has always used many SGML features not found in XML, such as tag inference (<html><title> becomes <html><head><title>, <p><p> becomes <p></p><p>). Some of these, like SHORTTAG, were never even implemented in browsers. These days HTML is defined by the WHATWG ‘living standard’, which largely just restates the SGML DTD rules in plain language.

(Okay, https://validator.w3.org/nu/?doc=https%3A%2F%2Fwww.todepond.... shows a few minor errors, bet you couldn’t spot them.)

This is independent of the fact that browsers do try their best to render objectively broken markup, usually by ignoring the broken parts. In principle they could do the same with XHTML, but someone decided it would be ‘helpful’ to show the parser’s diagnostic output instead, and the rest is history.





> that browsers do try their best to render objectively broken markup

And it's a cancerous engineering principle. People say NullPointerException is the one billion error, but (the misuse of) Postel's Law on web frontend is a multi-billion error. Once one mainstream browser decided to "tolerate" an error, websites would start relying on that behavior, making it a permanent feature of web.

If browsers were less tolerant the whole frontend development would be much smoother. The past decade of JavaScript framework farce would have probably never happened.

The proper way to deal with syntax is making better tools: linters, interpreters and compilers that spew clear error messages. Not trying to 'tolerant' errors.


Postel’s law allows a degree of forward compatibility. This was important before continuous software updates were practical. User-facing code is the best place to apply it: I want my text editor to highlight invalid source code on a best effort basis, whereas the compiler should promptly bail out.

The tolerance is now precisely specified in the HTML5 parsing algorithm, far from "try their best". This is good, because browsers fail in mostly the same ways as each other, humans do not need a CS degree to handwrite Web content, and your tools can still write perfectly valid HTML5.

Obviously now things are better than IE5/6 era, but I can't help but think people without CS degree have to hand tweaking HTML because people who with ones failed to design proper tools and abstraction for them.

Typing valid XML does not require even 1% of a CS degre level of skill, come on.

You're overdoing it.


More than that, HTML5 specifies how browsers handle "broken" HTML. There's a super-precise algorithm that dictates what to do for unclosed tags, how to fix up the DOM for incorrect nesting, specific fallbacks/behavior for invalid attributes, and so much more. I would say this algorithm, along with its element/attribute-specific components, are where most of the HTML5 effort was applied, and continues to be such for newer Web APIs.

Browsers "trying their best" was like half of the Browser Wars, and what HTML5 was largely created to address. The other half being nonstandard ActiveX crap and IE-specific JavaScript.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: