To close or not to close – Void HTML elements

jrockway · on Feb 27, 2014

HTML loves its special cases. XML is overly complex, but at least your editor doesn't need to know anything special about what document type you're writing in order to indent it properly. Throw in HTML's special cases, and now it needs to know that <br> is different from <foo>.

I guess since HTML is so common it doesn't really matter, but really? We need 5 differnt types of markup, when one would have been fine?

https://xkcd.com/927/

bhaak · on Feb 27, 2014

This. I wish they didn't do a HTML5 but instead only did a XHTML5.

There are a lot of good ideas in HTML5 but why did there need to be _another_ way of parsing HTML-like documents?

Apparently because it's the one HTML-parser to surpass and replace all other HTML-parsers out there. <sarcasm>Yeah, I totally believe that.</sarcasm>

jeswin · on Feb 27, 2014

I prefer HTML over XHTML, because it is easier to write. I don't get the reasoning behind closing tags. LIs close before the next LI, or the UL. <BR> saves two characters over <BR /> and causes no harm. XHTML feels like trying too hard to make the machine overlord happy.

It is plain wrong to make a standard easier for machine-parsing at the expense of humans who are typing it in.

EDIT: Another example. I write some HTML in a text-editor/textarea and send it across to someone. If I missed a </LI>, should the parser reject it? If not, the standard should be accommodating enough so that this is valid.

mkohlmyr · on Feb 27, 2014

Personally I see saving a few characters here and there as a completely inadequate reason for making the spec less consistent.

I don't think any of the things you mention actually make html considerably more legible or easier to write for a person. Just harder to parse for a machine.

I would rather have a strict language and have solid parsers that can thoroughly and decisively reject improper markup and help stop people from making mistakes while writing markup. Rather than trying to interpret what they really meant after the fact.

recursive · on Feb 27, 2014

How many HTML parsers do you need to write on average?

mkohlmyr · on Feb 27, 2014

You're only addressing half my argument. Just because we can invent the wheel once doesn't mean it's a good idea to make the problem it solves overly complicated for no good reason.

Illotus · on Feb 28, 2014

So you are of the opinion that convenience is not a good reason?

mkohlmyr · on Feb 28, 2014

I'm of the opinion that there is nothing that much more convenient about what he was describing. And definitely not to the point where it makes a more complex specification worthwhile, whether one or a million parsers need to be written isn't really the point. There is some virtue in having a clean, consistent and well defined specification, for writing simpler parsers or just for being able to learn the language and be sure of how to parse it in your head. Fewer edge cases in specification = fewer mistakes and bugs in source. Attempting to fix mistakes and bugs after the fact by guessing what someone meant however will just make for new mistakes, hide bugs and does less to encourage a solid understanding of the language.

romaniv · on Feb 27, 2014

It is plain wrong to make a standard easier for machine-parsing at the expense of humans who are typing it in.

It is far worse to sacrifice consistency of the mental model behind the language for the sake of not typing two extra characters.

Core XML is much easier to read than modern HTML, because you can read it without knowing the context of what you're looking at and without memorizing tons of exceptions. It's easier to parse for the same reason.

Also, the savings on avoiding " /" are offset by the need to needlessly close some of the HTML5 tags.

The only really stupid things in XML I remember is the need to do checked="checked" and people using namespace prefixes on every tag. It's pretty obvious how to fix the former. The latter is entirely avoidable if you have a fully working parser.

untog · on Feb 27, 2014

LIs close before the next LI, or the UL.

But the problem is that you now have a specific behaviour that depends on the tag name. DIV tags don't need to close before the next DIV, but LI tags do. So you've gone from a simple tag parser to one that needs to know the intricacies and rules surrounding every element type.

Personally, I feel that two extra characters per <BR/> tag is worth it.

enraged_camel · on Feb 27, 2014

In addition, not having to close some tags might make it easier to write HTML, but it makes it more difficult to learn it. I remember back in the day I had to keep looking up which tags need to be closed and which ones do not.

Nowadays I just close everything because my OCD outweighs my laziness.

__david__ · on Feb 27, 2014

Yes, it makes parsers harder. Or more accurately, it shifts some of the complexity from producers to parsers.

Given the landscape of HTML (lots of producers, comparatively few parsers), this shift seems reasonable to me.

jonhohle · on Feb 27, 2014

Wouldn't having a reasonable schema specification specification solve this? Its not like you can invent arbitrary tags (there are arbitrary attributes, but an expressive enough schema language could capture that as well).

callahad · on Feb 27, 2014

FWIW, inventing arbitrary tags is coming with Web Components / Custom Element :) http://w3c.github.io/webcomponents/spec/custom/

anonymouz · on Feb 27, 2014

But then you need a schema to be able to parse the file. XML files can be checked for well-formedness and parsed without a schema file.

untog · on Feb 27, 2014

Sure, but presumably that schema will change with time. Then you'll have a parser built around HTML5, but another new release for HTML6, so on and so forth. It just needlessly complicates things.

bhaak · on Feb 27, 2014

It does not only make the machine overlord happy, it also helps the humans when they do make a mistake.

Humans throw some HTML-like stuff at the browser and the browser tries hard to make sense out of it. If the browser misinterprets, you have a hard time finding out what went wrong.

Whereas with XML and XHTML you get told immediately what's wrong and you don't have to hope that every browser implementation works the same way.

It's also a bit strange to argue about "easier to write manually" in this day and age of Markdown, HAML, etc.

danbee · on Feb 27, 2014

> If the browser misinterprets, you have a hard time finding out what went wrong.

This is what HTML validators are for.

Browsers should do their best to interpret the page authors intention and actually display a page. The developer doesn't always have 100% control over the page markup (think about user generated content, ads etc...)

talmand · on Feb 27, 2014

Why can't the browser be the validator itself? You load the page and the browser tells you what's wrong with the code.

rwj · on Feb 27, 2014

This would work as long as the browser was in a dev mode. Usability would suffer if normal users had to deal with errors in the pages that they visit.

bhaak · on Feb 27, 2014

Sorry, that's silly. Web pages break all the time.

If they would break because of invalid HTML, they would also already break during your tests.

If you use some HTML injection services like ads or analytics that make valid HTML invalid, then it's great that it breaks because it will show you immediately on your tests that these services suck.

danbee · on Feb 27, 2014

What if you have no choice but to rely on those services? The person coding the site isn't always the decision maker.

bhaak · on Feb 27, 2014

Yes but you have an error in your reasoning.

If browsers would error out on invalid content then those services would either provide valid content or they would go bust.

And XML/XHTML is much easier to parse and for producing valid content than HTML because it is much more consistent and has less history package than HTML.

It is just that today, we still have tag-soup and error-tolerant parsers in the browsers and of course lots and lots HTML producer are producing shitty HTML and you can't just switch on a strict parsing.

But if history would have taken a slightly different turn, we would be talking about XHTML5 and not HTML5.

talmand · on Feb 27, 2014

That's a good point, but I wasn't necessarily thinking it should error out to render nothing at all.

What I would like is if somehow the browser could insert an error class name into the offending element so that in my CSS I could give it a set of rules to make it stand out after render.

danbee · on Feb 27, 2014

Users shouldn't have to care about this. If the browser can render the page it should.

This does raise the point that perhaps browsers should have a development mode that does raise errors though.

jeswin · on Feb 27, 2014

What I'm saying is that it is "not a mistake" to omit the closing tag in some cases. There are things that are hard when it comes to parsing HTML, but when to close open tags is not one of them. Rules for closing tags are trivial to implement and well documented. (Add: also, omitting unnecessary tags such as <html> and <body>)

Humans throw some HTML-like stuff at the browser and the browser tries hard to make sense out of it. If the browser misinterprets, you have a hard time finding out what went wrong.

I haven't seen any modern browser misinterpret HTML's (simple) closing rules. As for being harder to debug, I haven't seen any real evidence of that either.

Whereas with XML and XHTML you get told immediately what's wrong and you don't have to hope that every browser implementation works the same way.

Again, do you have any evidence of such incompatibility with current or recent browsers?

It's also a bit strange to argue about "easier to write manually" in this day and age of Markdown, HAML, etc.

Way more HTML is written by hand than Markdown and HAML. The issue isn't just saving keystrokes. The point is that whenever possible, technology should accommodate simple mistakes people make.

epidemian · on Feb 27, 2014

> It's also a bit strange to argue about "easier to write manually" in this day and age of Markdown, HAML, etc.

But, by that logic, isn't it also strange to argue about making HTML's syntax more consistent if we should be using Markdown/HAML/etc to generate it anyway?

BTW, i do agree with you in that having a more consistent syntax is better than having a syntax that aims to save a few keystrokes at the expense of adding special rules. As a user, i find it more difficult to have to remember which cases are special than to read or write a more consistent syntax. I just don't see how your comment on Markdown/HAML helps the case for a simpler HTML syntax ;)

bhaak · on Feb 27, 2014

There is nowadays less need for writing HTML by hand. HTML is more often generated from other formats and things like Markdown, HAML, and other lightweight markup languages helped in that.

But in the end the output is HTML and having a consistent syntax makes it easier to generate, read, and debug it.

Its syntax doesn't need to be dumbed down for casual users because casual users have other options. In that sense I think my comment is in support for a better HTML syntax. :)

kalleboo · on Feb 27, 2014

The tradeoff is that allowing quirks like that means that either you need to make a massive specification to deal with each way someone might goof in their code (edit: which is what HTML5 tries to do), or you end up with each browser engine reacting differently.

The main push behind stricter document control is making it easier to make all the browsers render documents consistently.

Also, in the age of XHTML, manual document editing was seen as dead - tools like DreamWeaver were popular, and XSLT was touted as the answer to server-side templates. The web refuses to be anything but a pile of dirty hacks upon dirty hacks though, which while frustrating, may have a hand in it's popularity :)

zimbatm · on Feb 27, 2014

That's why HTML5 specifies the parser, so that every browser extract the same DOM tree from the same input. The specification is strict in the sense that any parser has to behave the same while also allowing for human error.

DougWebb · on Feb 27, 2014

... so that every browser extract the same DOM tree from the same input

That is, every browser whose engine has been updated for HTML5 and which also implements the parser specification correctly and without bugs. Which is most, but not all, of them.

My personal preference is to include the optional closings, because XHTML has been around a lot longer and therefore a larger proportion of browsers have been coded to handle it properly and have had more time to work out bugs.

I do like that HTML5 browsers can work around invalid markup in a well-specified way, which is much better than XHTML browsers just showing an error. It's the best of both worlds, especially when the browser's Developer Tools give you warnings about the invalid markup too so you don't need to use an external validator to find them.

talmand · on Feb 27, 2014

I'd rather the browser show the error then trying an educated guess at rendering the HTML soup that people/editors create.

mcv · on Feb 27, 2014

You've got a system with lots of exceptions and special behaviour. I don't see that as easier for humans at all. (The machine doesn't care; it parses anything you can put in rules. But the more complex the rules are, the harder it is for you to understand the error message. XML is really easier on humans.)

6cxs2hd6 · on Feb 27, 2014

You'd like s-expressions. As in Lisp.

    (ul (li "one") (li "two"))

Although I hadn't heard of the SHORTAG feature before:

    <strong>Hell yea</>

It's a slightly more verbose version of s-expression.

al2o3cr · on Feb 27, 2014

"It is plain wrong to make a standard easier for machine-parsing at the expense of humans who are typing it in."

No, it isn't - especially not when the intended use case is for literally every viewer of the document to use "machine-parsing" to read it, doubly so when a significant fraction of the users will actually BE machines...

keeperofdakeys · on Feb 27, 2014

The two main advantages were XML parsing performance, and the ability to embed XML directly in the XHTML. For phones of the era, the performance benefits are obvious. As for XML embedding, it'd give you the ability to embed SVG, MathML, and any other XML language directly. This avoids a second retrieval/parsing step, and allows extensibility without changing the XHTML spec.

joesb · on Feb 28, 2014

> LIs close before the next LI, or the UL.

Unless you have nested list.

userbinator · on Feb 27, 2014

I think it's excellent that HTML5 completely specifies the parsing in a very clear, and most backwards-compatible way; judging by what the big browser vendors have been doing, they seem to be following it. (It also gives a nice starting point that makes it easier for anyone to write their own parser, and have it behave the same as any other mainstream browser - and having the possibility of making more browsers available, with the same standard parsing behaviour, is a good thing.)

keeperofdakeys · on Feb 27, 2014

XHTML lost, we decided that we preferred tag-soup, and keeping our past documents readable. Besides that, most XHTML came with a HTML mimetype, which meant it wasn't being read as XHTML. So the best bits, lighter parses, and XML embedding, were never usable.

talmand · on Feb 27, 2014

I don't understand how there's much of any difference in readability between html and xhtml style of coding. Do closing tags really make it that much more difficult to read for some people?

keeperofdakeys · on Feb 27, 2014

There isn't a difference in readability, XHTML is just stricter. Much of the HTML that exists is invalid HTML/XHTML, and the XML parser used for XHTML would simply error out. Most XHTML pages were served as HTML, due to mimetypes being wrongly configured, so no-one ever noticed.

The XML parser was supposed to be faster, and allow any XML to be embedded in the XHTML (SVG, MathML, etc). This stuff was designed to change the shape of the web (especially since mobile phones weren't very powerful in those days).

chrisoverzero · on Feb 27, 2014

I think that poster means machine-readable. An XML parser will choke on almost any HTML document.

talmand · on Feb 27, 2014

I can see your point for the post I was responding to, but I've numerous posts throughout the page that seem to be debating over human readability as well.

vesinisa · on Feb 27, 2014

Apparently, you can still write polyglot documents that are both valid XHTML and HTML5.[1] But this requires closing void tags with <tag />.

[1] http://blog.whatwg.org/xhtml5-in-a-nutshell

icebraining · on Feb 27, 2014

I'm sure many at the W3C would have loved to develop only a new version of XHTML, but the problem is that it breaks retro-compatibility, and that's almost impossible to impose. Any browser to try that would see its lunch get eaten by its more permissive competition.

mcv · on Feb 27, 2014

And if they were really going to break the standard like that, at least they could have broken it in such a way that it fixed all the stupid legacy decisions.

Have it support <img>alt text</img> and <meta>content</meta>, as well as the old way, and then the developers can decide if they want to support legacy browsers. (They probably do, but at least we're looking at a future where html will be just a tad cleaner and more consistent.)

praptak · on Feb 27, 2014

Out of curiosity - what is the correct way to write the BR tag in XHTML5?

porges · on Feb 27, 2014

<br/> - you might like to read the guide for 'polyglot' markup [1] that is both valid HTML & XHTML

[1]: http://dev.w3.org/html5/html-polyglot/#empty-elements

eponeponepon · on Feb 27, 2014

> We need 5 differnt types of markup, when one would have been fine?

...and we already had SGML in the first place... :)

protonfish · on Feb 27, 2014

Indenting HTML is a terrible practice. HTML is not a programming language - it is document markup. Source files should read like a line-wrapped text document sprinkled with embedded tags. Let your editor keep track of open/close pairs with highlighting (the way most already do.)

recursive · on Feb 27, 2014

Have you ever maintained HTML? It doesn't sound like it.

protonfish · on Feb 28, 2014

That's what I do professionally for over 10 years. I know it sounds unorthodox, but I have have plenty of first hand experience that it works.

saraid216 · on Feb 27, 2014

People indent text files for reasons other than "I am writing in a programming language."

For example, take an English class. Any. English. Class.

alkonaut · on Feb 27, 2014

When xhtml came to replace html4 it was such a huge relief for all OCD developers, and I thought I had seen the last non-xml compliant web page. Now I'm encouraged to write tag soup again because void elements? Humbug.

andybak · on Feb 27, 2014

How does 'Tag soup' follow from 'let's not force pedantic XML parsing rules on people'?

HTML5 parsing is clearly defined and in most cases quite sensible. I think it was an excellent compromise.

alkonaut · on Feb 27, 2014

It's the html5 standard that is complex and pedantic, it breaks silently when you violate one of hundreds of rules (e.g lists of void elements that can't be closed).

XML is simple. Sure it's pedantic in the sense that it breaks, but html5 breaks too only subtly.

It's like the difference between java and JavaScript. Java isn't more "pedantic" than JS in ANY way, it just breaks in a more understandable way (break loudly, early and understandably is in my view "better").

mattmanser · on Feb 27, 2014

You say pedantic, while others might say simple, easy to remember, easy to error check and impossible to get wrong.

iSnow · on Feb 27, 2014

Which probably goes to show that most developers are not afflicted with OCD but would rather have a more lenient spec. After all, XHTML2, which is even more strict, sold like hot cakes...

alkonaut · on Feb 27, 2014

Don't confuse the term lenient to mean "pedantic but with very silent failures". The failures caused by forgetting to close tags in html5 are often catastrophic, which is why it isn't "lenient".

If html5 fails on some seemingly valid input (e.g. makes a strange layout when you self-close a div-tag) then it isn't lenient, it's still pedantic. It's just as pedantic as an xml standard is about closing tags, only that the specification for closing tags is dozens of pages instead of three words.

In fact, I think most developers agree that an error message would be preferable to a corrupt layout in the case of the self-cosed div.

pavpanchekha · on Feb 27, 2014

I think the author's recommendations at the end, on making <meta> and <img> and <script> more sane, are good examples of where the "implement then standardize" process that the W3C uses falls down. In fact, XHTML2 (which was never implemented) had some good ideas. On the other hand, as we've seen so many times, implement then standardize reduces foot-dragging and needless bike-shedding. You take the good with the bad, I guess.

ghayes · on Feb 27, 2014

I've been burned before by using <script src="..." /> and assuming it would work in all browsers. Instead, it subsumed later tags in a horrible way. I've never used empty-elements in HTML since.

mathias · on Feb 27, 2014

`<script src="foo" />` only works the way you’d expect it to in XHTML. Proper XHTML, that is — served with the correct `Content-Type` header. http://mathiasbynens.be/notes/xhtml5

bsimpson · on Feb 27, 2014

Chrome explicitly stopped supporting void syntax on the script tag to encourage people to use </script> so IE wouldn't die when it saw a <script />.

pbhjpbhj · on Feb 27, 2014

That's messed up - really? Chrome covering up IE's mistakes??

dmethvin · on Feb 27, 2014

Read the article, there's no way to specify "optional closing tag depending on whether a `src` property is present" so therefore it's manditory. You can, of course, write the parser to do it, but there isn't a way to express it in the HTML grammar.

I always have wished that the script inside the tag would be executed if the `src` couldn't be loaded, which is something John Resig suggested years ago [1].

[1] http://ejohn.org/blog/degrading-script-tags/

gpvos · on Feb 27, 2014

I think it's awesome, actually. Good neighbourship. Not that they should do this all the time (there would be no end to it), but for the most egregious problems, yes, why not?

stormbrew · on Feb 27, 2014

XHTML2 was largely fantastic, imo, and would have been an excellent successor to html. If it had been what started out the XHTML process I think it would have been more successful, but XHTML1 was such a foot-in-both-worlds mess that it needed to be put out of its misery.

bzbarsky · on Feb 27, 2014

The big problem with XHTML2 was that it was designed by people who hated HTML. So they went and made it purposefully incompatible with HTML and XHTML1 in various ways (e.g. tags with the same localName and in the same namespace were supposed to have different behavior).

That made it impossible for a browser to implement both XHTML2 and XHTML1 at once (which was in fact the goal of some of the committee members). And then browsers were faced with the choice of implementing XHTML2 (no content at all out there) or XHTML1+HTML (lots of content out there) but not both, they picked the one you'd expect them to pick...

iSnow · on Feb 27, 2014

Actually hardly anyone wanted XHTML2, because it was a purely academic excercise in making established things harder (<a href="..." target="_blank") without compelling features.

I tried to use it but then completely reverted to HTML4. Thank god we have HTML5 now.

bct · on Feb 27, 2014

> making established things harder (<a href="..." target="_blank")

What are you referring to? That example would have worked identically.

Zarel · on Feb 27, 2014

It's invalid in XHTML2, as well as XHTML 1 Strict.

bct · on Feb 27, 2014

You mean the target attribute? It wasn't in XHTML 1 Strict or 1.1, but you're wrong about XHTML 2. http://www.w3.org/TR/xhtml2/mod-hyperAttributes.html#adef_hy...

dbbolton · on Feb 27, 2014

>Optionally, a "/" character, which may be present only if the element is a void element.

>There is absolutely no difference between <br> and <br />.

>Actually, one might argue that adding / to a void tag is an ignored syntax error.

>every browser and parser should not handle <br> and <br /> any differently

If it's optional and has absolutely no effect and makes no difference, how exactly would one argue that it's an error?

To me, this is like saying `print ${SHELL}` is erroneous because the braces don't do anything and `print $SHELL` does exactly the same thing. It may be superfluous, but it's not erroneous.

rimantas · on Feb 27, 2014

It is erroneous. It only makes no difference because the error is ignored (or rather rendering is wrong). In HTML properly rendered <br/> would produce extra ">" on each occurrence. IIRC there was a browser (some reference implementation) that did this correctly. Also, I remember Gecko used to flag these slashes in source view too.

Ideka · on Feb 27, 2014

I'm sure the parent was talking about HTML5. It isn't erroneous in HTML5.

the_mitsuhiko · on Feb 27, 2014

HTML5 has a list of tags for which a trailing solidus is not producing an error. It's still not something the parser ever uses for anything besides sometimes producing an error. "<br/>" is tolerated whereas "<script/>" for instance is a parsing error.

userbinator · on Feb 27, 2014

I noticed the same thing and made a more general observation about "errors" here: https://news.ycombinator.com/item?id=7311197

ivanca · on Feb 27, 2014

Is erroneous because one of the stated goals of HTML5 is semantic value, and under any line of logic you cannot close something that isn't open, therefore is an error, albeit not (yet?) a technical one.

muyuu · on Feb 27, 2014

I'm sorry but saying a discussion is over because "Google says so in their style guide" is contemptible.

I still think empty elements make more sense and a proper reformulation of XHTML(5) is the way it should have been done since the beginning.

enyo · on Feb 27, 2014

Yes it is! Who said that? (Or was that all you got out of the article?)

muyuu · on Feb 27, 2014

"Well, for those of you who are really addicted to X(HT)ML, you might think, «yeah, it's optional, but <br /> is still 'more correct'», but I have to tell you: it is not. Actually, one might argue that adding / to a void tag is an ignored syntax error. The possibility to write it has mostly been added for compatibility reasons and every browser and parser should not handle <br> and <br /> any differently.

Google's styleguide on that subject is also very clear that you should indeed not close void tags."

nollidge · on Feb 27, 2014

At what point did they say that was supposed to be the end of the discussion?

skeletonjelly · on Feb 27, 2014

It's written as the last line of that section/paragraph. It's essentially a closing argument. The reader can be drawn in no other direction. If they says "interestingly" or such that it was obviously a small paragraph then maybe I'd not draw this conclusion.

nzp · on Feb 27, 2014

> Google's styleguide on that subject is also very clear that you should indeed not close void tags.

Only because it results in smaller files. For example it also recommends omitting optional tags for the same reason. I'm really skeptical that omitting these things helps readability (if that's what the guide is referring to when it says "scannability"). If size is at such a premium why not simply preprocess and minify HTML? Recently I tried briefly omitting "/>" from <br> and friends and I wasn't impressed as far as legibility goes. Maybe I just didn't try hard enough... :)

mathias · on Feb 27, 2014

From the article:

“It is not, and has never been, valid HTML to write `<br></br>`.”

Sure, but note that it is perfectly valid XHTML (which is a form of HTML).

Oh, and `<script src="foo" />` actually works the way you’d expect it to in XHTML.

Don’t use XHTML though.

__david__ · on Feb 27, 2014

> Don’t use XHTML though.

Unless you want to combine it with SVG for a hybrid site [1].

[1] view source on http://emacsformacosx.com

TheZenPsycho · on Feb 27, 2014

not necessary. html5 permits inline svg and in <img src=>. And unlike the XML hybrid document stuff, it actually works in real browsers.

__david__ · on Feb 27, 2014

> And unlike the XML hybrid document stuff, it actually works in real browsers.

You are saying that XML hybrid document stuff doesn't work in "real" browsers? Do IE, Firefox, Safari, Chrome and Opera not count as "real"?

TheZenPsycho · on Feb 27, 2014

they count as real. and if that counts as working, then they don't work, full stop.

__david__ · on Feb 27, 2014

What? Sorry, I'm not parsing that at all.

Just to be clear, I'm saying the hybrid XHTML/SVG works just fine in all major browsers.

TheZenPsycho · on Feb 28, 2014

and I'm saying: if THAT is what you call "works just fine", then the whole concept of a hybrid document is broken and it's great that we've abandoned it.

__david__ · on Feb 28, 2014

Why? How is it different from your suggestion of inlining SVG in an HTML5 document?

TheZenPsycho · on Feb 28, 2014

xhtml is the nazi party of the web. obsessed with purity, unwilling to mix with the unclean documents, and soundly defeated by history.

That's right, I just Godwin'd you.

__david__ · on March 1, 2014

So really, you have no argument, except for a distaste of XHTML. Gotcha.

It seems better to mix that with SVG than HTML5, since SVG happens to also be XML based. Otherwise you have a impedance mismatch leading to some weird corner cases.

TheZenPsycho · on March 2, 2014

my argument is that you might be able to make it work through a lot of effort but you can't actually use it in the real world so it's worse than worthless.

stan_rogers · on Feb 28, 2014

XHTML is not a form of HTML; it's a dialect of XML that bears a strong surface resemblance (but only a surface resemblance) to HTML.

robin_reala · on Feb 27, 2014

Know why everyone writes <br /> instead of <br/>? IE5 on the Mac’s parser broke it if found an empty tag without a space before the closing slash. Funny how software can vanish into the mists of time yet still have an effect on current coding.

skakri · on Feb 27, 2014

HTML and XHTML 'compatibility' for older versions of IE. IE6 couldn't parse application/html+xml MIME type and broke layouts. <br /> fixed that issue.

robin_reala · on Feb 27, 2014

Add IE7 and IE8 to the list of browsers that don’t understand XHTML.

eik3_de · on Feb 27, 2014

If you like to keep it terse, it's perfectly fine not to quote attribute values. The HTML5 spec[1] says:

The attribute value can remain unquoted if it doesn't contain space characters or any of " ' ` = < >

[1] http://www.w3.org/TR/html5/introduction.html#a-quick-introdu...

rimantas · on Feb 27, 2014

This is also true for previous versions of HTML, as well as omitting some optional end tags.

mathias · on Feb 27, 2014

I used the SGML NET trick a few years back in an attempt to create the shortest possible valid HTML documents for different versions of HTML: http://mathiasbynens.be/notes/minimal-html

Note: “valid” here is defined as “theoretically valid as per the relevant spec” and doesn’t reflect what browsers actually support(ed).

lkrubner · on Feb 27, 2014

This is from Ian Hickson in 2006, regarding the emergence of HTML5:

"Regarding your original suggestion: based on the arguments presented by the various people taking part in this discussion, I’ve now updated the specification to allow “/” characters at the end of void elements."

To which Sam Ruby responded:

"This is big. PHP’s nl2br function is now HTML5 compliant. WordPress won’t have to completely convert to HTML4 before people who wish to author documents targeting HTML5 can do so using this software. Such efforts can now afford to proceed much more incrementally. This is much more sensible and practical possibility."

http://www.intertwingly.net/blog/2006/12/01/The-White-Pebble

Remember that both men played fundamental roles in shaping HTML5. And I think this one sentence sums up the mindset that shaped HTML5:

"The truth is that most HTML is authored by pagans."

and this was Sam Ruby's view at the time:

"When all the religion was stripped away from the trailing slash in always-empty HTML elements discussion, only one question remained: I think basically the argument is “it would help people” and the counter argument is “it would confuse people”. This is a eminently sane way to approach discussions such as these. I would argue that it would both help people and reduce confusion if a void <a/> element continued to be invalid HTML5 and, by implication, be invalid in XHTML5. By invalid, I simply mean that a parse error would be reported by a conformance checker whenever such constructs are found in a document. Non-draconian user agents can, of course, chose to recover from this error."

People with real lives have perhaps missed the sad slow way that the argument for XML on the Web, and therefore XHTML, has imploded. But the sad souls (such as me) who have followed this story are aware that the case against XHTML has developed slowly over the years.

The first salvo against XML on the web was launched by Mark Pilgrim way back in 2004. This is when the mania for XML was at its peak (before JSON had appeared), a time when people felt XML/XPATH would eventually replace SQL and RDBMS (an idea promoted by no less an authority than Sir Timothy Berners-Lee, who, at that time, could make a believable case that RDF was the future of the Web).

This is Pilgrims article "XML on the Web has Failed":

http://www.xml.com/pub/a/2004/07/21/dive.html

an excerpt:

"There are things called "transcoding proxies," used by ISPs and large organizations in Japan and Russia and other countries. A transcoding proxy will automatically convert text documents from one character encoding to another. If a feed is served as text/xml, the proxy treats it like any other text document, and transcodes it. It does this strictly at the HTTP level: it gets the current encoding from the HTTP headers, transcodes the document byte for byte, sets the charset parameter in the HTTP headers, and sends the document on its way. It never looks inside the document, so it doesn't know anything about this secret place inside the document where XML just happens to store encoding information. So there's a good reason, but this means that in some cases -- such as feeds served as text/xml -- the encoding attribute in the XML document is completely ignored."

The article we are talking about "To close or not to close" states:

"XHTML is basically the same as HTML but based on XML."

This is stated as a fact, but in fact many people have made the argument that XHTML never full functioned as XML, partly for the reasons that Pilgrim talks about, but also because only the strict versions of XHTML ever triggered the strict draconian error handling that has always been part of XML. However, there are other ways where XHTML was difficult to treat the same as XML. For instance:

No more "XML parsing failed" errors

http://intertwingly.net/blog/2011/10/03/No-more-XML-parsing-...

an excerpt:

"Note that the reason to do this is to deal with bad browser sniffing where sites send HTML/XHTML markup meant to be served as text/html as application/xhtml+xml, application/xml or text/xml only to Opera, which causes Opera to encounter an XML parse error that breaks the site for Opera."

Sam Ruby is a co-chair of the W3C's HTML Working Group, and if you've read his blog over the years, you are aware of the many problems that arise when treating XHTML as XML.

Some of the debates that have happened over the years simply reveal how much reality differs from the specs:

"HTML charset vs XML encoding"

http://www.intertwingly.net/blog/2004/02/13/HTML-charset-vs-...

If it was easy to develop a version of HTML that truly acted as a form of XML, would such debates have been necessary?

Please understand me: I am not criticizing all of the intelligent people who worked very hard on the specs for HTML and XML and XHTML. I am pointing out that after 15 years of effort, no one has found an easy way to treat XHTML as a form of XML under all circumstances. Surely if the brightest minds in the tech industry fail to make this work after 15 years, this is a circle that can not be squared?

Consider the fact that companies like Google felt they had no choice but to ignore the mime type "application/xhtml+xml":

Google Hates XHTML?

http://www.intertwingly.net/blog/2007/03/15/Confirmed-Google...

Sam Ruby also makes clear that the concessions to an XML style, including closing void elements, were thought of as an effort to ease the transition:

"I believe that if those that had created XHTML had the courage of their convictions, both Google and Microsoft would have had no choice. I also believe that there should have been a maintenance release or two of HTML4. In HTML5, the root element MAY have an xmlns attribute, but only if it matches the one defined by XHTML; and void elements may have terminating slash characters in their start element. It is these small touches that make transition easier."

Also, in another blog post Sam Ruby makes the point that the draconian error checking that is mandatory for XML also makes it impossible to develop those technologies that supporters of XML were excited about. He gave the example of sending an SVG image to his daughter, and her wanting to post it to her MySpace page: but SVG is XML, and so it should not render on a malformed page, and MySpace was permanently malformed. Sam Ruby could send a gif or a jpeg to his daughter, and she could post that, without a problem, to MySpace, but SVG was limited to well-formed, correctly served pages -- in a world where few pages are well-formed and correctly served. See the comments here:

http://www.intertwingly.net/blog/2006/11/24/Feedback-on-XHTM...

Also, if you have the time, see the debate here between Sam Ruby and Henri Sivonen:

http://intertwingly.net/blog/2012/11/09/In-defence-of-Polygl...

I feel that debate reveals much of the thinking that lead to HTML5 being so much more accepting than XHTML was.

Also, if you have a lot of time, this post from 2009, and the debate in the comments, will teach you a lot about the thinking that shaped HTML5:

http://intertwingly.net/blog/2009/04/08/HTML-Reunification

Finally, in a post I can not find, Sam Ruby makes the point that, for some strange reason, people seemed to very much want something called XHTML, even though it would not be able to act like real XML, for all the reasons that had been discussed in thousands of blog posts and chat rooms. He seemed puzzled by it.

Anyone who advocates for XHTML needs to think long and hard about what it is, exactly, that they are advocating for. If you want an HTML that has an XML style, can you say why?

bct · on Feb 27, 2014

> If you want an HTML that has an XML style, can you say why?

Because I think that section 12.2 of the current HTML specification is outrageous. (The section is "Parsing HTML documents", if anyone is not familiar with it make sure to look at the subsections "Tokenization", "Tree Construction", etc.)

(That said, I appreciate your detailed comment; this is important history that too few people are aware of.)

(Also overenthusiasm for all things XML had nothing to do with RDF. RDF is not XML.)

rjd · on Feb 28, 2014

I do it for a simple reason, layout clean up with auto-indent. I've found HTML layout cleanup to be unreliable in most editors. Where as XML layout works 99% of the time.

skywhopper · on Feb 27, 2014

HTML5 is a huge improvement over the HTML4.01/XHTML madness that was going on back in the day. And it's fine with me to allow non-closed singleton tags.

There's perhaps no strong logical argument either way, but from a style perspective, I prefer to use closing slashes to make it absolutely clear what's going on.

alkonaut · on Feb 27, 2014

Allowing the void tags to be unclosed is the lesser evil of the two, I can even accept the argument behind it (they can't have content) even though it complicates the syntax.

The really evil one is to not make <div /> be exactly equivalent to <div></div> which is just batshit crazy. When I want a placeholder tag (to be populated later) I have to write <div></div> which feels completely unnatural,

talmand · on Feb 27, 2014

Seems simple to me, a container element should always have opening and closing tags. An element that will never contain anything is self-closing.

But I admit, I suppose it could only be simple to me.

yashg · on Feb 27, 2014

Now these are the kind of articles that I like to read on HN. A very detailed analysis about a single aspect of programming.

billmalarky · on Feb 27, 2014

Not to mention it's a style choice I've struggled with back and forth for years. So simple, yet I can't decide which one to use.

To close or not to close...

dools · on Feb 27, 2014

There is an advantage to writing your HTML as well formed XML, and that's being able to parse it as XML if you want to. There's no disadvantage to writing your HTML as well formed XML.

Why wouldn't you do it?

pornel · on Feb 27, 2014

The polyglot syntax gets weird in CDATA elements and you have to add a bunch of talismans to the code.

If you don't want to accidentally break it you shouldn't be writing XML by hand or gluing it from strings (https://hsivonen.fi/producing-xml/), so you need to output only using polyglot-compatible XML+HTML serializer.

That's a lot of work for case when maybe somebody will parse your markup as XML? All bots support HTML.

userbinator · on Feb 27, 2014

I think that now with HTML5 standardising the parsing behaviour ( http://www.w3.org/TR/html5/syntax.html ), looking at that is very useful too - it shows that void elements get closed automatically by the parser whether or not "/" is included, some other extraneous end tags get ignored completely, and also shows that "</br>" gets parsed as "<br>". So the example given in the article, "<br>Hello!</br>", does have a defined meaning in HTML5 - equivalent to "<br>Hello!<br>".

enyo · on Feb 27, 2014

I'm talking about the specs in the article (not how browsers interpret errors). So </br> may be interpreted as <br> but is actually a syntax error. I quoted the HTML5 specification in the VALIDITY section of the article.

userbinator · on Feb 27, 2014

The fact that HTML5 basically completely specified the parsing for any string of input, even "syntax error" cases, raises an interesting point: if these errors still result in some DOM and across all browsers that choose to implement the error handling (which has also been standardised) so they will have the same behaviour, are they really true "errors" anymore? We usually think of error cases (e.g. in a programming language) as ones which have no meaning or could cause implementation-defined/undefined behaviour, but these have been completely defined by the standard.

I don't see any good reason to use "</br>", but there's some other cases that could be useful, like not requiring spaces between quoted attributes (name1='value1'name2="value2"). I see a parallel with this and the evolution of natural languages: words and syntax that used to be incorrect gradually become accepted as part of the language and attain a normative meaning, because everyone still understands.

enscr · on Feb 27, 2014

TL;DR : "Google's styleguide on that subject is also very clear that you should indeed not close void tags"

P.S. The article is very well written.

btbuildem · on Feb 27, 2014

Thanks, that should have been the first line after the introductory paragraph..

garethadams · on Feb 27, 2014

The take-away from the article should be "…because now I understand the issues", and not "…because Google says so"

enscr · on Feb 27, 2014

Google also says so...

NKCSS · on Feb 27, 2014

Very good article. Been doing most of my web development in the .NET area, starting with ASP.NET and the strict XHTML, I've picked up the habit to always write the /> variant, so it's nice to read about which one to use in the HTML5 age :)

brunnsbe · on Feb 27, 2014

Nice write-up! I have never thought about shrinking the closing tags to </>, if it were supported it would shrink large HTML-pages quite nicely. Has there been a proposal at W3C to use that kind of a format back in the good old days of HTML 1.0?

userbinator · on Feb 27, 2014

I don't know about the historical aspect but I do know that the HTML5 parsing spec explicitly ignores the "</>" sequence. More interestingly, "</ >" (with an extra space) is parsed as a "boguscomment" which means it basically adds a comment node.

Destitute · on Feb 27, 2014

I feel so much better that I don't have to both typing out <br /> anymore after running my HTML through a validator when I first began serious HTML coding (self-taught). It was a habit that stuck with me and is by far one of the most difficult, finger-stretching pieces of code to write. Nowadays I don't even have too much use for breaks, but it's going to be a relief to just throw a <br> ... Ahhh that was so easy to type.

granttimmerman · on Feb 27, 2014

Always Be Closing. Otherwise, say hello HTML preprocessors.

ars · on Feb 27, 2014

I thought this was going to about closing LI, TR, TD, and TH, OPTION.

All of them are optional to close, and everyone seems to differ on if you should close them.

bsimpson · on Feb 27, 2014

I've often wondered about why /> syntax doesn't work on some elements. Now I know.

Thanks!

jamesbritt · on Feb 27, 2014

Quite interesting, especially the stuff about the SGML arcania.

One quibble: The conflation of tag and element in the article, making it hard to understand just what was meant.

For example, what is "tag content"?

ed_blackburn · on Feb 27, 2014

I'm looking into polygot markup for an api at the moment. This makes lots of sense and explains the idiosyncrasies that easily confuse me.

rvkennedy · on Feb 27, 2014

HTML would be much easier to write if it were based on JSON. Less bandwidth too. Has this been attempted?

TazeTSchnitzel · on Feb 27, 2014

Are you kidding me?

Currently:

  <ul>
    <li>Hello!</li>
  </ul>

By your suggestion:

  {
    "tagName": "ul",
    "children": [
      {
        "tagName": "li",
        "children": [
          "Hello!"
        ]
      }
    ]
  }

(and, yes, it has been attempted... JSON.stringify(document.body))

bzbarsky · on Feb 27, 2014

> JSON.stringify(document.body)

One of two things will happen, depending on your browser.

If your browser is following the WebIDL spec, so all the accessors are on the prototype, this will produce "{}".

If your browser is WebKit-based, this will throw an exception, because body.firstChild.parentNode == body and JSON.stringify throws on object graphs with loops.

TazeTSchnitzel · on Feb 27, 2014

Aw, damn. I didn't realise it was all fake properties, I thought that would produce something.

bzbarsky · on Feb 28, 2014

Nothing fake about accessor properties. Means you can lazily generate the string for .innerHTML and such, though!

talmand · on Feb 27, 2014

I don't think your example needs to be quite so verbose.

  {
    'ul': {
      'li': 'hello!'
    }
  }

I would think it would depend upon the parser.

Regardless, I'd still rather write out HTML instead of JSON for markup.

bct · on Feb 27, 2014

That breaks down as soon as you have something with attributes, and it breaks down even further once you have something like:

  <p>Example <b>text</b></p>

talmand · on Feb 27, 2014

You see, this is why I don't create standards.

I'm going to go with that JSON is the failure point with my grand vision.

nobleach · on Feb 27, 2014

Not sure your example would work in a real world situation. The UL would probably have an array attached as it can have multiple children. Then we get into the fact that all tags can have attributes, not just a text node.

talmand · on Feb 27, 2014

You're right, I didn't consider attributes. In my simplified way the parser would need to know which keywords were attributes based on parent element versus keywords that are just new children elements. Which would defeat the purpose.

I guess I'm not changing the world today.

nobleach · on Feb 27, 2014

There's always tomorrow!

talmand · on Feb 27, 2014

Nah, I need time to recover from this dismal failure and to reflect. Maybe next Tuesday.

Shorel · on Feb 27, 2014

LISP:

  (:ul (:li Hello))

Shorel · on Feb 27, 2014

Nah.

It would be much easier as a Lisp S-EXP.

And yes, it has been attempted.

__david__ · on Feb 27, 2014

http://jsml.org

Siecje · on Feb 27, 2014

If you can use the text portion of meta and other tags. Which void tags are actually required?

rocky5 · on Feb 27, 2014

Well researched article.

kirbyk · on Feb 28, 2014

Fantastic article. And god I love this website's design.

vixen99 · on Feb 27, 2014

"has its disadvantages as well". Its!

evincarofautumn · on Feb 27, 2014

What are you saying? “Its” is the possessive of “it”. “It’s” is a contraction of “it is” or “it has”.

enyo · on Feb 27, 2014

It was "it's" before.

evincarofautumn · on Feb 27, 2014

Man, I can’t keep up with all this mutation.

enyo · on Feb 27, 2014

Thanks. Corrected.

steffex · on Feb 27, 2014

nice article, i totally agree with the suggestions.

sippeangelo · on Feb 27, 2014

SHORTTAG NETENABL IMMEDNET

WHTDOES ITEVN MEAN

Reminds me of PHP's T_PAAMAYIM_NEKUDOTAYIM

icantthinkofone · on Feb 27, 2014

All of this has been clearly outlined in the spec for decades and many articles have been written over the years talking about this same issue. Why this is a problem for any professional developer, I just don't have a clue.

goggles99 · on Feb 27, 2014

TL,DR... Do not close.

huhtenberg · on Feb 27, 2014

I appreciate the amount of research that went into it, but in reality this all falls squarely into domain of pedantry, because you close void tags either way and move on to more important matters.

enyo · on Feb 27, 2014

How to close void tags is more of a leitmotif to learn more about the whole subject, and the reason for investigating it. If you're not interested in understanding the core features of the markup language you're using, then this article is definitely not for you.

ars · on Feb 27, 2014

> because you close void tags either way

No you don't. You don't close void tags because it does absolutely nothing.

It's like adding HTML comments around javascript code. You haven't needed that in a decade, yet some people still do it.