Hacker News new | past | comments | ask | show | jobs | submit login

Note that closing </p> tags are optional, so one can be an HTML purist and still write a decent HTML document with a relatively clean markup like this:

    <!DOCTYPE html>
    <html lang="en">
    <title>Lorem Ipsum</title>
    <h1>Lorem Ipsum</h1>
    Lorem ipsum dolor sit amet, consectetur adipiscing elit.
    Duis id maximus tortor. Sed nisi ante, fermentum vel nunc
    et, tincidunt sagittis magna. In ultrices commodo lacus, id
    tristique ipsum euismod laoreet.
    Maecenas at neque posuere, aliquet erat at, vehicula est.
    Duis aliquet elit et arcu laoreet, id pulvinar eros pretium.
    Quisque consectetur, enim semper facilisis feugiat, velit
    sapien semper arcu, eu mollis libero est et odio.
    Curabitur fringilla interdum ante vel ultricies. Mauris
    volutpat nisi sed turpis elementum elementum. Mauris nec
    eleifend lorem. Sed ac vulputate libero.
A valid HTML5 document does not require explicit <head>, <body>, or the closing </p>, </html> tags. See the spec for optional tags at https://html.spec.whatwg.org/multipage/syntax.html#optional-... for more details. Similarly, the markup for lists and tables can be cleaned up too because the closing </li>, </tr>, </th>, </td> tags are optional.

Note that the opening <html> tag is optional too but I retained it in the above example to specify the lang attribute otherwise the W3 markup validator warns, "Consider adding a lang attribute to the html start tag to declare the language of this document."

† These tags are optional provided certain conditions are met. See the spec for full details. In practice, one rarely has to worry about these conditions.

And since we’re talking about optional things:

  <link rel="stylesheet" href="/style.css" />
  <meta name="viewport" content="initial-scale = 1.0,maximum-scale = 1.0" />
Trailing slashes on HTML tags are useless. They’re allowed on void elements, for XML compatibility, but are by definition simply ignored. I recommend against including them, because they’re simple visual noise, and misleading because they don’t actually close tags—you can only use them on on elements that are defined as having no children.

(Note that I say HTML tags; on foreign elements—meaning inline SVG and MathML—trailing slashes do make tags self-closing, XML-style.)

Also since I’m writing, that viewport declaration is wonky. It should have device-width, and it should not have maximum-scale which is user-unfriendly.

All up:

  <link rel="stylesheet" href="/style.css">
  <meta name="viewport" content="device-width,initial-scale=1">
And for completeness, https://html.spec.whatwg.org/multipage/syntax.html#elements-... defines void and foreign elements.

They're optional in the same way that braces around single-statement "if" clauses are optional in most curly brace languages—and indentation, for that matter—i.e. they still serve a purpose for many humans who are going to be tasked with upkeep and will choose to include them for personal reasons. Not every decision is rooted in trying to satisfy the machine. But speaking of machines, on that note...

If you ever have to write any tooling for HTML processing, you'll realize that they can be useful for machines, too. If your team doesn't make use of any of these "features" that trigger corner cases in the spec, then you can adopt an XML-like parsing strategy (where these aren't optional) and your parser can be simpler than if you were to implement the entirety of the HTML5 parsing algorithm. You don't need any notion of void elements, and your parser doesn't need hardcoded lists of which elements are among them. You can write a "dumb" parser that can derive the node structure without needing any intimate knowledge of HTML. It's like the difference between parsing S-expressions and parsing an ALGOL-like language.

I do think there’s an important difference from the optionality of curly braces on if statements in many languages: syntax highlighting will normally make it very obvious what is attribute name and what is attribute value, so that any error will be obvious, more obvious than the probable incorrectness of something like `if (a) b; c;` on one line.

Your point on HTML processing is a nice idea, but quite useless in practice for HTML. XHTML failed: people didn’t want to go to the effort of getting it all rigorously correct; they rather wanted the browser to guess what they meant, because it got their intent right most of the time, and now they could forget about various details like tbody elements and trailing slashes. I regularly look at page sources, and I regularly encounter people putting these trailing slashes on various void element; but I can’t remember when I last found a page that actually applied that consistently—invariably they have at least one void element without a trailing slash. My conclusion is that the whole thing is misguided. I would be curious to see the result of attempting to parse all the HTML in something like Common Crawl with an XML parser. I suspect that barely any pages would succeed.

The fact of the matter is that the HTML parsing algorithm is well-defined and very nuanced, so if you’re processing HTML you should use a real HTML parser, and to do anything else is folly—unless you are assiduous about maintaining XML correctness, which you can do, but it’ll be even more of a footgun for others than omitting quotes on attribute values. But if you’re writing something new, then by all means, strongly consider the comparatively simple and principled XML philosophy over the organic and complicated philosophies of the HTML serialisation.

This comment comes off as very, very, very... weird. A little condescension and a lot of pretending that the goalposts are there when before they were here (and still are, too); almost no "situational awareness" or acknowledgement of the actual context, constraints, and motivation behind making a choice about what to do. Syntax highlighting (esp. for attributes[???]) has _nothing_ to do with what we're talking about (trailing slashes, void elements), for example—and it gets worse from there.

The logical coherence of this response is in "not even wrong" territory.


Frontend developer here --

Although you could omit the closing tags, I don't see the benefit of doing so. If you know HTML, nesting is fundamental and not explicitly closing dom nodes would lead to confusion. You would also need to concern yourself with the "certain conditions" that must be met for it to work. Consistency and clarity over brevity!

I am not a frontend developer. I agree with you. I write my blog posts with handwritten HTML because that is how I began writing blog posts many years ago when Markdown was not as popular as it is now. Indeed I never omit any optional tags while writing my blog posts or blog layout.

I am not necessarily recommending that one should omit the optional tags. However, it is worth noting that the option to do so while conforming to the HTML5 spec is there. The "certain conditions" are not really much to worry about. I think they are drafted quite carefully and are quite sensible. If one is writing simple HTML documents, say, for blog posts, text-based articles, etc. one can safely omit the optional tags without running into issues due to the "certain conditions".

> Indeed I never omit any optional tags while writing my blog posts or blog layout.

Do you type <tbody> every time you write a table? That is an optional implicit tag that can be left out just like <html>, <head>, and <body>.

I don't type <thead> and <tbody>. I believe that's an exception to the practice of never omitting optional tags. Maybe there are a few more exceptions like that but none that I can remember right now. Thanks for posting this comment. It made me realize that my previous claim was inaccurate.

I write plenty of HTML by hand, for myself. I prefer to omit things like </p>, </li>, </td> and </tr>, because it takes less effort (and I hate text editor plugins that automatically add closing delimiters of any form, because they always do the wrong thing a meaningful fraction of the time in a way that I have to think about, more than if I just type the delimiters myself, though an accurate “insert at the cursor whatever is needed to close the last thing” shortcut might be handy), and reduces visual noise.

I also normally omit quotes on attribute values if correct to do so.

I mostly do these things when working on things of my own, when I know no one else needs to worry about them. When working on things others will touch, I don’t drop quite as many closing tags, and will normally leave attribute values quoted.

> I hate text editor plugins that automatically add closing delimiters

Glad I'm not the only one. The less an editor does automatically to "help" me the better. I agree, a shortcut to close the last opening syntax element would be nice, but for this to work properly the editor needs to be aware of how all the syntax elements interact, e.g. to differentiate between '<' used in a logical comparison and the same symbol as start of an XML tag. Or to correctly close an XML tag no matter if it has attributes.

I have a shortcut in Vim for three different kinds of brackets but it's a bit janky. Still better than the automatic stuff the typical IDE and modern editor does without asking, though.

> You would also need to concern yourself with the "certain conditions" that must be met for it to work.

You have to concern yourself with them anyway. If you do something that automatically closes an element, it's automatically closed at that point whether you put a close tag somewhere later on (that will be ignored) or not. This is like semicolon insertion in JS: the fact you're using semicolons does not mean you can ignore the rules for how they're inserted.

Especially with HTML as it's going to get minimized and hacked up to reduce the filesize. Including the closing tags makes the transpilers job easier and less error prone.

A transpiler that gets confused when optional tags are missing (a feature explicitly allowed by the spec) is a broken transpiler and it needs to be fixed. This is like the automatic semicolon insertion of JavaScript debate[1][2] all over again. These things are spelled out in the standards and tools that do not adhere to the standards are broken.

[1] https://web.archive.org/web/20201206065632/http://inimino.or...

[2] https://blog.izs.me/2010/12/an-open-letter-to-javascript-lea...

Yeah, fine. The tool is broken. Right.

But YOU have still an issue.

The Point is: There a lot of broken tools out there, and you can't know which of them will be used in the future. Just avoid a lot of headaches for your future self and your colleges by not testing out the spec-compliance of all those tools you'll probably use at some point.

> But YOU have still an issue.

I disagree.

> There a lot of broken tools out there

A tool that incorrectly handles optional tags may handle other parts of the spec incorrectly too. Such a tool may provide incorrect results for even perfectly well-written HTML. There is no know what it takes to make all the broken parsers out there happy.

I know you made a point about ETL tools[1] where XML parsers are used to parse HTML but there is no way to cater to such absurd use cases anyway. Using an XML parser to parse HTML5 is not going to work correctly anyway even if you do retain the optional tags because it would fail on other HTML5 tags that do not have closing tags such as <meta>, <link>, <img>, etc., empty attributes like <input disabled>, <input required>, etc. Web developers from all around the world are not going to start writing self-closing <img /> tags just because these broken ETL tools have decided to use an XML parser to parse HTML5.

There are plenty of good HTML5 parsers out there for almost every mainstream programming language. Just use them.

[1] https://news.ycombinator.com/item?id=25708209

Most of the "plenty of good HTML5 parsers out there" are broken. No wonder as the spec is nuts. (It took years before there was even a correctly working validator).

Also I was explicitly talking about XML compatible HTML. It's called so because it's XML compatible.

Btw, have you ever seen HTML in the web browser dev tools? Guess why it shows always the "optional" tags. ;-)

> Most of the "plenty of good HTML5 parsers out there" are broken.

Can you name a few popular and widely used HTML5 parsers that are broken and tell us what the bugs are in those parsers? I would be surprised if you can find or name even two such parsers that are popular but cannot handle optional tags correctly as required by the spec.

> Also I was explicitly talking about XML compatible HTML.

There is no such thing as XML compatible HTML (unless you mean XHTML which we are not discussing here). Maybe you mean XML-serialized HTML5. I can only guess since the terminology you are using is vague and unclear. In any case, HTML5 by itself is incompatible with XML. I mentioned this in my previous comment. Not all tags in HTML5 are self-closing, thus incompatible with XML. XML-serialized HTML5 is however compatible with XML, by definition, and in that case, one would use an XML parser, not an HTML5 parser. More importantly, you can safely omit the optional tags and still convert your HTML5 document into XML-serialized HTML5 document without any issues whatsoever. This was explained to you by anjbe here at https://news.ycombinator.com/item?id=25706163. He is absolutely right.

> Btw, have you ever seen HTML in the web browser dev tools? Guess why it shows always the "optional" tags. ;-)

You see all the tags there because it shows the entire DOM. The browser automatically creates the elements when optional tags are not explicitly present in the HTML. This is all spelled out in the spec very clearly. Any HTML5 parser worth its name follows the spec. I am not sure what your point is here.

See https://html.spec.whatwg.org/multipage/syntax.html#optional-... for details, especially:

"Omitting an element's start tag in the situations described below does not mean the element is not present; it is implied, but it is still there. For example, an HTML document always has a root html element, even if the string <html> doesn't appear anywhere in the markup."

I hope that explains why you always see the elements for the optional tags in a web browser's developer tools.

The minimizer should output HTML without unneccesary tags, and it should probably expect to run on its own output, so I don't see why adding unneccesary tags would help it work.

The man who collects rocks in his pockets eventually drowns.

That's right, but it's probably more interesting that HTML 5 simply hard-coded these rules based on the tag inference features of SGML and the particular per-element tag omission indicators of HTML 4 and earlier SGML DTDs for HTML (see links on how head and body elements in your example document are inferred by SGML in detail).

[1]: https://www.youtube.com/watch?v=jy-b4jeJSas&list=PLQpqh98e9R...

[2]: http://sgmljs.net/blog/blog1701.html (the "Talk" link for slides)

> the opening <html> tag is optional too but I retained it in the above example to specify the lang

> <html lang="en">

Yeah, no. The text is in Latin (la), not in English (en).

la-gb? (Latin, Gibberish?)

~Any references to support your claim?~

[Edit: I notice now that the content is indeed written in Latin. It contains the "Lorem Ipsum" placeholder text. Nice catch! :-)]

To be fair, you did write your post in Latin:

> Lorem ipsum dolor sit amet, consectetur adipiscing elit.

> Duis id maximus tortor. Sed nisi ante, fermentum vel nunc

> et, tincidunt sagittis magna. In ultrices commodo lacus, id

> tristique ipsum euismod laoreet.

Hmm I always thought lorem ipsum was a text with letter usage histograms similar to English, without distracting the reader with meaning. I could have sworn I read that somewhere...

Lorem ipsum is words/phrases copied from one of Cicero's writings, "De finibus bonorum et malorum". It's not a straight copy, kind of like if you copied every other word in places, but it's definitely Latin.

It does likely have relatively similar usage patterns to English in terms of letter distribution, if only because they're both indo-european languages and English has deep vocabulary ties to Latin through Norman influences.

This is true. And still it is (nonsensical) Latin. It's chosen because Latin looks quite similar to English text. They use the same alphabet and word length tends to be similar.

You are almost right: The distribution of letters and length of words is similar to that of the Latin language. But it is not Latin and the text does not make sense. It's gibberish.

Those are Latin words, drawn from a Latin source.

“Pencil what comma building twenty section human fedora”

is English, even if it makes no sense.

> Those are Latin words, drawn from a Latin source

No, "Lorem" is "dolorem" with the first part chopped off, "adipiscing" and "elit" are mangled non-words, too, and so on.

"Pencil what correct horse battery staple" can be called a sequence of English words. If someone who doesn't know what they're doing writes "ncil what correctando taple", though, then you're no longer in a position to say, "Those are English words".

The words are English words, but many would argue it's not an English sentence.

Please never ever break XML compatibility!

Not having valid XML in the first place complicates any further processing quite a lot. Also you're going to run into annoying and / or strange issues with tooling.

Those HTML shortcuts are just not worth it. Their value is "questionable" (to put it kindly) but down the road their cost can become surprisingly high.

You don’t seem to realize that your post only applies to xhtml doctypes and your concern is extremely outdated to boot.

Use an html parser to parse html.

You also are extremely off on your estimation of how common xhtml is on the web since you thought this would be a useful PSA and you seem unaware of what <!doctype html> means here, as it specifically is not xml. I’m not tying to be mean, but you came in with guns blazing with weird advice and it seems very mislead.

Note that the XML serialisation of HTML is still a thing—if you open a .xhtml file or navigate to something served with the content-type application/xhxml+xml, the XML parser will be used instead of the HTML parser.

Implicit and optional tags have been part of HTML for decades. A valid HTML page that uses them is completely unambiguous. Nothing prevents it from being parsed and converted to an XML‐compatible document on the fly by tools.

Writing HTML in an XML‐like fashion has its own quirks since HTML is not parsed the same way. Can you add elements within an <img></img>, or self‐close a <span/>?

Automatic conversion to XML is not always possible cleanly as those documents tend to not follow all rules (which can happen for example by template nesting). At that point you have a headache.

The point of being XML compatible is not about the browser environment. It's about tools and processes that work with XML (and assume therefore valid XML) and use your HTML as input. That's still a quite common thing. Even you don't do it today you can't know whether you or someone else is going to need it tomorrow.

Being XML compatible also opens up the possibility to use powerful tools like transformations and queries right on the raw HTML data in ad-hoc scenarios.

Sure, that's nothing you would do on an private blog usually, but in more enterprisey settings all kinds of processing happens on all kinds of data, quite often including web page contents. My experience after working in such environments is that not keeping HTML XML compatible will cause some serous trouble eventually.

I haven't encountered a tool that uses an XML parser against HTML in over a decade.

Maybe you just didn't any ETL things lately. :-)

Only one branch of the HTML family tree was ever XML compatible. I was a heavy user of XHTML in the day, so I'm sympathetic... But it's misleading to characterize non-XML variants of HTML as questionable shortcuts.

Strictly speaking, HTML is still XML-compatible—the XML serialisation is still a thing, though pretty rare these days. But yeah, the HTML serialisation is not XML-compatible, though you can easily write documents that will be parsed identically by the XML and HTML parsers.

This is great point. Doesn't it mean that "markdown" is kind of redundant ?

> † These tags

Could you explain the use of that "cross" symbol, versus what I use normally, [0] [1] etc. ?

The dagger (the typographic name of that symbol) is usually employed as a footnote indicator if an asterisk is already present.

It would be valid, but wouldn't Google penalize it for being poor html?

Under what definition is leaving off optional closing tags “poor html”? It is perfectly valid, part of the spec, very widely used, and completely unambiguous.

If it doesn't, it should.

It makes the life of parsers a lot harder.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact