A decade ago, in an act of extreme futility, I wrote a book about HTML5. I did the mailing list archaeological dig to discover the logic behind these and other new (at the time) elements. There really wasn’t any. The spec editor just made them up on a whim with very eccentric definitions. I found that very frustrating as I saw it as inflicting a whole array of meaningless choices on front end folks for years to come. A decade later, and folks are still earnestly trying to divine the wisdom of the spec. No need — there isn’t any. There’s no there there. I don’t blame the author for trying, but I do blame the spec author for a very silly rabbit hole that people are still falling down to this day.
The ill-fated XHTML 2.0 was where the academics with actual interest in the semantic meanings of tags got too busy trying to cardinalize their semantic meanings. My understanding was that HTML5 "imported" some of the tag names but never had an interest in the intended semantic meanings as that was part of the schism that killed XHTML 2.0 and was something HTML5 wanted to avoid entirely for pragmatic reasons.
Under that understanding I think you can probably still find interesting semantic versions of these tags in the XHTML 2.0 mailing lists and schism discussions. They aren't relevant to HTML's present, but might be interesting for someone truly curious about the path not taken in semantic HTML (the path unlikely at this point to ever be taken).
I was curious so I compared the list of element tags between HTML 4.0 and XHTML 2.0, excluding the XForms module. Excluding XForms tags from XHTML 2.0, the former has 91 tags, reduced to 67 in the latter.
AFAICT, XHTML 2.0 reorganized tags into modules, yes, but didn't actually try to expand the set of semantic tags, except for XForms--the XForms module looks really complex. And those module groupings were more concerned with functionality, not content semantics, per se.
As far as I recall, that "final" draft of the XHTML 2.0 that W3 posted is "post-schism" just to get something out to compete with the growing momentum of HTML5 and kick the semantic can down the road again to XHTML 3.0 (after most of the damage of the schism was already done). I recall early XHTML 2 drafts had at least article, aside, section, hgroup, and others. I don't know where you would track down such drafts other than combing ancient mailing list archives.
Section and article makes sense as "parts of a book". However, unlike HTML, article is always hierarchy bellow section, it is actually bellow paragraphs. This schema is common in legal texts in many languages, I don't know if this is the case in EUA.
The hgroup elements also seems to be related to this.
My reasoning has always been that an article is a separable entity, which can do without the given context. (E.g., you can share it, or you can present multiple of them in varying order.) So a document may have sections, which may include articles, which in turn include sections, like the table of contents, a section of images, etc. So there's no distinctive hierarchy to them, as each may contain the other. (Mind that this is somewhat different from the use of articles in legal documents, which are integral elements of that document and lose meaning, if provided out of context.)
While any such interpretation is somewhat funny in the context of the parent comment, it may still turn out useful. E.g., if we were to scrape any content from an existing site in order to reintegrate it for a relaunch or a similar purpose.
And, as we're at it, a div is really just a technical means for applying something to a group of elements (e.g., in it's a original use, an attribute for centered text presentation), think of it as blocks in programming. Nothing semantic to see here, keep calm and carry on…
BTW, thanks for mentioning the hgroup, which is often overlooked, but really makes sense, when combining headings and subheadings, which are to be understood as a single item (like the head of an article, yes, an article in the common sense).
The actual specification of article and section elements in HTML is pretty much what you said.
My issue with them is not with their roles, but with their names. And, from the article and from OP, it seems I'm not the only one. I think "region" as it is used by WAI-ARIA would be a better name. Also something like "contentinfo" instead of "footer". And "complementary" instead of "aside"...
> E.g., if we were to scrape any content from an existing site in order to reintegrate it for a relaunch or a similar purpose.
The spec call this "outline".
Related to divs. I find ironic that making pages with tables were frowned upon 20 years ago, yet it is hot again now, but we are calling them "grids".
They say, the lack of usage of hgroups is due the lack of support by screen readers. Another common use case is <h1>Chapter 1</h1><h2>Foobar</h2>.
Regarding the table irony, see also the common use of table, table-row, and table-cell display styles for anything but actual tables. ("If I'm using divs, it's fine!") :-)
(Tables should even be more accessible, since there is <th>, both in <thead> and with `scope="row"` for table rows.)
Something, I've been guilty of (sometimes) for emulating hgroup: <h1>Heading<br /><small>Subhead</small></h1>.
In both cases article means something like "an atom of content". In legalese each statement is a separate article, in other context an entire book can be an article.
Heh, that's a little surprising. I never paid too much attention to the HTML5 element bikeshedding; I always assumed it was (like html) cribbed from sgml/docbook - but simplified (rather than randomly dumbed down).
So normally I'd probably go look at something like:
However - tfa was a lot more interesting and pragmatic - giving good advice on accessibility ; something that is actually worthwhile and not just silly bikeshedding...
Yeah, in this case, from memory, the spec author had a list of class names from a scraped HTML data set. He looked at the most common classes — nav, header, footer, and so on — and declared they should be made native elements.
Which would have been fine, except even the most obvious ones (header, footer) were given very idiosyncratic definitions, and others (article, section, aside) were seemingly thrown in at random.
This led to absurd examples where, as is still in the spec, a blog comment is an <article> and the comment's header is a <footer>. This of course undermined the original premise -- that these elements were just 'paving the cowpaths' of how people were authoring HTML, but the spec author would have none of it and shipped them all the same. And here we are. :)
This has been a thought that has been kind of recurring for me the last few weeks.
I think the problem of bad architectural decisions has been understated, possibly since the dawn of computing.
An example of greatness is HTTP and SMTP. I'd argue most cases don't even need HTTP/2 or HTTP/3 -- getting rid of all the tracking/bloatware on the modern internet would deliver more value for the end-customer but that's not the direction we're going in. The industry has boomerang'd back to nearly static sites (I think there was progress made, IMO SSR is a local maximum), so honestly HTTP/1 is often good enough.
An example of sadness might be Bluetooth. Every time I sit down to look at some docs that involve it (mobile, otherwise) I am horrified anew.
Bad standards basically set the entire industry back person-decades.