Regardless of the original intent (or lack thereof) behind them, HTML5 tags are more or less syntactic sugar for ARIA roles.
Which means, if you care about accessibility and want to pick the best element for the circumstances, you must know that you are deciding not on a tag but on element’s role.
Which means, the moment you find yourself asking “should I use <section> or <article>?”, or “should I use <nav> or <toolbar>?”, etc., you… might as well throw a <div> there for right now, and as soon as you’re done with the task head to ARIA docs. It’s a warning sign that you are lacking clarity as to the content hierarchy you’re dealing with.
Consult the source of truth (ARIA docs) instead of picking a role by proxy by reading HTML tag docs; give your div a well-informed choice of a role, and leave it to your linter to complain if it could be made briefer by changing it to a different tag that has the same role implicitly.
The main mental model I use for this is pretty straightforward. Is this content something that could arrive standalone in my RSS reader and still make sense? Then it might be an <article>. Would this content have a styled box around it in the UI? Then it might be a <section>, unless something more specific like <nav> applies. At the end of the day though, this is a fine yak to shave and most any option you go with is fine so long as you've at least thought about what you're doing and why.
Aren't the names <article> and <section> pretty descriptive already? In plain English it seems reasonable to say "I read the first section of this article", while "I read the first article of this section" just doesn't feel right.
I realize that not everyone is a native English speaker, but in the case of these two words it seems to me that the non-technical, casual definitions or article and section are enough to differentiate them.
> while "I read the first article of this section" just doesn't feel right.
Articles can go in a section, but an article "of" a section implies that being contained by a section is an ~identifying feature of the article, which is at odds with our intuitions of how articles and sections work. "I read the first article in the Editorials section" is, at least to my brain, a perfectly cromulent usage that puts the "article" subordinate to the "section", though.
The linked article explicitly claims that <article> does not correspond to our first intuition from plain language and that it's more like an "article of clothing" than a written article.
While we're on the subject of structuring content in HTML, guess what folks? <div> tags are now entirely unnecessary in almost every circumstance! They carry no semantic meaning in and of themselves and the only way they differ from <span> (which equally offers no meaning) is default display:block styling.
*Use custom elements.* =) We now have carte blanch in modern HTML specs and web browsers to invent our own elements when the built-in semantic variety won't cut the mustard. In one example, it used a <div> for a newsletter subscription area instead of <section> or <article>. Why div? You could actually add <newsletter-subscription> and now you have a tag you can style and utilize PLUS it conveys meaning within the DOM.
Note that I haven't yet mentioned web components…that's because you don't need to author a web component to use a custom element. A custom element in HTML starts life out as an HTMLUnknownElement object in the DOM. If you want to keep it there, that's perfectly fine! As an optional step, maybe you do want to attach some additional behavior via JavaScript. Now you can upgrade it using customElements.define and boom! It's a web component.
I've actually set up linters in some projects now where HTML is checked for <div>/<span> tags and you get an error unless you consciously opt-in for those particular cases. Again, in almost every respect (perhaps a little more relaxed if you're deep in the bowels of a component shadow DOM, but still…), I find reaching for a true semantic element or a custom element is really the way to go in 2022+.
<div class=article-section>
<div>Hammer</div>
<div>
Buy this and everything will start to look like a nail
</div>
</div>
</div>
Rather than this:
<article-section>
<article-name>Hammer</article-name>
<article-description>
Buy this and everything will start to look like a nail.
</article-description>
</article>
2: Web scraping. Div soup makes scraping harder than a doc of nicely named elements.
But I can usually grasp very quickly "Ah yes, inside an article-section, the name comes first, and the description comes second".
Writers of scraping software don't like ":nth-of-type(3)" though. They prefer ".price". As it is easier to write and will survive more changes to the site.
<div class=article-section>
<div>Hammer</div>
<div>
Buy this and everything will start to look like a nail
</div>
</div>
<div class=price>123.45</div>
</div>
How would you use XPath, and how would it be more rubust than '.price'?
It is the most robust. It picks the price in the article and not some other price that might on the page. And if there is no price, you get an exception that there is no price, rather than an empty result
first off I would probably want to use price as the starting point, if price is there and the contents of price when trimmed etc. is a decimal, cool.
But if there isn't a .price because someone has changed it to .amount for some reason. Or hey started using Material-UI or some other framework and they no longer have any meaningful classes anywhere which is what I most often see people whining about.
then you would probably get an absolute xpath to the item you want, then you would use the getRobustXPath(element, document) function for that element, using the linked library under discussion https://github.com/cyluxx/robula-plus to get a robust xpath. This xpath will probably be pretty unreadable but you can then test to see it gets your stuff.
How will it be more robust - well that depends but if I have something like this
<div class=article-section>
<div>Hammer</div>
<div>
Buy this and everything will start to look like a nail
</div>
</div>
<div class=price>123.45</div>
</div>
I might first off want to get products and their descriptions and their prices together. And I would probably say hey it might be that they are under this kind of class based structure but that can change but what is less likely to change is that a 'price' will be
A number with either . or , as the potential decimal delimiter, with potential currency symbols either directly in the element with the number, not found, or in a directly preceding or following element.
It would be useful to find prices on the page before everything else, and from there to figure out where product names and where descriptions where in relation to the price.
Dependent on language of page product layout will probably change, but in a western language the price will probably come 'last' in our product container, which is exactly what you have done here.
A non-robust xpath for matching the above might look something like
The above might have an error in the xpath because hey, writing without testing in environment, and also obviously really expensive as have not optimized. But it should also be clear enough that by removing the need to have matching elements be based on the incidentals of how we style those elements we can add up with data extraction that is less likely to break with site redesigns.
on edit: obv. article or main check would be optimized beforehand, just some sites use both, some use one, and some neither so would have to do all sorts of things first, etc. etc.
yes, I said it probably would have since not testing it before writing it, think of it as pseudocode to make a point, not code we're going to put in production.
Only if you need to define (JS) behavior for them. If you don’t, any arbitrary name will do (and will be an instance of HTMLUnknownElement[1]). Using hyphenated names is still advised, to avoid name collisions with future standard elements.
Without Javascript they are unknown to the browser. So the browsers fallback to treating them like a block element (so, a div).
It's not because custom elements "work without Javascript". It's because browsers are trying to work even in the presence of errors and invalid/unknown markup.
Thanks for the pointers. Although I think <div> is still incredibly useful, because in many circumstances people simply don't know which element to use or don't care to come up with custom elements.
<div> to me is like the generic building "block", not meant to convey meaning most of the time, and you can use aria-role when needed.
I don't think we needed <section> and <article>, many things in HTML-CSS naming/conventions don't make sense to me, after 15 years. Too many ways to do the same thing.
one of the things that people not used to a11y get confused by is that since div and span elements carry no information you can't add aria-labels to them (or shouldn't be able to, some readers may be non-compliant on this matter) without an explicit role. However you wouldn't be able to add them to any random element either.
A decade ago, in an act of extreme futility, I wrote a book about HTML5. I did the mailing list archaeological dig to discover the logic behind these and other new (at the time) elements. There really wasn’t any. The spec editor just made them up on a whim with very eccentric definitions. I found that very frustrating as I saw it as inflicting a whole array of meaningless choices on front end folks for years to come. A decade later, and folks are still earnestly trying to divine the wisdom of the spec. No need — there isn’t any. There’s no there there. I don’t blame the author for trying, but I do blame the spec author for a very silly rabbit hole that people are still falling down to this day.
The ill-fated XHTML 2.0 was where the academics with actual interest in the semantic meanings of tags got too busy trying to cardinalize their semantic meanings. My understanding was that HTML5 "imported" some of the tag names but never had an interest in the intended semantic meanings as that was part of the schism that killed XHTML 2.0 and was something HTML5 wanted to avoid entirely for pragmatic reasons.
Under that understanding I think you can probably still find interesting semantic versions of these tags in the XHTML 2.0 mailing lists and schism discussions. They aren't relevant to HTML's present, but might be interesting for someone truly curious about the path not taken in semantic HTML (the path unlikely at this point to ever be taken).
I was curious so I compared the list of element tags between HTML 4.0 and XHTML 2.0, excluding the XForms module. Excluding XForms tags from XHTML 2.0, the former has 91 tags, reduced to 67 in the latter.
AFAICT, XHTML 2.0 reorganized tags into modules, yes, but didn't actually try to expand the set of semantic tags, except for XForms--the XForms module looks really complex. And those module groupings were more concerned with functionality, not content semantics, per se.
As far as I recall, that "final" draft of the XHTML 2.0 that W3 posted is "post-schism" just to get something out to compete with the growing momentum of HTML5 and kick the semantic can down the road again to XHTML 3.0 (after most of the damage of the schism was already done). I recall early XHTML 2 drafts had at least article, aside, section, hgroup, and others. I don't know where you would track down such drafts other than combing ancient mailing list archives.
Section and article makes sense as "parts of a book". However, unlike HTML, article is always hierarchy bellow section, it is actually bellow paragraphs. This schema is common in legal texts in many languages, I don't know if this is the case in EUA.
The hgroup elements also seems to be related to this.
My reasoning has always been that an article is a separable entity, which can do without the given context. (E.g., you can share it, or you can present multiple of them in varying order.) So a document may have sections, which may include articles, which in turn include sections, like the table of contents, a section of images, etc. So there's no distinctive hierarchy to them, as each may contain the other. (Mind that this is somewhat different from the use of articles in legal documents, which are integral elements of that document and lose meaning, if provided out of context.)
While any such interpretation is somewhat funny in the context of the parent comment, it may still turn out useful. E.g., if we were to scrape any content from an existing site in order to reintegrate it for a relaunch or a similar purpose.
And, as we're at it, a div is really just a technical means for applying something to a group of elements (e.g., in it's a original use, an attribute for centered text presentation), think of it as blocks in programming. Nothing semantic to see here, keep calm and carry on…
BTW, thanks for mentioning the hgroup, which is often overlooked, but really makes sense, when combining headings and subheadings, which are to be understood as a single item (like the head of an article, yes, an article in the common sense).
The actual specification of article and section elements in HTML is pretty much what you said.
My issue with them is not with their roles, but with their names. And, from the article and from OP, it seems I'm not the only one. I think "region" as it is used by WAI-ARIA would be a better name. Also something like "contentinfo" instead of "footer". And "complementary" instead of "aside"...
> E.g., if we were to scrape any content from an existing site in order to reintegrate it for a relaunch or a similar purpose.
The spec call this "outline".
Related to divs. I find ironic that making pages with tables were frowned upon 20 years ago, yet it is hot again now, but we are calling them "grids".
They say, the lack of usage of hgroups is due the lack of support by screen readers. Another common use case is <h1>Chapter 1</h1><h2>Foobar</h2>.
Regarding the table irony, see also the common use of table, table-row, and table-cell display styles for anything but actual tables. ("If I'm using divs, it's fine!") :-)
(Tables should even be more accessible, since there is <th>, both in <thead> and with `scope="row"` for table rows.)
Something, I've been guilty of (sometimes) for emulating hgroup: <h1>Heading<br /><small>Subhead</small></h1>.
In both cases article means something like "an atom of content". In legalese each statement is a separate article, in other context an entire book can be an article.
Heh, that's a little surprising. I never paid too much attention to the HTML5 element bikeshedding; I always assumed it was (like html) cribbed from sgml/docbook - but simplified (rather than randomly dumbed down).
So normally I'd probably go look at something like:
However - tfa was a lot more interesting and pragmatic - giving good advice on accessibility ; something that is actually worthwhile and not just silly bikeshedding...
Yeah, in this case, from memory, the spec author had a list of class names from a scraped HTML data set. He looked at the most common classes — nav, header, footer, and so on — and declared they should be made native elements.
Which would have been fine, except even the most obvious ones (header, footer) were given very idiosyncratic definitions, and others (article, section, aside) were seemingly thrown in at random.
This led to absurd examples where, as is still in the spec, a blog comment is an <article> and the comment's header is a <footer>. This of course undermined the original premise -- that these elements were just 'paving the cowpaths' of how people were authoring HTML, but the spec author would have none of it and shipped them all the same. And here we are. :)
This has been a thought that has been kind of recurring for me the last few weeks.
I think the problem of bad architectural decisions has been understated, possibly since the dawn of computing.
An example of greatness is HTTP and SMTP. I'd argue most cases don't even need HTTP/2 or HTTP/3 -- getting rid of all the tracking/bloatware on the modern internet would deliver more value for the end-customer but that's not the direction we're going in. The industry has boomerang'd back to nearly static sites (I think there was progress made, IMO SSR is a local maximum), so honestly HTTP/1 is often good enough.
An example of sadness might be Bluetooth. Every time I sit down to look at some docs that involve it (mobile, otherwise) I am horrified anew.
Bad standards basically set the entire industry back person-decades.
Imagine a parallel world where HTML block-level tags are useful and sensible. Let’s think through the ones which would actually be nice to have…
I would start with <block>, which would take the place of <div>. But, big twist, you could put anything inside of <HERE> and get default block behavior. That way you get <article> and <aside> for free and people would know that it’s arbitrary, thus feeling more free to be descriptive with their tags.
Has there ever been a benefit to all these new elements which are just divs with another name? Yes I know there was that aim to make the web entirely machine understandable but at this point it would be much easier to just use ML to view the page like a human than to convince every website maintainer to switch to specific elements.
Does accessibility technology do anything at all with <article>? I'd be very surprised if it did.
<section> once drove the model where we were going to kind of soft-kill the h1-h6 elements and have only h1s where their heading level depended on the section nesting level... but my understanding is that this didn't really work out and is functionally dead, and the current recommendations are all to just use the range of numbered headings as before.
Aside from a single h1-h6 tree, I could imagine separate semantic tags having more traction if they contained matching alt text, as with an image tag. A sidebar might have a single-sentence summary, etc. ... Just speculating here about where more value might be added.
I've always just dreamed of an entirely accessibility focused API that doesn't rely on the DOM and html elements. Most accessibility concerns are for visual impairment, so why do we still base it on what in practice is a visual model?
Do screen readers use a different user agent? Seems like a11y would be simpler in many cases by just delivering a text based result that's tailored to screen readers. Most of the issues seem to arise from the structure chosen in order to style things rather than the content itself.
I think waiting until screen readers adopt technological affordances to start using those affordances is putting the cart before the horse. One could ask why a screen-reader-tech maker would bother paying attention to tags no-one uses—might as well put as much semantic information out there as we can, and let different people consume it in the ways that make most sense for them.
It’s an impossible situation. We are expecting website owners and screen reader companies to come to some kind of consensus without ever collaborating or establishing a standards.
And just expecting that vaguely guessing what the other wants will work.
If they're used well, they could enable non-site-specific custom stylesheets. Imagine if you could view all sites with the same design instead of the author pushing theirs on you.
I think this blog post makes things more complicated than they need to be. You don't need to use either <article> or <section>. Outside of accessibility, semantic HTML isn't really worth thinking a lot about. Is it going to matter if you accidentally use <article> and <section> like any <div>? Probably not.
> Should I Bother? Yes, you absolutely should. [...] Firstly, this article by Mandy Michael reveals that browsers pay attention to your HTML structure to generate a Reader mode which strips the page of unnecessary information, images and background.
Reader mode is not a universally good reason to be using article and section elements. It only matters if you care that much about Reader mode, and Reader mode is not a part of any web standard. Make your webpage so readable that Reader mode isn't necessary.
> Secondly, It helps you actively think about how you present your content.
Depends on if it does. I've used both article and section since time immemorial and I honestly couldn't tell you if they helped me create better markup. I just use them because they seem logical and they're there, but I wouldn't really miss them if they were deprecated. Article is only really useful if you have multiple article-like places in your markup, since it actually has an "article" role, but sections are just like any other region in terms of accessibility and aren't that useful.
You mean people don't read webpages without a screen-reader? There's no such thing as Reader mode? Web developers don't read HTML and want to write it in an organized way?
Sure, there should be nothing outside of accessibility. /s
If you read the post (and I can't really blame you for not reading it all the way through), you'd see that the author is suggesting that there are reasons besides supporting accessibility features for using these semantic HTML tags. I just don't agree that they are objectively good reasons for why one "should" do so.
Oh yeah, and semantic HTML is by no means the only way to write pages that are accessible. Everything that a semantic tag can do, the role attribute can do, too. I never said to not care about accessibility.
If you use main, article, etc is easier to write apps to process the page. Firefox reader mode use the main element to know how get the text from the page.
That's kind of ridiculous. If it actually was true, that accessibility is not only an important concern when designing a website, but the singular overriding concern, then surely all websites ought to consist of a blank page with code like
<html/>
That's the most accessible website in the world. You can print it without paper, character encoding is no issue, RTL is no issue, you can see it when you are blind, you can listen to it spoken without connecting your headphones, even if you are deaf you can hear it spoken, it's printed in braille on the screen (or any flat surface), it works in all languages spoken and written and sign language. It really is the pinnacle of accessibility. It's even available in dead languages such as Linear A and Etruscan. Every website with any sort of content is strictly less accessible than this website (which also is as utterly useless as it is accessible).
Just to take a ridiculous idea seriously, because it’s fun⸺
The minimal “valid” HTML page is actually this:
<!DOCTYPE html>
… though it’s only valid in contexts that don’t require an in-document title, such as email text/html parts (the Subject header from the MIME message is used) or frames (no title is needed); for a top-level page in a web browser, to be “valid” it needs a non-empty title, such as
<!DOCTYPE html><title>x</title>
In both of these, the <html> open and close tag are omitted, as they’re both optional in HTML syntax.
If you’re not caring about validity or quirks mode, then you might as well just serve a zero-byte text/html body.
As for XML syntax which you seem to have tried to use, <html/> wouldn’t be a valid HTML document because you haven’t used the http://www.w3.org/1999/xhtml namespace.
Accessibility matters. If nothing else because it is a legal requirement with laws such as US's Americans with Disabilities Act, UK's Disability Discrimination Act, EU's European Accessibility Act, Australia's Disability Discrimination Act, and Ontario's Accessibility for Ontarians with Disabilities Act.
Which means, if you care about accessibility and want to pick the best element for the circumstances, you must know that you are deciding not on a tag but on element’s role.
Which means, the moment you find yourself asking “should I use <section> or <article>?”, or “should I use <nav> or <toolbar>?”, etc., you… might as well throw a <div> there for right now, and as soon as you’re done with the task head to ARIA docs. It’s a warning sign that you are lacking clarity as to the content hierarchy you’re dealing with.
Consult the source of truth (ARIA docs) instead of picking a role by proxy by reading HTML tag docs; give your div a well-informed choice of a role, and leave it to your linter to complain if it could be made briefer by changing it to a different tag that has the same role implicitly.