Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I've never understood the purpose of "semantic" html. It seems to be bandied about as some sort of unquestioned good, but why? If the code is easy to understand who cares that the "correct" tags have been used?

Obviously there are some benefits for screen readers and search engines, but these uses should be explicitly checked for as part of a QA process instead of relying on some vague standard of semanticness.



Screen readers, distilled/simplified views, other automated parsing.

Things like marking up addresses, phone numbers - as examples - means that a search engine can log a website as associated with a particular address, a browser can link to enable a number displayed to be called direct.

If for example there was a microformat (or other semantic markup) for opening times then SE/social sites could read and display (and update) that info without owners having to go on 20 sites if they change opening times.

Of course it can also help infer which parts of a page are advertising, or impressum, and not render that.

A plugin could offer to work on tables, or import them to a spreadsheet.

Lots of scope for advanced automations.


Why do anything properly?

I look at 'div soup' HTML and shudder. The authors of it have not got a clue. They also make it hard for people who do write proper HTML with the correct tags, styling the elements and keeping the separation of concerns to do anything with it.

It is about organising your content, if everything is in div tags you might as well just upload JPGs of your web pages, with different ones for desktop and mobile.

Another game changer is CSS grid. You no longer need wrapper divs to do very basic tasks such as centering content. In fact content with horrible divs everywhere is a nightmare to style up with CSS grid.

An example of this is a basic form. With just the form elements and labels you can get it looking sweet in next to no time with CSS grid.

However, if some zombie has put lots of spans and divs around the form elements and done the labels wrong then you have to choose either to rebuild the backend thing that churns out the form or just style it up lame block layout style.

Pseudo selectors are also cool, there is no need to have silly 'i' elements and spans to put that asterisk after 'required'.

Some people like to keep it simple, doing HTML properly, others want to pootle along with their divs and class attributes. I know what will look good in ten years time and what will look outdated.

HTML5 introduced many features but the main course was the semantic elements. You can and you should write web pages with these elements and also be thinking in terms of them.

You may scorn the accessibility aspect but accessibility is easy if you use the right elements. Why would you not want to have it? There is also the mindset that goes with it - the web is for everyone, not just rich, white, English speaking males with perfect eyesight.

The article is spot on and extremely well said. Styling elements rather than make believe classes is particularly well said.

I would say that people who create div soup HTML should be banned from the internets, the work practices of overly complicating everything has been going on for too long.

I don't even regard div soup web pages as proper web pages, it is like comparing beige deep-fried processed food with freshly prepared from ingredients food.


> you might as well just upload JPGs of your web pages

That was not so uncommon in 1999 :-(

Seriously, it's 2019 and some developers still don't get the value of semantic markup while this has been beaten to death for at least 20 years. this profession is cluttered with too many tourists. I even talk with devs bragging about the lack of care for markup.


I wonder how well it would work to do that with JSON-LD micro-format markup.

Google don't care about HTML as they are way too clever for that. If I had more time I would like to see how they would rank two identical pages, one done properly and one that was a JPG image with a dangleberry of JSON-LD tacked on. I bet they latter would fare better!


If you do semantic html vaguely right (using <button> for buttons, <table> for table, <input> for input fields) you have something you know will work decently on screen readers, plugin and other assistive technology (which includes password managers in the case of login forms, etc).

If you don't follow semantic html at all I don't see how QA can safe you, except for QA telling you to fix that <div> by replacing it with a <button> (or <div role="button">, or <a> or whatever you prefer).


When done properly, it increases the separation between the actual content, and the styling on top of it. Any time you get closer to that ideal separation, accessibility is enhanced (and it’s also forward-thinking towards yet-unknown methods of accessibility assistance).


IMO this is an example of extreme overengineering and premature optimization. Unstyled html is so ugly and inconsistent across browsers that it’s just not a realistic scenario that it would be consumed without the css and js that make up the rest of the code.

When an alternate display method comes out and wants my semantic html, I will add it to my QA workflow and make sure that it looks good instead of relying on some vague standard of semanticness from one of many bloggers.


For a web application, unstyled HTML is almost certainly a mess even if it's perfectly semantic, but for a web page you absolutely don't need any CSS at all for it to be at the bare minimum readable on almost any device.

Sure it's mostly black text on white backgrounds with only minor typesetting differences between elements but as a means to present information it has worked since the mid 1400s...


Firefox's reader view is a realistic system that lets us consume html pages without their original CSS.

The more page authors use html 'properly', the more incentive there will be to improve systems like reader view.


> Unstyled html is so ugly

This comment made me realize how relative everything is. It seems it was yesterday, when you would open Mosaic and slowly discover completely different worlds, one after another, and you'd never think of complaining HTML was aesthetically unpleasant...


> explicitly checked for as part of a QA process

After you've written the application? That makes absolutely no sense, unless you enjoy renaming lots of elements.

> instead of relying on some vague standard of semanticness

It's not vague, there is an actual international standard... WCAG!


Not sure why you'd think semantic HTML is for people reading the code?

It's for software reading the page and making use of the extra information, e.g. a screen reader being able to tell a user that there's a navigation element on the site and being able to jump there. Reader modes knowing what the main article on the page is. ...

(EDIT: I see you edited that point in, so yes, that's the main one, and IMHO a good enough one)


Because the idea that one can separate content from presentation appears as an elegant Platonic ideal to a lot of folks. The trouble seems to be that content and presentation are not actually disjoint sets. And (a perhaps different) set of folks want a slice of the juicy middle.


Ever try using your page with JS disabled?


As with most things like this in tech (and every other economic sector that has ever existed) it's largely a guild protection and enforcement matter. When economic guilds use an unnecessarily strict 'only proper way of doing a thing,' it is for insulation against outsiders and to enforce their control over the domain they have a personal stake in. To the extent that it's truly petty, that is the extent to which the person shouting about it has very little protection from eg an endless flood of low priced competition (Web developers have historically been very exposed); or otherwise it's just pedantry run wild and a personal issue.

You see this in all professions and without exception, from the exercise industry, to trades like plumbing, to healthcare and everything inbetween. Most of it is bullshit, process-based job security theater. Most major guilds just end up developing higher level cartel-like certification protections to keep them economically secured from competition.


What a terrible attitude towards accessibility. Try surfing the web with your eyes closed and only a screen reader to guide you. I guarantee you'll cheat and open your eyes, maybe then you'll understand.


Honestly, there isn't, except for a small subset related to screen readers.

At one point there was a kind of hope that semantic tags would make web content easily machine-parseable, unlocking a bunch of meaningful content reuse somehow that better semantics would make possible. (Big data, ML and all that.)

The two canonical examples being that lists were important so software could extract meaning from list items, and not using tables for layout so software could trust tables actually had meaningfully tabular data.

But... that never really happened, it's not clear it ever will, and it doesn't benefit the content author directly anyways, so... yeah.

(Like you say, screen readers are the main thing, so use the tags and attributes that are important to screen readers, like buttons and alt text... but that's a very specific subset. A screen reader certainly doesn't care if you use a <div> or an <li> or a <td>.)

Edit: I stand corrected on my example, for <td> specifically I forgot <table> lets screen readers navigate vertically and also read column/row labels.


> A screen reader certainly doesn't care if you use a <div> or an <li> or a <td>.)

It absolutely does. Try firing up a screen reader before you make such comments spreading misinformation.

For example, it's maddening to see people now making tables out of flexbox grids and such. In a screen reader you can navigate a <table> in two dimensions (rows and columns). The flexbox variety completely breaks navigating up/down columns.


> A screen reader certainly doesn't care if you use a <div> or an <li> or a <td>.

In which current screen readers is that statement true? (I'm fairly sure the answer is "none", but ...)


You deserve downvotes. Semantic HTML is a key part of making your website accessible for people with disabilities. Ignore it, and you shouldn't be making things for the web.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: