HTML loves its special cases. XML is overly complex, but at least your editor doesn't need to know anything special about what document type you're writing in order to indent it properly. Throw in HTML's special cases, and now it needs to know that <br> is different from <foo>.
I guess since HTML is so common it doesn't really matter, but really? We need 5 differnt types of markup, when one would have been fine?
I prefer HTML over XHTML, because it is easier to write. I don't get the reasoning behind closing tags. LIs close before the next LI, or the UL. <BR> saves two characters over <BR /> and causes no harm. XHTML feels like trying too hard to make the machine overlord happy.
It is plain wrong to make a standard easier for machine-parsing at the expense of humans who are typing it in.
EDIT: Another example. I write some HTML in a text-editor/textarea and send it across to someone. If I missed a </LI>, should the parser reject it? If not, the standard should be accommodating enough so that this is valid.
Personally I see saving a few characters here and there as a completely inadequate reason for making the spec less consistent.
I don't think any of the things you mention actually make html considerably more legible or easier to write for a person. Just harder to parse for a machine.
I would rather have a strict language and have solid parsers that can thoroughly and decisively reject improper markup and help stop people from making mistakes while writing markup. Rather than trying to interpret what they really meant after the fact.
You're only addressing half my argument. Just because we can invent the wheel once doesn't mean it's a good idea to make the problem it solves overly complicated for no good reason.
I'm of the opinion that there is nothing that much more convenient about what he was describing. And definitely not to the point where it makes a more complex specification worthwhile, whether one or a million parsers need to be written isn't really the point. There is some virtue in having a clean, consistent and well defined specification, for writing simpler parsers or just for being able to learn the language and be sure of how to parse it in your head. Fewer edge cases in specification = fewer mistakes and bugs in source. Attempting to fix mistakes and bugs after the fact by guessing what someone meant however will just make for new mistakes, hide bugs and does less to encourage a solid understanding of the language.
It is plain wrong to make a standard easier for machine-parsing at the expense of humans who are typing it in.
It is far worse to sacrifice consistency of the mental model behind the language for the sake of not typing two extra characters.
Core XML is much easier to read than modern HTML, because you can read it without knowing the context of what you're looking at and without memorizing tons of exceptions. It's easier to parse for the same reason.
Also, the savings on avoiding " /" are offset by the need to needlessly close some of the HTML5 tags.
The only really stupid things in XML I remember is the need to do checked="checked" and people using namespace prefixes on every tag. It's pretty obvious how to fix the former. The latter is entirely avoidable if you have a fully working parser.
But the problem is that you now have a specific behaviour that depends on the tag name. DIV tags don't need to close before the next DIV, but LI tags do. So you've gone from a simple tag parser to one that needs to know the intricacies and rules surrounding every element type.
Personally, I feel that two extra characters per <BR/> tag is worth it.
In addition, not having to close some tags might make it easier to write HTML, but it makes it more difficult to learn it. I remember back in the day I had to keep looking up which tags need to be closed and which ones do not.
Nowadays I just close everything because my OCD outweighs my laziness.
Wouldn't having a reasonable schema specification specification solve this? Its not like you can invent arbitrary tags (there are arbitrary attributes, but an expressive enough schema language could capture that as well).
Sure, but presumably that schema will change with time. Then you'll have a parser built around HTML5, but another new release for HTML6, so on and so forth. It just needlessly complicates things.
It does not only make the machine overlord happy, it also helps the humans when they do make a mistake.
Humans throw some HTML-like stuff at the browser and the browser tries hard to make sense out of it. If the browser misinterprets, you have a hard time finding out what went wrong.
Whereas with XML and XHTML you get told immediately what's wrong and you don't have to hope that every browser implementation works the same way.
It's also a bit strange to argue about "easier to write manually" in this day and age of Markdown, HAML, etc.
> If the browser misinterprets, you have a hard time finding out what went wrong.
This is what HTML validators are for.
Browsers should do their best to interpret the page authors intention and actually display a page. The developer doesn't always have 100% control over the page markup (think about user generated content, ads etc...)
Sorry, that's silly. Web pages break all the time.
If they would break because of invalid HTML, they would also already break during your tests.
If you use some HTML injection services like ads or analytics that make valid HTML invalid, then it's great that it breaks because it will show you immediately on your tests that these services suck.
If browsers would error out on invalid content then those services would either provide valid content or they would go bust.
And XML/XHTML is much easier to parse and for producing valid content than HTML because it is much more consistent and has less history package than HTML.
It is just that today, we still have tag-soup and error-tolerant parsers in the browsers and of course lots and lots HTML producer are producing shitty HTML and you can't just switch on a strict parsing.
But if history would have taken a slightly different turn, we would be talking about XHTML5 and not HTML5.
That's a good point, but I wasn't necessarily thinking it should error out to render nothing at all.
What I would like is if somehow the browser could insert an error class name into the offending element so that in my CSS I could give it a set of rules to make it stand out after render.
What I'm saying is that it is "not a mistake" to omit the closing tag in some cases. There are things that are hard when it comes to parsing HTML, but when to close open tags is not one of them. Rules for closing tags are trivial to implement and well documented. (Add: also, omitting unnecessary tags such as <html> and <body>)
Humans throw some HTML-like stuff at the browser and the browser tries hard to make sense out of it. If the browser misinterprets, you have a hard time finding out what went wrong.
I haven't seen any modern browser misinterpret HTML's (simple) closing rules. As for being harder to debug, I haven't seen any real evidence of that either.
Whereas with XML and XHTML you get told immediately what's wrong and you don't have to hope that every browser implementation works the same way.
Again, do you have any evidence of such incompatibility with current or recent browsers?
It's also a bit strange to argue about "easier to write manually" in this day and age of Markdown, HAML, etc.
Way more HTML is written by hand than Markdown and HAML. The issue isn't just saving keystrokes. The point is that whenever possible, technology should accommodate simple mistakes people make.
> It's also a bit strange to argue about "easier to write manually" in this day and age of Markdown, HAML, etc.
But, by that logic, isn't it also strange to argue about making HTML's syntax more consistent if we should be using Markdown/HAML/etc to generate it anyway?
BTW, i do agree with you in that having a more consistent syntax is better than having a syntax that aims to save a few keystrokes at the expense of adding special rules. As a user, i find it more difficult to have to remember which cases are special than to read or write a more consistent syntax. I just don't see how your comment on Markdown/HAML helps the case for a simpler HTML syntax ;)
There is nowadays less need for writing HTML by hand. HTML is more often generated from other formats and things like Markdown, HAML, and other lightweight markup languages helped in that.
But in the end the output is HTML and having a consistent syntax makes it easier to generate, read, and debug it.
Its syntax doesn't need to be dumbed down for casual users because casual users have other options. In that sense I think my comment is in support for a better HTML syntax. :)
The tradeoff is that allowing quirks like that means that either you need to make a massive specification to deal with each way someone might goof in their code (edit: which is what HTML5 tries to do), or you end up with each browser engine reacting differently.
The main push behind stricter document control is making it easier to make all the browsers render documents consistently.
Also, in the age of XHTML, manual document editing was seen as dead - tools like DreamWeaver were popular, and XSLT was touted as the answer to server-side templates. The web refuses to be anything but a pile of dirty hacks upon dirty hacks though, which while frustrating, may have a hand in it's popularity :)
That's why HTML5 specifies the parser, so that every browser extract the same DOM tree from the same input.
The specification is strict in the sense that any parser has to behave the same while also allowing for human error.
... so that every browser extract the same DOM tree from the same input
That is, every browser whose engine has been updated for HTML5 and which also implements the parser specification correctly and without bugs. Which is most, but not all, of them.
My personal preference is to include the optional closings, because XHTML has been around a lot longer and therefore a larger proportion of browsers have been coded to handle it properly and have had more time to work out bugs.
I do like that HTML5 browsers can work around invalid markup in a well-specified way, which is much better than XHTML browsers just showing an error. It's the best of both worlds, especially when the browser's Developer Tools give you warnings about the invalid markup too so you don't need to use an external validator to find them.
You've got a system with lots of exceptions and special behaviour. I don't see that as easier for humans at all. (The machine doesn't care; it parses anything you can put in rules. But the more complex the rules are, the harder it is for you to understand the error message. XML is really easier on humans.)
"It is plain wrong to make a standard easier for machine-parsing at the expense of humans who are typing it in."
No, it isn't - especially not when the intended use case is for literally every viewer of the document to use "machine-parsing" to read it, doubly so when a significant fraction of the users will actually BE machines...
The two main advantages were XML parsing performance, and the ability to embed XML directly in the XHTML. For phones of the era, the performance benefits are obvious. As for XML embedding, it'd give you the ability to embed SVG, MathML, and any other XML language directly. This avoids a second retrieval/parsing step, and allows extensibility without changing the XHTML spec.
I think it's excellent that HTML5 completely specifies the parsing in a very clear, and most backwards-compatible way; judging by what the big browser vendors have been doing, they seem to be following it. (It also gives a nice starting point that makes it easier for anyone to write their own parser, and have it behave the same as any other mainstream browser - and having the possibility of making more browsers available, with the same standard parsing behaviour, is a good thing.)
XHTML lost, we decided that we preferred tag-soup, and keeping our past documents readable. Besides that, most XHTML came with a HTML mimetype, which meant it wasn't being read as XHTML. So the best bits, lighter parses, and XML embedding, were never usable.
I don't understand how there's much of any difference in readability between html and xhtml style of coding. Do closing tags really make it that much more difficult to read for some people?
There isn't a difference in readability, XHTML is just stricter. Much of the HTML that exists is invalid HTML/XHTML, and the XML parser used for XHTML would simply error out. Most XHTML pages were served as HTML, due to mimetypes being wrongly configured, so no-one ever noticed.
The XML parser was supposed to be faster, and allow any XML to be embedded in the XHTML (SVG, MathML, etc). This stuff was designed to change the shape of the web (especially since mobile phones weren't very powerful in those days).
I can see your point for the post I was responding to, but I've numerous posts throughout the page that seem to be debating over human readability as well.
I'm sure many at the W3C would have loved to develop only a new version of XHTML, but the problem is that it breaks retro-compatibility, and that's almost impossible to impose. Any browser to try that would see its lunch get eaten by its more permissive competition.
And if they were really going to break the standard like that, at least they could have broken it in such a way that it fixed all the stupid legacy decisions.
Have it support <img>alt text</img> and <meta>content</meta>, as well as the old way, and then the developers can decide if they want to support legacy browsers. (They probably do, but at least we're looking at a future where html will be just a tad cleaner and more consistent.)
Indenting HTML is a terrible practice. HTML is not a programming language - it is document markup. Source files should read like a line-wrapped text document sprinkled with embedded tags. Let your editor keep track of open/close pairs with highlighting (the way most already do.)
When xhtml came to replace html4 it was such a huge relief for all OCD developers, and I thought I had seen the last non-xml compliant web page. Now I'm encouraged to write tag soup again because void elements? Humbug.
It's the html5 standard that is complex and pedantic, it breaks silently when you violate one of hundreds of rules (e.g lists of void elements that can't be closed).
XML is simple. Sure it's pedantic in the sense that it breaks, but html5 breaks too only subtly.
It's like the difference between java and JavaScript. Java isn't more "pedantic" than JS in ANY way, it just breaks in a more understandable way (break loudly, early and understandably is in my view "better").
Which probably goes to show that most developers are not afflicted with OCD but would rather have a more lenient spec. After all, XHTML2, which is even more strict, sold like hot cakes...
Don't confuse the term lenient to mean "pedantic but with very silent failures". The failures caused by forgetting to close tags in html5 are often catastrophic, which is why it isn't "lenient".
If html5 fails on some seemingly valid input (e.g. makes a strange layout when you self-close a div-tag) then it isn't lenient, it's still pedantic. It's just as pedantic as an xml standard is about closing tags, only that the specification for closing tags is dozens of pages instead of three words.
In fact, I think most developers agree that an error message would be preferable to a corrupt layout in the case of the self-cosed div.
I think the author's recommendations at the end, on making <meta> and <img> and <script> more sane, are good examples of where the "implement then standardize" process that the W3C uses falls down. In fact, XHTML2 (which was never implemented) had some good ideas. On the other hand, as we've seen so many times, implement then standardize reduces foot-dragging and needless bike-shedding. You take the good with the bad, I guess.
I've been burned before by using <script src="..." /> and assuming it would work in all browsers. Instead, it subsumed later tags in a horrible way. I've never used empty-elements in HTML since.
`<script src="foo" />` only works the way you’d expect it to in XHTML. Proper XHTML, that is — served with the correct `Content-Type` header. http://mathiasbynens.be/notes/xhtml5
Read the article, there's no way to specify "optional closing tag depending on whether a `src` property is present" so therefore it's manditory. You can, of course, write the parser to do it, but there isn't a way to express it in the HTML grammar.
I always have wished that the script inside the tag would be executed if the `src` couldn't be loaded, which is something John Resig suggested years ago [1].
I think it's awesome, actually. Good neighbourship. Not that they should do this all the time (there would be no end to it), but for the most egregious problems, yes, why not?
XHTML2 was largely fantastic, imo, and would have been an excellent successor to html. If it had been what started out the XHTML process I think it would have been more successful, but XHTML1 was such a foot-in-both-worlds mess that it needed to be put out of its misery.
The big problem with XHTML2 was that it was designed by people who hated HTML. So they went and made it purposefully incompatible with HTML and XHTML1 in various ways (e.g. tags with the same localName and in the same namespace were supposed to have different behavior).
That made it impossible for a browser to implement both XHTML2 and XHTML1 at once (which was in fact the goal of some of the committee members). And then browsers were faced with the choice of implementing XHTML2 (no content at all out there) or XHTML1+HTML (lots of content out there) but not both, they picked the one you'd expect them to pick...
Actually hardly anyone wanted XHTML2, because it was a purely academic excercise in making established things harder (<a href="..." target="_blank") without compelling features.
I tried to use it but then completely reverted to HTML4. Thank god we have HTML5 now.
>Optionally, a "/" character, which may be present only if the element is a void element.
>There is absolutely no difference between <br> and <br />.
>Actually, one might argue that adding / to a void tag is an ignored syntax error.
>every browser and parser should not handle <br> and <br /> any differently
If it's optional and has absolutely no effect and makes no difference, how exactly would one argue that it's an error?
To me, this is like saying `print ${SHELL}` is erroneous because the braces don't do anything and `print $SHELL` does exactly the same thing. It may be superfluous, but it's not erroneous.
It is erroneous. It only makes no difference because the error is ignored (or rather rendering is wrong). In HTML properly rendered <br/> would produce extra ">" on each occurrence.
IIRC there was a browser (some reference implementation) that did this correctly. Also, I remember Gecko used to flag these slashes in source view too.
HTML5 has a list of tags for which a trailing solidus is not producing an error. It's still not something the parser ever uses for anything besides sometimes producing an error. "<br/>" is tolerated whereas "<script/>" for instance is a parsing error.
Is erroneous because one of the stated goals of HTML5 is semantic value, and under any line of logic you cannot close something that isn't open, therefore is an error, albeit not (yet?) a technical one.
"Well, for those of you who are really addicted to X(HT)ML, you might think, «yeah, it's optional, but <br /> is still 'more correct'», but I have to tell you: it is not. Actually, one might argue that adding / to a void tag is an ignored syntax error. The possibility to write it has mostly been added for compatibility reasons and every browser and parser should not handle <br> and <br /> any differently.
Google's styleguide on that subject is also very clear that you should indeed not close void tags."
It's written as the last line of that section/paragraph. It's essentially a closing argument. The reader can be drawn in no other direction. If they says "interestingly" or such that it was obviously a small paragraph then maybe I'd not draw this conclusion.
> Google's styleguide on that subject is also very clear that you should indeed not close void tags.
Only because it results in smaller files. For example it also recommends omitting optional tags for the same reason. I'm really skeptical that omitting these things helps readability (if that's what the guide is referring to when it says "scannability"). If size is at such a premium why not simply preprocess and minify HTML? Recently I tried briefly omitting "/>" from <br> and friends and I wasn't impressed as far as legibility goes. Maybe I just didn't try hard enough... :)
and I'm saying: if THAT is what you call "works just fine", then the whole concept of a hybrid document is broken and it's great that we've abandoned it.
So really, you have no argument, except for a distaste of XHTML. Gotcha.
It seems better to mix that with SVG than HTML5, since SVG happens to also be XML based. Otherwise you have a impedance mismatch leading to some weird corner cases.
my argument is that you might be able to make it work through a lot of effort but you can't actually use it in the real world so it's worse than worthless.
Know why everyone writes <br /> instead of <br/>? IE5 on the Mac’s parser broke it if found an empty tag without a space before the closing slash. Funny how software can vanish into the mists of time yet still have an effect on current coding.
I used the SGML NET trick a few years back in an attempt to create the shortest possible valid HTML documents for different versions of HTML: http://mathiasbynens.be/notes/minimal-html
Note: “valid” here is defined as “theoretically valid as per the relevant spec” and doesn’t reflect what browsers actually support(ed).
This is from Ian Hickson in 2006, regarding the emergence of HTML5:
"Regarding your original suggestion: based on the arguments presented by the various people taking part in this discussion, I’ve now updated the specification to allow “/” characters at the end of void elements."
To which Sam Ruby responded:
"This is big. PHP’s nl2br function is now HTML5 compliant. WordPress won’t have to completely convert to HTML4 before people who wish to author documents targeting HTML5 can do so using this software. Such efforts can now afford to proceed much more incrementally. This is much more sensible and practical possibility."
Remember that both men played fundamental roles in shaping HTML5. And I think this one sentence sums up the mindset that shaped HTML5:
"The truth is that most HTML is authored by pagans."
and this was Sam Ruby's view at the time:
"When all the religion was stripped away from the trailing slash in always-empty HTML elements discussion, only one question remained: I think basically the argument is “it would help people” and the counter argument is “it would confuse people”. This is a eminently sane way to approach discussions such as these. I would argue that it would both help people and reduce confusion if a void <a/> element continued to be invalid HTML5 and, by implication, be invalid in XHTML5. By invalid, I simply mean that a parse error would be reported by a conformance checker whenever such constructs are found in a document. Non-draconian user agents can, of course, chose to recover from this error."
People with real lives have perhaps missed the sad slow way that the argument for XML on the Web, and therefore XHTML, has imploded. But the sad souls (such as me) who have followed this story are aware that the case against XHTML has developed slowly over the years.
The first salvo against XML on the web was launched by Mark Pilgrim way back in 2004. This is when the mania for XML was at its peak (before JSON had appeared), a time when people felt XML/XPATH would eventually replace SQL and RDBMS (an idea promoted by no less an authority than Sir Timothy Berners-Lee, who, at that time, could make a believable case that RDF was the future of the Web).
This is Pilgrims article "XML on the Web has Failed":
"There are things called "transcoding proxies," used by ISPs and large organizations in Japan and Russia and other countries. A transcoding proxy will automatically convert text documents from one character encoding to another. If a feed is served as text/xml, the proxy treats it like any other text document, and transcodes it. It does this strictly at the HTTP level: it gets the current encoding from the HTTP headers, transcodes the document byte for byte, sets the charset parameter in the HTTP headers, and sends the document on its way. It never looks inside the document, so it doesn't know anything about this secret place inside the document where XML just happens to store encoding information. So there's a good reason, but this means that in some cases -- such as feeds served as text/xml -- the encoding attribute in the XML document is completely ignored."
The article we are talking about "To close or not to close" states:
"XHTML is basically the same as HTML but based on XML."
This is stated as a fact, but in fact many people have made the argument that XHTML never full functioned as XML, partly for the reasons that Pilgrim talks about, but also because only the strict versions of XHTML ever triggered the strict draconian error handling that has always been part of XML. However, there are other ways where XHTML was difficult to treat the same as XML. For instance:
"Note that the reason to do this is to deal with bad browser sniffing where sites send HTML/XHTML markup meant to be served as text/html as application/xhtml+xml, application/xml or text/xml only to Opera, which causes Opera to encounter an XML parse error that breaks the site for Opera."
Sam Ruby is a co-chair of the W3C's HTML Working Group, and if you've read his blog over the years, you are aware of the many problems that arise when treating XHTML as XML.
Some of the debates that have happened over the years simply reveal how much reality differs from the specs:
If it was easy to develop a version of HTML that truly acted as a form of XML, would such debates have been necessary?
Please understand me: I am not criticizing all of the intelligent people who worked very hard on the specs for HTML and XML and XHTML. I am pointing out that after 15 years of effort, no one has found an easy way to treat XHTML as a form of XML under all circumstances. Surely if the brightest minds in the tech industry fail to make this work after 15 years, this is a circle that can not be squared?
Consider the fact that companies like Google felt they had no choice but to ignore the mime type "application/xhtml+xml":
Sam Ruby also makes clear that the concessions to an XML style, including closing void elements, were thought of as an effort to ease the transition:
"I believe that if those that had created XHTML had the courage of their convictions, both Google and Microsoft would have had no choice. I also believe that there should have been a maintenance release or two of HTML4. In HTML5, the root element MAY have an xmlns attribute, but only if it matches the one defined by XHTML; and void elements may have terminating slash characters in their start element. It is these small touches that make transition easier."
Also, in another blog post Sam Ruby makes the point that the draconian error checking that is mandatory for XML also makes it impossible to develop those technologies that supporters of XML were excited about. He gave the example of sending an SVG image to his daughter, and her wanting to post it to her MySpace page: but SVG is XML, and so it should not render on a malformed page, and MySpace was permanently malformed. Sam Ruby could send a gif or a jpeg to his daughter, and she could post that, without a problem, to MySpace, but SVG was limited to well-formed, correctly served pages -- in a world where few pages are well-formed and correctly served. See the comments here:
Finally, in a post I can not find, Sam Ruby makes the point that, for some strange reason, people seemed to very much want something called XHTML, even though it would not be able to act like real XML, for all the reasons that had been discussed in thousands of blog posts and chat rooms. He seemed puzzled by it.
Anyone who advocates for XHTML needs to think long and hard about what it is, exactly, that they are advocating for. If you want an HTML that has an XML style, can you say why?
> If you want an HTML that has an XML style, can you say why?
Because I think that section 12.2 of the current HTML specification is outrageous. (The section is "Parsing HTML documents", if anyone is not familiar with it make sure to look at the subsections "Tokenization", "Tree Construction", etc.)
(That said, I appreciate your detailed comment; this is important history that too few people are aware of.)
(Also overenthusiasm for all things XML had nothing to do with RDF. RDF is not XML.)
I do it for a simple reason, layout clean up with auto-indent. I've found HTML layout cleanup to be unreliable in most editors. Where as XML layout works 99% of the time.
HTML5 is a huge improvement over the HTML4.01/XHTML madness that was going on back in the day. And it's fine with me to allow non-closed singleton tags.
There's perhaps no strong logical argument either way, but from a style perspective, I prefer to use closing slashes to make it absolutely clear what's going on.
Allowing the void tags to be unclosed is the lesser evil of the two, I can even accept the argument behind it (they can't have content) even though it complicates the syntax.
The really evil one is to not make <div /> be exactly equivalent to <div></div> which is just batshit crazy. When I want a placeholder tag (to be populated later) I have to write <div></div> which feels completely unnatural,
There is an advantage to writing your HTML as well formed XML, and that's being able to parse it as XML if you want to. There's no disadvantage to writing your HTML as well formed XML.
The polyglot syntax gets weird in CDATA elements and you have to add a bunch of talismans to the code.
If you don't want to accidentally break it you shouldn't be writing XML by hand or gluing it from strings (https://hsivonen.fi/producing-xml/), so you need to output only using polyglot-compatible XML+HTML serializer.
That's a lot of work for case when maybe somebody will parse your markup as XML? All bots support HTML.
I think that now with HTML5 standardising the parsing behaviour ( http://www.w3.org/TR/html5/syntax.html ), looking at that is very useful too - it shows that void elements get closed automatically by the parser whether or not "/" is included, some other extraneous end tags get ignored completely, and also shows that "</br>" gets parsed as "<br>". So the example given in the article, "<br>Hello!</br>", does have a defined meaning in HTML5 - equivalent to "<br>Hello!<br>".
I'm talking about the specs in the article (not how browsers interpret errors). So </br> may be interpreted as <br> but is actually a syntax error.
I quoted the HTML5 specification in the VALIDITY section of the article.
The fact that HTML5 basically completely specified the parsing for any string of input, even "syntax error" cases, raises an interesting point: if these errors still result in some DOM and across all browsers that choose to implement the error handling (which has also been standardised) so they will have the same behaviour, are they really true "errors" anymore? We usually think of error cases (e.g. in a programming language) as ones which have no meaning or could cause implementation-defined/undefined behaviour, but these have been completely defined by the standard.
I don't see any good reason to use "</br>", but there's some other cases that could be useful, like not requiring spaces between quoted attributes (name1='value1'name2="value2"). I see a parallel with this and the evolution of natural languages: words and syntax that used to be incorrect gradually become accepted as part of the language and attain a normative meaning, because everyone still understands.
Very good article. Been doing most of my web development in the .NET area, starting with ASP.NET and the strict XHTML, I've picked up the habit to always write the /> variant, so it's nice to read about which one to use in the HTML5 age :)
Nice write-up!
I have never thought about shrinking the closing tags to </>, if it were supported it would shrink large HTML-pages quite nicely.
Has there been a proposal at W3C to use that kind of a format back in the good old days of HTML 1.0?
I don't know about the historical aspect but I do know that the HTML5 parsing spec explicitly ignores the "</>" sequence. More interestingly, "</ >" (with an extra space) is parsed as a "boguscomment" which means it basically adds a comment node.
I feel so much better that I don't have to both typing out <br /> anymore after running my HTML through a validator when I first began serious HTML coding (self-taught). It was a habit that stuck with me and is by far one of the most difficult, finger-stretching pieces of code to write. Nowadays I don't even have too much use for breaks, but it's going to be a relief to just throw a <br> ... Ahhh that was so easy to type.
One of two things will happen, depending on your browser.
If your browser is following the WebIDL spec, so all the accessors are on the prototype, this will produce "{}".
If your browser is WebKit-based, this will throw an exception, because body.firstChild.parentNode == body and JSON.stringify throws on object graphs with loops.
Not sure your example would work in a real world situation. The UL would probably have an array attached as it can have multiple children. Then we get into the fact that all tags can have attributes, not just a text node.
You're right, I didn't consider attributes. In my simplified way the parser would need to know which keywords were attributes based on parent element versus keywords that are just new children elements. Which would defeat the purpose.
All of this has been clearly outlined in the spec for decades and many articles have been written over the years talking about this same issue. Why this is a problem for any professional developer, I just don't have a clue.
I appreciate the amount of research that went into it, but in reality this all falls squarely into domain of pedantry, because you close void tags either way and move on to more important matters.
How to close void tags is more of a leitmotif to learn more about the whole subject, and the reason for investigating it. If you're not interested in understanding the core features of the markup language you're using, then this article is definitely not for you.
I guess since HTML is so common it doesn't really matter, but really? We need 5 differnt types of markup, when one would have been fine?
https://xkcd.com/927/