So everything after HTML 2.0 (RFC 1866)? The STYLE and FONT elements were added in HTML 3.2, but when tables were added to HTML 2 in RFC 1942 they already included presentational attributes. Heck, RFC 1866 already included the IMG element with an ALIGN attribute, as well as typographic elements like HR, BR, B, I and TT (for monospace text).
It sounds like you're being nostalgic about a time that never was, especially for XHTML (which the Semantic Web crowd loves to misremember as being 100% about semantics and not just a hamfisted attempt to make HTML compatible with the W3C's other XML formats).
Really people didn't like XHTML because they didn't want to close their elements. That's it. And now most web pages don't even parse in an XML parser. What elements are standard and which are not is completely arbitrary, what the browser does with them doesn't really matter either. What matters is that you can extract the data from it, if you know the structure (either by following some standard, or by having out-of-band documentation). In that respect JSON and X(HT)ML are similar, except now you can't scrape web pages with a single GET request from the canonical URI, and instead need to run a fully fledged browser that parses javascript and/or read some site specific JSON docs (if any are provided at all).
> because they didn't want to close their elements
That is an incredibly uninformed view of early 2000s web content authoring. The reason people didn't like XHTML was that it provided no tangible benefit.
In fact, my experience was quite the opposite: technical people loved XHTML because it made them feel more legitimate, they just had to sprinkle a few slashes over their markup. Validators were the hot new thing and being able to say your website was XHTML 1.0 Strict Compliant was the ultimate nerd badge of pride.
But these same people didn't use the XHTML mime type because they wanted their website to work in as many browsers as possible.
> And now most web pages don't even parse in an XML parser.
Again with the nostalgia for a past that never was: most web browsers never supported XHTML, they supported tag soup HTML with an XHTML DOCTYPE but an HTML mime type. Why? Because they supported tag soup HTML and only used the DOCTYPE as a signal for whether the page tried to be somewhat standards compliant.
Did you reply to the correct comment? Because nowhere do I say I'm nostalgic or it was a panacea.
I don't remember HTML 3.2 being such a problem (I wasn't around for HTML 2.0) because either people didn't do such complex things with their sites, or they didn't redesign much.
I did enjoy how simple publishing documents online was back then, and I'm nostalgic for that. Though definitely not for dial-up internet!
The reason you're remembering HTML 3.2 being nice is that you're remembering a period when people didn't try to do anything complicated. Using tables for layout had nothing to do with HTML 4 and everything to do with the web becoming more mainstream.
The shift that made HTML the mess it is today isn't technology but target audience and sponsorship. The early web was mostly hobbyists and academia, people who didn't care much about layout and were fine with just having a way to make something bold. The equivalent of people writing their blog posts in markdown these days.
I'm saying you're being nostalgic because you claim that there was a point when XHTML was about semantics and not presentation. That time never was. It's true that today it's easier to use HTML for presentation than back in the day, mostly thanks to CSS and the DOM, but what held HTML back initially wasn't technical.
That said, there were numerous progressions and many of them overlapped. Flash and Java applets were infinitely worse than the interactive blobs of web technologies we have today. Table layouts were followed by a second semantic renaissance led by the CSS Zen Garden (which for the first time really popularised the idea of separating markup and presentation).
So everything after HTML 2.0 (RFC 1866)? The STYLE and FONT elements were added in HTML 3.2, but when tables were added to HTML 2 in RFC 1942 they already included presentational attributes. Heck, RFC 1866 already included the IMG element with an ALIGN attribute, as well as typographic elements like HR, BR, B, I and TT (for monospace text).
It sounds like you're being nostalgic about a time that never was, especially for XHTML (which the Semantic Web crowd loves to misremember as being 100% about semantics and not just a hamfisted attempt to make HTML compatible with the W3C's other XML formats).