Hacker News new | past | comments | ask | show | jobs | submit login

That was fine when (X)HTML was about semantics, but once it got used for presentation too, the contract was broken. I scrape a few sites (due to the lack of RSS feeds and the desire of UK organisations to treat Facebook pages as their 'News feed') and it's easy... until they do a redesign and the HTML changes.

For that reason alone, an API response format that stays constant no matter how the site looks appeals to me.

Well, technically that's possible with HTML, too. In addition to semantic tags like <summary>, <nav> or <article> we could use IDs and classes and define those as the API contract.

It's just that nobody uses HTML like this and hence IDs and classes in particular have turned into being almost exclusively used for visual / UI aspects of a website (using CSS).

There's nothing in principle though that keeps you from mandating that specific IDs and classes have a meaning beyond what's represented visually and that those signifiers therefore have to both stay the same and must not be used for entities other than those they were meant for.

> but once it got used for presentation too

So everything after HTML 2.0 (RFC 1866)? The STYLE and FONT elements were added in HTML 3.2, but when tables were added to HTML 2 in RFC 1942 they already included presentational attributes. Heck, RFC 1866 already included the IMG element with an ALIGN attribute, as well as typographic elements like HR, BR, B, I and TT (for monospace text).

It sounds like you're being nostalgic about a time that never was, especially for XHTML (which the Semantic Web crowd loves to misremember as being 100% about semantics and not just a hamfisted attempt to make HTML compatible with the W3C's other XML formats).

Really people didn't like XHTML because they didn't want to close their elements. That's it. And now most web pages don't even parse in an XML parser. What elements are standard and which are not is completely arbitrary, what the browser does with them doesn't really matter either. What matters is that you can extract the data from it, if you know the structure (either by following some standard, or by having out-of-band documentation). In that respect JSON and X(HT)ML are similar, except now you can't scrape web pages with a single GET request from the canonical URI, and instead need to run a fully fledged browser that parses javascript and/or read some site specific JSON docs (if any are provided at all).

> because they didn't want to close their elements

That is an incredibly uninformed view of early 2000s web content authoring. The reason people didn't like XHTML was that it provided no tangible benefit.

In fact, my experience was quite the opposite: technical people loved XHTML because it made them feel more legitimate, they just had to sprinkle a few slashes over their markup. Validators were the hot new thing and being able to say your website was XHTML 1.0 Strict Compliant was the ultimate nerd badge of pride.

But these same people didn't use the XHTML mime type because they wanted their website to work in as many browsers as possible.

> And now most web pages don't even parse in an XML parser.

Again with the nostalgia for a past that never was: most web browsers never supported XHTML, they supported tag soup HTML with an XHTML DOCTYPE but an HTML mime type. Why? Because they supported tag soup HTML and only used the DOCTYPE as a signal for whether the page tried to be somewhat standards compliant.

Did you reply to the correct comment? Because nowhere do I say I'm nostalgic or it was a panacea.

I don't remember HTML 3.2 being such a problem (I wasn't around for HTML 2.0) because either people didn't do such complex things with their sites, or they didn't redesign much.

I did enjoy how simple publishing documents online was back then, and I'm nostalgic for that. Though definitely not for dial-up internet!

The reason you're remembering HTML 3.2 being nice is that you're remembering a period when people didn't try to do anything complicated. Using tables for layout had nothing to do with HTML 4 and everything to do with the web becoming more mainstream.

The shift that made HTML the mess it is today isn't technology but target audience and sponsorship. The early web was mostly hobbyists and academia, people who didn't care much about layout and were fine with just having a way to make something bold. The equivalent of people writing their blog posts in markdown these days.

I'm saying you're being nostalgic because you claim that there was a point when XHTML was about semantics and not presentation. That time never was. It's true that today it's easier to use HTML for presentation than back in the day, mostly thanks to CSS and the DOM, but what held HTML back initially wasn't technical.

That said, there were numerous progressions and many of them overlapped. Flash and Java applets were infinitely worse than the interactive blobs of web technologies we have today. Table layouts were followed by a second semantic renaissance led by the CSS Zen Garden (which for the first time really popularised the idea of separating markup and presentation).

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact