Hacker News new | past | comments | ask | show | jobs | submit login

Now I don't want to be a downer: but we collectively seem to have forgotten that HTML as a markup language with sufficient semantic elements, is a perfect API in itself. In fact, if we had stuck with XHTML I would've postulated that it would've been an even better API than JSON due to XPath, XQuery and XSLT.

"HyperText is a way to link and access information of various kinds as a web of nodes in which the user can browse at will. Potentially, HyperText provides a single user-interface to many large classes of stored information such as reports, notes, data-bases, computer documentation and on-line systems help. We propose the implementation of a simple scheme to incorporate several different servers of machine-stored information already available at CERN, including an analysis of the requirements for information access needs by experiments… A program which provides access to the hypertext world we call a browser — T. Berners-Lee, R. Cailliau, 12 November 1990, CERN "

Web apps are somewhat backwards in my opinion. We completely lost the idea of a markup language and decided “wait, we need an API”. So we started using JSON instead of the actual document (HTML or XML) to represent the endpoints. And then we patted ourselves on the back claiming “accessibility".




Were you there when XHTML 2 was born and died a slow death? The Semantic Web died with the rise of search engines. Not that the well-formed, validated markup we had back then was generally even remotely semantic. Heck, even Google wasn't able to save semantic markup.

XHTML wouldn't have fixed anything. To make XHTML work on the web, browsers would have needed to make so many compromises we'd just have ended up with something even worse and less specified than HTML is now. HTML5 won because it codified all the compromises browser vendors had to make to deal with broken, real-world markup.

Sure, there are HATEOAS implementations that use HTML as a data format but the semantics of HTML are so generic and insufficient that all you're really gaining is being able to use an HTML parser instead of something easier to implement and more widely supported, like JSON.

Let me repeat that point: HTML semantics are insufficient and too generic. That is why microformats were a thing and embedded metadata was standardised in HTML5. So you can bolt on your custom semantics in a standardised fashion if you really want people to parse your markup.

HTML semantics are only sufficient to describe the structure of a document. And even that they don't do very well right now (just ask anyone who takes accessibility seriously and knows the WHATWG and W3C HTML specs).


Yep I was there. But XHTML did not die due to lack of semantics. One of the things it had was namespaces, and people did embed other XML specs in their documents (remember MathML, RDFa?). It was just too complicated for most people, and when AJAX became a hit people flocked to JSON. Remember, AJAX used to stand for Asynchronous Javascript and XML(!) that part got lost in translation somewhere. Besides, with JSON all the semantics are out of band in some (hopefully well written) doc, rather than in a schema or DTD (which used to resolve in band, even). Comparing JSON to HTML in terms of semantics is silly: both have none. But the idea of using hypertext markup for both display and machine readability, that's something that I would still like to see resurrected


> It was just too complicated for most people

Not just people. What's the point of being able to embed MathML in your XHTML if there's only a single browser that understands it? And what's the point in using XHTML when you need to pretend it's just HTML tagsoup for 90% of your vistors' browsers?

> rather than in a schema or DTD (which used to resolve in band, even)

Except that only happened with a very small number of tools. Browsers never cared about DOCTYPE declarations other than as a signal (or at best a bunch of ENTITY definitions). And schemas provide no use other than validation.

XML is more verbose than JSON and XML Schema is more established than most of the JSON equivalents but none of that is relevant when talking about HTML. The only thing you gain from embedding your metadata in your HTML is colocation.

Truth in DOM was a dream that web developers chased for more than a decade. The reason it doesn't work is simply that rendering information is a lossy process. You either end up duplicating the same information multiple times in your markup (once for presentation, once for display) or adding layers of indirection to render or reverse the render at runtime. It's a fool's errand.


I would disagree that real APIs have a lot of benefits over HTML, like, a good API has documentation and wouldn't introduce random, breaking changes. APIs usually have similar structure and they have less overhead than HTML. Well established methods for authorization that is easy to use from code/cli and so on.


a good API has documentation and wouldn't introduce random, breaking changes

The changes don't have to be breaking, if the HTML is well designed. Adding or removing an intermediate tag is OK as long as the data tags have identifiers to avoid having to write complex, fragile XPath queries.

Well established methods for authorization that is easy to use from code/cli and so on.

Basic Auth is very established and easy to use.


I agree. I manage a site using a hand-built set of 'XHTML' templates & documents, XSLT, and php scripts. Treating the HTML documents as the data source has proven to work very nicely in practise, and I would use the same approach in future, although it definitely needs finessing.

(I say 'XHTML' because it's just HTML that happens to be XML, rather than actually being served as XHTML. But having it in XML obviously makes things a lot easier.)


That was fine when (X)HTML was about semantics, but once it got used for presentation too, the contract was broken. I scrape a few sites (due to the lack of RSS feeds and the desire of UK organisations to treat Facebook pages as their 'News feed') and it's easy... until they do a redesign and the HTML changes.

For that reason alone, an API response format that stays constant no matter how the site looks appeals to me.


Well, technically that's possible with HTML, too. In addition to semantic tags like <summary>, <nav> or <article> we could use IDs and classes and define those as the API contract.

It's just that nobody uses HTML like this and hence IDs and classes in particular have turned into being almost exclusively used for visual / UI aspects of a website (using CSS).

There's nothing in principle though that keeps you from mandating that specific IDs and classes have a meaning beyond what's represented visually and that those signifiers therefore have to both stay the same and must not be used for entities other than those they were meant for.


> but once it got used for presentation too

So everything after HTML 2.0 (RFC 1866)? The STYLE and FONT elements were added in HTML 3.2, but when tables were added to HTML 2 in RFC 1942 they already included presentational attributes. Heck, RFC 1866 already included the IMG element with an ALIGN attribute, as well as typographic elements like HR, BR, B, I and TT (for monospace text).

It sounds like you're being nostalgic about a time that never was, especially for XHTML (which the Semantic Web crowd loves to misremember as being 100% about semantics and not just a hamfisted attempt to make HTML compatible with the W3C's other XML formats).


Really people didn't like XHTML because they didn't want to close their elements. That's it. And now most web pages don't even parse in an XML parser. What elements are standard and which are not is completely arbitrary, what the browser does with them doesn't really matter either. What matters is that you can extract the data from it, if you know the structure (either by following some standard, or by having out-of-band documentation). In that respect JSON and X(HT)ML are similar, except now you can't scrape web pages with a single GET request from the canonical URI, and instead need to run a fully fledged browser that parses javascript and/or read some site specific JSON docs (if any are provided at all).


> because they didn't want to close their elements

That is an incredibly uninformed view of early 2000s web content authoring. The reason people didn't like XHTML was that it provided no tangible benefit.

In fact, my experience was quite the opposite: technical people loved XHTML because it made them feel more legitimate, they just had to sprinkle a few slashes over their markup. Validators were the hot new thing and being able to say your website was XHTML 1.0 Strict Compliant was the ultimate nerd badge of pride.

But these same people didn't use the XHTML mime type because they wanted their website to work in as many browsers as possible.

> And now most web pages don't even parse in an XML parser.

Again with the nostalgia for a past that never was: most web browsers never supported XHTML, they supported tag soup HTML with an XHTML DOCTYPE but an HTML mime type. Why? Because they supported tag soup HTML and only used the DOCTYPE as a signal for whether the page tried to be somewhat standards compliant.


Did you reply to the correct comment? Because nowhere do I say I'm nostalgic or it was a panacea.

I don't remember HTML 3.2 being such a problem (I wasn't around for HTML 2.0) because either people didn't do such complex things with their sites, or they didn't redesign much.

I did enjoy how simple publishing documents online was back then, and I'm nostalgic for that. Though definitely not for dial-up internet!


The reason you're remembering HTML 3.2 being nice is that you're remembering a period when people didn't try to do anything complicated. Using tables for layout had nothing to do with HTML 4 and everything to do with the web becoming more mainstream.

The shift that made HTML the mess it is today isn't technology but target audience and sponsorship. The early web was mostly hobbyists and academia, people who didn't care much about layout and were fine with just having a way to make something bold. The equivalent of people writing their blog posts in markdown these days.

I'm saying you're being nostalgic because you claim that there was a point when XHTML was about semantics and not presentation. That time never was. It's true that today it's easier to use HTML for presentation than back in the day, mostly thanks to CSS and the DOM, but what held HTML back initially wasn't technical.

That said, there were numerous progressions and many of them overlapped. Flash and Java applets were infinitely worse than the interactive blobs of web technologies we have today. Table layouts were followed by a second semantic renaissance led by the CSS Zen Garden (which for the first time really popularised the idea of separating markup and presentation).


The problem with so-called "web APIs" is that usually they function as a way to get users to "sign up" to a website in order to use it, then to track them by some token, rate limit them at will or ask for fees. This is certainly not "accessibility".

The common sense of the user is that either the information on a website is free or it is not free. But "web APIs" have tried to blur this clear distinction. The information is free only if taken in small amounts. Consume too much, too fast and it is not free.

It is the Google model. Allow users to search public information on the web collected that Google collected for free from wesbites and has stored on its computers. But users may only query this public infromation in small amounts, and not "too fast". Otherwise users get blocked.

Now some websites might claim they need the ability to rate limit because if too many users started accessing their website at Googlebot speeds, the websites performance would degrade. But can Google make that claim? Is their infrastructure really that brittle? We are continually bombarded with PR that suggests Google is state of the art.

Is this really the era of "big data"? Users are restricted to very small data. The solution to any website performance being degraded by "too many" requests is to provide bulk data. For example, pjrc.org, where users can buy Teensy microcontrollers, makes the entire available as a tarball for users to download. The SEC traditionally provided bulk access to filings. There are countless other examples.

Is this really the age of Artificial Intelligence, Machine Learning, etc.? Teenagers and companies around the world build robots and the press gets excited. Yet websites block requests out of fear they are coming from "(ro)bots"? Is there something wrong with automation? (The majority of requests on the web are indeed from bots; using software to make requests, as Google and myriad other companies do, is far more efficient than manual typing, clicking, swiping and tapping.)

Summary: The "Web API" is nothing more than an another senseless urging for users to "sign up" to receive free information. Not every "Web API" user is an app developer submitting to an app store, nor are they necessary running a "competing" website. Everytime a website collects unnecessary "sign up" credentials it is one more unnecessary risk for the user that those credentials will be leaked.


Web APIs are a mechanism and therefore a medium for many higher-level implementations. There is nothing that inherently makes it so web APIs blur this distinction of free or paid content. There is a point at which you are asked to enter payment information. You can hide this pay wall regardless of web API or a series of web pages. It’s based on how you order content and data.

As for rate-limiting usage of a provider’s services, that’s what the terms are for. They’re providing the infrastructure and direcly covering the costs. It is an agreement between you and them to abide by their rules for what fair access is. If you disagree, no one is forcing you to pay and/or use that service. You may use another service or even start your own if you feel you can provide a better service. What you can’t do, however, is expect that these services provide some minimum threshold of computation power, especially at their ditect cost. Not unless it’s in the agreement you signed with them.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: