Hacker News new | past | comments | ask | show | jobs | submit login
HATEOAS: An Alternative Explaination (htmx.org)
143 points by recursivedoubts on Nov 23, 2021 | hide | past | favorite | 105 comments



Hello HN.

In this article I try to explain HATEOAS in its original context, hypermedia/HTML, rather than using JSON.

JSON isn't a natural hypermedia and it makes HATEOAS (and the entire uniform interface of REST) difficult to understand and of frustratlingly little benefit to end users.

I don't expect to change the language around REST-ful APIs, where darn near anything over HTTP is called "REST-ful", but I hope that this article will help people who have struggled with understanding what HATEOAS is or why on earth it might be an interesting and useful technology.

In its natural environment, HTML, it is much easier to understand.


First let me say I love htmx. It's nice to be able to write a little web app with minimal javascript and fuss.

It's all well and good to have your API return HTML when your only client is a web browser. And for many applications (like many of mine) that is enough.

But HTTP APIs are most powerful when they are client-agnostic. JSON is terrible in so many ways but it is at least neutral ground: it can be consumed by any HTTP client (which need not be a web browser) and every programming language that matters. For my day job, I deal with a great many HTTP APIs that are never seen or used by a browser. Authentication systems, data storage, schedulers, etc

I can write a Python script that turns a JSON API response into a native Python object with a single line of code. How can you do _that_ with a HATEOAS response that is basically guaranteed to contain arbitrary HTML?


> I can write a Python script that turns a JSON API response into a native Python object with a single line of code. How can you do _that_ with a HATEOAS response that is basically guaranteed to contain arbitrary HTML?

HATEOAS can be (and often is) done with JSON or JSON-based (e.g., JSON+JSONSchema) representation for “backbone” resources (the same resources might also be available in HTML; HATEOAS and HTTP both are designed to accommodate representation being flexible for the same resource.

But, sure, you can also do a one-line parse of raw HTML into a python object. Just as with raw JSON, you’ll probably lose some intended semantics, with either you’ll need to use a format based on the root format but with more tightly defined semantics to do a one-line client that doesn't lose meaning. For JSON, JSON+JSONSchema may be enough for many applications. For HTML, Microformats2 may be enough for many applications.


> Microformats2

Not only that, those metadata are basically enforced by the big closed garden internet players if you want to use any nice embedding; so you need those anyways.


RPC has its place in life, but hypermedia is extremely powerful for application development. I believe many systems would be best served by providing both: a hypermedia-based web application and a JSON-based data API. This essay on the topic may be of interest to you:

https://htmx.org/essays/splitting-your-apis/


If the response is xhtml then de-serialization as easy as with json


I think XHTML is largely a distraction here since the 2000s. HTML5 standardized parsing and the mechanisms for including non-display data.

Since the decline of IE8 or so, almost all of the work I’ve had working with data in HTML has been due to differences in the data models, which is also a problem using JSON or XML.


> I can write a Python script that turns a JSON API response into a native Python object with a single line of code. How can you do _that_ with a HATEOAS response that is basically guaranteed to contain arbitrary HTML?

These aren't equivalent scenarios. The JSON response is a programmatic endpoint so of course it only contains programmatically-relevant data. Typically the HTML is not from a programmatic endpoint, but it could be.

Strip out all styling and non-programmatic content from an endpoint that returns HTML and you basically have pseudo-XML document that can just easily be mapped to a native object as JSON, with the added benefit that you can actually identify links without needing a separate schema as you do with JSON. This isn't typically done but there's no reason it can't be done.

In fact, it would be considerably more powerful than JSON because you can actually transmit functions. For instance, a <form> input and its associated inputs can be translated into a native function with parameters whose names correspond to the input names that makes a remote call. So theoretically, hypertext is JSON + first-class functions. This is why HATEOAS is more flexible.


With htmx the size and complexity of web client should be radically reduced compared to a heavy JS client app consuming the JSON API. I reckon this could free one up to make an API that was specialised for native clients and possibly even the web server itself… maybe no JSON in sight, unless you were keen.


Frankly, I think that HTML can likely be consumed in more languages than JSON by sheer virtue of age.

And there are any number of ways you could standardize the format of your HTML response - such that people who know the formatting ahead of time get the exact same benefits as they would with JSON.


What if we simply move the JSON -> HTML translation layer from the client, to a layer just before egress from the server? Maybe we provide two output options for each endpoint: /foo/1/bar (HTML by default) and /foo/1/bar?format=json


...that's a mechanism already built into HTTP in the form of Accept header. There's something called "content negotiation"[1], which seems to be there since the 90s. Not sure about other frameworks, but the Django REST[2] Framework at least implements this correctly and can serialize to XML or JSON transparently.

[1] https://developer.mozilla.org/en-US/docs/Web/HTTP/Content_ne...

[2] Of course, the framework has nothing to do with REST at all. I gave up telling people this many years ago - it just doesn't work... Mentioning it here since it's relevant to the topic.


Note however that content negotiation has historically been complicated by things like caching or client oddities setting headers. Many of the projects where I’ve used it on have ended up adding a way to specify the format in the URL, although that has become less of a concern over time.


What about /bar.html and /bar.json ?


> How can you do _that_ with a HATEOAS response that is basically guaranteed to contain arbitrary HTML?

https://pypi.org/project/mechanize/


Have you looked into JSON-ld? Sure it is an extension of json but it has a formal specification which solves all the problems you have with your json examples, specifically the out-of-band issue you have with your json and how to link to other things. FWIW I maintain a API and frontend which use JSON-ld and has the backend passing the information I need to render forms based of backend permissions like you are describing.

What your html isn't doing for me is give proper type information. Should I look for the text "Status: x" in some div to find the status, and how is this better than having "status" as a key in some json object? I would not enjoy developing a consumer which had to interface with an api nesting data in divs.


> What your html isn't doing for me is give proper type information. Should I look for the text "Status: x" in some div to find the status,

Oh wow, you seem to have come out of reading this blog post still extremely confused.

In HATEOAS, the client doesn't care at all about whether the page contains a "status". It displays the hypermedia the way it is given without knowing the semantics at all - that's what makes it de-coupled from the server. It only knows the semantics of the hypermedia, in this case HTML.

In cases where you want a single-purpose client that can automatically take action depending on your domain's concepts (like account status) you simply can't use HATEOAS, go with RPC, which is a much better choice, or, as most people today are doing, stateless JSON over HTTP (which is not REST as most developers today seem to believe).


> Oh wow, you seem to have come out of reading this blog post still extremely confused.

Yea, guilty of not reading the article properly sorry. I now get what you are saying about not using the status to drive what forms appear in UI / actions can be performed. Using JSON-ld you might use the https://schema.org/potentialAction property to describe what actions the resources allows which provides the equivalent information as the HTML form tag.

After reading Roy Fielding's blog a bit I do find my understanding of REST to be lacking. My reading of the principles agrees with your statement: "It displays the hypermedia without knowing semantics at all" but then he seems to also endorse RDF and N3 which are more like JSON-ld from my understanding and I'm not sure how they cater to display purposes.

> When representations are provided in hypertext form with typed relations (using microformats of HTML, RDF in N3 or XML, or even SVG), then automated agents can traverse these applications almost as well as any human. There are plenty of examples in the linked data communities. More important to me is that the same design reflects good human-Web design, and thus we can design the protocols to support both machine and human-driven applications by following the same architectural style.

https://roy.gbiv.com/untangled/2008/rest-apis-must-be-hypert...


RDF, JSON-LD and other "structured" formats try to allow clients to gain a certain understanding of what the data means... it's easy to understand that when you look at HTML's semantic tags [1], which let the browser and other clients know, e.g. what's the "main" section and what's just an "aside"... and structured data provided by RDF, Microdata [2] and JSON-LD (which were all the rage in the never-realized Web 2.0), which you can embed in the HTML itself in order to provide lots of metadata information to the client about the data in your web page.

Today, these are actually used by, for example, search engines [3] to display data in a structured manner on searches!

Unfortunately, I don't know of many other "creative" usages of this metadata, but I imagine there could be many... HATEOAS is IMHO still ahead of our time... I do expect that, at some point, it will be really beneficial to have clients that can have a good understanding of the information is shows to users, and one day it will suddently became the next big thing (again).

[1] https://developer.mozilla.org/en-US/docs/Glossary/Semantics#...

[2] https://html.spec.whatwg.org/multipage/#toc-microdata

[3] https://developers.google.com/search/docs/advanced/structure...


> In HATEOAS, the client doesn't care at all about whether the page contains a "status". It displays the hypermedia the way it is given without knowing the semantics at all - that's what makes it de-coupled from the server. It only knows the semantics of the hypermedia, in this case HTML.

This is not exactly what the REST thesis advocates for. You can absolutely create a true REST client for something much more specialized and narrow purpose than HTML. The main point of REST is that your client and server should both understand the media type, and then be developed independently to have all the semantics of that media type.

HTML is not a special media type in this sense. It just happens that it's an extremely broad format, broad enough to encompass almost the entirety of the web. But not all REST APIs need to be this broad: you can create specialized REST APIs for specialized media types.

In principle, the definition of a true REST API would be: can you create a a fully conforming client for the REST API by just reading and implementing the definition of the media type + using the HTTP verbs? If yes, then the API is truly REST like. If you need any other out-of-band information, then it's not. Note that knowing URLs is already out of band information.


Completely agree... it's just that it's very rare to see truly hypermedia-driven APIs out there, other than the Web itself.

I've seen only one, and that's one I helped develop.


> It displays the hypermedia the way it is given without knowing the semantics at all - that's what makes it de-coupled from the server. It only knows the semantics of the hypermedia, in this case HTML.

I don't think this is true in the general case. Yes, you can make a hypermedia browser/"viewer" that doesn't understand any semantics apart from links and depends on HTML+CSS for presentation. But you can also make a hypermedia browser that knows the semantics such as microformats or RDFA.

JSON-LD is in a practical sweet spot because it has links and also structured data. Obviously, you don't view JSON-LD with a web browser directly but with a browser that has another presentation layer in place of / in addition to HTML+CSS. The downside is you have to agree on and implement a presentation layer, the upside is it can be much simpler than a web browser plus you get machine-readable data with semantics.


> Have you looked into JSON-ld? Sure it is an extension of json but it has a formal specification which solves all the problems you have with your json examples, specifically the out-of-band issue you have with your json and how to link to other things.

Which problems does it solve, exactly?

When I read the JSON-ld examples I see fields containing @-prefixed links to other JSON documents, but why not just include that JSON in the original document, instead of linking to it?


> When I read the JSON-ld examples I see fields containing @-prefixed links to other JSON documents, but why not just include that JSON in the original document, instead of linking to it?

A few reasons:

- It avoids bloating the response, especially if the referenced data contains other references, and so on.

- It allows cycles, e.g. person A's "spouse" can reference person B, and person B's "spouse" can reference person A.

- We can reference things which we don't know, or haven't bothered to calculate, or aren't permitted to access, or which don't even exist yet. All we need is their URL.

- References aren't simply locations (URLs, which we can GET), they're also identifiers (URIs). URIs let us reference a particular thing, whereas an embedded value might be ambiguous. For example, the value `{"firstName": "Tom", "surname": "Cruise"}` might be talking about the famous actor, or some other person called Tom Cruise. In contrast, a reference like "https://dbpedia.org/page/Tom_Cruise" only refers to the actor.

- URIs are quick and easy to compare, whereas JSON values are computationally harder to compare (especially if we're inlining a whole bunch of related values), and ambiguous; e.g. in the Tom Cruise example above, is that JSON object the same (semantically) as one which also contains `"birthDate": "1962-07-03"`?


> why not just include that JSON in the original document, instead of linking to it?

For me, the reason I'm linking and not including is bandwidth (and load of (de-)serialization). The age-old "pointer vs value" optimization.

When you go overboard, the "state" part of the payload becomes a large part of whats being sent. I've worked with APIs where over 80% of the data was this "cruft" and only 20% was the actual data. It happens esp when you compose your API of many small objects - which in itself has tradeoffs but is often a good practice.


IMO, the benefits of HATEOAS are very much overblown - many circumstances you have full control over the version of both client and server, and all it is doing is adding an extra layer of indirection that allows extra complexity and ambiguity to sneak in. Instead of the client depending on a specific known API endpoint, it depends both on the endpoint you give it at runtime and the process that serves that endpoint to the client. And as a bonus, the client now "helpfully" obfuscates what endpoints it needs to render which pages - what you could figure out with straightforward grep for something like "api/v1/books" now needs a deeper understanding of the application.

Like, the way in which the client accesses the "deposits" functionality is either fixed or can dynamically change. If it's fixed, the indirection serves no purpose - just have the client do the thing in the one way that you ever have the client do the thing. If it can dynamically change, the state space involved can and often will grow out of control in a hurry, and I strongly suggest pushing back on designs that require structural dynamism like this.


> IMO, the benefits of HATEOAS are very much overblown

Much overblown in the context of JSON APIs.

In this article I am (in part) arguing that HATEOAS (and, therefore, REST) don't make much sense when working with JSON, but make a ton of sense when working with HTML, and, therefore, when building hypermedia-based systems.


You may not have encountered another use-case. This is not only about design-time versus run-time. But a large part about "where to have business-logic".

Say simple use-case where you have web/android/ios clients, which show an invoice after the payment is processed (and not cancelled, pending, halted, flagged-for-terrorist-double-check, or whatever banks come up with tomorrow).

You either have one service determining "there is an invoice" or "there is no invoice", or you have all your web/andoid/ios clients implement this logic and have to keep it in sync. Not only is this flakey, it introduces very tight coupling to fields and their values. I've seen abominations such as `status == "pending" || status == "PENDING" || status == "PendingValidation" && validation == false`, (edit) repeated in every client (and obviously slightly different).

Moving this to a single point of truth comes with a lot of benefits.

But when you don't ever have such business-logic, by all means, skip all the indirection and just render whatever comes back, based on a swagger doc you read for version x.y.


You don't need to go full HATEOAS to consolidate most business logic in server-side API. Like, this is just an example of an awkward API that exposes `status` and `validation` and makes clients do the business logic, rather than one that exposes `hasInvoice` and/or `invoiceId`. The HATEOAS bit is sending across a URI instead of a resource ID, which only helps when there's business logic for which endpoint to use for a specific resource ID that needs deduplication. And even that is only necessary when the client must make the request directly and can't proxy through a "figure out which endpoint to use and delegate it" endpoint.

To be specific, HATEOAS would help when Visa and Mastercard need to use different endpoints for the invoice, and the payment networks require the client to directly query their endpoint rather than going through a server API you control.


I've worked with APIs that used IDs to communicate state.

While that works, IDs are a poor abstraction to communicate possibilities. IDs are good for structuring relations and composition (hasMany, belongsTo etc) whereas links indicate actions.

An ID says: "there is a thing somewhere". A link says "you can do somthing there". My example of "getting an invoice" is a bad one for this, because a GET is where IDs and links overlap. "There is an invoice somewhere" is almost the same as "there is an invoice there".

It becomes more apparent when creating things. No ID can communicate that a client may add a payment. Or can delete a tag. Or can move an item. Links can do that. So, I find that links are better abstractions because they cover all these cases whereas IDs cover only a subset.


I 100% am onboard. I think this article still undersells how vital a shift it is, whether you are using a programming-oriented representation JSON, or the web's de-facto and de-jure representation of information HTML (un-discussed but key to me is RDFa and Microdata semantic-markup). This article is so good, draws a very clear picture, & yet it's still no-where enough in de-lineating just how key it is, how vital it is, that the actual web medium- HTML- be able to & be used for expressing information. It at least makes a solid go at trying to show how JSON is arbitrary, where-as HTML is definitive, declarative, expressive.

Back in DHTML days we talked about "data islands", which could be either JSON or XML/XHTML/HTML, either embedded in the page, powering the JS & webapp. Allowing the data & page realm to remain separate is a huge mistake. It overlooks the reverence & respect we ought/need to have for the core source of truth, for the real medium, HTML. Programmers need to take a knee & respect the vital, core, discoverable, honest truth of the web: HTML >>> JS. JS is supposed to just be a tool. As a agnostic/near-aetheist, it still feels non-contradictory to me to say we have fallen away from god in our march towards JS-driven applciationization. We have served illegitemate interests.


I like this piece because I think it does a good job explaining the purpose of HATEOAS. The author is right that JSON can obscure its usefulness and HTML is a more familiar way to demonstrate it.

However, the author neglects to mention or is unaware of link relations, which can document and prescribe actions to take on a link just as "action" and "method" prescribe how a browser should submit a form.

Link relations can fix the <form> problem of JSON as a hypermedia format by prescribing the actions a client must take in order to follow a link. This is what the "action" and "method" of the form element do and these could be defined as target attributes of a "deposit" link represented with JSON.

The author is correct about pure JSON, but JSON-based media types like application/hal+json, JSON-ld, and application/vnd.api+json are very workable. Using these allowed my team to implement a decoupled React application with a JSON API back end. We used link relations similar to the banking example and our team was able to add and remove deposit-like links and add new buttons for previously unplanned operations without updating any JavaScript.

Full disclosure: I'm an editor of the JSON:API spec (i.e. application/vnd.api+json)


I'm a big believer that HATEOAS serves no actual purpose. Adding a layer of indirection is pointless - messages get bloated by URLs the client virtually never needs. For the vast majority of APIs, URL structures don't change that often and, when they do, it often indicates changes to content being transmitted, which would require business logic updates anyway.

For making APIs forward/backward compatible, both GraphQL and Protobufs are better options.


As with many other comments in this thread, I agree with you, but only in terms of JSON APIs.

The article is an attempt to explain HATEOAS in terms of HTML (or hypermedia) where it has a purpose and, in fact, is simply the natural state of affairs.


I’m disappointed that the initial vision of the web where the user-agent is an editor and not a browser was abandoned.

Playing around with Nyxt has given me a glimpse of how that might have looked.


The html on this page also contains URLs that you will never click.

The real problem with HATEOAS for JSON APIs is that we have never built the proper client. With html, you can rewrite your entire website and the browser will happily render the new pages. When people start writing such a client for JSON, the JSON starts to look very similar to html.


That's interesting. That makes me think of all the times a frontend engineer has tried to get me to put display logic in backend APIs. Their arguments were:

- We can update the site without deploying code changes

- These configurations can be actual configurations in our administration site, vs. code in the frontend

- Dramatically simplifies the frontend

My counterarguments were:

- it doesn't make any sense to `GET /songs&limit=50&offset=1000` and also get ordering information or what have you, which changes the mental model from data store to application state store

- we're not lowering complexity, we're just shifting presentation information to the backend where it certainly doesn't belong (imagine having to list songs in two different places that are presented differently--is that a different endpoint, a `scene` parameter, blah blah blah)

- this information almost never changes

But taken to its (I guess) logical conclusion here, where there's a sufficiently smart generic HATEOAS frontend, it's pretty cool?

But I also guess it's not really different. Something has to know what order the buttons are in. If we're looking at the chain of a web stack, we have:

    [DB] -> [API] -> [Frontend]
Doesn't it just seem like presentation stuff goes where we do the presenting? I think people's issue with HATEOAS is that we actually wanted an RPC backend where frontends control every aspect of the presentation, and HATEOAS wasn't just extra bloat, it put a straightjacket on design and asked backend engineers to start sending HTML again.


You could probably do it and maintain clean separation of logic vs presentation but what's the benefit? I don't really think producers or consumers want a generic frontend - it's main utility seems like it would be for automated crawling, but there are plenty of other ways to handle that.


> [...] the JSON starts to look very similar to HTML.

So that means, in practical terms, that every system that complies with the REST principles ends up virtually isomorphic to the web?

Which kind of means that the REST principles are not general enough?


That would essentially be Postman or some other API testing tool. The thing is, very few people have any interest in browsing pure data.

HATEOAS for JSON is especially useless given the same information can be derived from an API spec like Swagger, without the bloat of embedding reference URLs in every response.


Lots of excellent points in the article, but the problem I have with it (and HATEOAS in general) is that, like NakedObjects before it, it expects the model of the resources and what you do with them to mirror precisely the model the user understands and wants to do. Ie, it takes no account of UX and design principles.

As a result, you end up with rather CRUDdy noun-oriented systems that don't really flow, with resources randomly thrown around by developers to try and make it make logical sense. A kind of accidental hypermedia complexity.

I love the abstract concept of HATEOAS, but it needs some overarching architectural abstractions to help designers and developers implement just the resources and links needed. A great research opportunity there IMHO.


Yes, I agree entirely. The crux of the issue is that HTML has stalled as a hypermedia for a few decades now, but I am trying to fix that with htmx:

https://htmx.org

htmx is the result of my research :)


Quite often, using nouns actually improves your architecture (and your Ubiquitous Language[0]). This design constraint is often one that pushes you in the right direction, but you have to allow it to push you.

E.g. the classic "I want to archive this ticket". Three common approaches:

POST/PUT /coupons/1337/archive

PATCH /coupons/1337 { archived: true }

POST/PUT /coupons/1337/archivals

Designer now gives up. The second option is technically often the easiest (esp with CRUD frameworks like Laravel or Rails) but design wise by far the worst. The first is RPC-style and the last is suffering from what parent describes

> As a result, you end up with rather CRUDdy noun-oriented systems that don't really flow

What we see, however, is that our UL is lacking. What is it that we really do. What do we end up with?[1] We might "copy it to the archives" or "relocate it to the archives" etc, depending on your business case and domain details. We request a "relocation", which, when resolved, results in an "archived ticket".

POST /archives/tickets/ { ...the entire ticket }

or

POST /tickets/1337/relocations

It now adheres to REST, better describes the domain language, is consistent and flows quite alright. EDIT: the latter, for example, communicates that this may take a while (async), results in the ticket being moved elsewhere and the original one probably redirecting to the new. The former communicates clearly that we are making a copy or snapshot of the ticket in the archives.

But there will always be cases where the verb makes more sense or adheres much better to the UL and domain. In such cases, one can still be resource-full and REST-full without introducing awkward nouns. E.g. in an CQRS setup, we interface with Commands. We can then request commands: `POST /commands/archive_ticket_command/` for example.

---

[0] https://www.martinfowler.com/bliki/UbiquitousLanguage.html

[1] Sidenote: I try to avoid words that are both a nound and verb. Like "archive". To archive vs In the archive: this causes ambiguity.*


As someone who has worked as part of a team on an Angular 9.x frontend with Spring Boot Java backend mediating between many other backend API based services... well, it's a huge mess, a resource hog in terms of disk, RAM and CPU, and far slower than it could be.

Development is slow and cumbersome and the browser uses 48MBytes+ on the page for what is ultimately a slightly complicated CRUD app.

Thr JSON APIs are secured by tokens and other mechanisms, so you can never query them directly without the Angular app.

We have 800MB or more in node_modules with Node taking 100s of MB and the Java backend uses over 1GB of RAM in dev mode just to start up.

It's an internal application available over the VPN. Even with high speed internet some API calls take 2 seconds to return and populate parts of the form in edit mode. A single unified backend server pumping out html would likely be both faster and easier to manage.


Not sure how this is related to HATEOAS, but having one api backend to front several private services is a pretty common pattern. But as a data point, I can tell you that with us using a react frontend to talk to a nodejs graphql backend that also fronts private services, our nodejs graphql backend is usually below 300mb of ram. At any rate, it sounds like your spring boot app combined with your network topology is the constraint, good luck chipping away at it.


Perhaps you posted this comment on the wrong thread, because it doesn't seem to pertain to the article at all.


We have state in two places and have to talk to 3 servers (frontend Angular/Node, java middleware and databases behind that) to serve a page, with JS running continually on the browser to monitor changes in variables etc..

Isn't the idea that you have a server that serves static or nearly static html to the browser? And maintains the state on the server itself.

We can't easily do what the LISP guys call memoization, which would be a huge win for performance by keeping properly cached values on the server, for example.


> Isn't the idea that you have a server that serves static or nearly static html to the browser? And maintains the state on the server itself.

Which idea? The idea of REST? Stateless servers are one of the core elements of REST https://www.ics.uci.edu/~fielding/pubs/dissertation/rest_arc...:

> We next add a constraint to the client-server interaction: communication must be stateless in nature, as in the client-stateless-server (CSS) style of Section 3.4.3 (Figure 5-3), such that each request from client to server must contain all of the information necessary to understand the request, and cannot take advantage of any stored context on the server. Session state is therefore kept entirely on the client.

> This constraint induces the properties of visibility, reliability, and scalability. Visibility is improved because a monitoring system does not have to look beyond a single request datum in order to determine the full nature of the request. Reliability is improved because it eases the task of recovering from partial failures [133]. Scalability is improved because not having to store state between requests allows the server component to quickly free resources, and further simplifies implementation because the server doesn't have to manage resource usage across requests.

> Like most architectural choices, the stateless constraint reflects a design trade-off. The disadvantage is that it may decrease network performance by increasing the repetitive data (per-interaction overhead) sent in a series of requests, since that data cannot be left on the server in a shared context. In addition, placing the application state on the client-side reduces the server's control over consistent application behavior, since the application becomes dependent on the correct implementation of semantics across multiple client versions.


> A single unified backend server pumping out html would likely be both faster and easier to manage.

That's probably an oversimplification, because where would this unified backend server get its data from? How much code and services would need to be rewritten and re-engineered to make this work?

I mean, you know this tech better than I do, but don't make the mistake of thinking to just build something on top or alongside an existing problem, because it just paints over said problem instead of fixing it.

I mean, anecdote time. I once worked at a big company. Some clever engineers (including me? but I had no say on the architecture) decided to build a microservice architecture on top of an infrastructure-as-code setup on AWS with a Demand for 99.99999% uptime from the top brass.

However, the real processing in the end was done by some Java application built by people who weren't very good at software engineering (think: the answer to a stack overflow question "how to connect to a database in Java", copy / pasted five levels deep in an if / else tree).

And that was just an abstraction layer on top of fuck knows what, all I ever heard was that it's a mainframe, something cobbled together from decades of takeovers and mergers of other companies in the same industry.

It was layers on top of layers to try and make things manageable and fast, but in effect it made things less manageable and more complex and didn't solve the root cause, because nobody dared.


> Thr JSON APIs are secured by tokens and other mechanisms, so you can never query them directly without the Angular app.

What, you can't acquire auth tokens via API?


The last two sentences are:

> While attempts have been made to impose more elaborate hypermedia controls on JSON APIs, broadly the industry has rejected this approach in favor of simpler RPC-style APIs that forego HATEOAS and other elements of the REST-ful architecture.

> This fact is strong evidence for the assertion that a natural hypermedia such as HTML is a practical necessity for building RESTful systems.

This all sounds about right to me and makes me ask what the problem actually is? Maybe an RPC-style API is a better choice for most systems?


The success of the web argues that the hypermedia approach is a good choice for a large set of systems.

The problem today is that hypermedia development has been stuck in the late-90's/early-00's which, when contrasted with JSON/RPC-based development, is really showing its age, particularly in terms of UX.

https://htmx.org is an attempt to rectify that situation and get hypermedia-style development back in the game as a viable, modern network architecture for more than just content-oriented web sites.


I was looking at that site a bit earlier today and I guess I still don't understand the motivation.

On the "Htmx in a Nutshell" page you say:

> This keeps you firmly within the original web programming model, using Hypertext As The Engine Of Application State without even needing to really understand that concept.

Why is the original web programming model worth consideration? Why choose a RESTful API over a more RPC-like API?


Some reasons you might want to use hypermedia over an RPC-like API:

- The simplicity of the model (e.g. The "active search" pattern can be implemented with a few hypermedia attributes in htmx: https://htmx.org/examples/active-search/)

- The flexibility of the uniform interface[1] (e.g. versioning your API is much less of a concern)

- The ease of deep linking


I looked at HATEOAS a few years ago and thought it would make sense in a latency-free universe. The reality is that waiting for one page to load to know what's next and then waiting again, is a recipe for bad frustratingly slow interactions. Shouldn't we download the whole application "graph" schema upfront? How big could that be?


The browser has a built in bit of software called "the rendering engine" which turns HTML strings into visual widgets surprisingly quickly. A javascript application will still need to render to HTML. The few additional attributes necessary for an href are round off errors in the overall scheme of things.

I would note that HATEOAS was simply a description of the world wide web. So, in as much as the web worked and continues to work, latency-free universe or not, it appears to be practical for building software systems.


>The browser has a built in bit of software called "the rendering engine"...

FYI if you want to advocate for something, it helps not to condescend to the people you're advocating to.


> I looked at HATEOAS a few years ago and thought it would make sense in a latency-free universe.

It makes perfect sense in a latency-laden universe; see, e.g., the Web.

> Shouldn't we download the whole application "graph" schema upfront? How big could that be?

Arbitrarily large. And it can become stale arbitrarily quickly. But, anyway, HATEOAS is consistent with that if you need it (consider the case where the application data is in a single document and individual resources are accessed by fragment identifiers for one simple example.)


HATEOAS does not constrain you about that. You can easily return composite pages and have links inside of the local context. There is a reason why html has page internal links.

In fact its exactly designed in a way so that you can return one page that already has other stuff inside, and then the client can decide what of that to render. The server can even tell it not to render certain things.


I think this is a spectrum, currently with 2 approaches. The SPA routes mean every client side interaction is responded directly by the app, but still waits for the network. Client needs to download the whole application "graph" though. The HTML way means it doesn't "respond" for the latency period. But with modern tools (like HTMX/Hotwire/Unpoly), devs can start with HTML, and preload certain interactions/use minimal client side JS, instead of delegating all of it into an SPA.

E.g.: Client side search/filter https://twitter.com/htmx_org/status/1459549924014669825


If you calculate the size difference between JSON and a corresponding well-structured HTML/XML (i.e. without unnecessary nested divs), you will find that the extra network overhead is negligible. Modern SPAs still require network calls for most meaningful interactions. We can use Ajax calls to fetch partial HTMLs and replace target elements (like HTMX does).


HTTP APIs have been a huge hit, but HATEOAS hasn't. Nobody does it right, nobody "gets it," everybody must have just misunderstood Roy Fielding's dissertation. If only we had a clearer, better explanation, like this one!

Unfortunately, this article demonstrates what the industry has known/shown for years: that HATEOAS is unsuitable as a technique for HTTP APIs.

Imagine writing a programmatic client against the bank account "API" documented in this article, where the way you'd find out that you can't make a withdrawal is not by checking a "status" in JSON and finding it's overdrawn, but by parsing the HTML and noticing that there's no action link (no `<form>` element!!) that happens to make a withdrawal.

Instead of a JSON parser, you'd now need an HTML parser (one of the most notoriously quirky formats in modern use), but not just that, you'd also need your client to read and understand that HTML.

For example, where's the error message explaining why you can't withdraw? Why, it's written in the page as user-visible text. Which text is user-visible? Uh oh, now you need to parse and execute the CSS on the page to figure that out, too.

You'd need a full-blown browser to do that--a "user agent" as the HTML specification likes to call it--and now, instead of using an API, you're screen-scraping the web site by automating a browser with Selenium or Puppeteer or what have you. What will the client do when the server decides to restructure the HTML? Which one of those <a> links or <form> elements represents withdrawing cash??

It is possible to develop useful tools that automate full-blown websites, but it it's much much easier to develop useful clients when the responses are highly limited in the structure they're allowed to return, to couple the client to the server by committing, in advance, to how the API will work.

That's been Fielding's mistake from the very beginning: to think that hypertext provides useful techniques for developing programmatic interfaces. Hypertext is only useful because of its users. Users can read text, understand it, and decide what links to click on or what forms to submit. Programmatic API clients can't afford that luxury.

As the article demonstrates, if the client was intended to just be a program, we would never use HTML for any of these exchanges, and especially not use <a> links or <form> elements to convey to that program what is or is not possible.

The industry has rejected HATEOS not because we were too stupid understand how awesome it is; we've rejected it because we do understand how awesome it isn't.


I agree entirely with you and I don't regard it as unfortunate at all that this article shows that HATEOAS is unsuitable for JSON HTTP APIs.

It is extremely suitable, and, in fact, is simply descriptive, of an HTML-based API, however.

A lot of my thinking around this comes from my work on htmx, a tool that turns HTML into a richer hypermedia:

https://htmx.org

And I am actively encouraging people to try to split their thinking about JSON Data APIs (RPC) and REST-ful HTML-based systems.


I agree entirely with you

Do you, though? The GP is pointing out that HATEOAS (terrible name btw -- why not just HAS, Hypertext Application State?) requires a full, stateful user agent for a client to consume the "API", whether human or otherwise. This is what we call "screen-scraping", and it's a pretty tough road to consume an API like that. In particular the GP points out that an interaction requires, in general, a (stateful) crawl of the remote. Note that screen-scraper issues boil down to state (which the GP covered) or "entropy" -- HTML is higher entropy than JSON, and programmers prefer to use the lowest entropy thing, because its much easier to work with, in exactly the same way text is easier to work with than a jpg of text.

What JSON driven SPAs are doing is reifying hypermedia using a static function that maps from JSON to HTML -- and that function IS coupled to the domain. And I would argue you can't get away from that as long as you want to decouple presentation from state - and it seems like HAS wants to explicitly couple them, which is bad news for programmatic consumption because it means changing the page layout will change your API, and that's called "brittle" in engineering circles.

Interestingly, you could rescue a low-entropy representation in HAS by specifying a (hypermedia-specific) function that does the reverse operation of the JSON SPA function, and extracts a low-entropy representation of the page. But this would impose constraints on page authoring, so it's not exactly without cost.


A HATEOAS API is one where the client is completely un-coupled to the server and any domain in particular... you want to have a front-end that consumes an API, that's the opposite of that.

You and many others in this thread seem to think that an API must be separate from the front-end, which is understandable as that's the current standard way to do things, but HATEOAS is the polar opposite of that. The web itself is HATEOAS, so obviously it's useful.... the fact you can build a front-end app that consumes a JSON HTTP API on top of that shows just how incredibly powerful that concept is -but yeah, it requires a smart client, like a browser, to work...


It's not that I don't get it, it's that I call HAS screen scraping and I think screen scraping is a bad API design. It's also that I think HAS explicitly couples API and presentation, which is undesirable.

We can argue all day, but the proof is in the pudding: build something real with this approach, and show me it's better. I have doubts, but even one counter example would change my mind.


> This is what we call "screen-scraping", and it's a pretty tough road to consume an API like that.

That's hyperbole. If you're willing to accept clients which are hard-coded to some specific JSON structure, then we can accept clients which are hard-coded to some specific HTML structure.

The point is that a HATEOAS API will, by default and with no extra work from anyone, be browsable via any Web browser (including those with assistive tech like screen readers, etc.). In contrast, the best we have for JSON APIs is to pretty-print the response text, perhaps with syntax highlighting; embedded URLs might be made clickable, but we're probably restricted to only performing GETs.

If we use HATEOAS this might actually be enough for our purposes; e.g. if we just want to expose some basic functionality, like an admin UI. If not, we can write a client for that API; again, this doesn't require screen-scraping, or solving ambiguous parse issues, etc. since the client is not a general Web browser.

If we use JSON, we are pretty much forced to build a bespoke client for the API; due to how little we get 'by default'.


> [Comparing HAS consumption to screen-scraping] That's hyperbole.

How does it differ, then? Do you have any code examples?

Based on your next statement:

If you're willing to accept clients which are hard-coded to some specific JSON structure, then we can accept clients which are hard-coded to some specific HTML structure

it does not differ, but you instead say that both approaches require hard-coded clients to consume programmatically, so they are equivalent.

If that were so, then another approach like embedding results in a jpg, would be equally valid since you just have to hard-code your clients to extract information from the jpg. After all the human can parse this so we don't need a bespoke client. From the perspective of theoretical movement of data from a to b, yes of course this is true. But from the perspective of an engineer consuming it, no, they are very different. And the difference is that entropy is proportional to developer pain.

I hope you try to be more rational and less ideological about your ideas. I'll leave it at that. I really like thinking about alternative architectures, but this one's appeal seems to rely on ignoring some very real trade-offs.


> Do you have any code examples?

For extracting known elements from HTML that's under our control? That's a pretty standard task; here's a shell command to extract the 'deposits' URL from the article's example code:

    xidel -q -e '//a/@href' - | grep '/deposits'
How on Earth is that equivalent to OCRing a JPEG?

> I hope you try to be more rational and less ideological about your ideas.

What's "ideological" about claiming that it's easy to parse known elements from (X)HTML that we control?


> everybody must have just misunderstood Roy Fielding's dissertation. If only we had a clearer, better explanation, like this one!

> Unfortunately, this article demonstrates what the industry has known/shown for years: that HATEOAS is unsuitable as a technique for HTTP APIs.

Yes - that's all in Fielding's thesis - the REST architecture style is derived explicitly for hypertext applications, and contrasted with among others RPC and "movable code" (ie: server send js to client to execute).

So yeah, most people seem to have misunderstood Fielding's thesis.

Now, if one wants to argue against the benefits of multi-layer caching and so on, one might hold up the complexity and relative failure of WebDAV. I'm not convinced that's a failure of architecture as much as a failure of the design process surrounding WebDAV, however.


I have to disagree with you. In my opinion it really doesn't matter what the format is. There are tons of formats that could be used.

If you want return some information that says why something is not available, you could do that in multiple different ways.

You could either simply return a piece of information that say 'Account below 0' or you could return a the link and suggest the client could do a HEAD against the endpoint.

It seems you have taken the example given here to literal.

Sometimes it totally correct not to want to return something the client can't do. The client doesn't always need an explanation why something is not available. Many clients should even know that something should be available.

If you always return all the reason for everything you are also leaking information.

> The industry has rejected HATEOS not because we were too stupid understand how awesome it is; we've rejected it because we do understand how awesome it isn't.

The reason is in my opinion that the extra effort required is often not worth it because most companies control the client and the server and simply evolve them together. HATEOS is most effective when you want to have multiple clients that all evolve separately.


I think you are overstating the difficulty/problem of parsing HTML. The hard part of HTML is actually displaying it.


> HTML parser (one of the most notoriously quirky formats in modern use)

Agree with shaunxcode; while HTML indeed has quirks (such as wrt inline script+CSS parsing and URLs), plus overly permissive error recovery, it's just straight SGML with formal tag inference and markup minimization techniques known since at least 1986 when SGML was published.

The problem IME with HATEOAS is that even highly capable devs see it as an optional, nice to have feature, when loose coupling and late discovery of links in hypertext is the entire point of Fielding's REST concept. It baffles me to no end that even gifted developers turn to exegetic and "best practice" approaches for justifying their JSON-over-HTTP backends and squeezing invocation parameters into URLs when all they really need is a JSON response that can be parsed by browsers, and often times pass JSON in backend-to-backend responses in non-Javascript environments not even having JSON as built-in serialization format.


> just straight SGML with formal tag inference

What's more: the argument against HTML can be made for the soup that is used on the web-for-browsers; where webdevs put divs in spans in a-s in buttons to work around some quirk in CSS or such.

This does not go for the HTML that you design especially for an API. Such HTML is simple, clean, lean and flat, while being semantic. Parsing such HTML is easy: every language has a DOM parser for such XML.


I think those are valid points for the current state of things.

But from my perspective, where things started breaking apart was with the advent of html5, when sloppy html really became the accepted norm.

Parsing xhtml wouldn't have been a problem, and I'm sure we'd eventually found out a better "skinning" language to use than xslt over the server's response.

I am still yet to come across a real HATEOS implementation in real life, and probably the chances are smaller by the day. It has this nice theoretical purity but I don't know yet if it's worth the complexity.


> But from my perspective, where things started breaking apart was with the advent of html5, when sloppy html really became the accepted norm.

Sloppy html was the accepted norm long before html5.


HTML5 standardized what you get from "sloppy" HTML, so is more an improvement over the before status where each parser did its own thing with not-perfect HTML.


> It has this nice theoretical purity but I don't know yet if it's worth the complexity.

It is not, which is why no one does it; solution in search of a problem.


This article seems to just be describing plain-old HTML. You make a request to a URL, get back HTML, render the HTML. The HTML has forms and links that you click and get back new HTML.

I don't see how this is an API at all. It's just (server-side rendered) HTML.


REST, including the principle of HATEOAS, is an architectural style derived as a generalization/rationalization of the early web design and explicitly underlying HTTP 1.1, so an explanation of HATEOAS looking like a normal use of HTML on the web is not at all surprising.

The Web itself is a distributed hypermedia API.


It is an API that you program against when you want to build a client for that API. That client is a browser, the most famous of which are Chrome, Firefox, Safari, Edge, Opera, etc.

The HATEOAS constraint is what allows the actual form of (for example) a webshop's checkout to start asking for additional details from one day to the next without me needing to update the client. The Hypermedia is indeed HTML, images, movies, javascript, style files, etc. HTML contains the embedded links to navigate around a webshop, the form controls tell the browser what it needs to POST to a url (and that it needs to POST or PATCH).

To do something like that with a data API, the form the data must be comprehensible a priori and the client must be able to know the form of the data dynmically. This means either a large (or small like the web) registry of types of data that you are limited, or some sort of link + schema that the client can understand. If you build such an API, and I build a client, then my client will also naturally work on anyone else's web application that follows the same constraint. That is, my client will be universal. It turns out that nobody web browsers really want or need a universal client.

If you want to see an example of what that looks like in practice, check out JSON-LD at https://json-ld.org/spec/latest/json-ld/ or other specs like it.


A server returning HMTL is an API. It is a hypermedia API, which is what the terms HATEOAS and REST originally came out of. Which is why those concepts have had such a bumpy ride when ported over to JSON APIs.


> I don't see how this is an API at all.

That's absolutely an API... perhaps what you think of as an API is too narrow in scope?


Shouldn't HTML in the examples be compared to something on the same abstraction level, like HAL?

HAL is to JSON what HTML is to SGML.

Nobody would argue thst SGML isn't suited for the web, because it too general/low-level of a language.


The second example in the article is HAL-like.

Leaving the relative poverty of HAL as a hypermedia aside, the crux of the issue in my opinion is that JSON is consumed by code, rather than humans, and thus the flexibility and discoverability of HATEOAS is largely wasted:

https://intercoolerjs.org/2016/05/08/hatoeas-is-for-humans.h...


Yes, that might be true, I can't imagine how it should work without a human making decisions on the fly, but that's not a problem of the format.


I would say it is not a problem of the format in theory, but it turns out to be one in practice. And that using HTML rather than JSON when explaining REST and HATEOAS would be a much more natural approach.


The second article you linked here makes a much stronger case than the first one.

If HATEOAS doesn't work without a human, it's essentially useless for API design.


I agree, the second article is deeper. The first is an attempt to explain HATEOAS in terms of HTML.


Two things (not exhaustive!) that a JSON (or any data interchange format) API can do to improve discoverability:

1. Provide OPTIONS responses for all resources, clearly and consistently documenting usage of that resource.

2. Provide relevant OpenAPI documentation for those OPTIONS requests, and an interactive UI for GET requests accepting text/html (obvious caveats aside).


That's fine and well, but JSON isn't a natural hypermedia and it has struggled to achieve much by adopting hypermedia-like features.

I think this is because the power and flexibility of the hypermedia model requires that the consuming entity have agency, and humans consuming HTML (unlike the code consuming JSON) provide that agency:

https://intercoolerjs.org/2016/05/08/hatoeas-is-for-humans.h...

Providing more documentation for JSON is an admirable goal, but that isn't the uniform interface of REST, which requires no documentation, just an entry point from which all possible actions are surfaced via hypermedia.

Rather it is better documentation for what is a fundamentally RPC model (not bad! just different)


> JSON isn't a natural hypermedia

No format is a hypermedia, because hypermedia involves heterogenous formats bound with links.

But any format in which links can be conveniently embedded can participated well in hypermedia, and JSON is definitely a format in which links can be embedded.

> humans consuming HTML

Humans aren't any more the usual consumers of HTML than they are of JSON. In both cases, software consumes and may (or may not) interact with humans. But immediate agency in the consumer isn't necessary for hypermedia, in fact, the hypermedia model is very much intended for robust mechanical consumption by systems not designed with advance knowledge of the structure, because hypermedia is self-describing.

That's why web mashups, search engines, archives, and specialized crawlers are all able to work.


> JSON is definitely a format in which links can be embedded.

Not really? Html has a way to include links (a/href) json has strings.

One can of course add on top of json (like one can build on utf8 text to build json). But json is a pretty terrible format - quite simple, but quite weak.

You can embed links in json as much as you can embed a json document as a base64 encoded string inside another json document...


> with advance knowledge of the structure

Wouldn't advance knowledge of the structure do away with the benefit of self-describing messages?

I assert that, without strong AI or simply passing it along to humans (as with HTML) REST is largely wasted on code.


“bot designed with advance knowledge...” was an unfortunate typo for “not designed with advance knowledge...”

Yes, advance knowledge makes self-description superfluous.


Bots can crawl a link graph, but that seems to me to be a relatively primitive use of a uniform interface when compared with a human interacting with a hypermedia system.

Does that strike you as well?


> natural hypermedia

Like, observed in the wild?



REST APIs are designed to think in platform centric way. Hateoas are designed to extend the same philosophy.

I see The HTML example in the article isn't quite that. I mean the last example which renders content in HTML assumes client wants it for a POST call. What if me as a client using it for GET call? Having said that you as a developer is building a HATEOAS form of content in the user interface. Am I reading this right?


You as a client have no say in the matter, beyond asking for particular representations which might be presented to you.

A REST-ful (in the true sense of the word) system exposes its API to you through hypermedia: links, forms and so forth. You, the human, may interact with it. But there is nothing outside the API provided by hypermedia, no matter how poorly implemented that hypermedia API might be.


> But there is nothing outside the API provided by hypermedia, no matter how poorly implemented that hypermedia API might be.

I've heard this before about HATEOAS and true-REST but I'm always left wondering about odds and ends. For instance, in the deposit example the client is told that the form accepts "amount" which is of type "number". Does this imply that the unit of amount has to be negotiated between server implementation and client implementation out-of-band? Or, to stick to the same example, how should the client behave if the server responds with a 504? Presumably the client and server would need to agree on the protocol for dealing with failures out-of-band, whether retries are allowed, how and if commits work in the system?


Ideally the server simply responds to the erroneous state with hypermedia, using hypermedia as the engine of application state. ;)

Response codes have always been a bit of a mine field. In htmx, we allow users to hook in and define their own semantics for response codes:

https://htmx.org/docs/#modifying_swapping_behavior_with_even...


> Ideally the server simply responds to the erroneous state with hypermedia, using hypermedia as the engine of application state. ;)

I'm really not sure what this means. In the units case how does the server distinguish that the "2" a client sends it is not in millidollars or what not and respond back, without prior and ongoing coordination? Or if the server expects to do a two-phase commit for some POST how would that be communicated to the client without out-of-band communication?

Maybe I'm just misunderstanding what "nothing outside the API provided by hypermedia" is supposed to actually encompass.


In the case of HTML pages, there is ongoing coordination. The server expects the client to have accessed something before that said "please input amount in dollars" or maybe "please input amount in cents".

But if there is some new hypermedia format that supports currencies, it can be used instead of a string. What makes it proper hypermedia is that it isn't tightly-coupled with a specific API, it is as "universal" as HTML.

"Nothing outside the API provided by hypermedia" means there is no tightly-coupling between server and client other than the protocol and hypermedia format. No schema, no out-of-band-information, no iPhone app... It's just like the browser works: your computer has no special anything for accessing Hacker News, only a fully decoupled client that understands HTTP(S)/HTML (the browser).


> But there is nothing outside the API provided by hypermedia

Hypermedia APIs rely on out-of-band definitions of the semantics of media types and the semantics of operations on media types.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: