Hacker News new | past | comments | ask | show | jobs | submit login
Best practices for REST API design (2020) (stackoverflow.blog)
296 points by r_singh 14 days ago | hide | past | favorite | 267 comments



This article doesn't mention linking at all, which is at the heart of REST. Without links, it isn't REST, and most of the suggestions don't have anything to do with REST per se. Most of it is just standard JSON-over-HTTP stuff that's implemented in a variety of frameworks and libraries.

EDIT: To be clear, I don't particularly care if an API is "RESTful" or not, as long as it's well-designed and documented, but I think there are interesting/useful ideas in REST that are lost when it's conflated with JSON-over-HTTP.


> This article doesn't mention linking at all, which is at the heart of REST.

Please, just don't do it. Yes, I read the dissertation. It's just not a good idea, it has never given me any practical use whatsoever and has always made dealing with the API more annoying.

> Most of it is just standard JSON-over-HTTP stuff that's implemented in a variety of frameworks and libraries.

Yes, and we call that REST or RESTful anyway. We'll keep doing it that way. You're going to have to accept it.


Interesting. I usually find it easier to generate URLs on the server and more convenient to consume APIs that include links to related resources.

The other option is munging URLs on the client side, which can be tedious.

I'm genuinely curious as to how including links in the response could make an API harder or more annoying to use.


You have to know how to parse the response. This usually requires to know what you will receive, which limits the usefulness of generic link responses somehow


Totally agree. HATEOAS is a PITA.

Never overcomplicate things. Never add metadata if you don't have to. Just K.I.S.S


The main problem with REST is that it never developed tools even on the level with what we have for GraphQL.

Somehow everyone rushed to implement GraphQL schemas, and no one rushed to implement JSON Schemas for example. And JSON Schemas with the right tools are a much more powerful instrument.

Same goes for linking, and the rest of HATEOAS.

At a company I worked we had an internal tool built for most of this stuff which took a lot of problems with HATEOAS away. Unfortunately, we never opensourced it :(


> The main problem with REST is that it never developed tools even on the level with what we have for GraphQL.

REST isn't really designed for the problem of implementing a single, centrally controlled API, and is basically overkill for that. Its made for a decentralized, heterogenous network of resources controlled by different parties, which can have separately evolving relationships. Like, say, the WWW.

If you have sufficient tooling around it to consume that kind of heterogenous resource network, it's trivial to apply it to a centralized service, but we don't really have the kind of client tooling for applications other than UIs that makes that nice widely available yet, and everyone wanting walled silos instead of free remixing means that REST is probably an antifeature for lots of API providers even if the tooling was 100% there.


Very eloquently put! Thank you!

> Please, just don't do it. Yes, I read the dissertation. It's just not a good idea, it has never given me any practical use whatsoever and has always made dealing with the API more annoying.

If you believe REST is not a good idea then you never had to deal with versioning and struggling to keep clients and servers you don't own to play nice. Asserting that something like REST has no practical purpose is asserting that your experience is slim to none in this domain so that you are not mindful of the most basic challenges of getting servers and clients that evolve independently to continue to interoperate with minimal development effort. REST is content discovery and allowing clients to transparently adapt to breaking changes in the server. How does anyone with any relevant experience miss the point of that?

Furthermore, your assertion makes no sense at all. The main property, and the whole point, of REST is HATEOAS. It makes absolutely no sense to claim an API is REST if it misses the single most important design element that's behind REST. This is not pedantry or nit-picking: it's the whole difference between plain old RPC and REST. If you want to design an API that provides fixed endpoints to be called,and isn't discoverable or navigable, then just call the spade a spade: RPC over HTTP. Otherwise why do you feel the need to claim you design an API around a design principle you don't use and even criticize?


>If you believe REST is not a good idea then you never had to deal with versioning and struggling to keep clients and servers you don't own to play nice.

Sincere question: how does HATEOAS helps with this?

The only thing that HATEOAS helps with, is when the URL pointing to an entity changes, which is the most simplest change someone can make to its API.

But if an API ever changes the relationships between entities (new entities, 1-1 relationship changed to a 1-m relationship, ...) then HATEOAS won't help at all. You'll still need to change the logic of your client to take into account the modifications.

I've always felt like nobody bothered supporting HATEOAS because it's practically useless. The only use-case where it can be used is if you want your API to be crawled by a search engine. Then the search engine could crawl between entities thanks to the links you provide.


> Sincere question: how does HATEOAS helps with this?

HATEOAS basically involves two main things:

1. Resource representation specifications are communicated in the communication channel (in HTTP, this can be a combination of MIME type in headers are more specific specification in the document itself for responses, and accepts header in requests.)

2. Related resources are identified by URL.

> The only thing that HATEOAS helps with, is when the URL pointing to an entity changes

False, though #2 helps with that.

> But if an API ever changes the relationships between entities (new entities, 1-1 relationship changed to a 1-m relationship, ...) then HATEOAS won't help at all.

No, that's what #1 helps with, as it provides the means where you can:

(1) identify the resource representations available,

(2) identify if you have a resource-consumption/creation-client available that satisfies the endpoint requirements, and

(3) select the correct client implementation.

> You'll still need to change the logic of your client to take into account the modifications.

Yes, someone needs to implement a new resource client if a new resource structure, or representation of the same structure, is developed.

And if you are doing a single, centralized API, where no one else will be hosting something using the same resources, REST doesn't provide much. Where REST shines (which is why it is abundantly used on the web) is where large numbers of different parties will be hosting services using a potentially shared set of resource representations, where each hosts may evolve the particular set in use on their own schedule, and perhaps even using different network protocols, and you don't want to have bespoke clients for each service, but instead protocol clients for each network protocol and resource clients for each resource representation, and in-band signalling of not only where to get each resource once you get to an entry point, but also have the network protocol client you need and the resource client you need to successfully carry out each transaction.


> Sincere question: how does HATEOAS helps with this?

The whole point of HATEOAS is to address this. This is pretty much why it was developed and presented. Why on earth is anyone discussing REST not picking up on it's reason to exist?

> The only thing that HATEOAS helps with, is when the URL pointing to an entity changes, which is the most simplest change someone can make to its API.

It really isn't. At all. You're somehow missing the whole point of a resource-based architecture, let alone REST.

The whole point is that there are no fixed paths. At all. All there is is resources, whose representation might vary in shape and form, which are made to be discoverable by providing semantic descriptions of said resources. With REST you provide resources and metadata that points to representations of those resources. With REST your clients do not care about endpoints other than a root one, and resources. The resto of the process consists of tracking resource representations that you want, and you do not care where they are. At all.

This sort of profound misconception of what REST is supposed to be is the reason why somehow some people believe that it's reasonable to slap the REST label on APIs just because they churn out JSON. You can't. REST is based on design constraints that address a problem which these APIs fail to address at a fundamental level.


I'm totally fine with calling it RPC-over-HTTP, or RESTless, or unREST. I'm also fine with just calling it REST. I don't care.

I'm not fine with making an API use HATEOAS, especially not for the sake of being pedantically accurate about the word REST.

As for your defense of HATEOAS, you're not doing a good job. If it's such a good idea, why is pretty much nobody doing it as intended? How exactly does it help with versioning? Can you describe a real use case?

The way I see it, if you want to break your API, slap /v2 in front of it. Otherwise, I strongly suggest that you do not break your API. Do not move stuff around pointlessly. I guarantee you that even if you followed HATEOAS best practices to the letter, your clients are still going to break. You don't want that.


> I'm not fine with making an API use HATEOAS

And that's perfectly fine, but don't call what you're doing REST because it really isn't.


This is a semantic dispute. Our disagreement is on whether this is a dispute worth arguing over.

There are many other industry buzzwords like "object oriented" that don't match up with what the people who coined the term had in mind. It's a nuisance, but nothing to get worked up over.


We could use a name for "standard JSON-over-HTTP stuff that's implemented in a variety of frameworks and libraries".

So what shall we call REST without HATEOAS[1]? "RESTless"?

[1] https://en.wikipedia.org/wiki/HATEOAS


No, a truly RESTless API must do away with all the other pointless HTTPisms in REST. If you're using any other method than GET or POST, or any other codes than 400 and 200, you're not truly RESTless.


Most of the times I end up with put endpoints with verbs, because that often how businesses work. Objects and processes. Behind a REST resource there is a database table most in of the cases. But what if I want to send an email? POST /emails ? That's RESTless as well!


Not very useful and adds bunch of boiler + more bandwith for each req.

With only links, you don't know semantics and inputs. The benefits are next to 0, and effects on the maintenance are non trivial.


I wrote about why linking is useful recently https://www.daniellittle.dev/practical-hypermedia-controls. I've been using Hypermedia in my APIs for a few years now and it's been so useful for making great APIs. One of the things I really love about it is that you can test/use the API directly much more easily, because the knowledge about how to use it is in the API itself, instead of half hardcoded into the client.

Most of these best practices forget the hard parts of JSON/REST APIs:

- How to handle date and time incl. time zones - JSON has no data type for that - Handling of numbers (JSON only has double, which does not fit most cases) - Defined and parseable error responses (rfc 7807 plus extra fields for details) - Localization - do you send translated texts or just error codes? - How to handle updates? Overwrite every field? How to update only some fields for a resource? Optimistic locking? - How to handle 1+n problems for reading? ...


Send ISO date/time in utc.

Send numbers as strings. you should be treating this as hostile in your backend anyways, and checking it.

Send both an error code, and a string in simple english, or whatever your most common developer language is.

If you care about only updating certain fields, track changes and only send those fields to the backend.

These arn't really that hard.


Should we put the error code in the http status code and status line, or in the response body? We’re having a great debate at work and it’s been a blast mapping business logic errors to closest http status code spiritual animal. The Dean of the REST Engineering Directorship has decreed a bit of both, but most of The Unwashed (front end developers) are revolting on this matter insisting it all maps to a red-tinted “Sorry, reboot and try again” toast.

In other news we’re suspecting transactions are on horizon. Half the backend team are in denial, the other half are busy trying to figure out the most pleasant place to install decidedly “unRESTy URLs” for them. Somebody already set themselves on fire over PUT/POST for that.


>Send ISO date/time in utc.

With offset, ok. As UTC alone, it doesn't work if you need to know the local time when something occurred, like medication administration, especially when the things may occur across time changes like DST.


Do not just track offset. If you have to deal with date/time locally, you must have the timezone. Treat this just like bytes/Unicode: accept local time on the edges of the app, convert immediately to UTC, store/use UTC internally, convert back to local on the edges at display time. Yes, it can get more complicated than this of doing calendar math, but I'm not aware of anything gained by just tracking offset.


That does not work in the op’s use case because the op wants to know the local time of the medication administration.

The endpoint of the viewer may not be in the same local time.

If you go this route you may not care about location beyond this one data point, but have to store another location field each time.


> Send numbers as string [...]

Why? Why not send a number as number (double) and treat it as hostile in the backend? I dislike sending numbers as string because I think the different data types exist for a reason.


One reason could be that it's easy to think that a "number" is sanely typed, while you almost never want a double - not for quantity, not for price, etc. So you're going to cast to some (big) integer anyway...


In languages with reasonable number types (i.e. not JS), the parsers are typically able to produce integers, or even BigDecimal, out of a JSON number, without going through double.


I guess I feel returning error invalid integer for a valid json double is a bit wierd - but I guess it's not technicallyaany worse than "could not parse string as integer. It's more that it is expected that strings hold" other" data, like timestamps, while one might expect a double to hold only (but also any) doubles.

Many of the numbers we use day to day are more like strings than actual integers, though. What sense does it make to divide by a zip code, or two add two zip codes together for instance?


This. Just because something contains numbers doesn't mean that it's numeric. Times aren't numbers because 6pm * 7am doesn't yield a meaningful result.

Same thing with serial numbers, VIN numbers, building floor numbers, phone numbers, etc. All of those should always be strings, because performing "math" on them wouldn't yield meaningful results.


While I basically agree, there‘s a difference between „always send numbers as string“ and „send something that looks like a number but is actually a string as a string“.

Why should i send a dimension as a string? How am I going to calculate the volume from 3 strings?


> Why should i send a dimension as a string?

You shouldn't because a dimension is a number. Something 4 feet long is twice as long as something 2 feet long.


I work in ecommerce and deal with this issue a TON. I see it in things like GTIN/UPC or tracking numbers.

Many != All

So prices are numbers (if supplied in cents, the smallest unit). There is length/depth/height/weight/... A lot of things are actual numbers. Why supply them as string?

I agree partially with you but think ZIP are a very bad example because a zip itself may also contain letters.


A bit off-topic but I received an email from a government agency today that pretty printed my ZIP code as "12,345", I'm assuming because they store ZIP as a number. Made me wonder what would happen if I used a full 9 digit ZIP code with a minus sign.


Reminds be of a broadly used industry specific database that packs multi-line addresses in a delimited string with an initial integer which indicates the number of lines. Some addresses were written to the database without the initial integer and were addresses in a city with East/West streets with number names. Addresses like 1425 East 34th St commonly written as 1425 E34 were casted as exponents, so an address with 1.425E37 lines, which was a problem for the ETL-downstream database that attempted to make that many rows for address lines. It caused much confusion and delay for the downstream DBAs.


I don't care if you like it or not, if your numbers are actually that large, serialize it to a string and back out on your backend. You shouldn't be blindly dumping json to a structure anyways.

Either that or use a different serialization method to communicate.

No one else seems to be complaining about this issue. If JSON doesn't fit your needs, use something else.


> Send ISO date/time in utc

what's an ISO date/time?



I assume GP means ISO8601 Date Time https://en.wikipedia.org/wiki/ISO_8601. Default behavior of javascript date json serialization

  > JSON.stringify(new Date())
  '"2021-02-22T20:34:53.686Z"'


Then why add the "in UTC" part? ISO 8601 specifies how to designate the time zone.


ISO 8601 does not contain timezone information, but offset. Timezone info needs to be specified separately, likely as IANA names. It is a common misconception to time zones though, and in a way demonstrates why it’s best practice to always use UTC. When sending/receiving a datetime, no timezone info is available; if timezone is needed, send it separately (and keep the time value in UTC) to save misunderstandings for everyone.


I'm aware of the distinction between ISO 8601's time zone designators (which are simply UTC offsets) and other much more complex notions of time zones such as the IANA tz database (which identifies time zones with an "Area/Location" string and contains information about each zone such as daylight saving time rules and even the historical changes in such rules).

That said, "time zone designator" is the formal name of the UTC offset in ISO 8601, and it's reasonable to refer to it as such whenever the distinction is clear.



There's and iso for dates as string with the offset, that's usually fine.

Handling Post and Patch differently can at least give a general guideline for "overwriting vs keeping fields on update."

The localisation part is trivial if you consider that APIs are to be consumed by machines - much harder, if you assume they're used for GUIs.

Error handling is easy to define right, but so tedious to implement....

1+n can be improved by "guessing" which nested ressources are going to be fetched, and include them "by default."

But you're right that those are decisions that need to me made, and can quickly be agonised on without adding much value....


As to updates (and other aspects of API design too), I highly recommend taking a look at the solutions proposed at https://aip.dev - e.g. in case of update, https://aip.dev/134 - note the use of FieldMask (though there's some slight not-yet-resolved inconsistency observed recently by one user: https://github.com/aip-dev/google.aip.dev/issues/673)


Somewhat important tangent: the JSON number type is not a double. It's just a number of arbitrary size and precision in integer/decimal/E format that can be parsed as whatever the parser finds fitting.

This distinction is important because you can't serialize infinities or NaN, and there's no guarantee the JSON number can be accurately represented as a double. JS likes to pretend that JSON number is interchangeable with Number and this can result in some fun situations when your Infinity becomes null

I guess the point is that JSON has about as much to do with JS as JavaScript has to do with Java.


Do not use a page argument for pagination. If you have another process/client concurrently adding/removing items, then some items will be returned twice, and others will never be returned. It is better to use, for example, the ID of the last returned item as a starting point for the next query.


If items get returned twice while iterating, then I know something got added. Manga websites do this and it's a nice unintended feature.

Slicing by ID is how Github does commit history and it drives me nuts that I can't jump several pages, for example, to see when the first commit was, or to guess whereabouts some commit is given a known time range. IDs make it impossible to do anything but iterate step by step.

I much prefer an interface that exposes:

a) how many items exist in total

b) which offset it's starting from

c) how many items in the slice

Then if I want a slice from 300-8000th items, I can type exactly that in the URL. Yes, I understand this will render a huge page, just let me do it the one time so I won't be spamming your server with requests over the next hour trying to find something while fighting against bad UX.


> If items get returned twice while iterating, then I know something got added.

Fair point. I agree, that's a nice side effect. But what if you're looking at page 1, and an item is removed from that page? Then you'll never see the first item at page 2, because it's now the last on page 1.

> Then if I want a slice from 300-8000th items, I can type exactly that in the URL.

That's a nice feature. But it can put a lot of load on your backend if you paginate over 10 of thousands of items.


> an item is removed

How common are concurrent removals in practice though? I can't think of a single instance off the top of my head where this is problematic and you can't just go back to the previous page if you're really paranoid or confused that something is "missing"

> it can put a lot of load on your backend if you paginate over 10 of thousands of items

Look at it in aggregate. Fiddling w/ URL params to paginate (instead of clicking pagination links) is a power user move. If I want the date of first commit in a repo, I'll only look at the last page (vs paging through the entire history). For guessing, I can click a page, and if I went too far, binary search from there (again, vs linear search). Etc.

Even for the most degenerate use case (e.g. some jerk trying to crawl over the entire dataset), the load is smaller with a single request than the overhead of multiple requests. Paginating is not an appropriate mitigation strategy against this type of traffic, and you arguably can implement detection/caching/blocking mechanisms much more easily for naive huge queries than if you need to differentiate regular traffic from bots.


> I can't think of a single instance off the top of my head where this is problematic and you can't just go back to the previous page if you're really paranoid or confused that something is "missing"

The whole issue is that you're never going to know about it. Sure, you can write some convoluted automated process to double-check previous page(s) but most people aren't going to do that.


Sure, that's why I'm asking whether this is at all common. If the use case is programmatic, then just provide the ability to get an arbitrarily large set atomically (rather than forcing pagination size limits) and the problem goes away.

Another thing to consider when paginating via id cursors: if the deleted item is the one your cursor is sitting on, then you no longer have a frame of reference at all. For example, what happens to commit history pagination in github's implementation if I rewrite git history? Chibicc for example deliberately rewrites its history for didactic purposes.


> just provide the ability to get an arbitrarily large set atomically

It's difficult to do in most HTTP deployments where we try to limit HTTP response duration.

> if the deleted item is the one your cursor is sitting on, then you no longer have a frame of reference at all

Not a problem. If your order your results by timestamp and ID, then you provide the timestamp and ID of the last item returned, and take anything after. Doesn't matter if the item is still there or not.


Not sure what you mean by timestamp, but I don't think that works for the example I posted. If I rewrite the first commit in history, every single entry will have different sha's and commit dates. With offsets, clicking "next" will at best feel like nothing happened, or at worst run into the mild off-by-one confusion. With ids, there's no way for the system to know where to go unless there's custom logic to somehow map old sha's to the new ones (assuming that is even possible).


> That's a nice feature. But it can put a lot of load on your backend if you paginate over 10 of thousands of items.

To prevent that, usually maximum page size is enforced on the server anyway and the client is informed about the actual page size in the metadata in the reply.


As a user I expect that to happen. Cursors on the other hand are awful for getting to arbitrary pages, they are mostly useful for "More" links as on HN or Reddit.

It's a trade-off.


The notion of "page" really only applies as an arbitrary interface division to make performance predictable. The same book could be published in different form factors such that there's no real meaning to a specific page number. There's no "turn your bibles to page 112." That's even more clearly true when the data itself is changing over time, such that even the first thing on the first page is changing over time.

Thus using page numbers is probably a pretty poor proxy for what you're actually trying to do when you say "getting to arbitrary pages." Presumably you are wanting to skip to a specific place in the list, perhaps specified as a percentage ("take me halfway through the list") or as some predicate on the data ("take me to items from 2 weeks ago"). APIs should provide ways of expressing these specific places in a list, instead of requiring you to either guess page numbers or do extra work to calculate them.


Isn't that just pagination with a different granularity?

You'd still present the user with a set of results (a "page") and would let them seek forwards/backwards to the adjacent subsets of results.

AFAICT cursors can't support this.

And even if you go halfways into the list you can't display all subsequent items, so you'd still have to paginate in some way.

The page metaphor is there for a good reason.


There are two concepts here. One is simply the idea of a service returning only some portion of the items in a list for each request. That notion almost always exists simply for practical reasons: to bound the performance for retrieving, formatting, transferring, and consuming the list. I have no problem with that.

The other concept is using an actual page number to make requests, e.g. requesting {page: 1} and then subsequently requesting {page: 2}. This concept is the one I was claiming is less desirable than some alternatives.

As for cursors, I don't see any reason why you couldn't make requests like {listPosition: "50%"} or {createdBefore: "2020-02-15"} and then still use cursors in the response to request the previous or next page. (Those two examples probably aren't actually good API naming conventions, but it should demonstrate the idea.)


Cursors are also the only reliable option for supporting programatic usage for automation use cases.


I’d say they’re right: cursors are way better for an API. Pages are way better for a person. This here is about an API.


Well good luck giving your users direct page access when your API only supports cursors.


Here I think it makes sense to split the API in two classes:

1. APIs consumed by other backends lets call them API2B

2. APIs consumed by frontends lets call them API2C

In this case cursors are better for API2B but not for API2C as in case of API2C most users expect to be able to jump directly to a specific page.

At least when I am designing an API i take different decisions based on this split. For example in case of API2C I always want to see FE design even if I work on backend.


How often do people really jump to an arbitrary page? I'd say filtering / searching capabilities are far more important for users than paging.


It depends? If I'm looking for a new desk on IKEA's online catalog then I care more about filtering/searching, but if I'm reading through a conversation on a forum and want to be able to link to certain sections then I want that paginated.


But even for the latter case, surely you're not specifically interested in a specific page called "page 50" but rather you're interested in items satisfying some predicate (like "posts from 2 weeks ago") or locations respective to the entire list (like "halfway through the list"), and it would be much better if the user interface supported those sorts of navigation.


You can also use SQL:2011 "System-Version Tables" (a.k.a. temporal databases) supported in most of the major RDBMSs now. You'd just need to keep track of a timestamp when you started querying/paginating and include it as part of the SQL query, which'll give clients a consistent view even if the database is being concurrently modified.


temporal databases) supported in most of the major RDBMSs now.

Which ones?


I'm only aware of SQL Server (2016), Oracle and MariaDB supporting temporal tables, and... having dived in to them, they can certainly add some complexity to your data and queries. Would be happy to learn of more - had to do some digging on this in late 2019 for a PostgreSQL project, but there's nothing native or 'standard' for pg at that point in time.



Thanks!


you can return urls that guarantee proper pagination with your response, if the output is more than the limit. this way you can incorporate your logic (id of the last returned item is a starting point) in pagination and make sure as your API evolves, client applications pagination logic doesn't have to change as long as you provide proper pagination links

for example GET /customer/1/orders/

response:

{'orders': [order1, order2, order3],

'navigation': {

'firstpage': '/get/customer/1/orders/',

'nextpage': '/get/customer/1/orders/?query=orderId^GT4',

'lastpage': '/get/customer/1/orders/?query=orderId^GT990',

'totalrows': 1000

}

GT means greater than


I'd suggest caching the result of a query that could return multiple pages, if consistency is truly important. Otherwise, I don't think it really matters which option you choose, as long as you document the behavior.

Edit: obviously the proper impl totally depends on your app. If the result set is enormous, caching doesn't make sense. If it's rapidly changing, pagination probably doesn't make sense. etc, etc.


If consistency is really important, don't paginate on the backend at all. The only way to win is not to play the game.

That said, using “page of results preceding <first id if following page>” and “page of results following <last id of preceding page>” reduces obvious pagination artifacts compared to <page number n>. Whether it's better to beat consumers over the head with inconsistency due to concurrent changes probably depends on the application or provide a nearer illusion of consistency probably depends on application domain and use case.


But then you need to provide some kind of query ID in the the HTTP request? And if you many concurrent clients, that can be expensive in terms of RAM?

If we don't care about added items (they can be deduped client-side), and only care about removed items, maybe the backend can maintain a tombstone timestamp on each deleted item, instead of deleting them, and then the client can provide a "snapshot" timestamp in the query that can be compared with the tombstone timestamp.


> But then you need to provide some kind of query ID in the the HTTP request? And if you many concurrent clients, that can be expensive in terms of RAM?

A query ID is super easy to implement, and yes, for any application you run you have to be aware of the resource requirements. Tune your eviction policy, and this is totally feasible.

Deduping and tombstones is messier to implement IMO, and hitting a cached result for the next page (e.g. a redis LIST) is probably less expensive than a "query" (whatever that means for your backend).


i prefer an offset, then a crawling client can pad their requests and remove anything with a duplicate id.


That solves the problem of items added concurrently (by deduplicating them) but that doesn't solve the problem of removed items.


The only thing that completely solves for removed items is constantly polling, or setting up a webhook/callback of some kind.

Padding the offset would solve for the problem mentioned where deleting an item would mean some non deleted items are not included in the paging results because they got moved up a page after that page was requested, but before the next page was requested. For example If I request 100 items at a time but set my offset to be 90 more than what i have received so far, i can expect my response to have duplicates, if it does not then i know my offset was not padded enough and i can request from a different offset. Of course you would adjust the numbers based on knowledge of the data.

Edit: If you used the ID of the last item instead of an offset, then you could get errors if your last item is in fact the one that was deleted.


Your padding + deduplication solution is nice.

> Edit: If you used the ID of the last item instead of an offset, then you could get errors if your last item is in fact the one that was deleted.

If the items are sorted by ID, then we use the ID of the last item x. But for example if they are sorted by timestamp, then we use the timestamp of the last item (and maybe the ID to break ties). Then it doesn't matter if the last item is still there or not. We are only considering values before or after depending on the sort order.


What does Hackernews use in this case?


from what I can tell, every item in the hackernews data set is a monotonically increasing integer id and comments/posts are not fundamentally treated differently.

for instance your comment is 26227524

and the parent post is 26225373

edit: hacker news pagination is not high priority apparently since they just use the easiest way with p= some number.


It's okay for a submission to show up twice/not show up at all on a single page view of HN, so I can see why it wouldn't be a priority.


For forum like UI it is better to have numbered pages in ny opinion.


I interrupted my reading at 'Accept and respond with JSON' to write this comment, before I skipped over that section and returned to reading the rest.

Folks that aren't aware of Webmachine should take a look:

https://github.com/webmachine/webmachine

The 'Accept' header should determine the response type, but content negotiation is something that few bother to implement. Webmachine does that for you, among other things like choosing the correct status code for your response.

Also, shameless plug for my OCaml port:

https://github.com/inhabitedtype/ocaml-webmachine


> Webmachine is an application layer that adds HTTP semantic awareness on top of the excellent bit-pushing and HTTP syntax-management provided by mochiweb, and provides a simple and clean way to connect that to your application's behavior.

Great buzzwords, I have no idea what this project actually does.


I think its about adding semantic web stuff for mochiweb servers in their response? After clicking through a couple links I'm still not quite sure what it does.


[flagged]


> For you and other non-programmers

> Once you've learned these two, the sentence immediately becomes clear

> it can sound like mumbo-jumbo, but I'm sure it makes sense for them, since they are professionals

Did you just link a person to a two-kilometer long page about HTTP, assumed they're not a programmer and almost accused them of not being professional enough, because they haven't heard of some Erlang library?

> application layer that adds HTTP semantic awareness on top of the excellent bit-pushing and HTTP syntax-management

I'm a web developer and my reaction was exactly the same. What the heck does this thing actually do?


> Did you just link a person to a two-kilometer long page about HTTP, assumed they're not a programmer and almost accused them of not being professional enough, because they haven't heard of some Erlang library?

Nope, I linked to a specific section of a HTTP specification that goes through "Content Negotiation" since that was what the topic was about and I wanted to share a resource to spelunker so they could read more about it, as if you're familiar with the topic, what seliopou wrote is not alien.


I'm a programmer, and I know HTTP quite well. I know what HTTP semantics are, and this sentence is still gibberish to me.

I'd wager the reason I don't is because I'm not that familiar with the erlang ecosystem, so have no idea what mochiweb is, and don't know what this does differently.

Your first sentence was quite condescending, FYI, not sure if it was meant to be.


> Your first sentence was quite condescending, FYI, not sure if it was meant to be.

No, it was not and thank you for flagging that. Is it condescending to assume one is not a web developer if they don't know content negotiation? I thought that was Web 101 and pretty much one of the first things you go through when learning about HTTP, but maybe things have changed as of late.

I'm sorry spelunker if my message came off as condescending, I have no real idea of your professional and was only trying to provide a resource to learn more, not to push anyone away.


I think you can safely assume most people on here are some sort of programmer or at least tech-related.

We know about content negotiation it's probably the `Webmachine` `HTTP semantic awareness` `bit-pushing` `HTTP syntax-management` and `mochiweb` that not many people would know.

I'm not sure if you're purposefully trying to be sarcastic but even your current comment is kinda scathing.


webmachine, mochiweb etc. come from Erlang.

webmachine is one of the earliest libraries I'm aware of that tried to think carefully about HTTP semantics and structured it's work with restful APIs (or web requests in general) according to an actual diagram that you can see here: https://github.com/basho/webmachine/blob/develop/docs/http-h...

There's an expanded and re-worked version that builds on this here: https://github.com/for-GET/http-decision-diagram/tree/master... (follow the links to system, request etc. to see the reasoning and references for each step)


Thanks, I'm a programmer professionally, and I broadly understand HTTP and the language of the web. Talking down to me is really helpful though.

It's true, I don't know erlang, or mochiweb, or the nitty-gritty of content negotiation, but OP presented this library as an alternative to "accept and respond with JSON" so I attempted to read the readme because I was curious. And then quickly lost my curiosity when I again, had no idea what this library actually did. If a readme is meant to be an inviting and basic overview of what a library does and why I should use it, the readme failed.


What is the usefulness of this feature?

The only advantage I see is that your clients can choose the format they want to work with, but since the serialization format is just a message format that has no impact on the code, and that most of them have a bijective transformation between them, I don't see the point.

If it feels like a lot of work (Or the added complexity of a moving part, if using the tool you linked) for next to no impact.


things like this are useful if you have a user that is integrating your api into some kind of legacy application that does not support json. Then in the future when JSON goes out of style, you could easily add support whatever the new format is.


If you don't care about others using your API, then don't do it. If you do, follow the standard which says you SHOULD have a Content-Type if there is a body to your response.


My comment was about whether it's worth supporting multiple values for the "Accept" header, not the "Content-Type" header.

I've read through the README and links and still have no idea what webmachine does.


Django REST Framework has this built in, and is very easy to turn on. I always turn it on even though my users almost exclusively use json. I had one user use xml once, because they didnt know how to use json in whatever it was they were using. Noone has ever used yaml, but i leave it there just in case.

if you do it right, using the standard html content type returns a human browseable representation of your api with forms for posts and whatnot.


Omg I love the browsable api of Django. I am searching for something similar for .NET - any pointers anyone?



That's not browsable. Swagger is just a prefilled Postman. I mean some UI where I can navigate and discover the resources directly.

> The 'Accept' header should determine the response type

Also, the HTTP decision diagram: https://github.com/for-GET/http-decision-diagram/tree/master...


I always have mixed feelings about using plurals for naming. Pluralization in English is extremely inconsistent.

cat -> cats,

dog -> dogs,

child -> children,

person -> people,

etc.

It makes my code feel inconsistent too. Sometimes I use more specific typing to resolve this, e.g. PersonList , but I'm not sure that's any better.


I've gone back and forth on this in respect to database tables and ultimately prefer singular. If the table is named "profile", I think it's obvious that it can store more than 1 profile, since... well it's a table and has rows.


I don't see a problem with writing "childs" or "persons".

In this scenario, consistency is more useful than grammatical correctness [1]. And the simpler pluralization logic makes it easier for non-English speakers to work with the codebase.

[1] Related example: https://en.wikipedia.org/wiki/HTTP_referer#Etymology


Agree. I've always preferred the Django vs the Rails way on this. Django uses pluralization only for display purposes.

Things are object_detail (singular) and object_list (plural).

It just assumes 'append s' for plural displays, unless you override. I like the _list suffix instead of trying to pluralize


I’d say REST apis should also support selections. E.g I only care about these fields. It’s the #1 reason why graphql is so popular. You only fetch what you want.

But I have missed feelings about graphql. I wish they didn’t invent a new language, it was just json. I’ve encountered so many little bugs because graphql parsing is different between different servers.

Ideas of graphql are great, implementation seems over complicated.

REST is much easier to grok.


> It’s the #1 reason why graphql is so popular. You only fetch what you want.

It's certainly one of the main sales points for graphql. On the flip side, I've never been frustrated by getting too many fields back from an API. I suppose if I was developing exclusively in extremely bandwidth limited contexts where getting back only 2 fields rather than 50 actually made a difference, I might care. It just seems like such a non-issue to solve/talk about in almost every other context.


It's likely more of an issue if you're wanting to fetch lots of nested data in one request, but only a few specific fields of what could be an extremely large object, like say the just the names and profile picture URIs for a large set of users that have interacted with the object your fetching. This is particularly relevant to social media, such as Facebook.


For the cases we've hit, it made sense to do a more specialized endpoint (or media type) than to create a universal way to let clients pick that fields they want.

This works way nicer with caches and APIs where the URI is a first class citizen.


> On the flip side, I've never been frustrated by getting too many fields back from an API

A pretty common use-case I have is needing to support these three things :

- A “big object” list screen (where retrieving the whole objects would make the query return megabytes off data)

- A “big object” details screen (where I need the full object)

- Programmatically getting many big objects

With GraphQL it involves writing one (or two) straightforward queries, while with REST it would require more thought or code.


> With GraphQL it involves writing one (or two) straightforward queries, while with REST it would require more thought or code.

Aren't you just pushing the work to the back end? The GraphQL resolver is a new layer of complexity while with REST is more straightforward.


Yes, but you don’t have that many network round trips.

Maybe, maybe not.

If there is some sequence that is used a lot and you've determined it to be a bottleneck, you can write a custom endpoint for it.


You could simply use HAL with embedded resources in JSON. Not sure how the client side feels about that though


It's simpler if it's normalized like with JSONAPI, but all of the apps I've worked on use an E(T)L process where the transformation step normalizes the data before saving it to our cache.


I was frustrated immensely due to that.

I could fetch the objects in bulk request.. except they didn't have the field i needed when fetched in bulk. So i had to do bulk fetch, where each object was pretty big and then fetch them again individually.

If that service supported graphql i could easily just grab what i need, without hammering the poor already over-utilized server.

This kind of issues prop up constantly especially if you use on-perm hosting.


I like having the option to select fields as an optimization, but the main reason i dont like graphql is because it forces me to select fields from the beginning. Most of the time I dont know what the fields are until i look at the data in context. In general, I dont like anything that forces me to rely on documentation.


This can really screw up your cache hit ratio.


We could support "fields" query parameter in REST API to get only necessary attributes. Eg. ?fields=f1,f2,f3 etc.


OData adds this kind of functionality to REST APIs, in a standardised way. It can do filtering, pagination, sorting etc (your backend data provider of course needs to support these operations).


You can do it with query params. Its not enforced or consistent because not everything that serves data can parse the data.

You could throw static json blobs on S3 as a rest API but selections would not be supported.


I wish S3 supported filters on predefined content - eg. json. That would be a killer feature


With Amazon S3 Select, you can use simple structured query language (SQL) statements to filter the contents of Amazon S3 objects and retrieve just the subset of data that you need. By using Amazon S3 Select to filter this data, you can reduce the amount of data that Amazon S3 transfers, which reduces the cost and latency to retrieve this data.

Amazon S3 Select works on objects stored in CSV, JSON, or Apache Parquet format. It also works with objects that are compressed with GZIP or BZIP2 (for CSV and JSON objects only), and server-side encrypted objects. You can specify the format of the results as either CSV or JSON, and you can determine how the records in the result are delimited.

You pass SQL expressions to Amazon S3 in the request. Amazon S3 Select supports a subset of SQL. For more information about the SQL elements that are supported by Amazon S3 Select, see SQL reference for Amazon S3 Select and S3 Glacier Select.

You can perform SQL queries using AWS SDKs, the SELECT Object Content REST API, the AWS Command Line Interface (AWS CLI), or the Amazon S3 console. The Amazon S3 console limits the amount of data returned to 40 MB. To retrieve more data, use the AWS CLI or the API.

https://aws.amazon.com/blogs/aws/s3-glacier-select/

https://docs.aws.amazon.com/AmazonS3/latest/userguide/s3-gla...


Did you ever check out odata? It seems that's exactly what you want


> Set the Content-Type header in the response to application/json; charset=utf-8 without any changes.

The "; charset=utf-8" part is not necessary and is ignored. See https://tools.ietf.org/html/rfc8259 which defines the application/json media type: "No "charset" parameter is defined for this registration. Adding one really has no effect on compliant recipients."


I’ve also found the Microsoft API guidelines helpful (especially for error handling).

https://github.com/microsoft/api-guidelines/blob/vNext/Guide...


Nice reference, but like the OP article, it really goes light on security. I feel that's the hardest part of getting an API released.

Way better then this one, however only covers CRUD.


> Then if we try to submit the payload with the email value that already exists in users, we’ll get a 400 response status code with a 'User already exists' message to let users know that the user already exists. With that information, the user can correct the action by changing the email to something that doesn’t exist.

I was always under the impression that 409 would be the "correct" code here, while you'd use 400 if the API user didn't supply an email at all, for example.


Ironically, your comment is a perfect illustration of one of the problems with REST - its tendency to provoke discussion about things that don't actually matter in practice.

No user cares whether an error response came back with a 400 or a 409 status, or what those codes even mean. It's madness that as a profession we spend so much of our employers' time and money on trivial things that deliver no value whatsoever. Response codes, HTTP verbs and pretty hierarchical URLs are all meaningless. I've lost count of the number of meetings I've been in where there have been pointless, timewasting debates about these things. Should we expose separate /foos and /bars endpoints? Should it be /foos/{foo_id}/bars/{bar_id}? Or maybe both? Do we need a POST, a PUT or a PATCH? Etc, ad nauseam.

I'm generally cautious when it comes to embracing new technologies, but I've wholeheartedly embraced GraphQL simply because it has removed countless hours of unproductive bikeshedding from my professional life.


I have been bitten by this really bad idea of reusing HTTP CODE for application problems. The server was 404 itself instead of the application.

It is a bad engineering idea to reuse HTTP codes that can be thrown by multiple intermediaries confusing the client code.


Where I work everything internal is a 503. Put a parameter out of range, or supply an ID not in the database? That's a server side exception. It's very frustrating even with a stack trace because I don't work on that code.


REST doesn't have any monopoly on bikeshedding.

That said, some of the practices (like being overly clever with status codes) are pointless while others are actually helpful design patterns.

Using nouns over verbs helps you fully think through the process of mutating state in a RESTful/stateless way. The HTTP verbs all have a unique purpose you should understand.

Does it always matter what size of hammer you use? No. That doesn't mean its smart to use a screwdriver as a chisel.


> No user cares whether an error response came back with a 400 or a 409 status

Unless your product is itself an API, your users shouldn't be exposed to its status codes. Sorry for dispensing trite advice, but you should really have a nice client between API and users, that can translate the difference between 400 and 409 statuses in a comprehensible and friendly way. :)


That's the purpose of an error message isn't it? I'm not sure what an HTTP response code is adding to that.

In my experience, in any situation where you produce an API where you want the client to pay close attention to the actual error that occurred, the list of official HTTP response codes is generally woefully inadequate for the task and you're going to want to work with something much more specific.

If my payment processor is telling me they're declining a transaction because the customer failed a fraud check, I'd rather they didn't communicate that to me with some random 4xx code one of their devs found on a Wikipedia article and tried to bend into shape. Rather, I'd hope the importance of the situation would lead them to compile a table of custom error codes that I can implement properly in my program's logic.


Let me get this straight, you don't think codes matter, but you want to manage a giant proptietary list of codes?

Won't you just send up in those huge discussions you waste so much time on?


To continue the payment processor example, if there are 10 different reasons for declining a transaction, how do you propose mapping those onto 4xx codes? Would you just choose 10 at random? Or would you just return, e.g., E_INSUFFICIENT_FUNDS?


I'm not sure anyone would propose you randomly map different 4xx codes to a bunch of different internal server errors. For the one example you provided I would probably use a 500 code, return the relevant message and call it a day. In that case you might also return an additional code if you have some extremely complex logic for handling that error at the client side, but in that case your client might be a little too complicated.

That doesn't mean I wouldn't also use the best practices codes for authentication vs authorization etc. I say this loosely though because best practices are really just that. It's more important to maintain a level of consistency within your team or organization so you're all speaking the same language. It's nice to have a starting point like rest APIs (or something that resembles them) so you're not writing a completely new rulebook on standard stuff that other people have already figured out, you can focus more on the problems your team is trying to solve. Why not use a 409 when there's a conflict? Someone already solved that problem. You can still return whatever pretty error message you want.


FWIW, my policy is that 5xx codes are always bugs, or an infrastructure-level failure/refusal to serve.

4xx codes are processing errors, or application-level refusals.

I end up borrowing 422 from WebDAV for a generic "Unprocessable Entity", with an error code (for API interpretation/reference lookup) and description (for user display, when appropriate) in the response body.


Yeah that's understandable when frameworks default everything to 500, that's often true, I just couldn't think of a more appropriate code to the specific example apparently there actually is a code for fraudulent payments so 402 code would be good.

re: best practices. I inherited an API that always responds with 200s, even when there's an error condition. When a document can't be found it returns a 200 OK with a response body "NOT_FOUND".

By ignoring existing standards and re-inventing the wheel with custom errors the previous development team made the system harder to maintain & harder to onboard.


I've also encountered that, it's very frustrating when you get a 200 ok, then you continue on in your client but there was actually an error expressed in the body of the 200 response. Ah gee.

It's all terribly simple: https://http.cat/


interesting, I didn't know using graphQL could help remove so many design debates (compared to rest). But yeah it makes sense since with rest you have to keep thinking about whether or not you'll extend the /foos/{foo_id}/bars/{bar_id}/baz or say put it on its own resource /baz/{baz_id} etc...

maybe I'll investigate graphql a bit more, I've just heard it was really complicated to implement the resolvers for the queries so people weren't using them yet.


But for graphql you will have the problem of naming the mutations instead. That is also a debate...


Isn’t 409 if you’re trying to update an existing record (PUT) and not create a new record (POST)?

The ambiguity is probably why most people just use 400 and call it a day.



Please tell me that's satire...


All these "best practices" are exactly why I embraced GraphQL quickly and flushed "REST" down the toilets for big projects (http+json is fine for small ones, since no lib overhead). GraphQL is unashamedly the new SOAP. I.e. we have a spec, not a series of "best practices" thus endless architectural debates where people are shamed for 'doing it the wrong way' online.

- Accept and respond with JSON: No, why? what if I want to work with XML? or format X or Y?

- Use nouns instead of verbs in endpoint paths: I don't care about that

- Name collections with plural nouns: I don't care about that

- Nesting resources for hierarchical objects: No, why?

- Handle errors gracefully and return standard error codes: nothing to do with REST specifically.

- Allow filtering, sorting, and pagination: yes, good luck coming up with the right query structure and sticking to it.

- Maintain Good Security Practices: nothing to do with REST specifically.

- Versioning our APIs: And here we are. How many HATEOAS people claimed it was blasphemy and it missed the point of REST?

None of these are an issue with GraphQL, just like they weren't an issue with SOAP. You get a schema both the client and the server must agree with, end of story.

As I said to a dev once, if your 'best practices' can't be automated with a CI tool, then you need to worry about creating that tool first...


Never understood the hype around GraphQL.

You want to know how to halve your performance and responses/s on your service? add graphql.

All things that GraphQL claims to do can be implemented in RESTful services easily.

If you want specificity in your query fetching, just add query params or put them in the request body

If you want schema validations, there are many libraries that help you with that.

And if you want data from multiple resources from different endpoints, what exactly is stopping you from implementing that in REST?

Also, GraphQL has a steeper learning curve as opposed to REST.

never understood graphql. To me its just an abstract layer between the front end and back end, adding to the already complex stack.


I think people like it because the popular frontend library (Apollo) makes certain things easy in React, like caching.

From a backend perspective, it's about 100x more work than dumb RPC (which I prefer over "REST"). I have spent so much time becoming aware of very "interesting" decisions. For example, gqlgen for Go generates code that fetches every element of a slice in parallel. We used to create one database transaction per HTTP request, but you can't do this with GraphQL, because it fetches each row of the database in a separate goroutine, and that is not something you can do with database transactions. It's also exceedingly inefficient.

All in all, it's clear to me that GraphQL is built with the mindset that you are reaching out to an external service to fetch every piece of data. That makes a lot of sense in certain use cases, but makes less sense if you are just building a CRUD app that talks to a single database.

I am hoping that people get bored with this and make gRPC/Web not require a terabyte of Javascript to be sent to the client. RPCs are so easy. Call a function with an argument. Receive a return value and status code. Easy. Boring. It's perfect.


I love this comment so much.

With GraphQL, you're just pushing N+1 calls and figuring out the mental contortion needed to support all the graphy-ness from non-graph structures. Most that say GraphQL is the bees-knees must be an FE developer. There's nothing wrong with that, but GraphQL is just NOT as cracked up to be.


I'm not a GraphQL advocate and barely use it myself, but the N+1 problem has been solved in many server implementations.

Eg: Hasura.

https://hasura.io/blog/architecture-of-a-high-performance-gr...


> All things that GraphQL claims to do can be implemented in RESTful services easily.

How do you easily write an endpoint that can either return a comment, a comment with its children, a comment with its corresponding story, with a strongly typed schema (ie. no optional children or story whatever the query is)?


like i said, specify what you want in the query params or in the request body


So, what you are saying is reimplement graphQL.


no, what im saying is graphql is introducing something they think is novel, when it already can be done in REST.

it's not hard.

In my specific case, using Flask-Marshmallow i can do this

```

    some_resource = Resource.find_by_id(resource_id)

    arg = request.args.get('type_of_resource')

    if arg == 'with_commments':

        return full_resource_schema.dump(some_resource), 200

    if arg == 'no_comments':

        return no_comments_schema.dump(some_resource), 200
```

there i just implemented graphql in 6 lines of code


The whole point of graphQL is to allow FE teams to decide what information they need in an organizationally decoupled fashion, you haven't done that at all. Suppose FE wants another join on that object, now they must wait for you to push more code enabling those queries.


Unfortunately for graphql, the claim that backend teams need to be constantly making endpoints for other decoupled teams is not as drastic and critical as you consider it to be.

Dare i even say switching to graphql and onboarding developers to graphql is a much more resource intensive process than adding another endpoint (or a couple).


> onboarding developers to graphql is a much more resource intensive process

As a BE/remote I basically taught myself most of elixir's graphQL framework in a day (wait, that was yesterday), while under feature pressure (due this afternoon), mostly by poking around and doing a bit of TDD. There are parts that I really hate about graphQL, but overall I consider it a win.


The problem with GraphQL is on the front end. Suddenly, the FE team becomes responsible for understanding the entire data model, which resources can be joined together, by what keys, and what is actually performant vs what isn't.

Instead of doing a simple GET /a/b/1/c and presenting that data structure, they now need to define a query, think about what resources to pull into that query etc. If the query ends up being slow, they have to understand why and work on more complex changes than simply asking the BE team for a smaller response with a few query params.


that's an interesting point! And the graphql basically obscures a mental model over a broad abstraction, so it's impossible to know exactly what's going on without going through and actually reading the BE code itself. Thanks for the perspective.


I hit this problem when contemplating exposing the API of the application I work on to customers, to be used in their automation scripts.

We quickly realized that expecting them to learn our data model and how to use it efficiently would be much more complicated than exposing every plausible use-case explicitly. We could do this on the "API front-end" by building a set of high-level utilities that would embed the GraphQL queries, but that would essentially double much of the work being done in the front-end (and more than double if some customers want to use Python scripting while others want JS and others want TCL or Perl).

So, we decided that the best place to expose those high-level abstractions is exactly the REST API, where it is maximally re-usable.


also, to add another point.

Most API teams will give you all the resources that you need.

It is most likely the case that the FE team will ask not for more, but rather, for less.

This was the exact problem why GraphQl was created in the first place - to account for Facebook's mobile platform, because they were receiving way too much data that needed to be trimmed because they didn't want to be sending like 500kb of json over mobile networks.

Competent API teams will give you all the resources that you need to be productive as a FE engineer, and if it is the case that your data needs some trimming, then go ahead and ask your decoupled backend team, while you can go ahead and keep working asynchronously while they get that done for you.

Not Fatal as you think it is.


Unless I’m missing something, the response schema will have an optional “comments” field whether the “with_comments” arg is passed or not.


it's just a hit to the db.

Graphql works the same way if i'm not mistaken.

and if its the case that you don't want the comments at all, then in the conditionals write your sql statements

if you want comments

select * from resource where id=some resource id

if you don't

select without comments from resource where id=some resource id

there's no need to get into specifics, the point is graphql is not bringing anything novel. everything it claims to do can be implemented in RESt if you're willing to implement it.


I’m talking about the GraphQL or OAS schema.

With a REST API, the “resource” type will always have an optional “comments” field, whatever the value of arg is.

With a GraphQL API you’ll have the right type depending on your query.


Yep. OpenAPI spec is not rich enough to represent relations like this so its rubbish for FE dev to use to generate models/api client.

In general, the tooling/spec has thousands of open bugs, isnt going anywhere, moves very slowly and is missing fundamental features.

https://github.com/OAI/OpenAPI-Specification/issues/1998 heres the ticket where it will inevitably be ignored and die. But even if it was supported in the spec, the open source generators wont support it anyways.

Said generators also uses "logic less" templates to generate something that very much needs logic. As a FE dev, I love nothing more than to fork a an archaic Java based generator to get the client gen working...not.


If you truly care about that, the truly REST solution would be to define separate MediaTypes and use Accepts headers to specify which MediaType is desired.

So if you wanted comments you would send a request with a header like "Accept: application/resource_with_comments", and you'd send a Header like "Accept: application/resource_without_comments".

The thing is, few people want this level of control in practice. This would generally lead to a huge proliferation of types for very little practical benefit. Much simpler to have a set of well-defined types with optional fields than to define a new type for each combination of fields used today.


What would you do if someone wanted:

- some fields of the resource - some fields of the comments - some fields of the user that made the comments


It depends on performance requirements. Either ask them to retrieve the entire objects, or expose a new endpoint to retrieve the filtered list in an optimal way.

API consumers definitely aren't the ones who should be thinking about something like joining behavior, since GraphQL doesn't give nearly enough control to achieve performant joining.

Basic filtering can also be easily achieved with ad-hoc methods, such as query params. No need for a full-blown GraphQL language server in front of all your services.


No need for it, from your perspective, sure. It totally solves a lot of shortcomings your API is forcing on its consumers though.


In general, my consumers don't want to learn my API. If they find shortcomings, they don't want to work around them by learning to query my bad models, they will call and complain that my model is bad and that I have to fix it in the next version or they'll look elsewhere.

At least, this has been our experience with B2B software sales.

Of course, YMMV depending on market etc.


One thing people havent mentioned as well is the importance of mutations. Querying is only one side of the story.

Your API should match semantic actions on your client like the UI and it needs to do this in such a way that it happens in one database transaction.

With REST, new UI requirement that needs to save 2 objects together across multiple entity types...? GG.


I also wanted to understand why GQL was getting so much attention in a REST vs GraphQL way so I implemented a GQL microservice that had the same features as a service that I had implemented previously as REST. Both services ran on node.js and connected to the same data stores, etc. I ran both services on the same load test lab and documented my findings at https://glennengstrand.info/software/architecture/microservi...

The TL;DR of that blog is this. There isn't much value that GQL can provide over REST + OpenAPI. GQL is still in its infancy so APM is easier with REST. The biggest argument for GQL is returning the right amount of data but GQL is slightly slower than REST because the server is having to parse what is essentially a mini program with every request.


If you're a front end developer the advantages a clear because of much superior tooling. The reason for this the agreed SDL spec format which is part and parcel of GQL. REST is just a loose collection of ideas and OpenAPI is not integral to REST as theres no single over arching spec of organisation for REST.

In my experence therefore, the tooing (code gen, doc gen) etc around OpenAPI is piss poor in comparison to gql. Just look at GraphIQL compared to postman.


The above referenced PoC service based on express uses https://github.com/apigee-127/swagger-tools which surfaces a web app by which front end devs navigate through the published API endpoints. They can use the web app to call the service via the endpoint or copy and paste sample code that calls the service. Most open api integrations support this kind of functionality.


From my experience, front-end developers are the last people that want to learn how to query a complex data model. It's usually much preferable if the backend team exposes the required endpoints for any UI operation, since they have much more fine-grained control on the DB and a much clearer intuition on what is easy to do and what isn't.


For comprehensive APIs,GraphQL is often more performant. 1 request that takes 500ms is far better to 5 requests that take 200ms


You can create a single endpoint that's non-RESTFul to make a single call.


Yes and by doing so you lost the advantages of a predictable and well organised API.

GraphQL has first class support for this scenario.


I disagree. You're shifting your "organization" to the frontend which becomes an utter cesspool of chaos. If anything changes in the backend, you will need to change the frontend as well.

The whole point of an API is to decouple. GraphQL does the exact opposite.


depends, if all requests are parallel they should hit different instances and that would distribute load more efficiently instead of pinning to single instance. You would actually get response in 200ms not to mention that your ability to properly size each node is increased. It also enables you to have a grey area where response can be partial and not just fail/pass. As usual YMMW depending on use case.


> GraphQL is unashamedly the new SOAP

...may not be the advertising you think it is.


A kinder way to put it might be "SOAP with the benefit of hindsight."


> - Allow filtering, sorting, and pagination: yes, good luck coming up with the right query structure and sticking to it.

I believe that treating RESTful APIs as a smart data storage that can do advanced querying is a mistake, or rather REST is not the proper solution for this use case, and you did well to move on from it. Personally I've never failed to use REST successfully as long as I was willing to do proper caching, sorting and filtering in the API consumer. REST is plenty if you need presenting collections of things with key-set pagination and proper caching headers.

In the days of HTTP2+, it's easy to fire 100 concurrent requests to a key-set RESTFUL API and load all the data that you could ever need. :)


What if you need that set of read/write operations to happen in one database transaction?


in my experience it is usually not required and not desirable to have distributed transaction where transaction coordinator is your remote client. It is also habitual behaviour, you are used to ACID so it must be that ACID is required. Currently have that type of discussions at work where people want to have atomic consistency across distributed systems just because mental model is easier. Whenever eventual consistency raises ugly head people get worried :)


One big disadvantage about GraphQL is that, in a microservices architecture, it requires a fat centralized GraphQL gateway that understands your whole application in order to dispatch queries to other microservices. This is much fatter than a reverse proxy with a few URL-based matching rules, and harder to scale as well.


We need a prettier for REST API.


Design is hard.

In the given example we work with an Article domain. Where there are Comments.

Fine and totally fair.

But how would you design your object graph from the bottom up?

Articles and Comments is a fairly simple example.

Imagine a Transaction in bank domain context.

Would you have an endpoint like so?

transactions/:id/acoount/:id

Or perhaps

transactions/:id account/:id/transactions

Whatever you pick you must find a way for transactions and accounts to work together on a design level, which will impact how you potentially serve services and how you deploy them.

Is an account and a transaction part of the same domain and service?

It's hard to design proper REST apis because although it's easy to dish out cool looking routes, it might be a lot harder to actually design your systems based on those routes.


REST doesn’t really have the concept of endpoints. That’s a hierarchy some people artificially impose over the top of a REST system. REST doesn’t require it. All of your URLs could be randomised and a REST system would work exactly the same.

You don’t need hierarchy in your URLs. If you want to load a transaction, then /transactions/123 is fine. If you want to load an account, then /accounts/456 is fine. You don’t need to put one under the other.

In a REST API, the primary key for a resource is the URL. If you have the primary key, URL hierarchy is unnecessary. Just use whatever is clearest for a developer to identify at a glance for debugging purposes. It’s not required by the tech.


This is true and I use top level URLs for all of my resources in a REST api. You can however to a POST to articles/:id/comments and return location headers pointing to /comments/:id easily!


Sure, but this is unavoidable complexity for any API designer, whether it's REST or C or SQL. Of course you need to think about how to model the business domain and what entities you will expose, and the relations between them.


It would be for all transactions /transactions and for transactions for a particular account /accounts/:id/transactions.


I always found this document by Microsoft very very good

https://docs.microsoft.com/en-us/azure/architecture/best-pra...


If only some teams at Microsoft would read that. Man some of their APIs are annoying


Zalando has amazing api guidelines - https://opensource.zalando.com/restful-api-guidelines


I like their 429 guidance: https://opensource.zalando.com/restful-api-guidelines/#153

This has massive implications in real-world systems. Rate limiting adds and order of magnitude complexity to a client system due to the next for delayed, conditional execution. So many APIs do rate limiting and don't bother to add information about why it is happening or for how long the limit applies. I've been working with a name-brand system that limits based both on client application and end user. So, effectively, there can be a 429 because the application as a whole is exhausted, or because a particular user is exhausted. There is no additional information, so the only way to know if the limit has lapsed is to try again. It make it incredibly cumbersome to create reliable consumer applications.


Just for a point of reference here is Google’s take on the exact same topic. It has a bit more of a gRPC background but still is very REST friendly by default.

https://cloud.google.com/apis/design/resources

Also as one of the most common pushbacks people have with gRPC is that everyone generally expects JSON still in 2021 especially in the browser. Not sure if this is super well known or not but gRPC integrates well with things like envoy which allow you to write a single gRPC API and have it serve and respond to both native gRPC and standard REST / JSON calls.


I had really good success with https://github.com/grpc-ecosystem/grpc-gateway for returning json to the web from a grpc system


gRPC is also a 300lb gorilla of a framework for communication and has huge problems with versioning if you aren't "running at HEAD".


My original comment might have given the impression that I was some gRPC guru when in reality I’ve only played with it very briefly.

What are the versioning problems with it? I was under the impression that it was actually designed from the start to be fairly flexible with regards to changing API methods etc


Well I wasn't referring to API versioning, I was referring to the actual lib versioning. All of Google runs basically at HEAD and nearly all of their libraries reflect this. In a distributed system, if you update one of the GRPc library versions, you need to update all of them otherwise you can run into weird failures.


That might be true, but at least when I was at Google, people were not building and releasing their software to pick up HEAD of libraries they depended on. Some binaries running in production could be quite old and seemed to communicate with newer builds just fine. (I suppose if there was a problem someone would have built a new version, though.)


The real question IMHO is "does the library change impact the protocol itself"? Because last time I checked, gRPC the protocol didn't evolve that fast. Now, gRPC-REST bridges are a different thing...


That’s a skinny gorilla.


API Practices If You Hate Your Customers

https://cacm.acm.org/magazines/2019/12/241052-api-practices-...


Pretty good, though I disagree with their opinion on SOAP ;)

Many REST APIs have problems for the same reason SOAP had, and it involves lack of deliberate API design and abuse of "automagic code generation"


Small typo-ish error in the article, emphasis mine:

> 401 Unauthorized – This means the user isn’t not authorized to access a resource.

Surely that should say that the user isn't not unauthorised?


No, just "isn't authorised" is clear. The triple negative is a bit much.


403 is unauthorized (forbidden due to authorization policy).

401 is unauthenticated (not logged in).

Very important difference for clients handling authentication state.


That is not false, I didn't not fail to notice that after fixating on the incorrect double negative! However, not to be unfair, 401 is unfortunately not named "unauthenticated", even though it's conventional usage (as you point out) is not unintended to actually mean lack of authentication rather than of authorisation.


I can't wait to see the day when a guide for best practices for REST API design just contains a single sentence: "Don't write any new REST APIs. Use something like gRPC instead."

The only thing something like gRPC is missing that is very valuable from HTTP REST is having standard error codes corresponding to all the HTTP response codes (2xx, 4xx and 5xx codes)


Using HTTP methods as verbs is a terrible practice.

It works for simple CRUD APIs, but it becomes too limiting once there are more than one way of updating something.

It ties the API design too closely to the data model and usually means that the client has to implement more business logic instead of handling that on the server.

Good API design should decouple business logic from the HTTP spec.


You don’t need more than the HTTP methods as long as you reify requests as resources.

Instead of “doSomeAction on resource” it’s “POST/PUT /resource/actionRequest”.

The trick is stopping and trying to make everything work over CRUD of your main entity resources. HTTP resources don’t need to map to tables.


The basic HTTP methods work in the vast majority of cases. The key is to create the appropriate resources--facades that compose the data model and business logic as needed.


No they don't work for vast majority of cases. They work only for CRUD and imposing resources paradigm on everything looks alien.


By far the most important best practice is to keep the damn thing stable and online. I'd much rather integrate an API that doesn't change but maybe has a few pain points in developing the client than one where the client is a bit easier to develop but the API changes every year.


Looks like a ripoff of this post from June 2019: https://www.merixstudio.com/blog/best-practices-rest-api-dev...


It's a common design style it seems. O'Reilly's REST API Design Rulebook reads similarly...

https://www.oreilly.com/library/view/rest-api-design/9781449...


TFA: Posted 17 August 2015


I've written a comprehensive one on this topic a few years ago:

https://www.vinaysahni.com/best-practices-for-a-pragmatic-re...


Thank you for this. It is my go to post every time I implement an api service or even for web development.


I'd recommend against using `v1` in the path and prefer content negotiation:

    GET /resource/1
    Accept-Version: ~1.0.2
    ...
Implementation-wise this ends up with better code (and less code) on the back-end and a reduced number of HTTP routes (which should represent the entities and be maintained).

The pain of REST is that there's very little in terms of "correct" as we don't have a formal specification of the semantics. If you end up using /v1 it's not any more or less REST-ful.


I dunno. We started doing something like that at my company (we actually use the plain Accept/Content-Type headers, so "application/vnd.mycompany.service.v1+json") and it's been a giant pain. Routing based on headers sounds cool, but basically no backend OR frontend frameworks support it well. Even if they say they support, there are bugs or weird traps to fall into. Path versioning is just so much easier for everyone.


I'd recommend the opposite, as with a versioned URL you can very easily route stuff to different machines by simple URL inspection and have a very clean separation between versions.


Unless you're talking about a human dispatching the request (url toolbar or via tool like curl) There's nothing special between a url path or a content header Either way it's layer 7 routing. What was the advantage again?


Load balancers/reverse proxy have first-class support for routing based on paths, but support for routing based on headers is trickier or not possible depending on the software.


I think they're useful for different things.

Media-type versioning is useful for supporting different data types/representations of the same resource.

URL versioning is useful for versioning resources themselves and their behavior/business logic.


Great post. Nothing flashy, walks through basics and principles. I do love the simplicity of REST and unless you need some graph would recommend folks stick with it over GraphQL


When I looked into implementing GraphQL, it didn’t seem to require the data I was serving be a graph. I found the big difference with REST was that GQL allows the consumer to define the properties returned rather than having backend engineers have to think through every use case. It wasn’t enough of a killer feature to risk the timeline on that project, so my experience is limited.


Backend engineers still need to think through every use case to decide what fields to expose.


If you want a pretty decent REST-ish API right out of the box, and you already have a DB, check out PostgREST[0]. More features than you can shake a stick at, and all you have to do is maintain your DB schema.

[0]: https://postgrest.org/en/v7.0.0/


I like it and its very fast.

Good luck with scaling and automatic testing tho.


From-scratch or from-fixture E2E tests to prevent regressions should still be doable though, just gotta bring along your language of choice, and keep a repo around to hold the code/data.


With the exception of url naming (no rule, which is worse than any rule), error response (a bit ambiguous) and general verbosity, I find that the jsonapi.org spec can serve as a relatively sane default.

Although I would love for browsers to start natively supporting some sort of schema based, explicitly typed, binary format.


This is very limited guideline. Nothing really new here, all those topics are covered all around for a decade.

As usual only CRUD counts, no mention of operations, patch, links, aggregation of calls, ways to document API, token duration, rate limits etc.

Adequate for todo list maybe.


I've also found Google's AIPs (https://google.aip.dev/general) to be pretty useful when designing REST-based APIs.


A question I’ve always had about 404 and APIs...

How do you distinguish between an invalid path (no end point) and a valid path requesting a resource that doesn’t exist?


You can distinguish between them in your response body. A simple error response body for a JSON-oriented REST API is often something like a JSON object with a human-readable error message, and a unique code for that type of error. i.e. {“message”:”No such route”, “code”:”ad01fd04-8e9e-4326-9576-3d479fa20637”}


This is where error response messages should come into play. Returning a JSON object with a "message" attribute containing "No such route/path" vs "Object not found" depending on the context of the 404 makes it immediately clear where the issue is.


I'd propose either 421 Misdirected Request or 400 Bad Request, both with more detailed error message in the body when the URI Path is incorrect and doesn't map to anything, and 404 Not Found for valid requests for things that don't exist.


From a client application perspective, would you want to handle the cases differently?


Of course, hitting a path without a handler is a bug, a resource which doesn't exist is a normal thing.


The situation you describe is not really REST. In REST, URLs are opaque; they have no structure for clients to understand. So, clients don't make up new URLs in order to determine whether resources exist. They only use URLs that they've already been given. If they haven't received it, they don't care whether it exists. This is called "Hypermedia as the Engine of Application State".


If doesn't exist means it has been removed: 410 Gone, if doesn't exist means not yet created: 404 Not Found.

I can't think for other types of not existing, or why you'd need to differentiate between them.


Say I have an API with a path of /customer/<id>.

I call /customer/235235 and customer 235235 doesn't exist. That's a 404, resource not found.

But say I make an uncaught error with the path and call /cutsomer/235235. That also is a 404, resource not found.

It really depends on what you want the word "resource" to mean.

It gets a bit more complex if you have a simplistic website with an API service both on the same server. For your users, you want a nice 404 page for mistyped or dead links. But you don't want your API also triggering that same page.

Beyond actually splitting your main web content and API into two servers, I could see returning a 400, Bad Request response for the API to avoid triggering a 404 handler. After all if you give it a customer ID in the above case that doesn't exist, it is essentially a "bad input parameter" and the error message can indicate which parameter (the customer ID) and why it's a problem (there is no such customer).


I've seen this type of reply in previous posts on this thread, and I am baffled at the fact that people seem to have their downstream users be much closer to the API than I think it's safe.

To answer the specific question /cutsomer/123 should never be hit in a production application, unless your user actually inputs that, so treating it differently is just overhead that gives no benefit.

Also for the "both on the same server" scenario, you can make a pretty good assumption if a user agent is a browser or an API client, so you can present the right output to the right one. The accept header is the simplest method I can think of.


> It really depends on what you want the word "resource" to mean.

I once worked with an old API and when a pentest team ran a scanner against it they reported hundreds of backdoors and malicious code false positives because they were expecting a 400 error rather than an 200 with success=false response body


valid path: maybe return 200 (the request is ok), but return an empty object.

invalid path: 404


You don't


I can just see some developer going nuts and pulling out their hair because they can't figure out why a call to query a customer keeps saying the customer doesn't exist when clearly it is in the database. Until they realize they gave the URL path as "/cutsomer" instead of "/customer" in the code.


Applications are open for YC Summer 2021

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: