
Best Practices for Designing a Pragmatic RESTful API - ohjeez
http://www.vinaysahni.com/best-practices-for-a-pragmatic-restful-api
======
buro9
I think I've said this before, but the caching advice:

> HTTP provides a built-in caching framework!

Conflicts with the authentication advice:

> each request should come with some sort authentication credentials

One of those gotchas in API design and implementation is how to provide good
performance against authenticated resources.

My personal take on this is to _not_ follow the standard HTTP caching patterns
as this encourages the cache to be external - browser, proxy - to your
authentication.

Instead I choose to cache behind my authentication layer, and this either
means:

1\. I use a plugin to Nginx or Varnish to call the authentication check before
serving a cached resource.

or

2\. I put my resources in a memory cache within the application.

The former has the advantage of being as HTTP pure as possible, but the
disadvantage of maintaining the server extension/plugin.

The latter has the advantage of code maintainability (just the stuff in your
app) but the disadvantage of not using existing tools for the caching.

Either way, if you are mixing authentication and caching the rule should
always be that you cache behind your authentication layer and everything that
passes through the authentication layer is explicitly marked as not-cacheable.

------
johns
Previous discussions/submissions:
[https://www.hnsearch.com/search#request/submissions&q=vinays...](https://www.hnsearch.com/search#request/submissions&q=vinaysahni.com&start=0)

------
georgecalm
Awesome points; didn't know about the Link header (RFC5988).

Regarding pagination: a few years ago I switched to using "skip" and "limit",
instead of "page" and "per_page" or a variation of the two. Instead of
thinking about pages (an abstract idea) it forces me to think about the
concrete list of objects being requested from the resource, how many of those
I want and how many to skip from the start; making it easier to reason about
pagination (at least for me).

~~~
veesahni
(original author here)

I thought about this one when writing the post. Although I think the
differences between the two are marginal, my conclusion was that it's
"simpler" to increment a page number by 1 in code than incrementing by the
page size.

~~~
kbenson
Dealing with pages can get trickier if you allow the user to change the number
displayed per page when not on the first page. It's not particularly hard to
figure out, but what the right thing to do when the boundaries don't line up
is a question that needs to be answered, as well, as well as dealing with user
confusion about what page they are then on. Dealing with number ranges instead
of pages solves these problems, at the cost of some readability.

------
h2s
This is a great resource. Feels like the next step after Matt Gemmell's post
"API Design" ([http://mattgemmell.com/2012/05/24/api-
design/](http://mattgemmell.com/2012/05/24/api-design/)) where he called APIs
"UX for developers".

------
klapinat0r
A topic I rarely see (at least in depth, or in discussion) in stories/blogs
about APIs is ID generation, i.e. best practices for creating access_token,
resource ID and especially handling collisions.

The Flickr blog post [1], which featured a simple implementation of a ticket
server, is so far one of the easiest and most secure to use.

Personally I don't feel comfortable using "hash of i++", "hash of ...", "i++",
as they all "fall apart" when you need ids of different specifications (e.g. a
8-char ID, a 24-char token ID).

Anyone have two cents on this? I have a difficult time imagining that larger
APIs just "live with" the chance of collision (even though it can be very low)
- they must mitigate it somehow, right? combined probability (such as the
nonce check and tokens on oauth1 requests)?

Actual ticket servers (e.g. [2], [3]) either introduce added complexity (and
possibly latency) or needs a secondary code base. Regardless, _the issue of
scaling and latency is rarely touched upon_ , so if anyone have some input
here, I'd greatly appreciate it.

EDIT: I've found two interesting links on the problem, [4] [5] (snippit'ed
here [6]).

[1] [http://code.flickr.net/2010/02/08/ticket-servers-
distributed...](http://code.flickr.net/2010/02/08/ticket-servers-distributed-
unique-primary-keys-on-the-cheap/)

[2]
[https://github.com/twitter/snowflake](https://github.com/twitter/snowflake)

[3] [https://github.com/boundary/flake](https://github.com/boundary/flake)

[4] [http://boundary.com/blog/2012/01/12/flake-a-
decentralized-k-...](http://boundary.com/blog/2012/01/12/flake-a-
decentralized-k-ordered-unique-id-generator-in-erlang/)

[5] [https://blog.twitter.com/2010/announcing-
snowflake](https://blog.twitter.com/2010/announcing-snowflake)

[6]
[https://github.com/antirez/redis/pull/295#issuecomment-46734...](https://github.com/antirez/redis/pull/295#issuecomment-4673460)

~~~
veesahni
I like how MongoDB does this:

[http://docs.mongodb.org/manual/reference/object-
id/](http://docs.mongodb.org/manual/reference/object-id/)

They start with a timestamp and are fully distributed. In the official linux
client library, the "3 byte machine id" is the first 3 bytes of the md5 hash
of the hostnames. As long as you can guarantee uniqueness there, there are no
collisions.

For a large API, one could generate their own machine ID's and provide a
strong guarantee on uniqueness.

The one criticism with MongoDB's approach is that the ID is 12 bytes and
doesn't fit very well in column types offered by other DBs

------
Keats
This is probably my favourite talk on the subject :
[http://www.stormpath.com/blog/designing-rest-json-
apis](http://www.stormpath.com/blog/designing-rest-json-apis)

------
SEJeff
This too is a must read for REST API builders from one of the founders of the
django project: [http://jacobian.org/writing/rest-worst-
practices/](http://jacobian.org/writing/rest-worst-practices/)

------
kh_hk
My opinionated design principle for RESTful APIs is to keep in mind that REST
is just a style sheet of good practices and patterns as in, do not apply it
blindly and just use whatever makes sense.

From there, I have conflicting emotions with this:

    
    
        Although the web generally works on HATEOAS type principles (where we go 
        to a website's front page and follow links based on what we see on the 
        page), I don't think we're ready for HATEOAS on APIs just yet.
    

I do not think "being ready" is an argument to use or not use HATEOAS. Sure,
HATEOAS as defined by Roy Fielding contains a lot of clutter, but there's
nothing bad if it makes sense to assume a client that does not know how to
construct URLs. Generally I find it nice to have something similar to:

    
    
        GET /fooes
    
        {
            "fooes": [
                "foo": {
                    "bar": {
                        "some": "baz"
                    },
                    "name": "The Mighty Foo",
                    "id": "the-mighty-foo",
                    "href": "/fooes/the-mighty-foo"
                },
                ...
            ]
        }

~~~
h2s
There's a sliding scale of HATEOAS-ness. I've seen a much more extreme form of
it described elsewhere. Mark Seemann ([http://blog.ploeh.dk/2013/05/01/rest-
lesson-learned-avoid-ha...](http://blog.ploeh.dk/2013/05/01/rest-lesson-
learned-avoid-hackable-urls/)) describes an approach that encrypts all URLs so
that clients have _no choice_ but to follow links because generating URLs from
templates is impossible.

So instead of

    
    
        http://foo.ploeh.dk/customers/1234/orders
    

You'd have

    
    
        http://foo.ploeh.dk/DC884298C70C41798ABE9052DC69CAEE
    

And you'd have to get that URL out of the response to another request you'd
already made. I never liked this idea, and the linked article provides a
really strong counterargument to this approach to HATEOAS.

    
    
        When browsing a website, decisions on what links will be
        clicked are made at run time. However, with an API, decisions
        as to what requests will be sent are made when the API
        integration code is written, not at run time. Could the
        decisions be deferred to run time? Sure, however, there isn't
        much to gain going down that route as code would still not be
        able to handle significant API changes without breaking.

~~~
kh_hk
About the counterargument, yes, it does raise a good point (in fact, one that
I skimmed through). Specially about changes and breaking. For instance, in the
following document, what is to change in the href attribute?

    
    
        GET /fooes
        ...
            "foo": {
                "id": "the-mighty-foo",
                "href": "/fooes/the-mighty-foo",
                ...
            },
        ...
    

Surely, not much. Either /fooes/ changes to /foos/, in that case, the call to
/fooes is broken anyway, or the id has changed, and needs to be handled with
an HTTP 301 anyway. So no, there's not much to gain.

But then again, I was just saying that the href attribute might make sense in
some cases, with not much extra effort. Nobody is asking the clients to
consume the href attribute, it just makes understanding the whole API easier.

As of images, what would make more sense, to add an endpoint /images/10/view
or directly use an href attribute pointing to
[http://s.foo.bar/1337.png](http://s.foo.bar/1337.png) ?

My point is, in some cases it makes sense, just use it when it does. Just as
one should not be encrypting URLs just because Mark Seemann feels it is within
the holy grail of the Level 3 API as defined by Leonard Richarson [1], we
should not be avoiding to make our APIs navigable.

[1]: Don't get me wrong. What they say might make sense in non-trivial APIs,
as always, REST is not an specification.

~~~
smizell
I think we spend too much time trying to craft beautiful, descriptive URLs,
where we should be spending our time crafting beautiful, descriptive
responses. We try to make the URL describe to the developer what is being
returned, when in reality, this should be dictated by media types and link
relations.

Take for instance the example above. Whether the URL is /images/10/view or
/images/10.png, the developer has no idea what kind of resource that is
without any media types or relations (the file extension != to the media
type). In other words, you could use both URLs to return the same exact
resource, and the client shouldn't care. You could even change back and forth
between those URLs and the client should never flinch.

This is actually a good example. Say you are serving a static PNG file at
/images/10.png, linked in your server response (i.e. in some href like above).
Say you want to change things up to do some type of logging on how many times
the image has been viewed, so you write some code, create a response at
/images/10/view, log views, and return the image with an image/png media type.
You just did a really big API change and the client didn't have to be changed.

Now if your client was dumber, and was crafting URLs as /images/{id}.png, you
would break all of your clients if you did the above changes. This of course
is really just the tip of the iceberg of what flexibility HATEOAS can bring.

------
gbates
This feels like a good place to get some opinions out there about an app I am
working on.

Some team members are advocating a very dogmatic approach to our eventually
public facing REST API design where each REST end point of /foo results in a
response of a list of some ids; each url can then be hit to fully populate the
list.

eg

/foo returns something like [1,2,3,4]

I then have to do 4 more http requests to /foo/1, /foo/2, /foo/3 and /foo/4.

Further, if a /foo/1 model contains for example a user, then /foo/1 would
additionally contain userid of eg 101.

I would then have to do a /user/101 and ensure the foo model was populated.

This can mean a simple list page with 10 on screen list items can easily
create 1 + 10 + 10 = 21 http requests at a minimum. This is starting to add a
lot of code complexity to managing the lists for sorting/filtering/updating
purposes and managing the async requests in our AngularJS application.

The way I have worked before is to build a single end point of /foo with a
paginator offset, and then that would return the entire object in an HTTP
request, eg

[ {fooid : 1, user : {id : 101, name 'bob'}, orders : [{id: A},{id : B}]},
{fooid : 2, user : {id : 102, name 'alison'}, orders : [{id: C},{id : D}]} ]

This approach is less RESTful, but presents fewer issues when writing the API
consuming code and is much more responsive from a latency point of view.

What does HN think about these contrasting approaches?

~~~
icebraining
I don't see why would the latter be any less RESTful than the former; as long
as the "foos" still have their canonical URL, I don't see any constraint being
bent or broken by having their representations be sent in another
request/response.

In fact, I'd say the latter approach is actually more RESTful, since the
former requires the client to build URLs, which breaks if the server changes
them.

~~~
tasomaniac
+1

------
the1
* document your media type: [http://www.iana.org/assignments/media-types](http://www.iana.org/assignments/media-types) * document link relations: [http://www.iana.org/assignments/link-relations/link-relation...](http://www.iana.org/assignments/link-relations/link-relations.xhtml)

the rest is usually using HTTP properly.

------
shirro
I stumbled upon this post recently and I love it. It summarises very nicely my
own feelings.

I went down the path of exploring HATEOS and Collection+JSON and it all
started to feel like enterprise architecture style over complication for very
little benefit.

~~~
smizell
I believe it's actually the other way around, where HATEOAS is the least
"enterprisy" of the options. When I think of enterprise or over-complication,
I think of a system where logic is so deeply embedded in the server and client
that if the client or server changes in any way, everything breaks. REST was
designed to better handle changes between a client and server. For instance,
if you built your API to use something like Collection+JSON, then it could be
instantly read and understood by any Collection+JSON client (much like in the
way the HAL browser works [1]). You could change things, move things, even
redo URLs, and the client will still work.

If, on the other hand, you don't conform and use HATEOAS, you have to go look
at the documentation on how to build the URL to make a call to the API. Your
client can never explore an API, but must be built to make certain calls to
certain URLs that are built a certain way.

True REST with HATEOAS was what HTTP 1.1 was built around. It may take some
extra work to implement, but it was designed to make communication over the
web simpler (and it's worth the trouble). Unfortunately, this article is not
really endorsing what REST is, but rather a very common RPC that has been
popularized by Rails and labeled as REST.

[1]
[http://haltalk.herokuapp.com/explorer/browser.html#/](http://haltalk.herokuapp.com/explorer/browser.html#/)

------
ajtaylor
Great ideas and tips. As I read through it, I found myself mentally saying
"yep, did that" for most of the sections. One thing I disagree with was what
to do when a new resource is created. I think the Location header is a great
idea (which I'm not doing) but I also think returning the new entity in the
body is a good thing in that it saves another request.

The new tips for me were the Link: header for pagination and using 422
Unprocessable Entity for validation errors. How did I overlook 422 before?
I've been using 400 for validation problems, which I wasn't very happy with
but didn't know what else to use.

------
jimfuller
of course 'dont use xml'; unless your data mainly looks like documents, needs
to handle mixed content, need to represent richer types and/or if you already
have a robust, full featured, stable XML toolchain.

~~~
e12e
XML gets a lot of flack, and I think part of it is that "enterprise" solutions
have mis-used and abused it so much (and part of it might be that people
started out with only DTDs for specifying schemas).

Today, I think it's obvious that xml shouldn't ever really be written by a
human, it should be used for machine generated interchange of data, much as
html -- markdown and rst are reasonable languages to write in -- html
"requires" a gui/wysiwig editor -- just as it really isn't pleasant to work
with eg: xml config files directly. Or ant-scripts. It just isn't. Yes, you
can hide the ugly with a smart editor, but it's still pretty ugly.

Now, html might be pleasant compared to writing in Postscript -- but it's not
really pleasant to write (hence things like sparkup).

~~~
ams6110
The strength of xml is xsl. The ability to declaratively transform the content
into another form. I'm not aware of any way to do this with json other than by
hand-rolling code.

~~~
e12e
I'm not entirely certain that is a plus... Is writing a compiler/translator in
xsl really more pleasant/easier to maintain than writing it in a "proper"
language?

That said, I really enjoyed: [http://www.amazon.com/Program-Generators-Java-
Craig-Cleavela...](http://www.amazon.com/Program-Generators-Java-Craig-
Cleaveland/dp/0130258784)

edit: for example: you _could_ transform an ant build.xml file -- but _should_
you?

------
cientifico
Following your examples, one of the ways of searching is by issuing a GET to
/search?params. The problem with this, is that is easily to reach the maximum
size of the url. For example, if you want to know the data of your facebook
frriends you will issue a GET /data?user[:fb_id]=123&user[:fb_id]=534...

There is a solution for this less ugly than using POST, is to sent data in the
GET request. This is not forbidden by any RFC. Is just not so common. Most of
the client libraries supports that.

~~~
icebraining
Why not:

    
    
      > POST /search
      > #query data
      < 201 CREATED
      < Location: /search/2324242
      > GET /search/2324242
      < 200 OK
      > GET /search/2324242?page=2
      < 200 OK
    

(hopefully the representation would be an implementation of a standard paged
format, like RFC 5005, but I'm not holding my breath)

~~~
cientifico
Because if you are not changing nothing on the server, you should not use POST
requests.

------
pjwerneck
That's Best Practices for Designing an HTTP API. Of 22 points, the author
deals with HTTP implementation details in 14, and REST is supposed to be
protocol agnostic and leverage on the standard. In 6 points he goes directly
against the REST principles, and in 2 other points he directly contradicts
them.

Sorry, but if you follow his guide, you won't have a RESTful API at all. I
don't understand how can someone sell that as a "best practices" manual.

------
ozh
Another interesting resource: list of best practices on
[https://blog.apigee.com/taglist/restful?page=5](https://blog.apigee.com/taglist/restful?page=5),
with comparisons of big names such as Twitter or Facebook

------
giergirey
One thing the (otherwise thorough) article doesn't mention is whether to
provide a machine-readable schema for the API or not. Some integrators do seem
fond of being able to generate stub client code automatically ...

------
baruch
Is there a good way to declare a REST API and automatically have a cli created
from it? Having a simple GUI would also be great but that seems like a much
less likely chance.

~~~
veesahni
We don't have enough standardization for that yet.

Old enterprisey API's had things like WSDL[1] which made auto-generated
clients possible. But added their own overhead and complexity. REST API's tend
to be much simpler and don't have anything like this at the moment.

[1] -
[http://en.wikipedia.org/wiki/Web_Services_Description_Langua...](http://en.wikipedia.org/wiki/Web_Services_Description_Language)

------
Keyframe
Honest question. Do people actually use PATCH and for what?

~~~
rglullis
In a standard CRUD application, PATCH _should_ be used for "U"pdates. So you
should be able to change attributes of a resource without having to send all
parameters to it. It gets a little confusing when you think that most people
use POST to create resources, and PUT to do updates (partial or not) and leave
PATCH without too much practical use.

So, if you want HTTP purity, think of PATCH as "the verb to use for partial
updates without having to send all parameters in the form".

------
elwell
Umm... this has been in the top Google results for "api best practices" for
months. I use it all the time.

