Hacker News new | comments | show | ask | jobs | submit login
Dropbox starts using POST, and why this is poor API design (evertpot.com)
154 points by treve on Mar 2, 2015 | hide | past | web | favorite | 48 comments

So the solution for DropBox to overcome their size limitations with GET is to:

- Use a new REPORT verb that will confuse API developers who've never heard of it (and is likely unsupported by existing HTTP middleware/libraries/frameworks/etc)

- Increase latency by decoupling creating the query from executing it, doubling the number of requests, forcing storage of every request permutation where the query is now no longer visible as to what it does by looking at the HTTP request in isolation.

Basically use "proper solutions" irrespective of worse Developer UX and/or end-user experience which is ok because "HTTP Commandments", and not accepting these compromises is poor API design!

Love the "1. Don't break the web" moniker that's thrown around in-place of explaining exactly how it will break the web. SOAP and other protocols have been using POST exclusively for over a decade without breaking anything. I've seen a number of successful companies using POST for posting complex queries over the years, e.g. from ESRI's ArcIMS requests, AWS for complex queries and many RPC solutions using HTTP, etc.

Obviously using GET for queries is preferred, but when the QueryString reaches 2kb-8kb it becomes less useful to cache (large key size/entropy) and less human readable at which point using POST is a perfectably acceptable compromise.

That's the thing: POST is "safe". POST is protected because it assumes it (1) will change data and (2) is not idempotent. Its results will not be cached, and it will not be re-issued. If you are designing your own logic on top of it (a la SOAP), and your application logic on top of that, POST is almost the perfect solution. While I really like my REST, I see why you'd use POST in this case.

> POST is protected because it assumes it (1) will change data

That's is not correct. That's a community assumption that you're parroting. I have seen and implemented plenty of endpoints where this is not true. There is no technical implementation barring a POST from any particular technical action.

Author here. I definitely understand the thought process that went behind dropbox's decision, and which arguments got the most weight. It must have been pragmatic and valid design within the design considerations they valued the most.

The other suggestions I gave will for some people largely be an academic. But for the people that do strive to build a more RESTful service and do see the point of doing so, or for the people that want something that takes better advantage of the standards, I wanted to present a few alternative ideas.

Whether those are right for you are highly dependent on the situation.

What I meant with "don't break the web" comes down the fact that using SOAP, RPC or whatever take advantage of some the technologies that came with the web, but forgo the ability to for example link to the data that these services provide. Soap uses HTTP as a transport mechanism, but I wouldn't really call it part of the web.

Your headline of "...this is poor API Design" is clearly not written as here's a "REST-ful alternative to using POST for large queries".

There's a strong case for taking advantage of every technology at your disposable to deliver the most compelling value and end-user experience possible, as-is the trait/focus of successful companies (ala DropBox).

And how exactly is REPORT better for linkability? Why is being able to link a 8kb QueryString more important and worth sacrificing over all other end-user features/qualities for? and what are other examples of "breaking the web"?

REPORT definitely still has the linking problem. REPORT only solves the issue that POST is unsafe/non-idempotent. This is just a marginal improvement, and would not qualify as restful.

The 'creating a query resource' would be an option for those that do want a linkable resource.

Why linking is important? I don't think I can quickly sum that up in a comment, or properly do it justice. If you're genuinely interested I would be happy to look for and suggest a few resources that do a much better job than I ever could.

Regardless, I don't believe that HATEOAS/RESTful design is the be-all and end-all. I think certain web api's are particularly well-suited for it, but it's a type of design that is not appropriate for every problem.

Oh man, I love ServiceStack and have built some incredible API's on it. Just wanted to say thanks :)

This is exactly the kind of design that creates overly complex software. It's design for the sake of purity rather pragmatism. Some guy like Roy Fielding proclaims the "right" way to do things and developers end up doing backflips to accept the dogma. Dropbox is correct in this case, and this would make a good interview question to see what kind of engineers you are about to hire and what the long term impact will be on your code base.

TBF if API developers get confused with a newish HTTP method, they're not very resilient. Said HTTP middleware can be adjusted easily, the methods are simple enough usually.

It's not poor. Using REPORT will confuse the API users a lot more than using POST. Not many know what REPORT indicates and it probably will confuse a lot of proxies, and clients.

REST sticklers need not take REST as a bible.

> Using REPORT will confuse the API users a lot more than using POST.

Yeah, this has been discussed a bit in the thread on the Dropbox explanation [0] for the decision, which this thread seems destined to rehash, but REPORT is a not-generally-supported-outside-of-WebDAV WebDAV-specific method whose specification includes WebDAV-specific behavior that other applications probably don't want, its not a generic GET-with-semantically-meaningful-body method.

HTTP should have such a method and, in its absence in existing spec, it makes sense for one to be proposed via RFC as a general-purpose extension (much as happened for PATCH).

OTOH, REPORT, while perhaps the closest thing in any existing RFC to that method, is not that method.

EDIT: It might make sense, however, for the new method to be called "REPORT", with a spec that generalizes the existing REPORT method in a way that drops the WebDAV specific behavior so that the existing WebDAV version is a special case: rather than a representation of the resource, REPORT asks for a report about the resource, and the specifications of the report are in the body. There's a lot of good ideas in WebDAV and its extensions tied up behind specifications that are too context-specific for general use isolated from the rest of WebDAV.

> REST sticklers need not take REST as a bible.

Taking REST as a bible may be a problem, but its not the problem being experienced when people are suggesting using REPORT for a generic GET-with-a-body.

[0] https://news.ycombinator.com/item?id=9133469

We use POST when we need to because it's an understood solution. I really wish we had a better option. There are so many tools we can't use without modifications. Some examples:

Varnish. POST responses do not have cache control of etag or last modified. Nothings stopping you from adding them, but without guarantees no general purpose tool is going to look at them. For our larger datasets, serving from a fast cache is going to make a world of difference.

Shadow. https://github.com/twilio/shadow Twilio built this out for their billing system as they migrated to a new platform. It only works with idempotent requests (basically everything but POST). It executes the request on both the new system and the old system and compares the output.

The proposed solution here is one I've pondered quite a bit. Making all large queries into two requests would help a lot. The first request is a PUT and returns a URI to a new resource. The second request is a GET.

For large queries with significant payloads, two queries that result in a 304 is well worth the cost. The point of a 304 is to reduce data transfer (as well as cost of generating a response). The new resource URI does not need to be stored indefinitely. It's a time sensitive URI.

All that said, I haven't gone that route yet. It's too new, too "unique snowflake". The old _method=get approach with a POST is going to be understood by all (even if it makes many people cringe).

I really like how the Google Translate API handles this issue[1]. The actual HTTP method can be POST, but the intended HTTP method must always be GET (using the "X-HTTP-Method-Override" header).

[1] https://cloud.google.com/translate/v2/using_rest#WorkingResu...

> A GET query is always a URI. Anyone can link to it.

Except this isn't really useful outside of bookmarks in a browser. Who would bookmark an API endpoint returning JSON? In code, it's just as easy to make a request with query parameters as it is with a post body.

    #!/usr/bin/env python
    import requests
    requests.get('http://example.com/', params={'key': 'value'})
    requests.post('http://example.com/', data={'key': 'value'})

Which endpoint (or resource locator) you hit probably shouldn't dictate the representation type you get back. That's what the `Accept:` headers should be used for. If you hit it with a browser, you'd expect to get back some html version of the same resource.

If you use "Accept:application/json", then you should expect to get JSON back. Etc.

A fair point. I was referring to the api being separate from the frontend, not what the api is returning. But I could see someone designing their application the way you describe.

And when someone does design their application in that manner, the various benefits of doing things 'right' start to pay off.

If a resource or result is addressable it means that a 3rd party can build an API that integrates with my API and link straight to results of certain queries.

Granted that this would be hard in the context of the dropbox API, because they already 'break' a lot other rules.

Yes, if reusable/sharable queries are something you want, then you don't really have a choice but to expose an API for creating and retrieving them. Not sure if you're trying to make some other point related to the `Accept:` header the parent comment was about.

> this isn't really useful outside of bookmarks in a browser.

a GET query's "always-URI" status is useful to caching proxies. so going against the protocol could mean you have additional work to do configuring your proxies.

Especially when a good majority of APIs require you to pass something like an OAuth token in the header, ruining the bookmarkibility completely.

Also, the behavior of a GET can change based on the headers. (think: basic auth.) So no, it can't be purely linked to.

I think, if an API is well-designed, you should strive to not let the resource change depending on the authorization headers.

Ideally, they should only make a difference in the fact that access is granted, or not (401 for bad authentication, 403 if you're simply not allowed).

This is not possible everywhere, but it's definitely something to try to aim for. If you can't, you can still use facilities such as the Vary header to indicate that the authorization header alters the result.

I'm just pointing out that GET requests and linkable URLs are very different things. Even consider Accept headers—it's perfectly valid to respond with different content if the request wants it.

The alternative approach seems like a very complex way of solving the problem. Given the choice between that and POST, I'd choose POST every time.

I can't imagine the REPORT method is well supported and therefore will practically have the same limitations as POST or worse won't work at all.

I agree. PHPBB has been using that since the dawn of time, and I always get confused looking at the url - there's no context to my search

It's actually quite well supported. In the same way that PATCH didn't suddenly cause issues everywhere. A HTTP-compliant actor must let through any method and most vendors seem to follow that rule quite well.

I'm not convinced that every firewall or proxy is HTTP-compliant either by accident or on purpose. And then even fully HTTP-compliant actors are unlikely to treat REPORT any differently from POST.

My two cents:

- if your queries get that complex, introducing a subordinate query resource is a pretty reasonable measure; in a world where people commonly use URL shorteners, why is this controversial?

- "don't break the web" isn't just about bookmarks; it's also about interoperability, in general

- for example, supporting GET means responses are now cacheable, which may be useful in many situations

- the author is mostly just pointing out that DropBox hasn't followed the REST style, and he's correct; there are reasonable design alternatives that do

- therefore, if REST is good, than this design is bad; given the success of the Web and HTTP, whose fundamental design follows the REST style, it always mystifies me how many people find this objectionable

Agreed. Particularly "don't break the web". Points and kudos for a clean solution to complicated GET queries.

I think the main problem people have with this, is that REST is not necessarily easy. And easy sometimes wins over 'good'.

Strict REST can also result in a lot more requests than a denormalized API might, and that's mobile-unfriendly.

Strict REST doesn't require normalization; there is nothing in the definition of the REST architectural style that prohibits resources that are aggregates of -- in DB terms, equivalent to whole tables, or rows from a complex view, or whole complex views -- of other resources, and allowing those derived resources to be directly retrieved or manipulated (with the latter also affecting the "base" resources.)

The kind of REST that simply maps resources to DB rows with collection resources for base tables and does nothing else is naïve REST, not particularly any stricter REST than more complex and application-suited resource models.

you're absolutely right. and i'd be fine with that if the objection was just that: this is too hard to implement, or we don't have time, and so on. it's the rationalizing that's annoying. and obviously DropBox has the resources to implement whatever design they choose.

Seriously? /delta doesn't seem to be a RESTful endpoint so why not use POST if the data can't fit in the query string.

From rfc 2616:

"- Providing a block of data, such as the result of submitting a form, to a data-handling process;"


His alternative approach is actually pretty clever, even if not the best for Dropbox's API. I can see a number of advantages.

If the query result is it's own resource, you get async for free (the POST can return the URI it will be at before it's ready), the server can cache the result indefinitely if wanted, and the client can treat the API as an immutable dataset.

Isn't this exactly for what HTTP SEARCH was intended?


HTTP REPORT was always supposed to provide additional information _about_ a resource not the resource itself.

> Isn't this exactly for what HTTP SEARCH was intended?

Yes and no. This use case is approximately, on an abstract level, what RFC 5323 "Web Distributed Authoring and Versioning (WebDAV) SEARCH" is intended for -- sending a query as a request body and getting back a list of results matching the query.

Unfortunately, however, the actual specification of "Web Distributed Authoring and Versioning (WebDAV) SEARCH" is -- as its title suggests -- deeply tied to the rest of the WebDAV infrastructure, specifying that servers supporting it must support XML requests, must handle them in a particular way tied to WebDAV, and must respond with the WebDAV-specific 207 Multistatus response code with a response that uses the WebDAV-specific multistatus format for the response body.

While less specific on an abstract level to this use case, the WebDAV REPORT method is defined in a slightly more general way (but still has some WebDAV baggage that makes it not quite general.)

In a perfect RESTful world, we'd have one general purpose HTTP method (call it QUERY) that is safe, idempotent, takes a request body as well as a URI, and returns a representation of the result of the transformation defined by the request body applied to the resource specified by the URI, and searches and reports of the type addressed by the WebDAV SEARCH and REPORT methods would just be different types of request bodies that could be sent in a QUERY.

(I don't see WebDAV multistatus response code as really needed, a 200 with an appropriate content-type for the body serves the same purpose.)

> This means that if a POST request fails, an intermediate (such as a proxy) cannot just assume they can make the same request again.

If this is actually true for your API, then yes, there's poor API design involved, but it has nothing to do with using the POST method.

If there are really business rules that say some resource can't be created twice (for some definition of resource uniqueness), then the API should enforce that with a 403 that explains sensibly "duplicate entity" or some such error message. [1]

[1] http://tools.ietf.org/html/rfc7231#section-6.5.3

I think the response code you are looking for is HTTP 409 (Conflict) [1]

[1]: http://tools.ietf.org/html/rfc7231#section-6.5.8

That's just a 'why don't they' argument. If you want to criticize dropbox for their API choices you'd do well to put yourself in their shoes first, once you do that it all makes perfect sense. What works well for a hypothetical solution may not work well - or even at all - when you're operating dropbox.

The original post from Dropbox says that they considered the pros and cons of a bunch of options, and they thought this was the best solution. I'm inclined to agree with them.

You can have a good API design without following REST to the letter. Usability always trumps standards.

Standards go a long way toward facilitating usability. Web developers should know this as well as anyone.

Of course they do, but there are times when it makes more sense to deviate from them. It always comes down to why. If it's because some designer thinks it looks prettier that way, or because some inexperienced developer doesn't know any better, then it's a bad idea. But that's not the case here.

Is any of this more than academic philosophizing arguing about the pedagogy of verbs used in http APIs? Does it actually matter?


> (Psst -- any library that doesn't let you use arbitrary methods/verbs is not RFC 2616 compliant.)

Neither now-obsolete RFC 2616 nor the current HTTP/1.1 RFC set (7230-7239) require clients or servers (and, hence, client or server libraries) to use or accept, respectively, all HTTP methods, including those not defined in the RFCs themselves.

Its true that a library which doesn't support arbitrary methods cannot be used to build all possible RFC-compliant applications, but that is a different issue than the library not being compliant with the RFCs itself.

The correct approach is as outlined in the "alternate approach" section. For complex queries you need to POSTS a query resource at a dedicated endpoint. You can then later GET a search result using the query resource.

PUT is idempotent, why not use PUT instead? Semantically wrong, I know, but better than POST.

I learned about this pattern a few months ago from a couple of helpful Germans:


Works great.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact