Hacker News new | comments | show | ask | jobs | submit login
Idempotent Web APIs: What benefit do I get? (fenniak.net)
60 points by mfenniak on May 6, 2013 | hide | past | web | favorite | 51 comments



Some usability feedback: It seems like the most useful information is contained in the links between nodes, but at first it was not obvious to me that they were clickable, or that clicking the text links at the bottom of node descriptions would actually show the 'link' description, rather than just taking me to the next node.

I'd recommend you make it clearer that the linking lines between nodes are clickable. Even better, rather than using a modal dialogue to display node contents, you could use a popover with an arrow pointing to it's owner. Clicking a link a the bottom of an item would simply move the popover, making the relationships between various items of content clearer.


Thanks for the feedback. There are usability issues, and as you point out, the importance of the links is not evident at a first glance. I like the popover idea; I'll have to give that a try.


I like the idea. Have you considered adding weights? I'd also consider linking to a more detailed source (wikipedia?) when available.

On the topic of idempotence..we recently made our platform distributed across 4 datacenters using queues. None of us really knew what we were doing. Recognizing this very early on, we made all calls idempotent. Pretty sure that single decision is what made the entire system work.


I have absolutely considered adding weights. But, it quickly leads to the idea that my priorities are not necessarily the same as everyone else's, and that leads to user-customizing, and that leads to... well, it got to some cool ideas, but I figured first I should shove this out the door in a simple form and see if people like the concept. If the concept is well-received, I'll continue to develop it and explore other API best practices in a similar manner.


Awesome idea.

I know they don't get too much love on HN, but it would be cool to see this done for a set of enterprise integration patterns.

See:

http://www.eaipatterns.com/toc.html

http://camel.apache.org/enterprise-integration-patterns.html


Thanks!

The enterprise integration patterns links are interesting to me; thanks for point those out. I can definitely see how they could use a connection to reality. I can look at a random one and see what it is, but understanding why you'd use it is a mile above my head.


Some of the EIPs suffer from the same 'head-in-the-sky' issues as the GoF design patterns, but there's a few that have come in handy (like the idempotent receiver!).

The link to Apache Camel is a bit more informative with actual examples with code snippets.


Reply to mattacular, whose comment was deaded:

Idempotence (literally "same power) in a REST API means a given request produces the same result whether you execute it once or multiple times. Given the URL:

/resource/1

Whether you execute the DELETE method on that resource once or ten times, the resource is deleted. Therefore, it is idempotent.


This is something I've wondered about. So you get a DELETE request against /resource/1 and you delete the resource. When you get the second DELETE request, shouldn't that give a 404, since the resource no longer exists?


You should indeed give a 404 (or a 410, if the fact that a resource used to exist there but doesn't anymore is meaningful information in and of itself, but that depends on your API). That doesn't affect whether or not the application is idempotent: the state of the server matters, not the responses.

Assume for the moment that DELETE on /resource/1 actually removes /resource/1 from the namespace, and doesn't do anything else (other than logging; logs don't count). If DELETE is idempotent in this case, then whether you call it one time or ten times, either way, /resource/1 is not there when you're finished, and nothing else has changed. The client might see an OK response to the first call and error responses to the others, but from the server's perspective, it's the same as if the call had only been made once. That's what makes it idempotent.


> The client might see an OK response to the first call and error responses to the others, but from the server's perspective, it's the same as if the call had only been made once. That's what makes it idempotent.

This seems like a rather hollow meaning of idempotency. HTTP is a communications protocol. Why bother specifying idempotency if it doesn't apply to the communications?

The client gets no benefit at all if the server doesn't faithfully serve idempotent responses, and in fact a great deal of the benefit of idempotency is lost. Specifically, the client cannot reasonably resume a dropped connection if the server changes its responses during a retry scenario (which RFC2616 says the client should do). Would it be idempotent if the first response to "PUT /foo" was 204 and the second response was 405?


You're mixing up safety with idempotency. What you call "idempotent communications" are only possible on a read-only server: if the state of a server can change, then the communications must change to reflect that. HTTP doesn't assume a read-only server: three of the common verbs (GET, PUT, and DELETE) are idempotent, but only one (GET) is safe.


No, I'm not. HTTP is a communications protocol. It doesn't dictate what a server should be doing internally, only how it should behave in the context of an HTTP session. If idempotency isn't important to the communications, why put it into the standard? For that matter, if idempotency is only relevant on the server side, then the server could do anything in response to a client request and it seems that the spec would be satisfied.

    void processRequest(httpcontext c)
    {
        // who cares?; this looks "idempotent" to everyone
        doSomethingArbitrary();
        c.status = rand();
    }
RFC 2616 specifically discusses resuming in the face of dropped connections. This is only feasible if calls are idempotent from the client perspective (as well as the server perspective).


It makes sense to return a 404 for DELETE request on an already-deleted resource, but the response to a request is simply a representation of the resource. My point was to note that DELETE is idempotent - the resource itself is similarly deleted no matter how many DELETE requests you submit to its URL.

Edit mfenniak suggests using HTTP 410 Gone instead, which makes more sense than 404 Not Found if you can configure your server to remember that a resource was deleted.


Why give out an error at all? The intention of the API call is fulfilled either way.

The spec regarding this: "A successful response SHOULD be 200 (OK) if the response includes an entity describing the status, 202 (Accepted) if the action has not yet been enacted, or 204 (No Content) if the action has been enacted but the response does not include an entity."

204 is what I would return for most DELETEs because there is no status (entity) left after a normal delete.


The problem I find with this solution is that it makes it difficult to tell if the client is operating properly. That is, I can't tell if my client is spuriously trying to delete the same resources from the status codes if there is no difference between a status code for successful deletion, and an attempt to delete an object that does not exist.

As an API consumer, I would prefer a 404 not found - so that I may easily and accurately populate up the information that a resource I expected to be there, and to delete, wasn't there when I tried to delete it. Especially in complex applications where various clients may be running against the same data set across many distributed nodes, this can be demonstrative of many much harder-to-find problems when every action is "success."

I also would prefer 200 - with a description of the entity that was deleted, again, for verification and system maintenance needs.


> That is, I can't tell if my client is spuriously trying to delete the same resources from the status codes if there is no difference between a status code for successful deletion, and an attempt to delete an object that does not exist.

Presumably, if you are logging status codes received by your client, you are also logging the requests that produce them; you should be able to determine if it is spuriously trying to delete the same resources from the request history. The only status code you need to make this determination is the 2xx received from the first successful request.


Experience has taught me that it is far easier to catch negative anomalies (an apparently bad thing happened), than it is to catch positive anomalies (things that look normal, but are actually bad).

Consider that to catch two nodes deleting the same resource, you'd need to correlate the messages from all nodes. (Or, at least, know that there may be a problem with nodes trying to delete the same resource, and plan by recording every resource deleted in some other node.) Whereas, with a 404, you can determine that there is a problem in one message - and have the resource information to search through your history and find related events. Given that some of the systems I've worked in have had thousands of nodes - I'd rather not go through the log history from each node on every normal event to verify that it isn't actually abnormal. At a certain scale, relying on correlation and positive anomaly detection approaches such a level of difficulty as to be nearly impossible.


> Consider that to catch two nodes deleting the same resource, you'd need to correlate the messages from all nodes.

Generally, two nodes trying to delete the same resource isn't an anomaly. (The result of those attempts being something other than the resource being deleted exactly as if one attempt was made to deleted would be an anomaly, as it would the same node attmepting to delete the node more than once after having received confirmation that it had been deleted on an earlier effort.)


>Why give out an error at all?

Because that is useful information. What benefit is there to hiding the information?


I draw the line on what a real error - 4xx doesn't seem to fit the bill. If you want information then a 2xx code and/or a JSON body would be ideal (for me).

I think this is the crux of the debate is where do you draw the line?


That doesn't make any sense. The line was already drawn, in the HTTP spec. Follow it. Returning 200 OK for everything and then putting an error in the the body is completely useless. The whole point is that there are common, well defined errors which HTTP specifies codes for. Clients can properly handle those errors regardless of what the backend is like. Nobody wants to have to write millions of different solutions to parse your JSON error responses and figure out what they mean and what to do with them.


I think the server should return a 2XX code for multiple deletes. Idempotency is not restricted to the server, and responding with a 4XX in response to delete #2 means that the action is not idempotent from the client perspective, because the repeated action produces a different result.

If delete is considered idempotent despite changing the status code for repeated calls, then we must consider the status codes to be irrelevant to idempotency, which seems like completely wrong.

    response.status = rand(); // is this really idempotent?


The client perspective isn't relevant, because the server has no control over what the client does. That's why idempotency is restricted to the server. You're thinking of safety.


Of course it's relevant. Why else would a communications protocol bother to specify it?


Because the communications become useless if they do not accurately reflect the state of the server. Sending anything other than an error when the client does something it shouldn't (for example, calling DELETE on a resource that no longer exists) leaves the client in a state where it doesn't know what's going on.


The only meaningful measure of the server's state is in how it expresses that state via the communications protocol. As a client, what the server does internally is irrelevant, so long as it behaves consistently when I talk to it.

I find it farfetched that the spec authors would specify that an HTTP-compliant server should not do something completely unexpected in the face of a DELETE on a nonexistent resource. What's the alternative to "internal idempotency" here? To randomly restore the deleted item? To delete something else instead? Anything other than "note that the item is already gone and return" would be a bug. So why would anyone put this in the spec? They wouldn't, which is why the spec is clearly discussing idempotency with respect to the communications, and not with respect to the server's internal state (which again is not something the RFC mandates).

On the other hand, the RFC does specifically call out that clients should automatically retry idempotent operations in the event of a closed connection, which makes no sense if the communications will not be idempotent. The RFC also calls out that internal state may be changed in the face of an idempotent (and even a safe) operation, another indication that what's being specified is not internal server behavior/state (because if it's mutating state it's not internally idempotent).

> the client does something it shouldn't (for example, calling DELETE on a resource that no longer exists)

You're begging the question by assuming that a client should not call DELETE on a nonexistent resource. It's perfectly valid to interpret "DELETE" to mean "make sure nothing exists at this URL" (I believe the RFC does), in which case a delete of a nonexistent item is a successful call.


>The RFC also calls out that internal state may be changed in the face of an idempotent (and even a safe) operation, another indication that what's being specified is not internal server behavior/state (because if it's mutating state it's not internally idempotent).

Ah, now I see the crux of the problem: you've mixed up safety and idempotency. It is perfectly all right for an idempotent action to change state, as long as performing the same action twice is functionally no different from performing it once.

This is different from safety (which some call nullipotency), where performing the action once or twice is the same as performing it zero times: i.e. nothing changes. The HTTP spec notes that it has no way of actually preventing servers from ensuring that safe actions change nothing, but it still warns server developers against doing so.

But the final proof that idempotency is not with respect to communications comes from the spec itself: RFC 2616, section 9.1.2, first sentence. The spec even explicitly states that error or expiration issues can cause different responses on subsequent requests, which is what makes it clear that idempotency is not with respect to communications.

"Not Found" is an error condition with respect to the server. That is a fact of the HTTP server specification: the client wanted to do something with a resource that the server couldn't find. It might be that this condition is precisely what the client wanted, but the server has no way to know that: it can only answer from its own perspective. It is up to the client to sort the matter out on its own, but fortunately, this is not hard. A client that expects for resources not to be found from time to time need only listed for these cases, and deal with them however it wants to. The server need never know.

>You're begging the question by assuming that a client should not call DELETE on a nonexistent resource. It's perfectly valid to interpret "DELETE" to mean "make sure nothing exists at this URL" (I believe the RFC does), in which case a delete of a nonexistent item is a successful call.

Actually, the RFC doesn't say that. The relevant section (9.7) states that DELETE is for actual requests to delete something. Nothing else. To "make sure nothing exists at a URL", the proper method is GET. You can also use HEAD if you're worried about a large entity-body taking up too much bandwidth. OPTIONS would work too, and it has the advantage of only touching the metadata, but not all servers support it.


> Ah, now I see the crux of the problem: you've mixed up safety and idempotency.

That's not the miscommunication here. The reason I mentioned state mutation was only to clarify that the RFC does not dictate consistent internal server state. Even a "safe" call is only considered safe from the view of the caller. It also does not "warn developers" against changing state as a side effect of a "safe" call. It specifically states that this may be considered a feature, but that the user does not request any such side effects. Again, the concentration here is on the behavior of the communications, and not on the internal behavior of the server.

> But the final proof that idempotency is not with respect to communications comes from the spec itself: RFC 2616, section 9.1.2, first sentence. The spec even explicitly states that error or expiration issues can cause different responses on subsequent requests, which is what makes it clear that idempotency is not with respect to communications.

We're clearly reading this RFC differently. I see the error and expirations as being server-side. X expired? Sure, "GET X" will return a different result. Disk failed (aka error), certainly the response is going to change.

The RFC calls out changing the behavior in response to errors or expirations as an exception to idempotent behavior. I see this as confirmation that the server should behave consistency from the perspective of the caller except in cases where this is not possible.

> "Not Found" is an error condition with respect to the server. That is a fact of the HTTP server specification

I don't agree that "Not Found" is an error condition for delete. This is not in the spec. It's your interpretation of the spec. Nowhere in the RFC does it say "if a request comes in for X and X does not exist, a 400-level error should be returned". (If it did say that, PUT would be quite useless.)

> Actually, the RFC doesn't say that. The relevant section (9.7) states that DELETE is for actual requests to delete something. Nothing else. To "make sure nothing exists at a URL", the proper method is GET. You can also use HEAD if you're worried about a large entity-body taking up too much bandwidth. OPTIONS would work too, and it has the advantage of only touching the metadata, but not all servers support it.

You misunderstood what I was saying. "Make sure nothing exists" isn't an intention. It's a command. It's "Hey, server, make sure that when you return, nothing exists at location X." It's a delete, but a less "strict" version than yours, which is "Hey, go find out if something exists at X, remove it if so, and throw an error otherwise."

I don't think we're getting any closer to agreeing on this. I looked for a more definitive answer from one of the RFC authors, but couldn't find anything. It's just this same discussion playing out over and over with other people.


I'm not entirely sure about what error code to return, but that wouldn't affect idempotency, as long as it remains deleted.


"410 Gone" is probably preferable to 404, as it would indicate there used to be a resource at that URL.


410 Gone means that the URL is permanently gone. It would generally be inappropriate as a response to DELETE, because a client with permissions to delete likely has permissions PUT, and could therefore cause the given URL to become valid again.


Very interesting, but I would say your tradeoffs aren't really accurate. Idempotency should decrease testing and maintenance costs because it is simpler.


Interesting, Seems like the Haskel benefits but implemented in mortal technologies.


I don't get why there should be an intrinsic higher development cost since you're building a simpler, saner and better-defined system. If anything, it would decrease development costs.


It will decrease development costs for the end users, but most times when you say you want a simple api, you mean you want to take a really complicated system and wrap it up in a far simpler interface (hence the "abstract" part).

Making good abstractions for a complicated system is always harder than just a 1:1 API.



If you have to explain this to your boss, why is that person your boss?


If you think technical prowess is a requirement for being a boss, reality is going to be quite shocking for you.

And, I don't even think it should be a requirement (depending on the role), if said boss hired you because you have the technical experience needed and trusts you, and is willing to learn, I wouldn't mind working for him.

http://management.fortune.cnn.com/2011/06/01/why-bosses-dont...


I should have clarified, I fully appreciate that most bosses do not actually know the technical minutia of their underlings. That is how hierarchies work...

My question was really, given this shared understanding. Why is this limited level of autonomy so common? How much longer can work that requires deep technical knowledge be directed and controlled by those who do not understand it.

In short, why are bosses not mentors. I don't mean in the general sense, I mean literally. My boss should be the person I respect, not who I answer to.


Then what term, if not "boss", do you use for someone who lacks technical skills but has the skills to decide how to manage the investments and balance profit against growth opportunities and for those reasons gives you direction and instructions?


Ehhh dont disagree with you on a theoretical level, but are the average software development manager management skills sooooooo strong that it totally overrides basic technical proficiency? definitely there are a individuals that this is true for... but most development managers naw...


Except in very rare cases where a company is staffed almost entirely by former engineers, it is almost always the case that when you go far enough up the management hierarchy you will encounter someone who is perfectly smart, but doesn't happen to know why idempotent calls are useful (or even what "idempotent" means). That is the person this chart is intended for. It may not be your immediate boss, but perhaps HER boss, or another step up the chain.


Because a good boss makes sure you've actually thought about what you're doing, not just blindly parroting "design patterns", "best practices" and "standards".

Don't get me wrong, all of those are generally good things, but if you can't draw up a diagram like that in 30 minutes for any design pattern, best practice or standard you follow, you should take that as a hint that you're doing something wrong.


Couldn't we find a nicer word for "idempotent"?


Indeed. It's a very clunky word, not least because it's unclear which syllable to stress.

I've variously heard people say both i-dem-PO-tent and i-DEM-po-tent. (In the latter, the "o" in "po" gets reduced to a schwa.)


Since when a noun pronounce should be obvious?


When I don't want to sound snooty, I say rerunnable.

Fixed point is another way to get at it, although it requires your listener to know what that is.


The term is very well-established in mathematics (for e.g. matrices and transformations) that is the precisely correct term for this, and it is already widely used in CS. Inventing a new word would not be helpful in the long run.


I find a world where we're locked into using words mathematicians invent a very bleak one indeed.


What do you mean by 'nicer' word? I really honestly don't get it.

I mean if it were some consumer property, like say the size of a hard drive or such, then yeah makes some sense, for a term used in strictly in technical circles to describe a precise technical property, I don't understand what 'nicer' means.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: