Hacker News new | past | comments | ask | show | jobs | submit login
The Idempotency-Key HTTP header field (ietf.org)
332 points by detaro on July 4, 2021 | hide | past | favorite | 132 comments



It's a bit funny to see an IETF draft for this!

Adding idempotency support to Stripe's API was one of the things I built while an intern in 2015. We've since rewritten it almost entirely, but many the technical decisions made in 2015 still stand.

One of the things that IETF doesn't specify ("an appropriate response") is that Stripe's Idempotency API has a response body has the latest rendering of the resource. Here's what I mean:

  # Set tax exempt status
  curl /v1/customers/cus_AJ6m5vWl7scnn6 \
    -H "Idempotency-Key: aaaaaaaaaa" \
    -d tax_exempt=reverse

  → {id: 'cus_AJ6m5vWl7scnn6', object: 'customer', tax_exempt: 'reverse', ...}

  # An unrelated request sets the customer's email
  curl /v1/customers/cus_AJ6m5vWl7scnn6 \
    -H "Idempotency-Key: bbbbbbbbbb" \
    -d email='foo@bar.com'

  → {id: 'cus_AJ6m5vWl7scnn6', object: 'customer', tax_exempt: 'reverse', email: 'foo@bar.com', ...}

  # Set tax exempt status with the first idempotency key:
  curl /v1/customers/cus_AJ6m5vWl7scnn6 \
    -H "Idempotency-Key: aaaaaaaaaa" \
    -d tax_exempt=reverse

  # The customer is re-rendered on the latest version
  → {id: 'cus_AJ6m5vWl7scnn6', object: 'customer', tax_exempt: 'reverse', email: 'foo@bar.com', ...}


What I have always wondered about, in the stripe docs it says "Stripe's idempotency works by saving the resulting status code and body of the first request made for any given idempotency key, regardless of whether it succeeded or failed. Subsequent requests with the same key return the same result, including 500 errors." which indicates that the idempotency functionality is created in a kind of layer around the main application functionality, and thus the request from the idempotency layer to the main app is not itself idempotent. So the idempotency key protects against faults on the network between the client and stripe's idempotency layer, but not against stripe-internal faults between the idempotency layer and the application. Is that the case? Why is the idempotency not achieved by using an idempotent database transaction or atomic operation? What is the value of responding repeatedly with a 500 if the original 500 was caused by a transient error?

Adyen seems to similarly implement the idempotency in a separate data store (https://docs.adyen.com/development-resources/api-idempotency...).

Both stripe's and adyen's implementation seem to not treat transient faults within their own systems correctly.


Many developers conflate idempotency and purity. It appears stripe is trying for purity, two requests always have the same response. This is likely misguided.

Idempotency would be like the API failed to generate the success response, but the backend did issue the money movement. On subsequent retries what matters is that it doesn’t do a duplicate money movement. The client ought to get a request indicating the money was successfully moved (since it was), perhaps with the earlier timestamp as a way of indicating it was already moved. Or it ought to get a special response that indicates the client can stop retrying

Sending the same error upon success sounds like the client has no way to know when to stop retrying. Sure it makes the API pure, but what problem does it even solve?


For Stripe, there's two headers that can help clients with retrying or not: "Idempotent-Replayed: true"[1] and "Stripe-Should-Retry"[2].

Whether they're helpful in all server-side error situations I'm not sure.

1: https://stripe.com/docs/idempotency#sending-idempotency-keys

2: https://stripe.com/docs/error-handling#the-stripe-should-ret...


"Should-Retry" is useful, but "Idempotent-Replayed" while might be somewhat useful for debugging, makes the idempotent semantics worse: i.e. the side-effect idempotency stays the same, while result isn't idempotent.

I guess in HTTP you may say that only the body of the HTTP response is the result, not the headers. But I implemented a similar system for GraphQL, and there are these boolean flags (isRetry and isReplayed, etc.) are part of the mutation inputs and/or payloads. The same with the "Idempotency-Key" itself, which is called clientMutationId in Relay mutation spec. The added value of not using HTTP headers is that it also works over websockets.


> So the idempotency key protects against faults on the network between the client and stripe's idempotency layer, but not against stripe-internal faults between the idempotency layer and the application. Is that the case?

Yes, in practice this is the case — the idempotency key insertion could succeed and then subsequent queries fail. Practically though, it doesn't happen very often — if a request gets as far as the idempotency layer successfully, the rest of it tends to work too.

Where it doesn't, Stripe pays a lot of attention to 500s, especially where those pertain to charges, and a lot of time and energy is spent cleaning up state that might have resulted in an invalid transaction.

> Why is the idempotency not achieved by using an idempotent database transaction or atomic operation?

The most honest answer is that Stripe wasn't built on a data store where transactions are supported, so it wasn't even a possibility until quite recently, and by then the system was already well established in its current form.

Beyond that though, once your requests are making their own requests to modify state in foreign systems (which is happening at Stripe in the form of banks, partners, internal systems, etc.), a single transaction isn't enough to keep things entirely in order anymore because it can't roll back that remote state. It is still possible to build a very robust system that is transaction-based, but it becomes a much more complex problem than a simple `BEGIN`/`COMMIT`.


> Practically though, it doesn't happen very often

It probably happens much more often than you think if you say this as someone with an insider perspective. I have had to integrate with banking systems and payment systems that use this approach, and it is extremely frustrating and comes of as a way of offloading work to the client. If a payment capture succeedes but subsequently always returns 500 an api client has to first query the status and then execute if-then logic for something that could easily just have been a retry of the request (3x the logic). This is acceptable since there are a million other ways to mess up idem potency so integrations kind of end up this way anyway. But the worst part of such an approach is debugging the issues as a client. A client cannot see the internal logs and therefore have to call support which will ALWAYS answer «the transaction seem fine with us» and basically just close the case to protect their KPIs. Im pretty sure this type of issue has a name but cannot find it (state client see dont match the real state). I dont mean to offend the work people put into these APIs, but I cannot see the good qualities of this approach other than saving development hours (and possibly saving one db query, but as an effect you get a status request plus another capture request as a workaround from your clients).


Just to clarify: I was speaking specifically about the case where you have a series of DB calls (like: auth user, retrieve account record, insert idempotency key, do more stuff), and the first one succeeds and the next ones fail. It can happen where the DB suddenly drops out as the request is executing, but it's more likely that it's either available or it isn't, so either the request succeeds, or it can't start.

And this comment was just meant to talk about faults with your own database. Once you are reaching out to other systems you see all kinds of problems regularly, but those tend to be handled more robustly because you kind of have to.


I might have misunderstood the scope of the idempotency layer, I thought it was this thin layer close to the web server that just replays the last answer from the services that are lower down. So that means stripe actually goes down to some store or external system to check the transaction status for subsequent calls with the same key?

I know how complicated integrating with stuff like the bank, 3Dsecure, AML and fraud prevention is. One of the systems I integrated with had no refund functionality, they expected a manually initiated bank transfer from our customer bank to the end user! So I certainly understand some states are irrecoverable in a web request. It is important to do though, because it saves all clients work for each automated recovery that can be done.


Thank you for the insightful answer.

I see the problem with mutations in foreign systems if those foreign systems do not support idempotency themselves. IMHO, though, stripe should abstract away faults in banks, and figure out how to work around faults in bank's systems using e.g. automated refunds when a duplicate charge is detected, and not just bubble up a 500 to stripe's customers and leave it to them to figure out. If stripe cannot figure it out in an automated way toward the bank whether the request suceeded, stripe's api customers certainly can't either, and stripe should risk double-charging the end customer knowing the the end customer will complain and request a chargeback.

> The most honest answer is that Stripe wasn't built on a data store where transactions are supported

Transactions are not necessary if one can do an insert conditional on the key not yet existing, but then it is required to have the idempotency key from the client enter into the primary key.


> IMHO, though, stripe should abstract away faults in banks, and figure out how to work around faults in bank's systems using e.g. automated refunds when a duplicate charge is detected, and not just bubble up a 500 to stripe's customers and leave it to them to figure out.

Yeah, this is what Stripe tries to do. Most problems during calls to foreign systems are handled in a ways that try hard not to send back an internal error. 500s are tracked carefully because they're painful to the user, but also because they leave behind potentially bad state that'll eventually cause problems internally and externally.

If they can be reconciled, a webhook will be fired to give the caller a more determinate answer (obviously less convenient for them to handle, but at least some sort of message makes its way back). More documentation on that here:

https://stripe.com/docs/error-handling#server-errors

> Transactions are not necessary if one can do an insert conditional on the key not yet existing, but then it is required to have the idempotency key from the client enter into the primary key.

This is how the implementation works more or less — an insert on a unique index that will error on a duplicate so you know it happened already. You'd probably implement it similarly in basically any major database whether Mongo, Postgres, MySQL, etc.

This is only a very small part of what transactions get you though — if a transaction-based request makes it midway through its lifecycle and then fails, it can roll back to a fresh slate. In a transaction-less system, you need to come up with some other answer for what to do what the partial state that was left behind.


I don’t speak for stripe, but do have similar experience with idempotent APIs.

> the idempotency functionality is created in a kind of layer around the main application functionality …

Yes. Think of it as a kind of request/response handler cache that doesn’t need to be deeply tied to the operation logic. On the request path insert an entry with a compound key for the “idempotency parameters” like client identity, operation, idempotency token, and request parameters. You can hash these if needed. On the response path update your idempotency cache with the response value. If you the underlying operation times out etc you can synthesize and insert the exception response. On subsequent requests read the idempotency key(s) from your cache and return the cached response if it exists.

> So the idempotency key protects against faults on the network between the client and stripe's idempotency layer, but not against stripe-internal faults between the idempotency layer and the application.

Maybe. The external idempotency token is there for client retries. Internally there’s idempotency all the way down to somewhere you’re (hopefully) doing a compare and swap, conditional put, or similar. A subsequent request that conflicts should be detected based on that state.

> Why is the idempotency not achieved by using an idempotent database transaction or atomic operation?

Beyond small examples you probably can’t. You end up calling multiple other services or have distributed state of some sort for most interesting operations. And distributed transactions are expensive/difficult/painful. So plumbing idempotency and error handling workflows through is simpler.


these seem extreme for most apis but are likely very good at preventing double charging. a “normal” api call usually costs nothing. i bet stripe and adyen have apis with an average cost per call (to someone) being like $20-$30

the cost to undo a charge is high in people and reputation


How could their API possibly cost $20-$30 per call? How could that even be a business model? Clearly, I am missing something here.


I suspect the OP meant to say charge instead of cost.


ah that makes sense.


unrelated but fun fact: AWS CloudHSM v1 (deprecated now) had a $5,000 api call. that was the cost to create a cluster.


Don’t some bank system transfer like the U.K. cost like ~£25 per transaction?


That's very useful! I can't decide if the architecture is more complicated or simpler because of that. It helps avoid stale data showing up in the UI, etc. It seems like you just need to store the idempotency key and the request data (in case the data changes, to catch bad requests using same key...), but not the response.

I think Amazon actually stores the response in a table for some amount of time (24 hours?), iirc https://aws.amazon.com/builders-library/making-retries-safe-... but I could be wrong.


are you mixing command and query together in one request? i thought it is a bad practice.

mutating data is command and is a separate request and fetching customer details is another request


On the contrary, for remote APIs like this having commands return the resulting state is the only sane path; otherwise you get mandatory inefficiency (doubling latency) and race conditions.


CQRS doesn’t obviate race conditions. Especially if multiple commands can be in flight at the same time (the response can still be received out of order), or if there are multiple clients mutating the same resource.


> CQRS doesn’t obviate race conditions.

I think you may be talking at cross purposes with me here. I’m saying that command–query separation¹ causes race conditions. Having the mutating API return state is the only way to obtain the value of the resource at the time of mutation.

¹ CQS; CQRS, where you use different representations for command and query, is not necessarily applicable here.


Yes you’re right. I meant to write that abstinence from CQ[R]S doesn’t prevent race conditions.


Command and Query is really just named that in CQRS, which is specifically a pattern that splits read/write into separate traffic paths. It's not the only way to do things, and I find it usually isn't that great in practicality.


It’s not bad practice. CQRS is just a pattern with pros and cons.


Googler, opinions are my own.

I work in payment and idempotency is always a discussion point. This draft links to one of our docs (though the draft points to a 404 URL, the correct link being[0]).

As the draft calls out, Google Standard Payments uses the "requestId" field as our idempontency key, which is part of the request body, rather than in the HTTP header. We did this specifically to decouple the protocol from the behavior of the request. We could take the existing definitions and put that payload into any other protocol and it would work without having to use that protocol's sidechannel data (aka, the HTTP header).

Our design makes idempontency part of the application layer, rather than any middlebox or load balancer.

[0] https://developers.google.com/standard-payments/guides/conne...


A nice feature of keeping the idempotency key separate from the payload is that a service like Stripe can build tools to help users with idempotency even if the user has no idea what an idempotency key is.

For example, take a look at stripe-go's implementation, which automatically tags a request with a key if the user didn't specify one:

https://github.com/stripe/stripe-go/blob/67034d2205c0240ade9...

This works for all mutating requests, and is useful because the built-in retry system will automatically reuse the same key that was generated. Users can get the benefits of idempotency without really having to understand very well what's going on under the hood.

I suppose you could still do that by munging each request body, but IMO it's a nice feature to make sure that requests are the same as what the user specified. Also note that in practice the implementations are probably not that wildly different under the hood — despite being in a header, Stripe's idempotency is still being handled by the same application stack which processes the payment (i.e. not a middle box or load balancer).


The downside with “automagically” trying to handle idempotency is users may not be aware of it and retries may happen across different processes (maybe they are running their application on k8s with multiple pods), which doesn’t work with stripes default behaviour.

IMO the idempotency key should be required to be set and make the user aware that they need to handle retries properly.


You might find differing philosophies depending on where you look, but a recurring theme you'll find with Stripe's is that they try to make it as easy as possible to get an integration up and running. When you're building out a payment integration, you're already awash in non-trivial concepts that are probably new and novel to you, so things that can be abstracted away for the time being to make things easier generally are.

In the situation you describe, I think it would make more sense to just retry the call a couple times from the same pod so you don't have the sizable overhead of discarding it and creating a new one for every failure, in which case the automatic keys would work fine. And if there's a really good reason they're not, setting the keys manually is very easy. At some point if you're far enough off the beaten path, you have to expect to read some docs.


Strongly agree that idempotency should be understood as part of the application, and not as part of the transport.

I find it insightful to think about a transport layer that is much less reliable than the internet: traditional/snail mail. In a proper business communication, every invoice has an invoice number such that the recipient can know whether they processed that invoice previously or not.


An invoice number is not the same as an idempotency-key. The idempotency-key is create by the recipient, sent to server and an invoice number is returned. Invoice number is clearly part of the application layer since its domain specific but idempotency-key is a generalized concept that prevents duplicate actions from occurring when there is no other key.


These are 2 different kinds of idempotency:

1. request (also command/mutation) idempotency (using Idempotency-Key in HTTP, or clientMutationId in GraphQL/Relay)

2. entity idempotency (using invoice_id)

(1) is more generic one and can be applied to any mutation with a side-effect, for example sending an email. It can also work with server-side generated IDs. It can be implemented as a generic middleware via cache or DB with TTL. That's what Stripe is doing.

(2) is more application-specific, for mutations actually changing a state (usually a DB row/document, or an entity managed by another service). Here the entity id should have a uniqueness constraint in the DB, and you should use either a semantic key (i.e. customer name), or a client-side generated ID like for example UUID.

I use a combination of both, to reduce a load on the DB while still having the same guarantees as using DB only.


Like a PO (purchase order) number?


I've worked on fintech as well. All of my full time jobs have dealt with both idempotency and API error code issues across both trading fintech as well as consumer social apps. App-level error codes, which includes idempotency, seem to be best handled in app request bodies, not the enveloping protocol layer.

In trading fintech there is the FIX API [0] which has been around for a while. It has a "Client Order Id" field which gets mapped to an (Exchange) "Order Id" field.

I've also worked on the Twitter apps. The API spec seems fairly clean [1], but at least when I was there some time ago, there were edge cases that both the HTTP error and app error codes had to be combined to find the true error in order to take the correct action and show an appropriate error message to the user.

Idempotency is a very app-specific error condition. So I'm definitely in the group of developers that would not want to see nor code for app errors at a transport like HTTP level. I feel that the header idea and spec has good intentions, but does not take into consideration the developer experience as an API grows more complex. It also forces changes at the app messaging level if one wants to support different transport types (UDP unicast or multicast, gRPC [2], etc.).

[0] https://en.wikipedia.org/wiki/Financial_Information_eXchange [1] https://developer.twitter.com/en/support/twitter-api/error-t... [2] https://grpc.github.io/grpc/core/md_doc_statuscodes.html


Very similar to the approach AWS takes with ‘clientToken’ being an API parameter, not transport/protocol header, for operationz that need it. If clients dont specify a value the SDK will populate the idempotency token and handle retries etc transparently. The approach is that API/SDK/client semantics should be similar across services, but very few users care about actual transport or “on the wire” serialization.

Disclaimer: principal at AWS, but just general observations on this topic.


That’s dead on what I’ve been doing on my projects for some time. I also track it with a correlation id once it reaches my network so that I can put all logging and context into ELK for my aggregated logging. I can then see everything that that request did in my entire system.


Interesting and I totally get your point. What protocols do you support besides HTTP?


There is the possibility of grpc, but there hasn't been a case for it yet (hence why none of our docs even mention it). Google also loves talking grpc/stubby internally, so any Google internal application to application calls would use that.


Pardon my ignorance but why does it matter that you're using RPC/graphQL or REST based service, isn't the underlying protocol still HTTP? So this spec would still be applicable, right?


AFAIK, in Google's case the end application doesn't really see the "GRPC HTTP headers" but instead they convert an incoming HTTP request to one of their frontends and route it to multiple backends (sending effectively the HTTP body serialized via GRPC), and then the frontends will simply re-create the response by unifying them.

Not a Googler, this is what I understood from reading similar things over multiple posts. Feel free to correct me.


The underlying protocol is not necessarily HTTP.


I think it is necessarily HTTP/2, but would be happy to learn I'm mistaken - it's not something I'm overly confident about.

https://developers.googleblog.com/2015/02/introducing-grpc-n...


gRPC uses HTTP/2.0 yes, so headers are a concept, but imagine taking it one step further, for instance using WebSockets. There is no concept of headers within an individual WebSocket message, so it would need to be encapsulated in an application-specific way anyway.


Probably all kinds of RPC


Making idempotency a protocol-level thing would be problematic with the many connection-breaking systems, even if the browser-ingress channel is TLS encrypted. You'd have the ingress proxy, maybe some IDS/IPS, maybe Cloudflare, maybe some router or gateway, some service proxy, who knows. If any of them decide to drop any headers, rewrite headers, or mis-parse them you'd lose the idempotency completely.

It does however make a difference if not the body, but the protocol-level request is supposed to be the idempotent one, but I don't know in what kind of a scenario that would apply.


Thanks for the insightful comment; makes good sense.

Could you kindly share your view on the security implications of this approach - in particular with reference to the security context outlined in section 5 of the IETF draft:

> For idempotent request handling, the resources MAY make use of the value in the idempotency key to look up a cache or a persistent store for duplicate requests matching the key. If the resource does not validate the value of the idempotency key prior to performing such a lookup, it MAY lead to various forms of security attacks and compromise.


Yes, that's the case if you use it for unauthenticated users or public API calls.

For that reason I do not support idempotent retries for public API calls.

For the authenitcated users, you cache/store the idempotency keys scoped by the currently authenticated user, i.e:

  {user_id, idempotency_key}
You can add more parts to thie key for better security, i.e.:

  {user_id, api_call_name, idempotency_key}
or even

  {user_id, api_call_name, input_args_hash,  idempotency_key}

etc.


It's more appropriate IMO to name it something like "Request-Identifier". All the client should indicate is that if you get 2 or more POSTs with that same Id, you are free to no-op on any after the 1st successfully handled one.

Idempotency is an emergent property of a system so it doesn't make sense that a client would dictate the key that's used to provide that behavior. So if the Idempotency-Key is quite likely not to be the actual "Idempotency-Key" in any sufficiently complex system, then don't name it that.


It matters that the client knows the server is offering idempotency, because then the client can safely do things like re-try a request if it didn't get confirmation -- at least I think this makes that possible? It is suprising to me the standard doesn't mention it though, so maybe I'm missing the boat?

If a client just knows it provided a "request-id", but not that the server provides idempotent semantics, the client can not safely re-try a request in which confirmation was not retrieved.

Or, you know how when you click the back button and see that message "this was the result of a POST request, but we can't actualy show you the response" -- because the response wasn't cacheable/cached, and the client can't safely re-do the request to get a new response, because the request was not idempotent.

The client knowning the request is idempotent avoids all sort of inconvenient (and sometimes confusing) client/user-agent behavior. For both ordinary human-readable HTML, and APIs (although by being an HTTP header, I guess this standard is only about API's as a use case?). Just knowing "we supplied a request-id we don't know what the server plans to do with it" is not sufficient. This is why HTTP distinguishes between methods that are idempotent and not -- which just means methods that it's, by the standard, the server's obligation to implement as idempotent. This is a way of making any method opt-in idempotent.

I think?


That’ll be a problem regardless – if the client provides an idempotency-key but the server doesn’t use it, it’ll reprocess on retry. Under this spec, the client needs to check server docs to figure out if/when/how idempotency keys are used.

The weird part of calling them idempotency keys is that seperate requests will often fall in the same idempotent group.

EG: If traffic lights were controlled over HTTP, a pedestrian pressing the crossing button 20 times in a row would create 20 distinct requests which would need 20 distinct idempotency keys. But pressing the button once is the same as pressing it 20 times.


What I meant is, I can imagine a client tool where, when an idemptotency-key is given by the caller of the tool, the tool can treat the request as idempotent. Triggered by the presence of a supplied idempotency-key.

The caller still needs to correctly supply (or not) the idempotency key, and supply a correct idempotency key, along with all other attributes of the request: http method, url, other headers, body. all of this needs to be constructed according to the documented behavior of the server, true.

But just like if the caller says "make a GET request" the client tool knows it can be treated idempotently (automatically retried on ack failure for instance), if the caller says "make a POST with an idempotency-key", same. If the caller says "make a POST with request-id" where that is intentionally defined with no guarantee of idempotency, then not necessarily, I think?


> Under this spec, the client needs to check server docs to figure out if/when/how idempotency keys are used.

The client needs to check the docs for other aspects of the API as well (such as the right HTTP verb, other required headers, etc)


Yes I have seen idempotency keys generated on the client that were comprised of the date and amount of money used to make a payment. Don’t do this. Just use a randomly generated request ID when the request hits the API. The idempotency key is (usually) there to prevent things like an RPC or event (in the backend) being retried causing duplicate side effects. Presumably you still want users to be able to intentionally submit two payments for the same amount on the same date (csrf tokens are more appropriate to prevent accidental duplicate submission)


I want both. If there are two network requests for the same action, I want to be able to trace them distinctly and also know that they are doing one thing.


Exactly this. Multiple requests with the same idempotency key are still different requests, and you want to be able to track them independently.

Also worth noting that a header named 'Request-Id' is already a widespread convention. You'll see one back from many popular APIs like AWS or Stripe, and definitely want a name for the idempotency header that differentiates itself from that.


They are 2 distinct concepts for sure, but if you also need to track each request independently then it would seem more robust to generate a separate Id for that from within the scope of your system. I don't think you would want to rely on a client to differentiate "physical" requests for you.

Maybe "Logical-Request-Identifier" and "Physical-Request-Identifier"?


On the surface this sounds about right, but how do we ensure that we only create one of these? Every layer will want to create a new id as not trusting the id given to it as 'new'. Having the actual client generate both is the simplest.


Providing 'idempotency' in this case commonly means to be robust against transient network partitions and timing delays. This goal can be met efficiently with a timebound nonce. (Indeed that is a common approach).

A request identifier is a much stronger and user visible property to your API. Where an 'idempotency' key is used simply to prevent replays, a request identifier is a property that a backend needs to store, index and serve queries against for a much longer period of time. Furthermore this data must be client generated by definition which would require some namespacing mechanism to keep the data model logically correct. Alternatively, just give me a random number (as this rfc suggests).


I find the definition of idempotencyvalue in the grammar surprising: it looks like RFC 7230 quoted-string, but is actually just anything except for control characters, whitespace (!) and quotes, surrounded by quotes.

Then I remember that the ETag header does the same thing since RFC 7232, where it used to use quoted-string with its escaping behaviour in RFC 2616. (And so they recommend avoiding backslashes because some recipients may mangle it.)

Anyone have any idea why they’ve gone this way (ETag and Idempotency-Key), rather than using quoted-string?

I will also note that people are already using a header named Idempotency-Key without quotes, e.g. https://stripe.com/docs/api/idempotent_requests. ETag has the “W/” indicator which requires some sort of disambiguation, but at least at this time I don’t feel Idempotency-Key needs quoting.


Actually, for that matter, look through the examples listed of entities already using a header named Idempotency-Key, and all of them use unquoted values. Yandex Cloud and Worldpay say it must be a UUID (with their examples showing canonical textual representation), Adyen says it must not exceed 64 characters, the others don’t say anything about the value. All show UUIDs in canonical textual representation as their examples (well, Adyen actually prefixes it with UNIQUE-ID-, showing practically that it doesn’t have to be a UUID).


The “Concurrent Request” part concerns me. Just because the server thinks the original request is still going on doesn’t mean the client does.

Example:

Client issues a request with a 10 second timeout. Network conditions are such that the request doesn’t get fully sent until 9.9 seconds have passed (at which point network conditions have cleared and are fast). Client gets a timeout, immediate retries. From the client’s perspective, there’s no concurrent request.

What the server sees is a request followed 0.1 seconds by an identical request. It naturally says “I’m not done with the first, the second is a concurrent request” and sends the required error.

Net result is the server has applied the request but the client is left with an error. The client must now send a third identical request just to find out the actual results.

Ultimately, the problem boils down to the fact that the client and server do not necessarily have the same idea of what requests are currently in-flight. And yet this draft declares that the server MUST return an error if it thinks the request is still being processed.


You are right that this is a scenario. I imagine that a permissible solution could be to effectively join the requests together by waiting for the (original) request to be finished.


That seems like the obvious behavior, but this draft says the server MUST return a conflict error, which means this behavior is not permissible.


I'm not sure I understand why this needs to be standardized. Is there something that middleboxes and CDNs need to do with messages marked idempotent? Would a standard header enable browsers to do something new that they're all likely to actually do? Otherwise: HTTP already specifies a bag-of-attributes header system, and this is just an application of it. Why is it in the HTTP standards?


Seems like a tail wag. The header is in common use in APIs in the wild, so IETF wants to formalize it.

Not unreasonable, but doesn't seem to add much value either.

With standardization, load balancers or reverse proxies could add a limited idempotency guarantee to reduce the load/implementation cost on the application layer. I don't think anyone is asking for this though, nor do I think they should be...but maybe I'm wrong about that. "Free (quasi) idempotency for small environments" could be a feature.


A real-world example from my previous job. You have a chat in your mobile app. Messages are sent using HTTP requests, and received using HTTP long polling. Now imagine sending a message over a slow network while on a departing subway train (service is only at stations). TCP & SSL are established successfully, then the request itself is sent, but you leave the network coverage before you receive the response. So you have no idea whether the message was actually sent. When you reconnect at the next station, you receive a message over long poll, but — because the request failed as far as you're concerned — you have no idea whether it's the message you sent. To resolve this, you need to have the ability to ask the server for a response to a previous request, but without actually performing that request a second time. Or, alternatively in this particular case, you need the server to accept a unique client-generated ID in the request and return it to you over long poll.


I think you're misunderstanding my question. I understand why idempotency tokens are valuable. What's not clear to me is why HTTP would need an "official" idempotency token. The behavior of APIs that run over HTTP is not properly part of the HTTP specification.


Maybe to make it easier for people to understand the concept - and reinforce a good design pattern? I implemented this stuff my own way whenever I needed it (e.g., for a chatbot), and every time i needed to take the time to explain it to the client app dev teams. Sometimes it took longer than desired, and pointing them to a well written RFC would help.


This is a little like saying there should be an RFC for understanding and defending against SQL injection, isn't it? Like, there are a lot of concepts that would be well-served by an "official document", but that's not the purpose of IETF RFCs.


That's one valid use of an idempotency token. Also e.g. POST submission of an order or transaction.

But these are implementation-specific, and available (and in active use) today under a few common HTTP header names.

It's not clear why the header name should be standardized.

You could move idempotency implementation up to the load balancer or the proxy layer. I'm not convinced that would be useful, because to implement the hard parts of idempotency (esp across instances), you'd need to add a bunch of extra complexity there, which is already a committed cost at the application layer.

But if, say, haproxy or nginx release a new distributed idempotency caching feature, maybe that would change my mind.


All that layer really needs to do is support caching objects. When response is returned, cache based on idempotent cache key, and on subsequent requests, lookup that value before making downstream requests.

Also, should extend the key’s value with an unguessable session id, so folks can’t can’t guess the key and read someone else’s data.


Yep, but that idempotency caching layer would need to share keys across all instances, possibly even geographically.

This is a solved problem at the application layer of course, but adding complexity at the higher (LB or proxy) layer ... well, adds complexity there and the benefit is not clear to me.


Besides providing a standard set of behaviors so people don’t accidentally mishandle idempotency in their hand-rolled attempt, it also allows networking frameworks to build in support for this in a generic fashion (if there’s any particular behavior they want to do around idempotency). This isn’t for browsers doing browser things, it’s for API clients calling APIs.


This is potentially of interest to any application making use of POST. Its generality and simplicity justifies standardisation.


The concept is worth knowing about, but that doesn’t mean it belongs in the standard, especially since there’s more than one way to do it.

I’m wondering if someone knows a particular reason why this way needed to be standardized.


"Providing one solid way to do it to act as a co-ordination point" seems like a reason that could at least be argued, but without asking the authors I have no more insight than that, sorry.


I see an interesting application for the Validity, Expiry, Enforcement, Fingerprint sections.

Many web apps generate HTML forms but submit form values via an API endpoint. With browser web tools and standard options like "copy-as-curl", it's very easy to run a "replay attack", where an adversary submits the form manually once, does a copy-as-curl (or similar) on that POST, then replays the POST repeatedly, updating particular values in the generated curl output for each subsequent POST.

This gets around form protections like captcha and allows for rapid abuse via the API that backs the form. Enforcing that an idempotency key/token be generated on the server side or that it match the submitted content or be valid for a short timespan could make replay attacks less convenient.


This is not related to replay. This is just a rate limiting issue. Throwing arbitrary auth mechanisms between the server and the client like that just means they need to make more requests per fulfilled request.

The client is authorised to make the request – it would only be a replay attack if the client were not authorised to make the request and that limitation was bypassed by capturing and then replaying it.

if you can replay the same captcha challenge over and over your captcha is broken.


With that scheme, I imagine the server would need to maintain a per-client timestamp of when a token was last generated for them, so as not to generate them too often.

But at that point, why not use that timestamp directly, to rate-limit each client's form submissions? What's the benefit of also issuing tokens? (Do you mean to thwart copy-as-curl script kiddies? Parsing a token out of a previous message to add it to a new one is a pretty low bar.)


> Do you mean to thwart copy-as-curl script kiddies?

Yes, that's what I had in mind.


Is this protection not already baked into the CSRF token?


No, a CSRF token only introduces information that is not part of the ambient authority of the HTTP client (i.e. an attacker cannot implicitly use it) – it is not uncommon to have deterministic or relatively constant CSRF tokens (e.g. the double-send cookie method). An n-once would fulfil your criteria but is importantly not a valid CSRF token unless it is also correlated or derived from something the attacker can’t know like a session; otherwise the attacker can use their own n-once to submit a victim request.


That and also, most captcha I've used so far require a server-side validation that is not replayable with Curl (as the token has been used during the first submission).


Yes, but captcha is on the form. I'm thinking more about a single-page app that submits form data via an API. That API call can generally be replayed.


“It does not matter if the operation is called only once, or 10s of times over. The result SHOULD be the same.”

The result MUST be the same.


The context is a paragraph describing a mathematical concept, not specifying the protocol. Any standardese imperative (SHOULD/MUST/...) is out of place here. It should read "The result is the same."


I think that's too strong: the response may have a component that is is non-deterministic. The important thing is that no matter how many times you call it the server state is the same.


Agreed. The state since the original success may have been mutated by another request since (or simply changed state as a result of any other async process), and the response may simply be a rendering of the latest server state.

The response of the server absolutely should not be specified as being identical, since that's not how idempotent requests work. At that point, you're really just specifying a caching key for verbs that have side effects.


Yeah, it's more about there being no side effects after the first time than asserting anything about the actual response


If you distinguish the ‘response’ from the result, then I agree. Typically I think of the result as denoting the effect, and not the response message.


Request 1:

   Status: OK
   Modified: True
Request: 2

   Status: OK
   Modified: False
Request 3:

   Status: EAGAIN
   Modified: False
Request 4:

   Status: TOOFAST
   Modified: False

Request 5:

   Status: EACCESS
   Description: Your OAuth token has expired.
   Modified: False


> The result MUST be the same.

Transient errors happen, as well as loss of state errors, and a good protocol allows for that fact.


I think this is due to side effect which are not “undoable”, like sending an e-mail. In case the timeout happen when the response is send, an email might be already send. If the request is send again another email is send. I think that’s why the RFC says “result” and not “state”.


Then it is a fundamentally non-idempotent operation?

In the case if an email you could use the idempotency key in the accounts outbox so you could detect if you’ve already sent the message and then no-op if that is the case, and you’re idempotent.

The response must reflect something useful, which is probably going to be different in those two cases.


Yes, writing status to log would be a better example; you might want to log retries.


I suspect it's meant to include behaviours like this: https://news.ycombinator.com/item?id=27730372


I'm gland for this much needed change. I hope it also makes it to frameworks and libraries such as JBoss, ExpressJS.

Idempotency is simple to understand but tricky to implement in practice because of a bunch of edge cases; handling of business failures, response (should it be the first successful response?), how to document the behaviour without confusing the developers and so on. So developers either don't consider it at all or when they do, they forget to consider all these aspect.

FWIW we, as a team, spent quite some time debating and going back and forth; I'm happy with the final behaviour we settled on though the documentation could have been better[1]

[1] https://developer.uber.com/docs/payments/glossary


I don’t get it.

For a particular request, a server must implement its documented idempotency guarantee, and the client needs to understand the server’s guarantee to know whether it’s safe to retry.

How does this header help or change anything?

It’s only real use (that I see, please correct my misunderstanding if there is one) is to paper over an incorrect API design, where you want idempotent behavior but can have identical payloads for distinct requests.

(e.g., imagine a chat app, where the message being sent is the POST request body, and there’s no other metadata about the message. In that case, there’s no way to distinguish a new “you up?” message from a retry. However, such an API should simply have a message id value sent with each message.)


It's interesting that this standardization effort hasn't happened earlier, because it seems such an obvious feature addition to POST requests. HTTP is decades old after all, and nobody thought about this prior to the payment processors? I guess nobody really cares if you have two posts in a forum with the same content, but if you do two payments it's way worse. After all, double spend prevention is what bitcoin does its hugely expensive proof of work for.


> If there is an attempt to reuse an idempotency key with a different request payload, the resource server MUST reply with a HTTP "422"

MUST seems too strong here to me. For some applications it's probably sufficient to treat this case the same as if the payload was the same, and it requires the server to persist either the original payload, or a hash thereof. And what if the payloads differ only by a timestamp, or a randomly generated id, etc.?


This (partly) solves some cases of the two generals problems, where you didn't receive a response to eg. a resource creation request but aren't sure whether that means that you can safely retry it, or need to use other APIs to figure out the ID of the created resource, if it exists

Some AWS APIs have it, eg https://docs.aws.amazon.com/medialive/latest/apireference/in... - I wish all their creation APIs did


This seems to be missing or at least conflating the meaning of idempotence.

The draft correctly identifies idempotence as a "property of certain operations", but what the draft seems to describe is locking and caching.

Locking and caching are important to building idempotent operations, but shouldn't be conflated with idempotence which is a conceptual property of the operation.

I like what the draft describes, but it would be better done in the correct context.


> Let's say a client of an HTTP API wants to create (or update) a resource using a "POST" method. Since "POST" is NOT an idempotent method (…)

Use PUT instead? The deterministic key that would go in the Header goes as part of the URI. I’m not seeing why standardize a roundabout way of doing the same. Am I missing something?


Yes, that was my thought too. The HTTP verb PUT is specifically made for idempotency, why would HTTP need a header filed for that?


Because the use case isn't necessarily modifying a resource. Charging somebody for a product, for example, doesn't have natural idempotency. They could buy the same item twice


Isn't that a modeling problem? Let's say you don't pay for an item but for a basket even if it has only one item. If each specific basket has its own unique id then you have a unique resource that can have a paid or not yet paid status, and use HTTP verbs appropriately against it (e.g., POST to add an item to the basket, or PUT to pay for the basket). Another basket, even with the same item(s) added to it, would have another id. Problem solved.


For the case of this header, you're interacting with third parties that don't speak your application domain.

That "unique id" you mention is exactly the use case for the idempotency key this RFC proposes. You can model your unique basket identifier in your application domain, and use that identifier as the idempotency key in interactions with third parties.

This is useful mostly because networks and applications are not necessarily reliable.


The semantics don’t quite work out. If you don’t know the ID of a transaction, for example, using PUT with a client derived key wouldn’t make sense given a subsequent GET would be made against the transaction ID, not the idempotency key.


An alternative is for the client to request an ID, another is for the server to respond the PUT with a 307 Temporary Redirect to GET the resource, idempotency is transparent for the client.


Why don't they generate the key from the payload if it has to be unique and request specific?


Maybe because the client should define what is the payload he wants to treat as idempotent. A request with the same message might have the same payload hash but it is sent on purpose twice.


That’s not it… this RFC requires that the server know how to distinguish two requests with different payloads but the same idempotency-key value (and return a 422 response in that case).

Not to mention, it makes little sense for a server to be able to implement idempotency without understanding the idempotency semantics of the payloads it accepts.


Your parent is correct. Think of a simplified example of posting a $1 charge to a customer.

    POST /v1/charges `{amount: 100, currency: usd, customer: 123}`
You might do that, and then later charge them another $1:

    POST /v1/charges `{amount: 100, currency: usd, customer: 123}`
The request payload would be identical, so the API can't use just that to distinguish requests. Adding an `Idempotency-Key` lets you tell the server that these are purposely two independent charges.

> this RFC requires that the server know how to distinguish two requests with different payloads but the same idempotency-key value (and return a 422 response in that case).

This is an error checking facility to make sure that the idempotency keys are being used correctly. Sending a different request with the same idempotency key is non-sensical, so the server tells its client about the problem.


Sure, but there’s no reason or value in a standard “idempotency-key”.

The API would simply have a requestId, or depositId or whatever, and place it in the header or body or whatever makes sense to the API.

> This is an error checking facility to make sure that the idempotency keys are being used correctly.

How circular. Idempotency keys are useful to make sure we’re using idempotency keys correctly.


You might have two identical requests, say for two identical credit card purchases, or you might have faulty retry on the same request.


Because only the caller knows if two payloads, despite being identical, are genuinely different requests or retries of the same request.

That said, in this age of micro services and large-scale distributed systems a request may get retried by a framework layer down a call-chain regardless of whether the request originator intended it or not.


Question: this says a duplicate request MUST respond with the previous result on retry. Doesn’t this run the risk of returning bad data to the client in the event that the API result includes the “current state” of some resource? And by that I mean if I call this API to make a change and get back the current state of the resource, and I retry my request after some other change has occurred, since this must return the previous result it’s going to return something other than the current resource state.

Having said that, it’s not actually even clear if this is what it mandates, or if it really just means it must return the same HTTP status code as the original request. It says

> The resource server MUST respond with the result of the previously completed operation, success or an error.

It’s that “success or an error” bit. That could be read to mean “if the original result was success, the new one is success, otherwise the new one is the same error as the original”. But if you ignore that bit then the obvious interpretation of the sentence is the entire result payload must be the same. Though storing the data necessary to return the original payload could be prohibitively expensive if the payload includes some large regularly-modified resource.

I also wonder why this mandates returning the same error if the original request failed. If it failed with a transient error, why not allow the retry to attempt the request again? It would make sense if the draft said it MAY do this, but requiring it to preserve and return transient errors does not seem helpful.

And now that I’m on a roll, I also wonder about scenarios where the retry should error, such as authentication issues, but the original succeeded. If I send a request that succeeds, then my auth token gets invalidated, when I retry that request should it succeed again because of idempotency, or should it fail because of an auth error? I can see arguments both ways (it should succeed so the client knows its previous attempt worked, or it should fail because otherwise the client thinks it’s auth is valid).

And finally, I feel like a retried request that had previously been fully processed by the server should include a new response header indicating whether this is a cached response. This would help the client to make more intelligent decisions about the response (i.e. whether to believe it represents current state, or whether it may be outdated). I’m tempted to say it should include the Date value that was returned in the original response in this new header, so the client can see not only that it was a cached response but also how old it is.


I think you have misunderstood slightly.

GET, for example, is an idempotent method [1], which means "the intended effect on the server of multiple identical requests with that method is the same as the effect for a single such request."

However, it is NOT true that sending the same GET request at different points in time would have the same effect - if a resource is deleted or moved, you might get a 404, 301 or 200 where previously you got something else.

When idempotence is talked about in these RFCs there's an implied caveat that the "multiple requests" they talk about happen at the same time, under the same network conditions. I think that's what they mean by "identical".

[1] https://datatracker.ietf.org/doc/html/rfc7231#section-4.2.2


GET is idempotent because it doesn’t modify data, not because multiple GET requests will return the same response. I could make an HTTP endpoint that returns random data on every request and issuing a GET to that endpoint is still considered idempotent. It’s the effect on the server that defines this property, not the effect on the client.

And no, the “multiple requests” do not happen at the same time, this draft even explicitly says that MUST produce a conflict error. The assumption is the multiple requests occur in a small window of time, but the size of that window is entirely up to the server. And even a window of time measured in seconds could still cause the data to have been changed before the request is retried.

For example, let’s consider a Twitter-like service. When you want to Like a post, that’s going to be a POST request. And it’s reasonable for the response to tell you the new Like count for the post. If you go to Like a popular post, and the request is retried 10 seconds later due to network timeout, that post could have gotten dozens of Likes in that time span. Surely the server should not be required to tell the client what the Like count was 10 seconds ago instead of the current count?


Please note that idempotency key is not specific to just HTTP requests - we may have to implement this for async use cases as well - eg. events streams & messages.


How are these RFCs written? I love the formatting is there a tool chain around these?



Seems like kind of a wordy name, why not just "nonce"?


Not recommended to use "nonce" casually in the UK. Urban Dictionary will tell you why.


IMO that ship has sailed given the word has been extensively used in software contexts for some time.


It's still somewhat a specialised word, even in software contexts. Or at least, UK people prefer not to to use it.


I think nonces are generally thought of as numbers, and as a client it might be simplest in some cases to be able to reuse or derive from an existing non-number key.



It probably makes some sense to reserve "nonce" for security/cryptographic use.


It's so funny to watch REST get "patched" with headers rather than just update the protocol. Path of least resistance I guess


“ Idempotence is the property of certain operations in mathematics and computer science whereby they can be applied multiple times without changing the result beyond the initial application. It does not matter if the operation is called only once, or 10s of times over. The result SHOULD be the same.”

Does not the use of “SHOULD” instead of “MUST” make this dead on arrival?

Edit: I see a number of others have the same thoughts.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: