
Idempotence: What is it and why should I care? - scottietom
http://cloudingmine.com/idempotence-what-is-it-and-why-should-i-care/
======
ageitgey
This is vital if you are designing apis or clients that deal with charging a
user money. It should be literally impossible for a user to accidentally get
charged twice due to a flakey connection if you design correctly.

The trick is to have the client generate a random 'idempotency key' (a uuid)
to start each logical transaction and have the server use that id to prevent
double charges of the same transaction. By always passing that key, client can
request that the payment be processed a 100 times with no fear of it being
processed more than once.

This stripe blog post has as good a description as any:
[https://stripe.com/blog/idempotency](https://stripe.com/blog/idempotency)

~~~
rectangletangle
This is solid advice. Another common trick is to disable the submit event when
the submit button is clicked for the first time, preventing two requests from
firing, when the user double clicks. Then re-enable the event, if the request
fails. Ideally this is done in addition to server side nonce validation, and
not as the only preventative measure, because browser differences, or network
issues could cause a double request (more likely if the HTTP GET method is
used, instead of a proper POST). Regardless server and client side techniques
should be used together to create a better UX, similar to using both client
and server side email format validation.

~~~
digitaLandscape
I don't like this because in practice sites usually fail to re-enable the
submit button if something goes wrong. Just let the user submit multiple
requests if they want to retry, don't take that away from them, just make it
harmless.

~~~
jen729w
Newbie programmer here, but even I am already implementing finite state
machines which handily fix this sort of issue.

Any sort of UI without them now feels archaic.

~~~
hood_syntax
Agree 100%. If you don't have explicit state transitions in a UI, you're
asking for trouble. I think some people don't realize how much harder they
make it on themselves by not setting clear boundaries; they see the up front
cost and balk, when it saves a non-trivial amount of headache in the future,
not to mention reducing mental overhead when you've got the structure hashed
out.

------
not_kurt_godel
I find that the easiest way to reason about idempotency is thinking about what
happens when a client incorrectly retries a request (that is, resubmits it
even though it was successful the first time). You should always strive to
make sure that the duplicate request is handled gracefully and without
negative consequences as much as possible. Requests that mutate state should,
as a general rule, accept the desired state as an explicit parameter - avoid
toggles, increments, decrements, 'go to next step' types of calls. Idempotency
tokens are another valid strategy, though they can be clunky to implement.

~~~
drawkbox
There was a story about toggles in apis on HN a few months ago [1][2],
basically the toggle function was called for the garage door, which is clearly
not idempotent, rather than an action like open/close so that is wasn't
repeated. The toggle action led to opening and closing of the door repeatedly.
Non idempotent requests are especially bad when tied to real physical
machinery.

[1]
[https://twitter.com/rombulow/status/990684453734203392](https://twitter.com/rombulow/status/990684453734203392)

[2]
[https://news.ycombinator.com/item?id=16964907](https://news.ycombinator.com/item?id=16964907)

------
empath75
I once wrote a Jenkins job that created a sub directory every time it ran, if
it didn’t exist and somehow managed to write it so every time it ran it went
another layer deep, adding another nested subdirectory each time and it was
set to reuse the same workspace. After a few months, all the jobs on the
server started failing because the worker node ran out of inodes. And that was
my introduction to idempotence.

------
gotodengo
Reminds me of the Spotify unicode username issue[1]. Where a function assumed
to be idempotent (think tolower(username)) actually wasn't with certain
unicode inputs. Allowing account takeovers.

[1] [https://labs.spotify.com/2013/06/18/creative-
usernames/](https://labs.spotify.com/2013/06/18/creative-usernames/)

~~~
tzs
spotify> For example it is hard to see the difference between Ω and Ω even
though one is obviously a Greek letter and the other is a unit for electrical
resistance and in unicode they indeed have different code points

This surprised me, because the correct Ohm symbol is in fact the Greek letter,
so why does Unicode have a special code point for it?

Unicode also does this for Kelvin, where the correct symbol is a capital K but
Unicode has a separate code point for it, and for ångström where the correct
symbol is a capital A with a circle above it but Unicode gives it a separate
code point.

They do not do this for Newtons (capital N), Joules (capital J), Watts
(capital W), or anything else I can see where the standard symbol is an
ordinary letter or group of letters.

In all three of these cases the Unicode Consortium recommends NOT using the
separate code point.

So...what's special about Ohms, Kelvins, and ångström that (1) gives them
their own place in Unicode, and (2) what is the point since we are not,
according to the Unicode Consortium, supposed to use them?

~~~
akira2501
> So...what's special about Ohms, Kelvins, and ångström

Nothing other than misguided thinking in the early versions of the standard.

The other problems with these special symbols is that if you call tolower() or
similar on them they'll return the "normal" character they're based off of. So
toupper(tolower(char)) != char.

~~~
r_c_a_d
Does tolower() or toupper() even make sense with general unicode characters? I
wouldn't expect it to... but I've never really thought about it before :-)

~~~
tialaramex
Mostly, we're used to defining tolower() and toupper() to return either a
lower or upper case variant if one exists, otherwise you get back what you put
in. For most Unicode codepoints no such variants exist and so you just get
back whatever you fed in. Some of the alphabets have uppercase/ lowercase, but
obviously most writing systems don't do this.

However, lower(upper(X)) is not defined to be the same as lower(X), and
there's no promise that meddling with a string transforming with lower() or
upper() does what you hoped because that isn't how language actually works
(e.g. in English the case sometimes marks proper nouns so "May" is the Prime
Minister of the UK, but "may" is just an auxiliary verb).

Where standards tell you something is case-insensitive, but it's also allowed
to be Unicode rather than ASCII, you can and probably should "case crush" it
with tolower() and then never worry about this problem. In a few places you
have to be careful because a standard says something in particular is case-
insensitive, but not everything that goes in that slot is case-insensitive.
For example MIME content type names like "text/plain", "TEXT/PLAIN" and
"Text/Plain" are case-insensitive, but

multipart/mixed; boundary="ABCDEFGHIJKL" multipart/mixed;
boundary="abcdefghijkl" multipart/mixed; boundary="AbcDefGhiJkl"

... declare three different boundary tokens, and none of them matches the
sequence abCdeFghIjkL.

------
idempotent
A function f is idempotent if: f(f(x)) = f(x)

The "absolute value" function is idempotent: abs(abs(-42)) = abs(-42)

The "squared" function is not: sq(sq(7)) != sq(7)

~~~
kazinator
This is about idempotent _operations_ , which are basically state changes that
have the same affect if executed multiple times as if executed once:

    
    
      $ chmod a+x file
      $ chmod a+x file
    

This is linked to mathematical idempotence in the sense that an operation is a
function which takes some inputs i and the state of the world S and produces a
new state S':

S' = f(S, i).

So then if f(f(S, i), i) = f(S, i) then _f_ is idemponent mathematically, and
the operation is idempotent in the software sense.

~~~
rhacker
Posting this to drive people crazy :) (only kidding)

How about state changes as a side effect, with a near meaningless result.

LockAccount('user')

The account is now locked.

LockAccount('user')

Error: The account was already locked!

However, the internal state of the user after the second operation remains the
same. We also cannot use any of the information in the result as something
meaningful as shown in these math equations because the API has a wall over an
opaque data structure. Which is often not the case in Math. Similar to chmod,
there might be a low-level file API that throws an internal error because the
file is already executable, but the chmod wrapper hides that possible error.
(It probably doesn't have an error internally, but for argument sake).

~~~
not_kurt_godel
A more idempotency-friendly API would be:

    
    
        setAccountLockStatus('user', LockStatus.LOCKED)

~~~
morokhovets
Or maybe ensureAccountLocked('user')?

------
drej
I work in data engineering and fixating on idempotence has been one of the
best things I've ever done. Now whenever we build a new job or we review an
existing one, the first question (well, second, first being 'do we actually
need this?') is usually 'is this idempotent?' Saves SO much hassle. Processes
fail, nodes disconnect, OOM kills stuff, these things happen on a daily basis
in larger systems, be ready for that.

~~~
SmirkingRevenge
Amen. And if you combine idempotent operations with atomic operations wherever
possible, it gets even better!

------
CaliforniaKarl
One other reason to care: If using TLS 1.3, having an operation be indempotent
increases the chances that the client could safely use 0-RTT for that specific
operation.

------
pmarreck
One of the practical applications of this concept I've found is that I try to
write idempotent database migrations (so rerunning the migration, which is a
common necessity while you're developing it, but is also useful if problems
occur, won't error).

So in essence both the "up" and the "down" migrations are idempotent and warn
if they are not (and why).

~~~
flukus
Database migrations are inherently stateful so I'm not a fan of indompodence
here, it can leave the schema in some arbitrary states. I much prefer tools
like flyway ([https://flywaydb.org/](https://flywaydb.org/)) that are more
deterministic, each migration will only be run once so you're going from known
state to known state.

~~~
pmarreck
I've had situations where a migration that ran locally just fine, failed in
staging and/or production (config-level stuff running into hosted DB access
rights, etc.). Those situations are a mess to untangle without idempotent
migrations.

Also, minor niggle but Flyway isn't database-agnostic, you'd have to use the
SQL of whatever DB you happen to be using (although if you code in Java I
guess you could use ORM commands)

------
mbushey
Here's a great blog on the subject: What is Idempotence?
[https://www.sendthemtomir.com/blog/what-is-
idempotence](https://www.sendthemtomir.com/blog/what-is-idempotence)

------
JauntyHatAngle
Interesting, never thought about applying idempotency to the more traditional
programming areas like web dev, although I would like to think I develop my
apps like this even without thinking about it in terms of idempotency.

For those who might not be in the know, this is a crucial concept for IaC
through tools like Puppet/Salt/Ansible etc, as it allows you to think/program
infrastructure configuration as a state rather than scripting everything and
having to take account of all the minute states that may exist on a legacy or
well entrenched system.

------
hnruss
I often combine idempotence with the command-query separation principle (CQS)
by making it so that queries are idempotent. This tends to encourage simple,
predictable code.

------
mlajszczak
Idempotence can be very helpful when one strives for resiliency.

Suppose you have an application that processes tasks that (among others) call
an external API. Both external API call and task processing can fail
independently. Moreover, task processing can fail after API call succeeded. If
you the external API is idempotent, you can simply retry on any task
processing failure, no matter when it happened. It can simplify error handling
a lot.

~~~
NightMKoder
One thing to note about this - idempotency is a very simple concept, but the
implementation is actually quite hard. Case-in-point: the kafka producer API
recently gained support for "idempotent/transactional capabilities":
[https://kafka.apache.org/documentation/#upgrade_11_exactly_o...](https://kafka.apache.org/documentation/#upgrade_11_exactly_once_semantics)
. That is to say exactly-once persistence by using an idempotency key.

Everything looks bulletproof unless you take a step back. To connect with the
example you gave: just having the kafka producer guarantee that one call to
send() is idempotent is not enough. Your application needs to be able to be
idempotent - e.g. if the same RPC/web request has to be retried on a different
server. You need to be able to pass an idempotency key TO the kafka API -
which you currently cannot do. The API currently allows for either a
global(ish) lock or duplicated messages - so it's not quite idempotent.

Idempotency needs to be end-to-end, otherwise it doesn't work. Unfortunately
that's very rarely the case - almost nobody tries to idempotently make XHR
requests to their servers. In effect it's almost always easier to de-duplicate
idempotently on read rather than attempt to write idempotently. It's a really
hard simple problem with lots of corner cases.

------
js2
At 13:45 in this interview from 2003 with Sergey Brin and Larry Page, you can
listen to them try to explain idempotentence on the air to Terry Gross and her
NPR audience:

[http://www.npr.org/2003/10/14/167643282/google-founders-
larr...](http://www.npr.org/2003/10/14/167643282/google-founders-larry-page-
and-sergey-brin-part-2)

~~~
rimliu
And in 2005 Google came up with Google Accelerator, which has shown many
sloppy programmer what happens when you use HTTP method which is supposed to
be idempotent—GET—for others purposes. What happened that GA crawled all the
links it found for prefetching and some of them were "delete" links in admin
interfaces. I think that was the biggest push not to use GETs to modify data
:)

------
tw1010
I read the title as impedance, then I read the article and kept reading
impedance, and was about to write a comment mentioning that that's not at all
impedance he's talking about, it's idempotency, but then I read the title
again and now I can't understand why I read it as the former all that time.

~~~
yjftsjthsd-h
That's okay, I read "impotence" and couldn't believe that anyone would need it
explained as to why they should care.

------
wincy
A comment about the site. On an iPhone X and I’d imagine other phone screens,
the margins are huge. Like the margin takes up maybe 15% around the text, then
there’s padding on the div taking up maybe 10-15% more space. So there’s 3-4
words per line without turning on reading mode.

------
dmourati
I just think of the Wailers song: Do It Twice. "I'd like to say baby, you so
nice I'd like to do the same thing twice, yeah!"

------
ianamartin
Pure idempotency isn't usually desirable. In any important database table
where this can be an issue, you want two timestamps. 1 for when the row was
created and 1 for when it was last updated. The upsert should change the
ts_updated value, and the attempt should be logged.

If you want genuine pure idempotent interactions, you can't do that, and you
have to rely on unique constraints in the database and swallow that particular
error, so that nothing about the universe of the application state changes and
nothing gets logged.

But that is a pretty garbage way to do things. As with many things, some
moderation and flexibility are a good idea.

Instead of focusing on the exact meaning of the word, we should focus on
making sure that nothing bad happens if someone does something twice. That's
what the operational concept of idempotence is.

It's relatively easy to get something to happen at least one time, and it's
slightly less easy to get something to happen at most one time. Getting
something to happen exactly one time is really, really hard. That's why we
should care: most of the things we want to happen will happen more than once
in any nontrivial system.

Purity in concept isn't important. Safety in the sense that nothing bad
happens the second or third or nth time around is.

Side note about the Stripe blog linked in the current top post by ageitgey:
that's not a useful solution. You can't trust a uuid created outside the
context of the uniqueness that needs a guarantee. That's one of the
fundamental problems of distributed systems. And it's one I would think Stripe
should know better than to espouse since, after all, they did hire aphyr.

You need something closer to home, not something received from a relatively
untrusted source. Before you gripe at me and tell me that uuids are, in fact,
uuids, let me explain. There are all kinds of situations that can force a
client to regen a uuid.

A gas station pump resets because of a blink in power. It remembers all the
information about the transaction except for the uuid because the programmer
was smart and wanted that to be, well, unique. And he wanted the pump to be
smart and retry the failed attempt. Same txn; different uuid.

The user hits the refresh button in the middle of the txn, but the rest of the
form data is cached. Not the uuid.

The corner store owner who hates people who use credit/debit cards in general
in Queens gets pissed because it takes more than two seconds to process, and
pulls the power plug to reset it. Yeah, POS units should clear after that, but
they don't always.

User's phone switches from cell service to wifi. Forces a refresh in the app
in the middle of a txn. It's still listening for the same transaction response
and when it times out tries again with a different uuid.

These are real scenarios that we have to deal with. Trusting a client to only
ever retry with the same uuid is not safe. So when I say that you need a uuid
in the same context, I mean the uuid created in your database when it receives
an auth request, inside the SQL transaction used to create the entry for that
attempt.

Everything else about the transaction is used to fingerprint it with the local
uuid as the upsert key. Once that is in place, then you can start to have some
duplicate txn safety. That's the beginning of your hell, but it's a better one
than trusting client devices.

I'm not trying to take a potshot at Stripe. I have tons of respect for them,
but that particular article is misplaced. You can't just pass uuids around and
think that you're safe because they are actually unique. I wish it were that
simple.

It most definitely is not, as the person said, "the trick."

~~~
lmm
> you want two timestamps. 1 for when the row was created and 1 for when it
> was last updated. The upsert should change the ts_updated value, and the
> attempt should be logged.

If timestamps are worth recording they're worth recording in the normal way
you'd record business data. Associate a timestamp with when the user made
their request, not on when your server happened to process it.

> A gas station pump resets because of a blink in power. It remembers all the
> information about the transaction except for the uuid because the programmer
> was smart and wanted that to be, well, unique. And he wanted the pump to be
> smart and retry the failed attempt. Same txn; different uuid.

Don't do that. If it's a transaction ID, generate the ID for the transaction
and keep it with the data for the transaction.

> The user hits the refresh button in the middle of the txn, but the rest of
> the form data is cached. Not the uuid.

Again, keep the ID with the data it goes with.

> Trusting a client to only ever retry with the same uuid is not safe. So when
> I say that you need a uuid in the same context, I mean the uuid created in
> your database when it receives an auth request, inside the SQL transaction
> used to create the entry for that attempt.

That's an unscalable approach. If you can afford to run everything on a single
database with monotonically increasing time then sure, knock yourself out, it
makes everything easier and is good enough for a lot of cases. But in these
days of client-side UI, like it or not you are working on a distributed system
where the client and server are separate nodes, and distributed system
techniques are your best hope of getting sensible behaviour.

------
drTriumph
I've seen this before but it never hurts to re-read

