It would be nice to have a transcript. Video description:
-----
When it comes to serverless applications one principle rules them all: idempotency. This ability to rerun a function, make a REST API call, or process a message any number of times and ensure the same result is extremely hard to do reliably. In this talk, I’d like to explain and explore the importance of idempotency and discuss how we can implement it in our own systems, including:
1. Idempotency’s relationship with good serverless architecture.
2. How safe retries are impossible without idempotency.
3. Examples of serverless frameworks and tools that help/encourage idempotency.
You’ll leave this talk with a deeper understanding and appreciation for this tongue-twisting principle and start combing through your code with an idempotent eye.
YouTube has transcripts now. On desktop, in the ellipsis menu under the video title on the right. On mobile, a "show transcript" button after expanding the video description.
They're not exactly formatted for nice reading, but on desktop it's possible to copy and paste the whole transcript wherever you like. I find the speech recognition astonishingly accurate. Of course you will still find mistakes, but overall it is almost always possible to get most of a video's content much faster just by reading the transcript.
I find text so much more accessible than video. Rarely have I seen a video that couldn't have been a bunch of slides or a blog post.
Unrelated to this video here, I started to wonder if that's why a lot of the content that tends to be associated with conspiracy theories or regarded disinformation is in the form of video and not text. Video is just so much harder to scrutinize and reason about because it's so difficult to reference specific arguments when they're talked about and not written down.
Idempotency is great to have and almost always non existent. It is just infuriating as a dev to keep coming across this mirage of an idea. I don’t get to reprocess the same bill twice without bad consequences. I don’t get to ship the same order twice.
Not everything can be idempotent, nor immutable, nor stateless. But if you adopt the core assumption that you will achieve those states in the places where they can be achieved (and attempt to make them easily achievable with patterns that support them), then when you do isolate situations that cannot be, like physically shipping a product, you can pay more attention to to making them correct. Most of the innumerable in memory, business logic, database operations, etc., behind the shipment can be made idempotent, which gives you more confidence that you didn't trigger a physical shipment because of an issue.
In other words, the decision to ship can be made idempotent. And that, in a modern software app, the result of a pretty complex set of things that can individually go wrong.
You could argue that Amazon made themselves something of a success by bringing at least the principle if not the actuality of idempotency to the shipment itself. By making it easy to get a replacement shipped.
Hmm, not really, I was more giving the GP the benefit of the doubt that they have some, so maybe they'll chime in. I think they were focusing on the fact that idempotent systems often need to interact with other systems that are not idempotent, but then the act of dealing with those systems, the surface area, can often be made idempotent. To raise the stakes very high, I picture software operating a military robot, where it is quite important that it fire its gun only the specific number of times it is cleared to by external logic. Instead of attempting to guarantee that the robot is sent a 'fire gun' message only the requisite number of times, the system should focus on putting the robot into a 'fire gun' state, the messages of which that sent the robot into that state can be made idempotent...
I certainly support your toying, because I think a lot of impedance to idempotency is the lack of easy patterns to achieving it.
In fact, multithreaded synchronization in general is non-idempotent. You don't want to send a synchronize signal twice, you need to send it exactly as many times as necessary. (Ex: mutex lock / unlock, which is probably the simplest version of a sempahore that you can get. Semaphore of size 1)
Agreed, though I think the GP was looking for a situation you can't refactor into becoming idempotent. Increments and decrements like traditional semaphores are definitely not idempotent, but couldn't you make an idempotent semaphore by changing the signature to be increment/decrement(source, from, to), such that it only applies the operation if the current state is equal to from, and the source is one particular subscriber? Then from the semaphore's perspective it's receiving a time series of messages some of which have the same identity, until all locks have been released and the owner gets notified.
> but couldn't you make an idempotent semaphore by changing the signature to be increment/decrement(source, from, to)
Think about a mutex lock: if the semaphore (aka: mutex) is already locked, your signature is: sem_wait(mutex, 0, 0).
Which is not idempotent. Two "sem_waits(s, 0, 0)" mean that you need two (other) threads to unlock you. One sem_wait(s, 0, 0) means that only one other thread needs to unlock you.
------
Semaphores used in this manner are how you implement reader/writer locks, as well as thread barriers. (If 100 threads are in existence, you wait for 100 semaphore_posts from those 100 other threads).
None of the semaphores or mutexes are idempotent. And never can be.
The dunning chain is dark and full of terrors. It's more Kafka (append-only log of unique operations) than some Platonic idea of idempotent operations.
Working on an ERP software atm, A lot. Often invoices HAVE to be immutable for accounting purposes, so changing the address has to keep an old copy around for the old invoice, while attaching it to any new ones/ones not printed yet. Along with that, some other data may be tied to that address that cant be changed. Like tracking a package in transit that has been sent to the old address.
Like Rust, you keep the truly non-idempotent actions minimal, isolated, and well understood. Then you wrap everything around those models and state changes with idempotent behavior.
As a trivial instance, you can guard a non-idempotent email send event with a database table and idempotency key. When you attempt to send the same message twice, you'll see that you have already done so.
With diligent engineering, this type of thinking can scale to non-trivial active-active concurrent writes and more.
The reasons for teams to choose serverless approaches (easy, “no devops”, frontend languages on the backend) seems culturally incompatible with the sort of truly hard up-front design work required to make operations idempotent. For example, it’s not trivial to avoid duplicates when handling HTTP POST operations in the presence of retries under load conditions even with a relational ACID compliant database supporting the effort.
So, while I find myself adamantly in agreement with the presenter —- idempotency is great, I don’t have much hope for the broader adoption of these ideas in driving meaningful improvements towards correctness. Serverless systems seem fated to approach Excel workbook levels of accuracy- it’ll work most of the time.
With the exception of things that can be trivially idempotent like pure referentially transparent functions and ffmpeg/image processing, of course.
I was reading the Expanse books and one thing that drew my attention is how easy and seamless they do data syncing over hours of light delay, asynchronous data updates, inter-operability... No way it would with our CRUDs and current SQL databases and programming languages
I don't think this is necessarily exclusive to Serverless, rather any event-driven architecture. I work with a mostly event-driven (via GCP Pub/Sub) microservices backend, and adopting statelessness and idempotency where possible massively simplifies application logic.
Without idempotency your transient error retry logic becomes really complicated. You need to handle which stage(s) have completed successfully and which failed, which then makes your service stateful.
All this complexity comes from a lack of idempotency, if you can get rid of that, you could just Nack the message and let Pub/Sub retry it automatically with exponential backoff.
In my work in severless, we see 10-40 (known) dropped requests power day. I highlight known because many requests fail with transient network errors that never make it to our logs, and others fail somewhere in the cloud provider black box before we get a log, but the caller gets a response from the cloud providers load balancer.
While 10-40 is not a lot in the scale of hundreds of thousands of requests per day (were not that large), when that request is a write event, that's very detrimental to customer perception/experience... It might even cost your business money (failing to turn off/delete something)... And the easy answer of "retry the request" fails with write events that are not idempotent... WHEN should you retry? A gateway error doesn't necessarily mean the back end didn't get the request.. just that your didn't get a response.
It's even worse when what you'd expect to be internal communication has this same failure rate.. your back end processes getting a random 504 gateway error in places it never happened before.. breaking longer running processes.
Welcome to the cloud. It has so many oddities and overall crappiness that's been excluded from all the marketing content. Places where you can engineer wrongly or shoot yourself in the foot.
Sorry low effort response, I'm on a phone. But my experience resonates with your comment a lot!
They go into it a little bit by referring to AWS' 7 principles of well architected applications one of which is being event driven. From experience my gut feeling is a large number of services running on servers PaaS are event driven, therefore it's important to build them to be idempotent beause of the "at least once" property of many messaging systems.
> Idempotence is the property of certain operations in mathematics and computer science whereby they can be applied multiple times without changing the result beyond the initial application. The concept of idempotence arises in a number of places in abstract algebra (in particular, in the theory of projectors and closure operators) and functional programming (in which it is connected to the property of referential transparency).
> Idempotency: An operation is idempotent if the result of performing it once is exactly the same as the result of performing it repeatedly without any intervening actions
Failure rate is higher in cloud severless than on transitional machines or in VMs... Especially since many people do not do the extra steps necessary to skip going too the edge of the cloud provider and back in.. do they often go through additional services that can break that make never would when the requests are on the same box.
We see something like .0001% failure rate with API gateway behind custom DNS, and supports response was "this is in SLA, try adding retry code" (which needs idempotency).
So we have to do extra engineering, including idempotency and bypassing API gateway in some cases, to mitigate the issue.
-----
When it comes to serverless applications one principle rules them all: idempotency. This ability to rerun a function, make a REST API call, or process a message any number of times and ensure the same result is extremely hard to do reliably. In this talk, I’d like to explain and explore the importance of idempotency and discuss how we can implement it in our own systems, including:
1. Idempotency’s relationship with good serverless architecture.
2. How safe retries are impossible without idempotency.
3. Examples of serverless frameworks and tools that help/encourage idempotency.
You’ll leave this talk with a deeper understanding and appreciation for this tongue-twisting principle and start combing through your code with an idempotent eye.