Hacker News new | past | comments | ask | show | jobs | submit login
You Cannot Have Exactly-Once Delivery (bravenewgeek.com)
113 points by tylertreat on Mar 25, 2015 | hide | past | web | favorite | 55 comments

Well, you have have exactly-once delivery unless there are network partitions, which is not a particularly surprising limitation. As the author admits, it takes considerable cleverness in the implementation details in order to achieve that abstraction, but plenty of tools have done it.

"People often bend the meaning of “delivery” in order to make their system fit the semantics of exactly-once, or in other cases, the term is overloaded to mean something entirely different. State-machine replication is a good example of this. Atomic broadcast protocols ensure messages are delivered reliably and in order. The truth is, we can’t deliver messages reliably and in order in the face of network partitions and crashes without a high degree of coordination. This coordination, of course, comes at a cost (latency and availability), while still relying on at-least-once semantics."

This reminds me of the problem of reliable data transfer over an unreliable network. It's theoretically impossible, but TCP is still a practical, useful abstraction.

>Well, you have have exactly-once delivery

Is this intentional?

Is it just me or did the following alliteration catch someone's eye?

DD: I myself shared many of these misconceptions, so I try not to demean or dismiss

EE: but rather educate and enlighten, hopefully while sounding less preachy than that just did.

FF: I continue to learn only by following in the footsteps of others.

Not just you, I had the same thought, but also included:

MMM: I myself shared many of these misconceptions

(to where I mentally included deMean and disMiss as Ms as well)

There's a hidden message in there somewhere. Just keep looking.

GG: two Generals..

Sure you can. What you can't have is, after a loss of communication, unambiguous knowledge about whether the other end got the last message.

When communication is reestablished, that issue can be sorted out. See "two-phase commit".

The point of the article is that "communication being reestablished" is not a given. If it never happens, then you never get clarity about whether the message was delivered.

If nobody ever hears from the server again, then who cares what it has stored on it's forever offline disk?

Consider that the same applies to crashes in any system that does not provide you with a means of reading back the status of your previous delivery attempt.

Such as, e.g. a remote mail server if the operation was to send mail to take an obvious one.

In other words: The server does not need to go permanently offline - it may be sufficient for the established session to terminate.

Ever see Dr. Strangelove? If the last message was 'drop the Bomb' whether or not the plane received the message is a bit important.

The problem isn't that "nobody" ever hears from the server again. The problem is that "not everybody" ever hears from the server again. If you aren't sure that the server received your write, but another system (that does have access to the server) depends on the result of that write.

The point is that temporary network partitions happen and you can't guarantee delivery without expensive coordination. Sure, exactly-once is possible in environments where network partitions don't occur. Unfortunately, we don't live in that world.

"You Cannot Have Exactly-Once Delivery" is not the same as "Exactly-Once Delivery is expensive".

Which is it?

You cannot have it if the network is unreliable and systems fail. The network is unreliable and systems fail.

The point is that "Exactly-Once Delivery is expensive" is really just saying "at-most-once delivery with an eventual guarantee of the network healing"

You do, evidently, if it is extremely important to you that you guarantee exactly-once delivery.


An easy way to think about solving the exactly-once problem, on top of an at-least-once message layer, is to use the at-least-once layer to write idempotently (eg, HTTP PUT) into a sequence of numbered messages.

There is no possibility that the receiver will execute message 18 before message 17. Each PUT retries until an acknowledgment is received, and the sender maintains a sliding window of sequence numbers it's trying to write.

All the Two Generals problem tells you, in this situation, is that the sender does not know whether the receiver got the last message. It's an important issue and I agree with the general sentiment of the OP, but perhaps it can be overstated.

If you send a message to a node, the operations it perform is not atomic, so it can either

* Perform the action, then acknowledge the message.


* Acknowledge the message and then perform the action.

(assuming delivery of messages is reliable)

In either case, if they crash after the first part, it will either be carried out twice, or not at all. When/if communication is established again, you need to ensure that the node is A. able to gurantee that it knows if an action was carried out or not, B. able to undo an action if the coordinator decided to send it elsewhere.

This is not always possible, and you cannot have a message system that gurantee exactly-once, without the recieving system supporting it.

Then how does Domino's get me a pizza within 30 minutes?

I love that response :-) Imagine that your Pizza joint is in a city where there are no signs on the street, at first when the Pizza delivery trucks pulls out, he shouts really loudly "Where the hell does fsk live?" and from one of the street corners someone says "Down that way." so he goes along, periodically shouting at intersections trying to find his way. Sometimes he has to cross bridges, except these are very special, at any time they can just collapse and drop all of the traffic into the river, too bad, so sad. Or people on the corners can keep saying "he lives this way..." until the driver realizes he is going in a loop and his pizza is very cold, and not worth delivering anyway. Or my personal favorite, the pizza guy is driving along and suddenly, BLAM! someone pulls out of their driveway at high speed demolishing the pizza guy's car! Now both drivers get out, go back to where they started, pick a random number between 1 and 10, count and then start out again.

That you get Pizza at all is a freakin' miracle!

Something about this immediately conjures up Snow Crash which I hadn't thought about in years.

I was thinking it was a deleted chapter written by Douglas Adams.

I wonder how many realize that you've described the life and many possible deaths of a packet. :-)

Across millions of deliveries, I'm sure there are some failures from time to time. I'm guessing Domino's has at-most-once semantics.

And then reconsider this at "webscale" (and yeah, I hate myself a little bit for using that term).

"Millions of deliveries" with a "five or six nines" 99.999% or 99.9999% success rate would almost certainly be considered spectacularly successful by Domino's. Wikipedia says $1.8billion in revenue, 220,000 employees, and 10,000 stores - some outrageous assumptions might be made. Lets say ~$500 per day per store is, what, a 15% of revenue franchise fee as a guess? So a store on average makes $3-4k/day? So maybe 3 to 4 hundred pizzas a day, right? Five nines would be one error per store per year. Six nines would be one error per store per _decade_. I bet their real-life error rate is _way_ worse than that.

But, across 10000 stores and 220,000 employees, a billion or so pizzas a year isn't really a very big number.

Now consider Google, or Facebook, or Amazon. How many emails/adviews/status-updates/likes/widget-orders do you suppose happen every hour? Would it be as low as 1 billion per hour? And would a spectacular seeming "six nines" of reliability, which means dropping a thousand mails/adviews/updates/orders _per hour_ be acceptable?

"failures from time to time" in some contexts, can mean "fundamentally and irrevocably broken" in other contexts...

Nah, it's at-least-once. For undelivered pizza you request a retransmit. I've seen people get duplicate packets if they requested a retransmit too early.

That reminds me of thing I heard many years ago: "And now look at Czech Post, it's obviously more reliable than internet. They can deliver letters with unbounded latency, letters could be delivered out of order, letters can get lost, but no mater what they do they are not able to deliver one letter twice"

On the other hand, after few years on various logistics-related projects, I would not be too surprised if there is some postal service or carrier that is actually able to somehow physically duplicate one shipment and deliver it twice :)

Oh, that happened to me few weeks ago. Ordered a pizza, received it and 10 minutes later another delivery person showed up with another pizza -- with the same order# on receipt!

More than once delivery in real-life :)

Most systems I know which require this just do at-least once on the sending side and dedupe on the receiving side. If you build this into your framework, to applications you have exactly-once barring unbounded partitions and the like.

This works only so long as the receiving side is able to retain all messages (or message ids if those are guaranteed to uniquely reference a given payload). It does not work for outages of unbounded (or otherwise impractical) data size or duration.

My understanding of the FLP result is that it only applies to algorithms without timeouts.

Because it restates that, without timeouts, in theory, you could be waiting forever for a message to arrive. It's formally proved and whatnot, but it's not a super-surprising result when you restate it in common terms.

For message delivery, it seems that FLP says that you need to give up at-least-once, because at-least-once ostensibly requires an infinite number of retries. I'm not sure how it would apply to at-most-once.

Exactly-once is, so far as I can tell, a bit of a strawman. But it's a strawman that we all fall for, the first time we break stuff into different systems.

As a disclaimer, I am still wrapping my head around this stuff. Don't rely on me. Not even if your name is Salvatore.

> My understanding of the FLP result is that it only applies to algorithms without timeouts.

No, I don't think timeouts help. What does timeout mean here? You've sent a message and didn't get a response. If you timeout, did they receive the message or not?

Aye. Timeouts cannot save you from the recursive ack problem.

I believe the parent is thinking of the fact that FLP applies only in an "asynchronous" system in which messages may take infinite time to arrive, and without regard to sending order.

Time-outs are a band-aid that works in practice but formally speaking are not an improvement at all since for every packet sent due to the time-out you'd need another time-out.

Thanks, looks like I'd misunderstood the theoretical finding.

Recursion is hard.

No you can't have exactly once delivery. However you can mitigate this if your datastore for your output from message processing is the same as your datastore for queue. With that and atomic mutations (de-queue and return result), it does allow practical solutions for almost all edge cases.

Another great read on this subject "Exactly-Once Delivery May Not Be What You Want" https://brooker.co.za/blog/2014/11/15/exactly-once.html

I hope the author isn't making an argument that network partitions are somehow the only consideration of whether a message is delivered. Even on a single user single process machine, delivery can't be totally guaranteed.'exactly once' delivery sounds like 'guaranteed perfect one time delivery under all circumstances', which is a dumb way to rephrase an idea whose literal translation is to deliver something one time and not more or less.

In other words, of course you can deliver something exactly one time. Just not every time.

Shameless plug:

I wrote about this issue with a lot of APIs some time ago. When a connection is broken unexpectedly, the protocol simply has no way to recover the state of play and there's a risk of duplicate transactions popping up.


But you can have it with probability 1...

Having spent 7 year of my life implementing an Exactly-Once-In-Order (EOIO) messaging system in SQL Server Service Broker[0] (SSB), I take somehow exception to the author claim.

Here is how SSB achieves EOIO:

- initiator establishes intent to communicate using BEGIN DIALOG[1] statement (SSB dialogs are the equivalent of a durable, long lived, TCP session). This creates the necessary state in the database by creating a row in sys.conversation_endpoints, with initial send_sequence_number 0.

- sender issues SEND[2] statement. The message is assigned the current send_sequence_number (0), the sys.conversation_endpoint send_sequence_number is incremented to 1, and the message is inserted in sys.transmission_queue

- after commit of SEND transaction a background transmitter reads the message from sys.transmission_queue, connects to destination, delivers the message over wire (TCP).

- target reconstructs message from wire, in a single transaction creates a row in it sys.conversation_endpoints (receive_sequence_number is 0) and delivers the message into the destination queue

- after commit of the message delivery, the target constructs an acknowledgement message and sends it back to initiator over the wire

- the sender gets the acknowledgement of message 0 and deletes the message from sys.transmission_queue

- sender may retry delivery periodically if it does not receive the ack

- target will send back an ack immediately on receipt of a duplicate (message sequence number is less than current receive_sequence_number)

What this protocol achieves is idempotency of delivery, hidden from the messaging application. Database WAL ensures stability in presence of crashes. Eg. if the target crashes in the middle of enqueueing the message into destination queue then the entire partial processing of the enqueue is rolled back on recovery and next retry from sender will succeed. If target crashes after processing is committed but before sending the ack then on recovery the enqueue is successful and the next retry from initiator will immediately send ack an ack, allowing initiator to delete the retried message and make progress (send the next message in sequence). Note that there is no two-phase-commit involved.

Retries of unacknowledged messages occur for the dialog lifetime, which can be days, months, even years. Databases are good at keeping state for so long. SSB uses logical names for destination (from 'foo' to 'bar') and routing can be reconfigured mid-flight (ie. the location where 'bar' is hosted can be changed transparently). Long lived state and routing allow for transparent reconfiguraiton of network topologies, handle downtime, manage disaster (target is lost and rebuild from backups). Most of the time this is transparent to the SSB application.

Furthermore, the guarantees can be extended to application semantics as well. Applications dequeue messages using RECEIVE[3] statement. In a single transaction the application would issue a RECEIVE to dequeue the next available message, lookup app state perteining the message, modify the state, send a response using SEND[2], commit. Again WAL guarantees consistency, after a crash everything is rolled back and the application would go again through exactly the same sequence (the response SEND cannot communicate anything on the wire until after commit, see above).

So EOIO is possible.

One has to understand the trade offs implied. Something like SSB will trade off latency for durability. Applications need not worry about retries, duplicates, missing messages, routing etc as long as they are capable of handling responses comming back hours (or maybe weeks) after the request was sent. And application processing of a message is idempotent (RECEIVE -> process -> crash -> rollback -> RECEIVE -> re-process) only as long as the processing is entirely database bound (update states in some app tables, not make REST calls). Yet such apps are not unusual: they use some database to store state and communicate with some other app that also uses a database to store state. Unlike most messaging systems, SSB stores the messages in the database thus achiving WAL consitency along with the app state. Many SSB applications are entirely contained in the database, the code itself is contained. They use SSB activation [4] to react to incoming message, without keeping any state in memory.

In SSB both the initiator and the sender are monolitic, SMP systems (not distributed). Together the two form a distributed system. It trades of availability over partitioning, but one has to understand how this trade off occurs. In case of parittioning (target is unreacheable) the application continues to be available locally (the SEND statement succeeds). If the netowrk paritioning is not resolved over the lifetime of the dialog, then the applicaiton will see an error. If the paritioning is resolved then message flows resumes and the applicaiton layer responses start showing up in the queue. Again, activation and durable state make this easy to handle, as long as a latency of potentially days makes sense in the business. Shorter lifetimes (hours, minutes) are certainly possible and in such cases, if network partitioning is not resolved in time, the timeout error will occur sooner.

  [0] https://msdn.microsoft.com/en-us/library/bb522893.aspx
  [1] https://msdn.microsoft.com/en-us/library/ms187377.aspx
  [2] https://msdn.microsoft.com/en-us/library/ms188407.aspx
  [3] https://msdn.microsoft.com/en-us/library/ms186963.aspx
  [4] https://technet.microsoft.com/en-us/library/ms171617.aspx

This topic seems to come up regularly, but I feel the discussions I see here are far too heavy to digest for the people who would benefit most from understanding the issues being presented (OP post is > 1,300 words).

I believe what it really comes down to is that people new to distributed processing think they want "exactly once delivery", but later they (hopefully) learn they really want "exactly once processing". For example:

> We send him a series of text messages with turn-by-turn directions, but one of the messages is delivered twice! Our friend isn’t too happy when he finds himself in the bad part of town.

This is easily resolved by amending each message with an ordinal indicator, e.g. "1. Head towards the McD", "2. Turn left at the traffic lights", etc. The receiver then dedupes and the instructions follow "exactly once processing". Processing messages exactly once is a "business rule" and the responsibility for doing so lies in the business logic of the application.

This example also brings up another typical point of confusion in building distributed systems: people actually want "ordered processing", not "ordered delivery". The physical receiving order does not matter: your friend will not attempt to follow instruction #2 without first following instruction #1. If instruction #2 is received first, your friend will wait for instruction #1.

It's also important to note that the desired processing order of the messages has nothing to do with the physical sending order: I could be receiving directions to two different places from two different people, and it doesn't matter what order they are sent or received, just that all the messages get to me! A great article covering these topics in more detail with other real world examples is "Nobody Needs Reliable Messaging" [1].

I think it is useful to try understand why new distributed system builders run into these difficulties. I suspect they try to apply local procedure call semantics to distributed processing (a fool's errand), and message queue middleware works well enough that a naive "fire and forget" approach is the first strategy they attempt. When they subsequently lose their first message (hopefully before going into production), it's natural to think in terms of patching (e.g. distributed transactions, confirmation protocols, etc.) rather than to consider if the overall design pattern is appropriate.

Oddly, there is at least one very well designed solution that addresses these challenges - the Data Distribution Service (DDS) [2] - but I almost never see or hear about it at any of my clients.

[1] http://www.infoq.com/articles/no-reliable-messaging [2] http://portals.omg.org/dds/

I don't want to challenge too many points of your post here, but I have a couple questions:

"This example also brings up another typical point of confusion in building distributed systems: people actually want "ordered processing", not "ordered delivery". The physical receiving order does not matter: your friend will not attempt to follow instruction #2 without first following instruction #1. If instruction #2 is received first, your friend will wait for instruction #1."

How do you handle that, practically speaking? In say, any system that operates at a scale where you might have thousands of messages per second. One concern I have with holding onto instruction 2 until instruction 1 arrives is, what if it never arrives? The system is blocked, isn't it?

The client is blocked but the producer is not. As soon as the client gets #1 it can process #1 and #2.

Here's a real world implementation discussion of this sort of stuff: https://github.com/couchbase/sync_gateway/issues/525

How big do you make the receiving queue on the client's side? How many instructions past #1 can it store while waiting for #1?

Yes, the consumer will be blocked for that particular transaction. The consumer can happily process other transactions in the meantime.

It's important to note is that no matter how you want to slice it, if instruction 1 never arrives, somehow you need a protocol to re-request it from upstream until you get it.

Whether that protocol lives in your middleware (e.g. an "at least once" message queue implementation that buffers messages and delivers them to your application in order) or in your consumer application logic (e.g. the application buffers messages and can send a request for a missing message over another channel), something is going to have to keep track of these things and buffer messages in the meantime.

This is why message queue middleware appears so convenient -- your application never has to see any of this complexity. I say appears because message queues can't do at least 3 things:

(a) It can't understand your application message processing order (which is NOT the same as your physical sending order): notice that how the message queue middleware and application determine if a message is missing are at fundamentally different levels of abstraction. The middleware can, at best, use e.g. sequence numbers or checkpoints to figure out a message is missing. An application has higher-order business knowledge about what the messages mean and so can determine if a message is missing much more intelligently. An application can also happily process other valid sequences of instructions without waiting for all preceding but possibly unrelated messages (e.g. it can accept an order for client A while still waiting for client B's full order to come through).

(b) Message queues DO drop messages so again without a higher-level protocol across the transaction sequence to manage this you will have a problem if you require all messages to be processed.

(c) You can't easily replay an event e.g. because your consumer stuffed up processing. It's true that many message queuing implementations can keep a duplicate journal and allow you to replay from there, but this is probably an opaque data store. You can probably inspect the message, but ultimately you will end up writing application-specific logic to figure out which message to replay, whether the copy of the message is retrieved from the message queue system or from upstream.

At that point, the end-to-end principle [1] applies and you start to wonder if all that intelligence in messages queues is a waste. I'm personally looking at moving over to using DDS for data distribution and RPC over DDS to do replay and reconciliation between publisher and consumer.

[1] http://en.wikipedia.org/wiki/End-to-end_principle

Life gets more pleasant when you internalize that consistency is not aligned with how the universe works and idempotence is your friend.

Could the blockchain be used as a send only once system?

I mean if you used the blockchain as a message queue it could be pretty reliable, no?

What makes you think it's reliable on a per-message basis now? It already has conflicts, and messages are regularly deemed-invalid and discarded.

Unless you really do mean "send only once", which, literally speaking, is pretty easy to do.

The blockchain isn't a good example when latency is an issue. Yes, it does guarantee that values cannot be double spent, but it does so at a very high time cost.

Would a system that prevents double-spending of a digital currency be equivalent with exactly-once delivery?

I think it would be equivalent to at-most-once delivery, unless it also guaranteed that you could always spend the currency once.

The best you can do is two-phase commit.

I would caution that a two-phase commit is not always the best you can do. A two-phase commit is blocking and in event of failure during processing it can cause a deadlock. A three-phase commit offers the same level of guarantee but without blocking.

Registration is open for Startup School 2019. Classes start July 22nd.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact