"People often bend the meaning of “delivery” in order to make their system fit the semantics of exactly-once, or in other cases, the term is overloaded to mean something entirely different. State-machine replication is a good example of this. Atomic broadcast protocols ensure messages are delivered reliably and in order. The truth is, we can’t deliver messages reliably and in order in the face of network partitions and crashes without a high degree of coordination. This coordination, of course, comes at a cost (latency and availability), while still relying on at-least-once semantics."
This reminds me of the problem of reliable data transfer over an unreliable network. It's theoretically impossible, but TCP is still a practical, useful abstraction.
Is this intentional?
DD: I myself shared many of these misconceptions, so I try not to demean or dismiss
EE: but rather educate and enlighten, hopefully while sounding less preachy than that just did.
FF: I continue to learn only by following in the footsteps of others.
MMM: I myself shared many of these misconceptions
(to where I mentally included deMean and disMiss as Ms as well)
When communication is reestablished, that issue can be sorted out. See "two-phase commit".
Such as, e.g. a remote mail server if the operation was to send mail to take an obvious one.
In other words: The server does not need to go permanently offline - it may be sufficient for the established session to terminate.
Which is it?
An easy way to think about solving the exactly-once problem, on top of an at-least-once message layer, is to use the at-least-once layer to write idempotently (eg, HTTP PUT) into a sequence of numbered messages.
There is no possibility that the receiver will execute message 18 before message 17. Each PUT retries until an acknowledgment is received, and the sender maintains a sliding window of sequence numbers it's trying to write.
All the Two Generals problem tells you, in this situation, is that the sender does not know whether the receiver got the last message. It's an important issue and I agree with the general sentiment of the OP, but perhaps it can be overstated.
* Perform the action, then acknowledge the message.
* Acknowledge the message and then perform the action.
(assuming delivery of messages is reliable)
In either case, if they crash after the first part, it will either be carried out twice, or not at all. When/if communication is established again, you need to ensure that the node is A. able to gurantee that it knows if an action was carried out or not, B. able to undo an action if the coordinator decided to send it elsewhere.
This is not always possible, and you cannot have a message system that gurantee exactly-once, without the recieving system supporting it.
That you get Pizza at all is a freakin' miracle!
"Millions of deliveries" with a "five or six nines" 99.999% or 99.9999% success rate would almost certainly be considered spectacularly successful by Domino's. Wikipedia says $1.8billion in revenue, 220,000 employees, and 10,000 stores - some outrageous assumptions might be made. Lets say ~$500 per day per store is, what, a 15% of revenue franchise fee as a guess? So a store on average makes $3-4k/day? So maybe 3 to 4 hundred pizzas a day, right? Five nines would be one error per store per year. Six nines would be one error per store per _decade_. I bet their real-life error rate is _way_ worse than that.
But, across 10000 stores and 220,000 employees, a billion or so pizzas a year isn't really a very big number.
Now consider Google, or Facebook, or Amazon. How many emails/adviews/status-updates/likes/widget-orders do you suppose happen every hour? Would it be as low as 1 billion per hour? And would a spectacular seeming "six nines" of reliability, which means dropping a thousand mails/adviews/updates/orders _per hour_ be acceptable?
"failures from time to time" in some contexts, can mean "fundamentally and irrevocably broken" in other contexts...
On the other hand, after few years on various logistics-related projects, I would not be too surprised if there is some postal service or carrier that is actually able to somehow physically duplicate one shipment and deliver it twice :)
More than once delivery in real-life :)
Because it restates that, without timeouts, in theory, you could be waiting forever for a message to arrive. It's formally proved and whatnot, but it's not a super-surprising result when you restate it in common terms.
For message delivery, it seems that FLP says that you need to give up at-least-once, because at-least-once ostensibly requires an infinite number of retries. I'm not sure how it would apply to at-most-once.
Exactly-once is, so far as I can tell, a bit of a strawman. But it's a strawman that we all fall for, the first time we break stuff into different systems.
As a disclaimer, I am still wrapping my head around this stuff. Don't rely on me. Not even if your name is Salvatore.
No, I don't think timeouts help. What does timeout mean here? You've sent a message and didn't get a response. If you timeout, did they receive the message or not?
I believe the parent is thinking of the fact that FLP applies only in an "asynchronous" system in which messages may take infinite time to arrive, and without regard to sending order.
Recursion is hard.
In other words, of course you can deliver something exactly one time. Just not every time.
I wrote about this issue with a lot of APIs some time ago. When a connection is broken unexpectedly, the protocol simply has no way to recover the state of play and there's a risk of duplicate transactions popping up.
Here is how SSB achieves EOIO:
- initiator establishes intent to communicate using BEGIN DIALOG statement (SSB dialogs are the equivalent of a durable, long lived, TCP session). This creates the necessary state in the database by creating a row in sys.conversation_endpoints, with initial send_sequence_number 0.
- sender issues SEND statement. The message is assigned the current send_sequence_number (0), the sys.conversation_endpoint send_sequence_number is incremented to 1, and the message is inserted in sys.transmission_queue
- after commit of SEND transaction a background transmitter reads the message from sys.transmission_queue, connects to destination, delivers the message over wire (TCP).
- target reconstructs message from wire, in a single transaction creates a row in it sys.conversation_endpoints (receive_sequence_number is 0) and delivers the message into the destination queue
- after commit of the message delivery, the target constructs an acknowledgement message and sends it back to initiator over the wire
- the sender gets the acknowledgement of message 0 and deletes the message from sys.transmission_queue
- sender may retry delivery periodically if it does not receive the ack
- target will send back an ack immediately on receipt of a duplicate (message sequence number is less than current receive_sequence_number)
What this protocol achieves is idempotency of delivery, hidden from the messaging application. Database WAL ensures stability in presence of crashes. Eg. if the target crashes in the middle of enqueueing the message into destination queue then the entire partial processing of the enqueue is rolled back on recovery and next retry from sender will succeed. If target crashes after processing is committed but before sending the ack then on recovery the enqueue is successful and the next retry from initiator will immediately send ack an ack, allowing initiator to delete the retried message and make progress (send the next message in sequence). Note that there is no two-phase-commit involved.
Retries of unacknowledged messages occur for the dialog lifetime, which can be days, months, even years. Databases are good at keeping state for so long. SSB uses logical names for destination (from 'foo' to 'bar') and routing can be reconfigured mid-flight (ie. the location where 'bar' is hosted can be changed transparently). Long lived state and routing allow for transparent reconfiguraiton of network topologies, handle downtime, manage disaster (target is lost and rebuild from backups). Most of the time this is transparent to the SSB application.
Furthermore, the guarantees can be extended to application semantics as well. Applications dequeue messages using RECEIVE statement. In a single transaction the application would issue a RECEIVE to dequeue the next available message, lookup app state perteining the message, modify the state, send a response using SEND, commit. Again WAL guarantees consistency, after a crash everything is rolled back and the application would go again through exactly the same sequence (the response SEND cannot communicate anything on the wire until after commit, see above).
So EOIO is possible.
One has to understand the trade offs implied. Something like SSB will trade off latency for durability. Applications need not worry about retries, duplicates, missing messages, routing etc as long as they are capable of handling responses comming back hours (or maybe weeks) after the request was sent. And application processing of a message is idempotent (RECEIVE -> process -> crash -> rollback -> RECEIVE -> re-process) only as long as the processing is entirely database bound (update states in some app tables, not make REST calls). Yet such apps are not unusual: they use some database to store state and communicate with some other app that also uses a database to store state. Unlike most messaging systems, SSB stores the messages in the database thus achiving WAL consitency along with the app state. Many SSB applications are entirely contained in the database, the code itself is contained. They use SSB activation  to react to incoming message, without keeping any state in memory.
In SSB both the initiator and the sender are monolitic, SMP systems (not distributed). Together the two form a distributed system. It trades of availability over partitioning, but one has to understand how this trade off occurs. In case of parittioning (target is unreacheable) the application continues to be available locally (the SEND statement succeeds). If the netowrk paritioning is not resolved over the lifetime of the dialog, then the applicaiton will see an error. If the paritioning is resolved then message flows resumes and the applicaiton layer responses start showing up in the queue. Again, activation and durable state make this easy to handle, as long as a latency of potentially days makes sense in the business. Shorter lifetimes (hours, minutes) are certainly possible and in such cases, if network partitioning is not resolved in time, the timeout error will occur sooner.
I believe what it really comes down to is that people new to distributed processing think they want "exactly once delivery", but later they (hopefully) learn they really want "exactly once processing". For example:
> We send him a series of text messages with turn-by-turn directions, but one of the messages is delivered twice! Our friend isn’t too happy when he finds himself in the bad part of town.
This is easily resolved by amending each message with an ordinal indicator, e.g. "1. Head towards the McD", "2. Turn left at the traffic lights", etc. The receiver then dedupes and the instructions follow "exactly once processing". Processing messages exactly once is a "business rule" and the responsibility for doing so lies in the business logic of the application.
This example also brings up another typical point of confusion in building distributed systems: people actually want "ordered processing", not "ordered delivery". The physical receiving order does not matter: your friend will not attempt to follow instruction #2 without first following instruction #1. If instruction #2 is received first, your friend will wait for instruction #1.
It's also important to note that the desired processing order of the messages has nothing to do with the physical sending order: I could be receiving directions to two different places from two different people, and it doesn't matter what order they are sent or received, just that all the messages get to me! A great article covering these topics in more detail with other real world examples is "Nobody Needs Reliable Messaging" .
I think it is useful to try understand why new distributed system builders run into these difficulties. I suspect they try to apply local procedure call semantics to distributed processing (a fool's errand), and message queue middleware works well enough that a naive "fire and forget" approach is the first strategy they attempt. When they subsequently lose their first message (hopefully before going into production), it's natural to think in terms of patching (e.g. distributed transactions, confirmation protocols, etc.) rather than to consider if the overall design pattern is appropriate.
Oddly, there is at least one very well designed solution that addresses these challenges - the Data Distribution Service (DDS)  - but I almost never see or hear about it at any of my clients.
"This example also brings up another typical point of confusion in building distributed systems: people actually want "ordered processing", not "ordered delivery". The physical receiving order does not matter: your friend will not attempt to follow instruction #2 without first following instruction #1. If instruction #2 is received first, your friend will wait for instruction #1."
How do you handle that, practically speaking? In say, any system that operates at a scale where you might have thousands of messages per second. One concern I have with holding onto instruction 2 until instruction 1 arrives is, what if it never arrives? The system is blocked, isn't it?
Here's a real world implementation discussion of this sort of stuff: https://github.com/couchbase/sync_gateway/issues/525
It's important to note is that no matter how you want to slice it, if instruction 1 never arrives, somehow you need a protocol to re-request it from upstream until you get it.
Whether that protocol lives in your middleware (e.g. an "at least once" message queue implementation that buffers messages and delivers them to your application in order) or in your consumer application logic (e.g. the application buffers messages and can send a request for a missing message over another channel), something is going to have to keep track of these things and buffer messages in the meantime.
This is why message queue middleware appears so convenient -- your application never has to see any of this complexity. I say appears because message queues can't do at least 3 things:
(a) It can't understand your application message processing order (which is NOT the same as your physical sending order): notice that how the message queue middleware and application determine if a message is missing are at fundamentally different levels of abstraction. The middleware can, at best, use e.g. sequence numbers or checkpoints to figure out a message is missing. An application has higher-order business knowledge about what the messages mean and so can determine if a message is missing much more intelligently. An application can also happily process other valid sequences of instructions without waiting for all preceding but possibly unrelated messages (e.g. it can accept an order for client A while still waiting for client B's full order to come through).
(b) Message queues DO drop messages so again without a higher-level protocol across the transaction sequence to manage this you will have a problem if you require all messages to be processed.
(c) You can't easily replay an event e.g. because your consumer stuffed up processing. It's true that many message queuing implementations can keep a duplicate journal and allow you to replay from there, but this is probably an opaque data store. You can probably inspect the message, but ultimately you will end up writing application-specific logic to figure out which message to replay, whether the copy of the message is retrieved from the message queue system or from upstream.
At that point, the end-to-end principle  applies and you start to wonder if all that intelligence in messages queues is a waste. I'm personally looking at moving over to using DDS for data distribution and RPC over DDS to do replay and reconciliation between publisher and consumer.
I mean if you used the blockchain as a message queue it could be pretty reliable, no?
Unless you really do mean "send only once", which, literally speaking, is pretty easy to do.