Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Related: you can get at most once delivery or at least once delivery; you cannot get exactly once delivery. If I had a dollar for every junior who thought that a lack of exactly once delivery guarantees was a bug...


If you can get at-least-once delivery, why can you not build exactly-once on top of that?

[UPDATE] Apparently I need to be more explicit about this. My question is: if I can get at-least-once delivery, why can I not build an abstraction layer on the receiving node that provides the illusion of exactly-once delivery? It seems like it should be a simple matter of keeping a log of received messages, and discarding duplicates.


The principal difference between 'at most once' and 'at least once' is whether a sender re-tries when it is unsure if the recipient has received the message. If the recipient's ack never makes it back, then a sender cannot know whether they actually received the message or not (the two-generals problem).

So this hypothetical middleman will receive a packet, check that it's not a duplicate, and forward it to the recipient it's proxying for. How will it know that the recipient has actually received it? If the receiver doesn't ack the message in some way, which causes your abstraction to re-transmit the message again, then it exhibits 'at least once' behavior. If it the abstraction only ever forwards the message along once and doesn't care whether the recipient acknowledged it or not, then it exhibits 'at most once' behavior.

As a more concise answer - 'exactly once' delivery is impossible because you can't know if the recipient actually got the message. If you assume a perfect communication channel, then I agree the problem is trivial, but I challenge you to find such a channel! Even on the same machine, interprocess communication can fail in all sorts of fun ways.


> So this hypothetical middleman will receive a packet, check that it's not a duplicate, and forward it to the recipient it's proxying for. How will it know that the recipient has actually received it?

It seems like the answer is in the first part, the "check that it's not a duplicate".

Implement at-least-once but with a unique token to identify the request, and the receiver sends back acknowledgement with that token every time it receives the original message, but only hands it off for processing the first time. Stuff this behind library/API so it's hidden to the user and the application code doesn't have to handle it, and... isn't that it?


Yes, but the handoff can fail in the same way (it can't know if the thing it's handing off to actually got it). But the application can also just be resilient to that with idempotent operations and have the handoff be at-least-once.


> the handoff can fail in the same way

Everything in a digital system can fail, but by convention (and because it's not so far from the actual truth) some parts are assumed to be 100% reliable when modeling them. If you don't make this assumption, you can't guarantee anything.


> So this hypothetical middleman will receive a packet, check that it's not a duplicate, and forward it to the recipient it's proxying for.

That's not how I would implement exactly-once on top of at-lest-once. I would do it at the recipient, not at the intermediate nodes.

> 'exactly once' delivery is impossible because you can't know if the recipient actually got the message

But the recipient can know.


But the recipient is not one atomic thing - we're assuming perfect communication between the process/driver/hardware receiving the packets and doing the duplicate detection and the process which wants to receive the message exactly once.

There's still communication happening there, and it can still fail. Buffers fill, processes pause for arbitrary delays which exceed timeouts, etc. Your assumptions based on your model are correct, but your model doesn't include that communication.

But all models have some level of detail they care about, and assuming the computer always works is a perfectly valid model for plenty of cases. It's just not all of them. You'll be able to create real-world cases where this abstraction is faced with a choice of whether to retry or not, and at that moment it will be unable to deliver exactly once.


> we're assuming perfect communication between the process/driver/hardware receiving the packets and doing the duplicate detection and the process which wants to receive the message exactly once

My claim is not that you can provide exactly-once delivery unconditionally. My claim is that if you can provide at-least-once delivery then you can turn that into exactly-once delivery. The word "delivery" is not rigorously defined, but IMO any reasonable definition necessarily entails a certain level of reliability at the receiving node.


I agree with your claim, a recipient can cope with at-least-once delivery by being idempotent. You're right.

The meaningful distinction is that something on the recipient needs to be idempotent because the message might get received twice. The application can be oblivious to this, so long as you assume that channel to be perfect.

People on the Internet won't like you calling it 'exactly once delivery' because it's not exactly once - it's an idempotent at-least-once. Which is great! But the statement of the at-least/at-most problem is making a decision to re-try. There's no middle ground, I either have to retry or not. People won't like a claim that 'exactly once' delivery is possible, because it isn't, it's just moving the at-least-once-ness to somewhere else.


> People on the Internet won't like you calling it 'exactly once delivery' because it's not exactly once

That depends entirely on what you mean by "it".

Messages can get lost. So if you want guaranteed delivery, you sometimes have to re-transmit the same message, and so you might end up with the same message being delivered more than once. But it is trivial to put an abstraction layer on top of that to discard the duplicates and make it appear to the application as if every message is received exactly once.

The whole thing is a tempest in a teapot.


The problem is that sometimes your application intends to send messages[0] multiple times. If your "log-and-discard" system was implemented naively then each node could only ever send each possible message once and only once. Ever.

That would be like:

Alice: "Honey, where did you put the keys?"

Bob: "They're up on the counter."

(The next day...)

Alice: "Honey, where did you put the keys?"

Bob: (nothing, I already received this message, it could have echoed off the walls from yesterday)

What you need is for all sent messages to have unique IDs that will never repeat, and then log those. That's known as an idempotency token.

But even then, logging all those UUIDs forever is probably not a good idea for disk usage. At some point you'll have to trash old message logs and hope you don't have a rogue network router retransmitting six month old messages or something.

[0] Or the moral equivalent of messages, e.g. HTTP POST requests


> What you need is for all sent messages to have unique IDs that will never repeat, and then log those. That's known as an idempotency token.

So your problem is not really a problem because you yourself present the solution. The real problem is:

> But even then, logging all those UUIDs forever

But you don't need to log them all forever. Just make the UUIDs sequential, and all you need then is to keep track of the smallest id that has not yet been received. (You can be more efficient by storing more state, but it's not necessary. Remember, we're assuming at-least-once deliver here, so you can always force retransmission by not acknowledging receipt.)


You can get exactly once processing, but not exactly once delivery.

https://bravenewgeek.com/you-cannot-have-exactly-once-delive...


That seems like a distinction without a difference to me. Why should I care if the thing I get exactly one of is called "processing" or "delivery"?


Because you need to understand that your processing code is constrained by the fact that you can't get exactly-once delivery. You must write your processing code to handle it one way or another. There's some libraries that try to wrap the abstraction of processing exactly once around the code, but those libraries still impose constraints on the sort of code you can write. They can make it easier but they can't fully remove the need to think about how your processing code works. It isn't exactly the same.

This is why people like me insist it's important to understand that you can not have exactly-once delivery. There is no library that can make that just go away, they can only shuffle around exactly where the lumps under the carpet live, and if one programs with the mistaken idea that these libraries really do solve "exactly once" delivery, one will get into deep trouble. Possibly the "my architecture is fundamentally broken and can't be rescued" sort of trouble.


> Because you need to understand that your processing code is constrained by the fact that you can't get exactly-once delivery.

Why do I need to understand that? Why can I not put an abstraction layer that provides me with the illusion of exactly-once delivery?

> There is no library that can make that just go away

Well, this is the thing that I dispute. I believe that there is a library I can write to make it go away if I have at-least-once delivery. In fact, I claim that writing such a library is an elementary exercise. The TCP protocol is an existence proof. Where is the flaw in my reasoning?


TCP is at least once, not exactly once.

Here's another useful article: https://blog.bulloak.io/post/20200917-the-impossibility-of-e...

TCP does not solve the two generals problem. TCP gets around this limitation by requiring only one ACK.

https://en.wikipedia.org/wiki/Two_Generals%27_Problem#Engine...

https://www.scaler.com/topics/computer-network/two-generals-...


OK, but you have to do a little extrapolating here because the claim is not that you can do exactly-once under all circumstances. That is obviously false because you can't do exactly-once in a situation where all comms are down indefinitely. My claim is that if I have at-least-once then I can build exactly-once out of that.


This isn't about indefinite communication loss. Obviously no progress is possible in that case. The two generals' problem has nothing to do with a permanent failure.

I think there is a lot of literature out there if you're really interested in understanding and I'm happy to provide more links if you'd like.


What makes you think the 2GP is relevant here? The 2GP has to do with coordination and consensus, not exactly-once delivery.


The link I posted above explains the connection: https://blog.bulloak.io/post/20200917-the-impossibility-of-e...

Particularly:

> The sender cannot know if a message was delivered since transport is unreliable; thus, one or more acknowledgement messages are required. Moreover, the sender cannot distinguish between message delivery errors, acknowledgement delivery errors, or delays (either in processing or because or network unreliability).

> The recipient is forced to send the acknowledgement only after the message is either processed (or persisted for processing) because an acknowledgement before processing would not work: if the recipient exhibits a fault before processing that would cause the loss of the message.

> In the case that that acknowledgement is lost, the sender can’t know (due to lack of insight in the recipient’s state) whether the recipient failed before scheduling the message for processing (in essence losing the message) or if the recipient is just running a bit slow, or if the acknowledgement message was lost. Now, if the sender decides to re-deliver, then the recipient may end up receiving the message twice if the acknowledgement was dropped (for example). On the other hand, if the sender decides to not re-deliver, then the recipient may end up not processing the message at all if the issue was that the message wasn’t scheduled for processing.


This proof is flawed:

"Let’s assume that a protocol exists which guarantees that a recipient receives a message from the sender once and only once. Such a protocol could then solve the two generals problem! Representing the time of the attack as the message, the first general (the sender) would only need to adhere to the protocol for the second general (recipient) to have received the attack time exactly one time. However, since we know that this is not possible, we also know that exactly once is not possible."

The 2GP is not just the first general knowing that the second general received the message. The 2GP is the problem of achieving common knowledge between the two generals, i.e. it's not just that G1 needs to know that G2 got the message, it is that G2 needs to know that G1 knows that G2 got the message, and G1 needs to know that G2 knows that G1 knows that G2 got the message, and so on.

Exactly-once delivery is possible. The only thing that is not possible is for the sender to know when the message has been received so that no duplicates are sent. But exactly-once delivery is not only possible, it's trivial. All you need to do is discard duplicates at the receiver.


> All you need to do is discard duplicates at the receiver.

...yes, which would be exactly once _processing_ but not exactly once _delivery_.

Unless you're wanting to redefine "exactly once delivery" to mean "at least once delivery but I'm calling it exactly once because I have a strategy to cope with duplicate messages"


> exactly once _processing_ but not exactly once _delivery_

What exactly is the difference? What counts as "delivery"? How do you do "delivery" (on a computer) without doing at least some "processing"?


Delivery is communication between two actors. Processing is what one actor (the receiver) does with a message.

Communication is when two actors exchange a message. Communication is generally done over an unreliable medium because in practice there is no way to communicate without the potential of failure.

1. Exactly once communication between two actors over an unreliable medium is impossible. At the very least you have to account for the possibility of failure of the medium, so you might need to re-send messages.

2. At least once communication between two actors is possible -- just re-send a message until the receive acknowledges the message.

3. Because a message might be re-sent, the receiver must be able to cope with duplicate messages. This is what you're describing. This might be done by making message processing idempotent or tracking which messages you've seen before. In either case, you have achieve exactly once processing. That is, if a receiver is given the same message multiple times, only the first receive of the messages changes the state of the receiver.

---

> But exactly-once delivery is not only possible, it's trivial.

Considering that many in the field consider this problem to be impossible (or, at best, extremely difficult, e.g. https://www.confluent.io/blog/exactly-once-semantics-are-pos...), this should be a huge red flag to yourself that you're missing something. Everyone has blind spots and that's okay, but hopefully you understand that there's a pretty big mismatch here.

Alternatively, it's possible that this problem _really_ is trivial and you have some unique insight which means there's a great opportunity for you to write a paper or blog post.


> hopefully you understand that there's a pretty big mismatch here

Yep. But on the other hand, 1) I have a Ph.D. in CS, and 2) I have yet to see anyone in this thread actually produce a reference to a reliable source to back up the assertion that exactly-one delivery is impossible. Indeed, the one reference you provided has a headline "Exactly-Once Semantics Are Possible" so you are actually supporting my position here.

Finally, I will point out something that should be obvious but apparently isn't: "exchanging a message" between computers over a network is a metaphor. There is not anything that is actually exchanged, no material transferred from one computer to another. There is only information sent in the form of electrical signals which results in state changes in the receiving system, so there is no clean boundary between "communication" and "what the receiver does with a message". Receiving a message in the context of a computer network is necessarily "doing something" with that message. There is no other way to "receive" a message.


TCP implementations are an abstraction that work 99.99% of the time, but are still vulnerable to two generals when you look close. TCP is implemented in the kernel with a buffer, the kernel responds with ACKs before an application reads the data.

There is no guarantee that the application reads from that buffer (e.g. the process could crash), so the client on the other end believes that the application has received the message even though it hasn't.

The kernel is handling at-least-once delivery with the network boundary and turning it into at-most-once with the process boundary.


> still vulnerable to two generals

What makes you think the 2GP is relevant here? The 2GP has to do with coordination and consensus, not exactly-once delivery.

> TCP is implemented in the kernel with a buffer, the kernel responds with ACKs before an application reads the data.

True. Why do you think that matters?

> There is no guarantee that the application reads from that buffer

So? What does that have to do with exactly-once delivery? Even if the application does read the data, there's no guarantee that it does anything with it afterwards.

> The kernel is handling at-least-once delivery with the network boundary and turning it into at-most-once with the process boundary.

OK, if that's how you're defining your terms, I agree that you cannot have exactly-once delivery. But that makes it a vacuous observation. You can always get at-most-once delivery by disconnecting the network entirely. That provides a 100% guarantee that no message will be delivered more than once (because no message will ever be delivered at all). But that doesn't seem very useful.


> Why can I not put an abstraction layer that provides me with the illusion of exactly-once delivery?

You can do that. You can implement a video conferencing system ontop of TCP, and it will even work, technically. It will just have terrible performance characteristics that you'll never be able to fix. You might even call it fundamentally broken.


OK, I don't dispute that, but that is a very different claim than "you can't get exactly-once delivery." You can (if you have at-least-once delivery).


Read the two general's problem. It is proven that exactly once _delivery_ is a physical and mathematical impossibility.

Like you said, you can simulate it with exactly once _processing_. But to do that, you have to know to do that.

Others are saying "you can't divide by zero" and you are saying "yeah, but if I detect a zero and the do something different, then it is the same thing." No, knowing you have to do something is the very point of acknowledging you can't divide by zero.

Because you can't have exactly once delivery, you have to deal with it. One trick is duplicate checks or idempotent writes. This gives exactly once processing. This also takes additional overhead and why audio and video stream processing doesn't typically do the additional checks.

I have fixed many bugs written by people who believe the network is reliable. I even hired one when during the interview and we talked about this kind of issue that they realized why they were getting duplicated writes reading from sqs or sns at $existing_job. They were green but smart and she became a real asset to the team.


What makes you thin the 2GP is relevant here? 2GP has to do with coordination and consensus, not exactly-once-delivery.

> you can simulate it with exactly once _processing_

What exactly do you think is the difference between "simulating" exactly-once delivery and actually having exactly-once delivery? What do you think "delivery" means in the context of a computer network?


I don't know enough to answer this. I'm sure there is plenty of writing on this subject from people more qualified than me.


Because 'exactly once' delivery is arguably a misnomer, you usually really want 'at least once delivery with acks and idempotent processing on the other side'.

The difference is subtle but important in practice and specification.


> you usually really want 'at least once delivery with acks and idempotent processing on the other side'.

Why? I'm pretty sure I really want (the illusion of) exactly-once delivery, and it seems to me that I can implement that pretty easily given at-least-once delivery. Why would I not want that?

> The difference is subtle but important

Why?


You can absolutely abstract 99% of it out.

But not 100%. At some point, a counter move on the delivery has to be stored... -somewhere-.

And sure you -can- make it very very close to EOD and for some subsets you can totally do EOD, but you are, realistically, better off with ALOD+Ack once it makes its way into a system. There's always that 'moving the counter' problem.

The upshot is, things tend to get faster, easier to code review, and simpler to test.

Pragmatically speaking, I've found devs are better able to handle ALOD+ACK than "Exactly once but because reality you might get a message that's doubled because you couldn't persist the ack".

And I'll note I'm possibly extra pedantic about this because I've had a month and a half of dealing with the fallout of people trusting low-code salesmen alongside gartner reports leading to a 'you people thought this was exactly once and it was not' sort of problem.


> But not 100%.

Why not?

> At some point, a counter move on the delivery has to be stored... -somewhere-.

What is a "counter move on the delivery"?


Read the two general's problem. It is a mathematical impossibility. The counter is the delivery counter. Delivery, delivery + 1


What makes you think the 2GP is relevant here? The 2GP has to do with coordination and consensus, not exactly-once delivery.


You are correct


> I'm pretty sure I really want (the illusion of) exactly-once delivery

Do you know what idempotency is? This is exactly what he described.

Idempotency is important to prevent unwanted behaviour for duplicate actions. If you have "exactly-once", and accidentally execute the action twice that could cause problems.


> Do you know what idempotency is? This is exactly what he described.

Is it though? It seems like a false equivalency, even if the outcome is approximately the same?


> Do you know what idempotency is?

Yes.

> This is exactly what he described.

So? Idempotency and an exactly-once delivery abstraction are not the same thing.


Agreed. Idempotent _processing_, not delivery.


Uh, what exactly do you think is "agreed" here? My claim is that idempotent processing can produce exactly-once delivery, and so the original claim that you "cannot have exactly-once delivery" is false.


Change your words from "delivery" to "processing" and you in alignment with reality.

Idempotent processing is exactly once processing. This is true. Delivery is from the sender's point of view, not the recipient. How the recipient processes the information, idempotently or not, is not the concern of the sender.

"Cannot have exactly-once delivery" is true and is as true as not dividing by zero. Read about the two general's problem. It is, quite literally, impossible to have exactly once delivery. Like, your friend is in the other room, and you shout "someone is at the door" - how do you _know_ your friend heard you? If you shout again, you are attempting another delivery. What do you do if your friend doesn't respond? In exactly once delivery, you guarantee they heard you in the other room. In exactly once processing, you can shout a couple of times until you hear them acknowledge.

You may think that this is not material and at the end of the day, as long as one thing is processed, then who cares? Well, you have to understand how delivery can fail otherwise you will handle the failure incorrectly. Is it safe to try again? If I say "transfer money from me to you", and I don't hear back, is it safe to again say "transfer money from me to you" again? Will I be double charged?


> Change your words from "delivery" to "processing" and you in alignment with reality.

You are not the first to say this, but so far no one has been able to explain what the difference between "delivery" and "processing" is. How do you do "delivery" (on a computer) without also doing (at least some) "processing"?

> Delivery is from the sender's point of view, not the recipient.

I don't see what difference that makes. In fact, I don't see why a "point of view" should enter into it at all. Whether a message has been "delivered" or not (whatever that actually turns out to mean) is (it seems to me) a property of the system, independent of anyone's point of view.


I gave two concrete examples. What about each of those examples is not landing?

One of shouting at your friend. You want to make sure your friend knows someone is at the door. Two: you tell your computer to transfer money and you don't want to be doubled charged.

More in depth: you click send money on a computer. The computer connects to another computer and sends data to it over an unreliable network. Computer A sends data over the network to B, just like you shouting to your friend you think is in the other room. Data can be lost / your friend might not hear you. Usually, the other computer says "acknowledged, I got your message" - and that is how know that B is moving money or your friend is getting the door. If A never hears back, should A try again? If B gets two requests to move $25, should that be deduplicated or was there two actual requests and $50 should be moved? To know how to solve that, you have to first admit that you might get 0, 1, or multiple messages delivered to B when A wants to send 1 message.

Read the two general's problem. It is the gateway to distributed computing.


> I gave two concrete examples.

Neither of which was on point because they both ignored how exactly-once delivery can be done.

> Read the two general's problem.

The 2GP is not on point because it's about achieving consensus, not exactly-once delivery. Achieving consensus is indeed impossible with unreliable messaging, but that has nothing to do with exactly-once delivery.


You cannot while maintaining the half-duplex behavior of the current system.


Why not? (Please see the update on my OP before answering that.)


The mechanism you're describing already exists. TCP has sequence numbers. It can drop duplicate data.

The difference between "processing" and "delivery" relates to "network capacity." Process handling wastes capacity in favor of latency. Delivery handling increases latency in favor of capacity.

Systems which have "exactly once" delivery typically do so with "send/receive" and "release/delete" message pairs. You need additional round trips to actually accomplish this at the "delivery" layer.


> The mechanism you're describing already exists.

Yes, I know, which makes it all the more bizarre that people are claiming that this is impossible.

> Systems which have "exactly once" delivery typically do so...

Ah, so exactly-once delivery is possible after all?


If your goal is simply to look smart then absorbing the subtlety of what is said to you before you reply should be top of list.


What subtlety do you think I have failed to absorb?


So fun fact, you actually can get exactly once delivery out of your network, but your network has to be not Ethernet/IP/TCP to do it. Every single one of those layers is mis-designed to allow you to get exactly-once delivery of messages (TCP doesn't even have a concept of messages).

Your network won't have "exactly once" message transfer happening on it (it will internally be "at least once" for certain packets, but only small ones) and administering it will be very different than administering an Ethernet network, but network protocols absolutely can be designed to give exactly-once delivery to your software.

The real reason most people outside of HPC don't do this is that exactly-once at the network layer is not that useful for most web stuff. You're going to have a higher layer that will drop stuff and retry anyway, so you might as well push the problem up the stack.


Yup, I try to explain it with shouting a message to someone in a crowded room. You can yell at your boss "I fixed the bug", they can confirm it or ignore you, which is delivery at most once if you don't repeat the message. If you try to repeat the message until they confirm it, it is at least once delivery.

edit: Point is in confirming that message is received. If you don't receive the confirmation the message was delivered at most once.


This is a popular saying that is basically wrong.

You have very limited guarantees around an arbitrarily bad partition, but this is also a detectable condition. Lots of defective systems exist, but in general non-defective systems generally guarantee "exactly once delivery or detected failure"


That sounds like "at most once" to me


If you unplug the network you can't send messages, correct.

Other people upthread have already gone over how you can't separate delivery from message processing and how TCP's attempt to do so makes it defective (unless you layer a whole additional system on top of it rendering most of TCP's design irrelevant)

If you were trying to make a new, non-broken system on top of TCP or otherwise, allowing multiple delivery doesn't add any correctness/robustness benefits -- it just makes messages cheaper to send and receive. There is no "at most once or at least once" choice except in the pickwickian sense that if you don't require delivery or delivery confirmation you can save the effort of even trying.


Let me get some popcorn before reading these comments.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: