So everyone is talking about the same thing here, data can get delivered twice, and the application must handle it in an idempotent way. For TCP this happens to be looking at some counter and throwing away anything that's already been seen. With a single client/server connection maintaining sequence numbers like TCP does solve the problem but it's harder to use this same technique in multi-client, multi-server, "stateless" requests that are common in the web world.
EDIT: To clarify, what I mean by TCP "does deliver data more than once", is that one side of the connection can send the same data twice. It's true that it's then discarded by the other end but this is what people are talking about when they talk about the theoretical impossibility of never sending anything twice. The rest is basically the idem-potency thing of ensuring that data received twice somehow doesn't cause anything abnormal.
I've been living in this problem space for many years now and seen the wheel reinvented many times. Whenever the plumbing does not guarantees EOIO but the business demands it, it gets pushed into the app layer where TCP (retries and acks) is reimplemented, to various success levels.
Isn't this the same app layer stuff that has to get reimplemented? I can see how this is often pushed back to a human (oops, my money transfer didn't go through, I'll have to redo it) but it's still something that has to be dealt with somewhere.
Database programmers have the means to deal with it off-the-shelf: BEGIN TRANSACTION ... COMMIT. When your queues are in the database, this becomes trivial. Even without the system I'm talking about (Service Broker) that has the queues stored in the database, most regular messaging systems do support enrolling into a distributed transaction and achieve an atomic dequeue/process sequence, is just that many apps/deployments don't bother to do it because the ops overhead (XA coordinator), reduced throughput and/or simply not understanding the consequences.
Point is that durable, persisted, transacted 'sockets' are behaving very differently from a TCP socket. Is a whole lot harder to simply lose a message in the app layer when interacting with a database.
The transaction boundary defined in your consumer covers the interactions of that consumer with other XA aware nodes who all participate in a "distributed transaction". So you can process this message N times without committing, and thus possibly N times telling other systems to do M' side-effect of that message, but until you commit the world has not changed.
1. Receive from server, commit state = NEW, UNACK
2. Send ACK to server, get confirmation from server, commit state = NEW, ACK
3. Start processing, commit state = PROC
4. Got results, commit state = FIN, UNACK
5. Send FIN to server, commit state = FIN, ACK
Each commit is a database transaction where you write the results of that step along with the new state. If anything fails along the way the work-in-progress is discarded along with the state change. The server has an equivalent so if it gets a duplicate ACK for the same (or earlier) state it can ignore it.
In this example, if the client crashes between 1-2, in #2 never gets confirmation, or crashes trying to commit the "NEW, ACK" state then it will retry. The server has already committed the fact that it sent the value to the client and is awaiting an ACK. If it saw the ACK and gets a duplicate it ignores it. If it never saw the first ACK then it will see the second(+) attempt and commit that it saw the ACK before sending confirmation to the client.
It's true that this doesn't work if your processing touches external systems or otherwise escapes the transaction context, but in those cases you do still get at-least-once delivery (or at-most-once, if you choose to commit the receipt before processing the message).
It really is a powerful technology and when leveraged can absolutely reduce the level of effort and cognitive burden to building correct asynchronous systems.
Interesting! You're saying the exact opposite of what Tyler Treat is saying:
> Even with smart middleware, problems still leak out and you have to handle them at the edge—you’re now being taxed twice. This is essentially the end-to-end argument. Push responsibility to the edges, smart endpoints, dumb pipes, etc. It’s the idea that if you need business-level guarantees, build them into the business layer because the infrastructure doesn’t care about them.
What do you think of his argument?