I believe what it really comes down to is that people new to distributed processing think they want "exactly once delivery", but later they (hopefully) learn they really want "exactly once processing". For example:
> We send him a series of text messages with turn-by-turn directions, but one of the messages is delivered twice! Our friend isn’t too happy when he finds himself in the bad part of town.
This is easily resolved by amending each message with an ordinal indicator, e.g. "1. Head towards the McD", "2. Turn left at the traffic lights", etc. The receiver then dedupes and the instructions follow "exactly once processing". Processing messages exactly once is a "business rule" and the responsibility for doing so lies in the business logic of the application.
This example also brings up another typical point of confusion in building distributed systems: people actually want "ordered processing", not "ordered delivery". The physical receiving order does not matter: your friend will not attempt to follow instruction #2 without first following instruction #1. If instruction #2 is received first, your friend will wait for instruction #1.
It's also important to note that the desired processing order of the messages has nothing to do with the physical sending order: I could be receiving directions to two different places from two different people, and it doesn't matter what order they are sent or received, just that all the messages get to me! A great article covering these topics in more detail with other real world examples is "Nobody Needs Reliable Messaging" .
I think it is useful to try understand why new distributed system builders run into these difficulties. I suspect they try to apply local procedure call semantics to distributed processing (a fool's errand), and message queue middleware works well enough that a naive "fire and forget" approach is the first strategy they attempt. When they subsequently lose their first message (hopefully before going into production), it's natural to think in terms of patching (e.g. distributed transactions, confirmation protocols, etc.) rather than to consider if the overall design pattern is appropriate.
Oddly, there is at least one very well designed solution that addresses these challenges - the Data Distribution Service (DDS)  - but I almost never see or hear about it at any of my clients.
"This example also brings up another typical point of confusion in building distributed systems: people actually want "ordered processing", not "ordered delivery". The physical receiving order does not matter: your friend will not attempt to follow instruction #2 without first following instruction #1. If instruction #2 is received first, your friend will wait for instruction #1."
How do you handle that, practically speaking? In say, any system that operates at a scale where you might have thousands of messages per second. One concern I have with holding onto instruction 2 until instruction 1 arrives is, what if it never arrives? The system is blocked, isn't it?
Here's a real world implementation discussion of this sort of stuff: https://github.com/couchbase/sync_gateway/issues/525
It's important to note is that no matter how you want to slice it, if instruction 1 never arrives, somehow you need a protocol to re-request it from upstream until you get it.
Whether that protocol lives in your middleware (e.g. an "at least once" message queue implementation that buffers messages and delivers them to your application in order) or in your consumer application logic (e.g. the application buffers messages and can send a request for a missing message over another channel), something is going to have to keep track of these things and buffer messages in the meantime.
This is why message queue middleware appears so convenient -- your application never has to see any of this complexity. I say appears because message queues can't do at least 3 things:
(a) It can't understand your application message processing order (which is NOT the same as your physical sending order): notice that how the message queue middleware and application determine if a message is missing are at fundamentally different levels of abstraction. The middleware can, at best, use e.g. sequence numbers or checkpoints to figure out a message is missing. An application has higher-order business knowledge about what the messages mean and so can determine if a message is missing much more intelligently. An application can also happily process other valid sequences of instructions without waiting for all preceding but possibly unrelated messages (e.g. it can accept an order for client A while still waiting for client B's full order to come through).
(b) Message queues DO drop messages so again without a higher-level protocol across the transaction sequence to manage this you will have a problem if you require all messages to be processed.
(c) You can't easily replay an event e.g. because your consumer stuffed up processing. It's true that many message queuing implementations can keep a duplicate journal and allow you to replay from there, but this is probably an opaque data store. You can probably inspect the message, but ultimately you will end up writing application-specific logic to figure out which message to replay, whether the copy of the message is retrieved from the message queue system or from upstream.
At that point, the end-to-end principle  applies and you start to wonder if all that intelligence in messages queues is a waste. I'm personally looking at moving over to using DDS for data distribution and RPC over DDS to do replay and reconciliation between publisher and consumer.