
High-performance, exactly-once, failure-oblivious distributed programming (2018) - cmeiklejohn
http://christophermeiklejohn.com/pl/2018/12/15/ambrosia.html
======
farazbabar
Exactly once processing is not possible in distributed systems. Anyone that
tries to sell that snake oil is dishonest and anyone who buys it should not be
making purchasing decisions. The definition and requirement of idempotent
processing means systems must be able to handle messages delivered more than
once, which irrefutably proves there is no such thing as exactly once.

Even within centralized, monolithic systems with no outside interaction, laws
of physics and reality still apply - a pull on the cord, an earthquake, a
flooding or a myriad of other things may interrupt message processing
resulting in either exactly zero or more times a message will be processed
even if the message were being processed within the confines of an embedded
micro-controller using hand crafted assembler code.

~~~
zzzcpan
There is no literal exactly once of course, because it's not physically
possible, but "exactly-once" semantics are possible in distributed systems.
Data can be resynchronized, processes can be restarted with the side effects
removed, etc.

~~~
lostcolony
"Data can be resynchronized" \- yeah, that means you send it again. Not
exactly once.

"Exactly once semantics" is semantics. It's at least once with idempotency,
which may or may not be able to be guaranteed on the part of the system
depending on actual implementation details which the marketing fluff will
invariably leave out if they're saying "exactly once". And that's a major
problem when relying on such 'semantics'.

~~~
naasking
> "Data can be resynchronized" \- yeah, that means you send it again. Not
> exactly once.

"Send it again" doesn't mean "processed again". Isn't exactly-once built on
at-least-once just binding the result to a future? Then any subsequent
accesses or attempts to update will simply return the bound result, which was
processed exactly once.

~~~
lostcolony
Which is what you -do- to handle 'at least once' delivery. The system can try
and hide that complexity from you,but there are still tradeoffs in any
implementation. How long in between does that guarantee hold? Does it
guarantee that if you fire the same message a year from now it will still
recall that it's a dupe (i.e., persist all message identifiers for an infinite
amount of time)? Probably not. Does that matter to you? Maybe!

Even if it does persist, does it persist the identifier on receipt of the
message, or on sending it to you for processing? If the former you run the
risk of crash and never having handled it; you really have no guarantee of
delivery to your processor. If the latter, what happens if the receiver
crashes before it hands it off to be processed? If simultaneous, is
'processing' atomic? Probably not; what happens if you crash midway through
processing the thing? Etc.

That's my point; you need more details to make the system robust. You don't
get "exactly once delivery" out of the box; you get a system that attempts it
by deduplicating, but there be gremlins, and the fact you're not saying "it's
at least once delivery with (details)" means I'm not hearing a technical
pitch, but a marketing one.

~~~
naasking
Futures have well-defined semantics as logic variables. The only question of
actual interest that you raise is the lifetime. This is dictated either by the
system or the dependent objects, although obviously "unbounded lifetime"
handles all possible cases. So lifetime is not "undefined" but "contextual".

------
cfontes
Kafka exactly-once semantics addresses the main issue of the article I think.

It's now relatively simple for a developer to implement a system with exactly
once guarantee as long as you take care of the world that is not inside a
Kafka transaction (integrations with third parties and such), which is still
not super easy sometimes, but less so then the distributed transaction that
will happen inside Kafka.

Kafka hides the complexity really well from my use of it so far is very
reliable with the "new" semantics.

~~~
pdpi
I'm getting mighty tired of the "exactly-once" thing.

Everybody and their uncles seem to have picked up on this trend of advertising
at-least-once systems as exactly-once, then burying somewhere in the docs that
you're expected to guarantee idempotency yourself to get the appearance of
exactly-once. That was the state of the art decades years ago, it's the state
of the art now, and it's pretty damn dishonest to sell quality-of-life
improvements as a fundamental shift in the guarantees/properties of these
systems.

~~~
dualogy
> at-least-once systems with idempotency to get the appearance of exactly-once

What baffles me even more is why the above is apparently not generally
considered good-enough, elegant-enough --- and as a bonus, not violating the
laws of physics either? Both sides of the coin are quite tameable and
implementable. And together deliver what was wanted in the first place, and
effectively. Curious in any subtle edge-cases I might have missed here!

~~~
pdpi
The problem is that there is an audience for whom exactly-once sounds like it
makes their non-specialist lives much simpler compared to an at-least-once
system because they can offload the necessary distributed systems expertise to
somebody else.

People insist on this messaging precisely because that crowd is somewhat
vulnerable to this sort of shenanigans.

------
hosh
I've been recently reading the papers coming out of the Berkley Disroderly Lab
-- Bloom(L) languages, lattices, composing eventually-consistent,
coordination-free systems. It's interesting to read this article with that
lens. There are some properties that are similar, but this one looks like it
is designed to let people continue programming the way they are at the cost of
increased coordination with other systems.

The idea of a replayable log seems to be able to convert a disordered sequence
of events into something that is ordered. Whereas, the Bloom(L) stuff
constructs algorithms that only requires partial order. An event stream can be
disordered because the functions being used are monotonic, and the
compositions of the data structure uses operators that are commutative,
associative, and idempotent. (Thus, there is no requirement for exactly-once
guarantee, or an ordered event stream).

------
kerblang
> Many cloud service designs today rely on durable queues, such as Event Hub
> or Kafka.

AFAIK nowhere in the Kafka documentation does it use the term "queue", and
unless you only have one consumer per consumer group it's impossible to
guarantee FIFO behavior. Maybe call me a nitpicker but I've seen this "queue"
language lead to completely wrong assumptions about kafka.

~~~
frankmcsherry
I googled "kafka documention", went to
[https://kafka.apache.org/documentation/](https://kafka.apache.org/documentation/),
searched for "queue" and found 29 matches.

