Hacker News new | past | comments | ask | show | jobs | submit login

"Exactly once" model of message is theoretically impossible to do in distributed environment with nonzero possibility of failure. If you haven't received acknowledgement from the other side of communication in the specified amount of time you can only do one of two things:

1) do nothing, risking message loss

2) retransmit, risking duplication

But of course that's only from messaging system point of view. Deduplication at receiver end can help reduce problem, but itself can fail (there is no foolproof way of implementing that pseudocode's "has_seen(message.id)" method)

I agree. I wish more messaging platforms would recognize this and stop trying to paper-over the failure mode (Kafka just came out with "Exactly Once," which I think is a variant of the 2-Phase-Commit protocol, which still does not solve the problem).

My go-to for explaining to people is the Two Generals Problem https://en.wikipedia.org/wiki/Two_Generals%27_Problem

After watching the video from the Kafka Summit talk https://www.confluent.io/kafka-summit-nyc17/introducing-exac...

it's an essentially-useless guarantee for any sort of Kafka consumer that interacts with an external system.

Exactly-Once is not something that can be provided-for or delegated-to an arbitrary, non-integrated system. For an "integrated" system, it requires idempotent behavior, in which case it's really At-Least-Once, so...

I think "exactly once" doesn't imply "non-reprocessing" (sorry for the double negative).

Meaning, you want "exactly once" and you don't want duplicates, yes. But you allow for reprocessing, provided that you have a way for deduplicating.

You want a guarantee that if the producer (at the top of your data processing pipeline) sends a message, then this message eventually corresponds to exactly 1 record in your final storage(s).

One easy-to-understand-yet-simplistic example is: send a message to kafka, use topic+partition+offset as primary key, store in a RDBMS. This is widely accepted as "exactly once", but clearly you may have multiple attempts to save the message into the db, which will fail due to the primary key integrity constrain.

So basically what's called "at least once with deduplicating". The parent comment addresses that.

> there is no foolproof way of implementing that pseudocode's "has_seen(message.id)" method

Wait why? Just because you'd have to store the list of seen messages theoretically indefinitely?

There's also a race condition in there when you receive the duplicate before publish_and_commit is done doing its thing - assuming they're not actually serializing all messages through a single thread like the pseudocode implies.

What they've done is shift the point of failure from something less reliable (client's network) to something more reliable (their rocksdb approach) - reducing duplicates but not guaranteeing exactly once processing.

its not so much that they are serializing all messages through a single thread, but that they are consistently routing messages (duplicates and all) into separate shards that are processed by a single thread.

sure, if you assume that everything can fail, then it doesnt help to store the list of messages you've seen

but if you can persist a monotonic sequence number, thats gets you pretty far. we use tcp all the time even though its has no magic answer to distributed consensus (and uses a super-weak checksum). 2pc doesnt guarantee progress and/or consistency either and its pretty effective.

> nonzero

If you send the message with a hash of the previous state on the server, (like with proof-of-work in Bitcoin), since it is so unlikely that it will hash will be the same with and without the message appended, it doesn't really matter if it is strictly nonzero, if it is just small enough.

The nonzero part is about the possibility of network / endpoint failure. No hash is going to prevent the cleaning lady from 'freeing up' the power socket for her vacuum.

You could resubmit the transaction until you get an ack. You wouldn't submit twice since the hash of the state with the added item is very likely to differ.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact