1) do nothing, risking message loss
2) retransmit, risking duplication
But of course that's only from messaging system point of view. Deduplication at receiver end can help reduce problem, but itself can fail (there is no foolproof way of implementing that pseudocode's "has_seen(message.id)" method)
My go-to for explaining to people is the Two Generals Problem https://en.wikipedia.org/wiki/Two_Generals%27_Problem
it's an essentially-useless guarantee for any sort of Kafka consumer that interacts with an external system.
Exactly-Once is not something that can be provided-for or delegated-to an arbitrary, non-integrated system. For an "integrated" system, it requires idempotent behavior, in which case it's really At-Least-Once, so...
Meaning, you want "exactly once" and you don't want duplicates, yes. But you allow for reprocessing, provided that you have a way for deduplicating.
You want a guarantee that if the producer (at the top of your data processing pipeline) sends a message, then this message eventually corresponds to exactly 1 record in your final storage(s).
One easy-to-understand-yet-simplistic example is: send a message to kafka, use topic+partition+offset as primary key, store in a RDBMS. This is widely accepted as "exactly once", but clearly you may have multiple attempts to save the message into the db, which will fail due to the primary key integrity constrain.
Wait why? Just because you'd have to store the list of seen messages theoretically indefinitely?
What they've done is shift the point of failure from something less reliable (client's network) to something more reliable (their rocksdb approach) - reducing duplicates but not guaranteeing exactly once processing.
but if you can persist a monotonic sequence number, thats gets you pretty far. we use tcp all the time even though its has no magic answer to distributed consensus (and uses a super-weak checksum). 2pc doesnt guarantee progress and/or consistency either and its pretty effective.
If you send the message with a hash of the previous state on the server, (like with proof-of-work in Bitcoin), since it is so unlikely that it will hash will be the same with and without the message appended, it doesn't really matter if it is strictly nonzero, if it is just small enough.