There's also a race condition in there when you receive the duplicate before publish_and_commit is done doing its thing -
assuming they're not actually serializing all messages through a single thread like the pseudocode implies.
What they've done is shift the point of failure from something less reliable (client's network) to something more reliable (their rocksdb approach) - reducing duplicates but not guaranteeing exactly once processing.
its not so much that they are serializing all messages through a single thread, but that they are consistently routing messages (duplicates and all) into separate shards that are processed by a single thread.
What they've done is shift the point of failure from something less reliable (client's network) to something more reliable (their rocksdb approach) - reducing duplicates but not guaranteeing exactly once processing.