Hacker News new | past | comments | ask | show | jobs | submit login

Everybody knows "exactly once" means deduplication. This is not exactly a new problem.

That said it's still a difficult problem and I actually wish people would stop trying to roll their own schemes. For example, this scheme relies on examining a Kafka outbound topic to resolve in-doubt outbound messages. But what happens if the outbound message + commit is still "in flight" when the system recovers so the system retransmits the in-doubt outbound message and so rather than deduping it now is generating dupes? Yes, the chances of this are minimal which means it will happen.




Everyone might know that, but it's certainly the case that a lot of systems have _claimed_ it where they actually meant "disastrous characteristics under load and/or (partial) failure".


The protocol is in fact quite simple:

1. Sender sends message.

2. If no acknowledgement from recipient is returned to sender, resend message until an acknowledgement is received from recipient.

3. If recipient receives message, check store to see if it's been received before.

4. If it's not in the store, store it, acknowledge receipt to sender, and process it.

5. If it's already in the recipient's store, acknowledge receipt to sender, and discard message.


What if the "acknowledge receipt to sender" message gets lost?


Server will keep trying to send the message. Each time the client responds with receipt. Eventually communication succeeds and protocol is done. The major drawback is requirement of persistent storage on the client.


That's not a drawback - it's an advantage. It enables store-and-forward, i.e. on- and offline use. The client remains functional without a connection to the server.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: