Exactly-Once Messaging in Kafka

mjb · on Jan 13, 2015

I'm the same 'mjb' from the linked lobste.rs thread. If anybody is interested I expanded that comment into a blog post here: http://brooker.co.za/blog/2014/11/15/exactly-once.html

The core problem here is making the act of writing down that you've done something atomic with doing the actual thing. If you can solve that problem, exactly-once processing is easy. If you can ignore that problem, for example because doing the thing is idempotent, exactly-once processing is easy. In the real world, however, it can be really difficult to solve these problems in general and very specific (and often 'incorrect') solutions are used. OP's post talks about a bunch of those, which is very useful.

Very often, though, real-world systems settle for 'at least once' or 'at most once' and find out-of-band ways to handle the missing or duplicate messages. Whether this is practical or not depends on the message rate, and the cost of getting it wrong.

bkirwi · on Jan 13, 2015

Hey, thanks for the excellent writeup.

I enjoyed and totally endorse the content of your blog post, but I did find the title / intro a bit misleading. Exactly-once is pretty much always what people want: it's the easiest model for most folks to reason about, especially when they're new to the whole distributed-systems thing. (If exactly-once delivery was easy, you certainly wouldn't find people going out of their way to build systems that dropped or duplicated messages.) Of course, you might not need exactly-once delivery to get what you want -- which is good, because like you say, that's a hard / impossible problem to solve in general.

It feels to me like people in the stream processing universe have taken that message a bit too much to heart -- things like the 'lambda architecture' treat stream processing as "too hard to get right" and relegate it to approximate, disposable calculations only. My post was partly an effort to push back on this; as a community, I don't think we should give up without a fight.

jayp · on Jan 14, 2015

Completely tangential: can you please invite me to lobste.rs?

fintler · on Jan 13, 2015

Although indirectly related, the comments in https://issues.apache.org/jira/browse/SAMZA-390 are also interesting reading (hi Ben!).

The Kafka/Samza ecosystem is coming alive with tons of neat ways to interacting with streaming data. For example, take a look at https://github.com/milinda/Freshet

bkirwi · on Jan 13, 2015

Oh hi!

The Samza ecosystem seems to be a magnet for this sort of thing these days -- the community's really committed to taking full advantage of Kafka, and not just papering over it so it looks just like any other queuing system. Really excited to see where all this will lead.

Terr_ · on Jan 14, 2015

Speaking as a bystander who mainly reads blogs, my perception of Kafka is that it's marketing (for lack of a better term) has successfully positioned it with "we can keep your arbitrarily large logs" and "they are pretty durable, consumers can revisit them if they fall behind".

This offers a nice distinction from MQ-ish systems, where the emphasis seems to be on a different set of benefits like "we can handle complex centralized distribution logic for you" and "we help manage your synchronous calls".

Does that seem accurate in terms of how Kafka is evolving its future niche? Personally, I'm interested in optimistic-locking when adding to a log.

bkirwi · on Jan 14, 2015

Yes, Kafka certainly is good at those things, and I suspect Kafka will only get better at them. But it's actually quite a simple bit of infrastructure at heart[0] and this means it's useful for a large variety of other things, many of which we're only figuring out now.

Re: optimistic locking... there are a couple proposals for this floating around, and I'm not sure which / if one will get in. It certainly seems consistent with the general mission, though.

[0] http://engineering.linkedin.com/distributed-systems/log-what...

brandtg · on Jan 14, 2015

Some caveats about Kafka replication are discussed here: https://aphyr.com/posts/293-call-me-maybe-kafka.

It is important to include these in the "exactly-once" discussion... (hard to be exactly-once if messages are lost broker-side).

Also, great discussion from Coda Hale about "CA" systems here: http://codahale.com/you-cant-sacrifice-partition-tolerance/