Does this just mean (in TL;DR form) that Kafka now has producers generate an ID ...

boredandroid · on June 30, 2017

That is part of it but there is also a general purpose transaction feature that lets you link together updates, state journalling and consumption all in a single transaction. This enables correct stream processing on top of Kafka and is arguably the more technically sophisticated aspect.

ckharmony · on July 4, 2017

This comment in tech crunch article is misleading and you may want to clarify on that. I agree that this is a nice feature for kafka streams or applications having consume-transform-produce loop. I as user/dev like kafka because of the design choices taken to keep things simple and much effective for users.

https://techcrunch.com/2017/06/30/confluent-achieves-holy-gr...

“It’s kind of insane. It’s been an open problem for so many years and Kafka has solved it — but how do we know it actually works?” she asked echoing the doubts of the community.

makr96 · on June 30, 2017

From the OP source -

" How does this feature work? Under the covers it works in a way similar to TCP; each batch of messages sent to Kafka will contain a sequence number which the broker will use to dedupe any duplicate send. "

phamilton · on June 30, 2017

This is basically it. I'm always amazed at how much reimplementation of TCP we see at a high level in distributed systems. Backpressure, message ordering, retries, etc. all work pretty well in TCP.

boredandroid · on June 30, 2017

Yes the idempotence part of this feature set is very similar to TCP (the transactional consumption and updates obviously aren't). But this isn't a reimplementation at all. TCP provides deduplication within the context of a connection tied to a process. If that connection is lost or the process dies then duplicates may occur. The feature in Kafka is much stronger as the "connection" is persistent and replicated with the log so effectively the "connection" fails over if the server dies.

zekrioca · on June 30, 2017

Probably these new over-layers of abstractions are needed for some use-cases in for specialized and more performant networks. Otherwise, the simple fact of using TCP would be enough.

But indeed, it is interesting to see the computing cycle reimplementations spinning over and over again.

phamilton · on June 30, 2017

Yeah, my point wasn't "TCP does this, that's all we need". It was just an observation about how we apply principles at a higher level.

nehanarkhede · on June 30, 2017

Re: TCP, it only maintains sequencing guarantees over the lifetime of a single connection. Obviously this is too weak of a guarantee for Kafka as leaders can change in a cluster. We've built idempotence in a way that makes the sequence number and producer ID part of the Kafka log. So it can provide idempotence even if brokers fail and new connections are established between the producer and broker