
Twitter’s Kafka adoption story - sciurus
https://blog.twitter.com/engineering/en_us/topics/insights/2018/twitters-kafka-adoption-story.html
======
asdfasafasdf
Twitter loves migrations.

"We will continuously keep an eye on different messaging and streaming systems
in the ecosystem and ensure that our team is making the right decision for our
customers and Twitter, even if it’s a hard decision"

Rather than setting industry direction, they are inclined to changing
underlying platform every few years, which brings a lot of challenges. May be
it creates work for platform teams. But, its not useful to do this often.

~~~
capkutay
I'm curious if there's some conflicting, underlying motives from their senior
level technology folks.

They had a game changing tool in Apache Storm...once Nathan Marz (creator of
Storm) left twitter, another senior engineer took over that group and created
his OWN variant of Twitter Storm called Heron. After much engineering PR and
github fanfare [0], couple years later he leaves to start his own company to
do 'enterprise heron' [1].

So is the goal here to use something that works well for their end-users? Or
is it to create a shiny new tool that they can spin off into a new company?

0:
[https://news.ycombinator.com/item?id=11770758](https://news.ycombinator.com/item?id=11770758)

1: [https://architecht.io/streamlio-founders-on-why-the-world-
ne...](https://architecht.io/streamlio-founders-on-why-the-world-needs-a-new-
streaming-data-platform-bdbb43c3f886?gi=ec98fbc31a6f)

~~~
fnbr
The joke that I've heard is "performance (review) driven development".

If the way to get promoted is to launch a shiny new product, then your most
senior people will be the best at finding shiny new products to launch, even
if that's not the right technical decision to make.

~~~
Benjammer
"Oh but our senior engineers would _never_ make such a selfish choice to make
themselves look better at the expense of the company, everyone here _believes_
in the mission."

------
majidazimi
I submitted a question in previous discussion link[0] but I didn't get an
answer by author. But hopefully someone will answer here:

How did you solve producer fencing mechanism which was built into Bookkeeper
with Kafka (Assume I need to limit only one client to be able to write to a
topic)?

[0]
[https://news.ycombinator.com/item?id=18559284](https://news.ycombinator.com/item?id=18559284)

~~~
Pixelstime
Sorry for that I didn't understand your question fully, I suggest you to ask
question in the slack[link] of BookKeeper community.
[link][http://bookkeeper.apache.org/community/slack/](http://bookkeeper.apache.org/community/slack/)

~~~
majidazimi
Sorry for my bad English. Here is a better explanation:

In Bookkeeper, one producer can fence the current active producer from
appending to log. If majority of Bookkeeper nodes respond with OK to a FENCE
request, then the current producer can not append to the log. This helps
implementing master-slave database replication quiet trivial. Master keeps
appending to log, and slaves keep reading from log. In case master fails, one
of the slaves sends a FENCE request to Bookkeeper nodes which reliably blocks
current Mal-functioning master from appending to db log.

However in Kafka, this isn't a native operation. Any client can push to a
topic. There is no easy way to safely do this.

I was looking forward to see how Twitter team actually handled this specific
use case.

------
stevewilhelm
Did they review other Pub/Sub solutions? I'm thinking of NATS in particular.

~~~
tveita
It would be very interesting to see a comparison of Pulsar or LogDevice on
Twitter scale workloads.

~~~
majidazimi
Both Pulsar and LogDevice solve the problem of far-away-consumers with kind of
segmenting the log and storing each segment in separate machine to reduce disk
seeks and page cache pollution (when c1 reads head of the log and c2 reads
tail).

So technically they are better than Kafka. But as the article mentions, they
just used SSDs to get around this issue in Kafka. By looking at article the
only valid reason to switch to Kafka seems to be KStream.

~~~
manigandham
Pulsar has a much better storage system that scales independently so latency
stays low regardless of consumer offset. It can also now tier to S3/cloud
storage natively and it has many other features like supporting millions of
topics and per-message acknowledgements.

------
manigandham
Why didn't they just run EventBus storage and serving layers on the same
machine if they wanted to get better resource utilization?

~~~
jbs40
There seems to be a fundamental technical misunderstanding of these systems in
the blog post. Unless you're running in the (generally quite dangerous)
scenario where replication factor = 1, both EventBus and Kafka have a network
hop (Kafka to the replica and EventBus to the storage layer) and JVM
traversal, so the argument that Kafka is more efficient there doesn't make
sense.

As you say, it would be trivial to run EventBus storage and serving on the
same nodes if the network hop were an issue.

The argument that a decoupled storage layer costs more than co-locating
storage and serving also seems dubious. The cluster is basically doing the
same amount of work when you decouple those layers, it's just separating them
into components that can be optimized for specific roles. So if you pick the
right hardware/instances for each layer and tune for that, you really
shouldn't need more hardware. Otherwise we'd all still be running full-stack
applications on mainframes to minimize the number of servers. :)

~~~
manigandham
Yes, agreed. It seemed odd because Kafka isn't split into layers so there's no
other option, and it confuses resource utilization for number of servers.

------
scarcely
Slightly off topic, but I feel that when naming one's technical creation it's
best to avoid using the last names of famous people especially when that
person has become so famous it's become standard to refer to him/her simply by
last name. Just google "Kafka" and check out the first page of search results
and you'll see what I mean. (No, googling "Franz Kafka" isn't the same)

~~~
will_pseudonym
I agree completely. Similarly, I have no idea why the creators of GIMP and
CockroachDB (Spencer Kimball and Peter Mattis, and Ben Darnell for the latter)
thought those names were the best possible options.

~~~
slededit
CockroachDB at least refers to their marketing line of "almost impossible to
kill/take down" and the way it scales to many servers. There are few less
gross insects they could have used though like Bees or Ants.

~~~
andrewflnr
Bees and ants don't have anywhere near the same rep for invincibility. Only
one of those animals is a stereotypical nuclear apocalypse survivor.

~~~
slededit
They should have went with Water Bears. It can survive even in space. Plus
they sound cute (but look scary).

~~~
mjibson
We have a conference room named Tardigrade.

~~~
jbs40
But any of those are infinitely better than naming something after a letter of
the alphabet. That would be a crazy idea, wouldn't it?

(Sorry to all the users of 'R' and 'C' and 'S' and ...) :(

