
Why I am not a fan of Apache Kafka - kornish
https://gist.github.com/markrendle/26e423b6597685757732
======
boredandroid
This article is pretty out of date, I think the central concerns have actually
been addressed.

It's true that when we were working at LinkedIn Kafka tended to have much
better Java support. Since founding Confluent (I'm one of the co-founders)
we've really focused on improving the situation outside Java.

A few specific corrections:

1\. We added full support for consumers with no interaction with zookeeper in
the main kafka protocol. There is no longer any direct interaction with
zookeeper from either the producer or consumer. We did this because we care a
lot about the non-java clients.

2\. Kafka has been extremely disciplined about backwards compatibility. The
protocol comes with versioning and changes are always implemented in a way
that supports both the old and new version and can be rolled out without
downtime. In the five year history of the project we did one backwards
incompatible release--the break from 0.5.x-0.7.x to 0.8.x. This was done
intentionally to allow us to refactor the apis. I think this is a pretty good
track record.

It's worth also addressing why Kafka clients directly access nodes in the
cluster rather than requiring a proxy layer. The reason we do this is to allow
very high throughput, partition aware processing. This is really required for
use cases like stream processing that need to process data efficiently,
especially in cases where you are reprocessing data. You can always build a
proxy layer on top of direct access but not vice versa.

Confluent (where I work) is doing two things that help the non-java client
ecosystem: 1\. We maintain an open source REST proxy that provides decoupled
access (albeit with a little overhead compared to the direct clients) 2\. We
have picked up work on clients. We offer and fully support a c/c++ client, a
python client, and have a Go client coming soon. All of these are in feature
parity with the Java clients. (More on the way).

Both of these efforts are open source and apache licensed and included in the
open source Confluent Platform distribution of Kafka.

~~~
GauntletWizard
Breaking with Zookeeper is a critical misstep. A lockservice is the critical
core of any distributed system. The hardest problem in distributed systems is
serialization and consensus - A lockservice provides important tooling to
solve both, in a way that can be portable and consistent across diverse
systems. Master election? As simple as who owns the lock. Serialization? Do an
atomic update on the lockserver of the pointer-reference to the datastructure,
or simply grab a lock for the duration of the commit. Service discovery? Use
ephemeral nodes and prefix-scanning to discover who's online. These solutions
are tried and true, but frequently ignored, as each new tooling re-invents the
wheel and builds it's own infrastructure... Including Kafka, which can't
decide if it's part of the Hadoop stack or it's own complete solution, and is
therefore suitable for neither.

I'm actually in favor of breaking with Zookeeper (It's terribly designed, has
serious problems with concurrency and even consistency, and has the classic
java problem of using java's internal serialization methods, which are utterly
unfit for storing more than temporary data on a single host.) However,
absorbing all of the demands of a lockservice into every product is not the
solution to Zookeeper's failures.

~~~
yid
They're not breaking with Zookeeper, it sounds like they're refactoring to
make zookeeper use transparent to producers and consumers.

~~~
Sphax
it's already the case fyi

------
ah-
This needs a [2015]. Back then there wasn't a single usable client for .net.

Since then the non-Java clients have massively improved. In particular
[https://github.com/edenhill/librdkafka](https://github.com/edenhill/librdkafka)
is fantastic. There's also [https://github.com/dpkp/kafka-
python/](https://github.com/dpkp/kafka-python/), with support for consumer
groups and all the other modern features.

The basic criticism of requiring a complex client is valid, however you cannot
achieve the delivery guarantees that Kafka gives you without one. The
alternative would be to have a local agent process like consul, but that
wouldn't give you the throughput that Kafka gets.

Disclaimer: I've built a C# client for Kafka based on librdkafa
([https://github.com/ah-/rdkafka-dotnet](https://github.com/ah-/rdkafka-
dotnet)), so I'm biased.

~~~
kornish
You're absolutely right; I've added [2015] to the title.

For context: the company where I work is putting Kafka into production soon
and we came across this post while trying to build an intuition for the
tradeoffs of Kafka as a system. I thought I'd post it on HN to try to generate
some discourse. It's good to hear that things have changed so much for the
positive, though since we're using Go, the Shopify client [0] has been usable
for a while.

[0]: [https://github.com/Shopify/sarama](https://github.com/Shopify/sarama)

~~~
sctb
We've removed the (2015) label now that there's an update at the top of the
post.

------
agentgt
I think in large part why people dislike Kafka is that they don't really need
Kafka (and the complexity that comes with it).

Don't get me wrong Kafka is good tech if really need that level of throughput
but I honestly think most companies don't have that much data and/or just
putting too much in the pipe. But they go with Kafka anyway I guess to "CYA"
for future scaling only to find out Kafka is complicated.

I mentioned this earlier (a couple days ago
[https://news.ycombinator.com/item?id=12520159](https://news.ycombinator.com/item?id=12520159))
for someone using Kafka for a logging aggregation system only to drop it for
ZMQ.

My other point is if your endpoint isn't fast enough it doesn't really matter
what your pipe is. Pick an easy to use pipe first (like RMQ) and worry about
scaling the endpoints.

~~~
evantahler
I agree! We've had success using redis as our pipe... with fast consumers we
regularly push MB/s of data... no problem. And redis is super easy to manage.

~~~
agentgt
Yes Redis is a solid choice! And it like RabbitMQ gets the constant "single
point of failure" critique (we get this crap from tech partners when we tell
them how heavily we use RMQ)... oh so I have should have chosen something even
more complicated so that we have multiple points of failure (I'm sort of
joking and being sarcastic)?

Kafka and ZMQ though are solid choices for intensive stuff... just be prepared
to build extra stuff and don't complain when it gets hard :)

------
Xorlev
Author doesn't understand Kafka, doesn't have a good client for his language,
therefore doesn't like Kafka.

Jay Kreps responds to a few of his points -- the complexity of the client is
for scalability reasons.

> When you Produce a Message Set onto the bus, you don't directly get back a
> response telling you that the messages have successfully been persisted to
> one or more partitions.

At least in the Java client, this isn't true. True, if you used the async API
before 0.9 you weren't able to get an ACK, but the sync producer would block
until a message was published. In the new consumer, you're handed futures +
the ability to provide a callback[1].

[1]
[http://kafka.apache.org/082/javadoc/org/apache/kafka/clients...](http://kafka.apache.org/082/javadoc/org/apache/kafka/clients/producer/KafkaProducer.html)

~~~
pfraze
There's one specific thing that I'd point out to the author:

Zookeeper and load-balancers accomplish different things. If leader-election
is being used, then the cluster needs to have a single authority to enforce
strict consistency in the logs. A load-balancer can only distribute load
between separate nodes. If there were only a load-balancer, and no leader
election, then Kafka would not be able to create logs with total orders, and
you'd have a different system. (We can talk about alternative architectures,
but I'm doubtful there's an alternative to leader-election that maintains the
same properties.)

He's complaining that Zookeeper takes 6 seconds to restore uptime in a
failure, which may be improvable, but it's not really bad. If you want strict
consistency, you either have a leader-election protocol, or you manually
reconfigure your cluster when it fails. So, it's 6 automated seconds vs N
manual minutes of labor, after your pager goes off.

------
slap_shot
I work with Kafka every day and don't really think the OP's concerns (it's too
complex and there isn't third party driver support to the degree of Redis) are
too serious. They both will be solved with time.

My bigger fear is Confluent - the private company founded by many Kafka core
committers and employing many Kafka committers.

Confluent offers open source extensions to Kafka's core in the form of
connectors (boiler plate code to connect to common sources and sinks like JDBC
databases, files, and Hadoop).

Confluent also offers (as of right now) one closed source product extension
(Control Center - a cluster monitoring system similar to the management UI of
RabbitMQ, etc) that requires enterprise subscription (several thousand dollars
per node per year) after a 30 day trial.

$30.9MM for a service/support based company seems like a lot of money and a
drives a very high valuation that needs to show return. I personally am
skeptical of service/support venture backed model[0].

My fear is that Kafka will increasingly require "enterprise" support tools
with less and less support and features available to people who do not pay for
enterprise support. The amount of documentation of the 0.10 release
(particularly the Streams API) that resided on the Confluent page versus the
Kafka page is a HUGE red flag to me.

I have all the respect in the world for Jay, and the Kafka/Confluent team, but
I find myself avoiding Confluent's tools (Kafka Connect and Schema Registry)
because of fear that those will eventually be closed source or require an
enterprise subscription.

[0] I'm not an investor but I haven't seen many these models work out in the
long run. A recent Podcast by A16Z touches on this subject very well, with an
A16Z partner saying he believes exactly one company has pulled this model off
well at venture scale - Red Hat. [http://a16z.com/2016/08/19/pricing-freemium-
premium-opensour...](http://a16z.com/2016/08/19/pricing-freemium-premium-
opensource/)

~~~
djhworld
_My fear is that Kafka will increasingly require "enterprise" support tools
with less and less support and features available to people who do not pay for
enterprise support. _

I thought Kafka was an Apache project, don't they have rules against this sort
of thing?

~~~
x0x0
Sure, the Apache distro of kafka won't require them.

Users who buy from Confluent will download Confluent's build of kafka. Similar
to how things work in the hadoop ecosystem: You can get Apache hadoop, but
most users get one of Cloudera, Hortonworks, or MapR.

~~~
ptrincr
As it stands though, kakfa is no where near as complicated as setting up and
managing a hadoop/yarn/spark(etc etc) cluster.

Without using something like Hortonwork's Ambari, it's a right pain trying to
get everything installed, working together and managing it. Then you have to
worry about how you are going to upgrade it.

Currently you can use this chef cookbook [https://github.com/mthssdrbrg/kafka-
cookbook](https://github.com/mthssdrbrg/kafka-cookbook) , along with the yahoo
kafka-manager application and you can get a cluster running without much
effort or complication.

The cookbook has a coordination feature which allows you to use a locking
mechanism ( I use consul as we use it for dns/auto registration), which allows
you to roll out changes across the cluster, restarting each broker one at a
time, it also allows you to upgrade in a similar fashion.

------
admnor
Hi. Author of the article/gist here. It was actually written in Spring of last
year, I think. I've just updated it because, as people have said, many of the
issues have been addressed one way or the other:

* The KafkaREST proxy makes life a little easier * librdkafka makes life a lot easier, especially when you can just download a thin wrapper around it for your chosen language * I am no longer working at the place that chose Kafka 0.8 following no testing at all for their use case, and refused to back down through months of hell both writing code and trying to keep clusters available * The people at Confluent have done a lot of good work since then, both on Kafka itself and on various auxiliary tools/products.

So yeah, I would probably look at Kafka again today if I needed that kind of
functionality. Screw ZooKeeper though.

------
reitanqild
To be honest I think very many of us don't need Kafka. As with many other
things as long as we aren't handling more than a few thousands messages/sec
any decent ordinary nessage broker like ActiveMQ should do.

Caveat: do not install production message bus in a vm.

I recently listened to this talk which compares and explains very nicely:
[https://vimeo.com/181925293](https://vimeo.com/181925293)

~~~
ah-
Kafka gives you vastly better delivery guarantees than the common ActiveMQ
brokers.

If you put a message into Kafka and have it configured correctly, it will come
out on the other end at least once. This is not the case for most of the
competitors.

~~~
vidarh
A lot of the time you don't need at-least once, so even picking a broker
without support for it is not necessarily an issue.

And at-least once can be reasonably simply layered on top (require a response
at a higher level layer) at a cost, so depending on your use-case, forgoing
at-least once support in the broker can very well be the right choice.

------
yummyfajitas
I find the complaints that Linkedin hasn't open sourced their REST client to
be a little silly. I'm in a similar situation - PHP needs to send commands
into Kafka but PHP kafka libs aren't great (or maybe there are and my PHP guys
don't want to use them).

So I wrote a little Thrift endpoint (in Scala) which receives messages and
writes to kafka. It's under 100 lines of code. Probably another couple of
hundred lines for the PHP version of the thrift client.

Are we really complaining that Linkedin hasn't open sourced their 200 loc rest
client?

~~~
ah-
I'm not sure if it was the case when the article was written, but the REST
proxy is open sourced now: [https://github.com/confluentinc/kafka-
rest](https://github.com/confluentinc/kafka-rest)

------
KirinDave
"Really new" around November 2015. It was released in 2011 though. It's only 2
years younger than Redis, and is primarily an exercise in using Zookeeper.

It's very surprising to hear people suggest Redis pubsub is a valid substitute
for Kafka, when in fact it's not. It has a fundamentally different set of
operating characteristics, a different sweet spot. Kafka isn't great from a
consumer resumption point of view, but at least there ARE options.

It's also untrue that Kafka gives no feedback on a successful message put.
This is obviously a bug or design shortcoming in the post author's chosen
toolchain which is correctable, and was part of the core java toolchain as of
late 2015 AFAIK.

I do agree that the Kafka system has an architecture that maximizes difficulty
for new language bindings. Certainly C# has the tools to write an excellent
implementation, assuming someone understands zookeeper well enough.

~~~
Sphax
I don't understand what is difficult about thé kafka protocol. as a client you
only ever need to talk to the broker. this uses a binary protocol that is
really straightforward to implement. any language that can work with binary
and can open a tcp socket is suitable to write a client.

~~~
KirinDave
What threw me off is some disaster recovery scenarios are a lot easier if you
can inspect and modify the broker data put into ZK.

~~~
Sphax
having your client modify the ZK data is not a good idea, you're bound to
cause problems.

~~~
KirinDave
If you were in mid-block consumption in some cases it may make sense to
advance your index to avoid replays.

You can say, "You should have made the message consumer idempotent" and I
reply, "I agree this was a consulting gig I was there to fix the problems." ;)

~~~
Sphax
oh you're talking about consumer offsets ! I exclusively use kafka to store
offsets since I started using kafka with 0.8.0.2 so I can set my offsets by
using the kafka protocol only

------
notacoward
I liked "behind a load balancer like a normal server" the best. How does the
author think a load balancer works? By making an even less accurate guess
about the state of servers behind it than Kafka can do via Zookeeper. AFAICT
the author is just upset that Kafka isn't exactly like Redis, whereas most
sane people would be quite glad it's not.

------
hifier
Seems like mostly FUD. No mention of specific issues in any clients. And
please have a look at other high throughput systems (like Cassandra or VoltDB)
before claiming that a load balancer is the proper way to connect clients to a
distributed system.

~~~
tptacek
Try to be more careful about the term "FUD". I know it isn't written anywhere
that this is the case, but in reality, "FUD" implies a bad-faith effort to
convince people to avoid something; it means the person spreading the message
doesn't care if it's true or not.

There is a big difference between that kind of "FUD" and simply being
incorrect or under-informed about things, even if you think people should have
an obligation to be better informed before relating their opinions.

~~~
hifier
Interesting. Noted.

------
falcolas
Quick note - the author just did an update for an "As Of Sept 2016".

TL;DR: Still not a great solution for their original problem. The development
of good C/C++ libraries means he could now get around the lack of decent C#
libraries. Overall architecture still pretty f'ed.

------
fusiongyro
I don't have as much depth with it as the author, but I also felt like using
it was kind of a bait and switch, especially coming from having read (and
loved) Jay's book I Heart Logs. We're using it at my work but I'm not really
in love with it and will probably be trading it out for AMQP for an event
system we're planning.

I was working with a junior engineer on this project, and he kept on getting
confused about what features were ZooKeeper and which were Kafka. There are
two complex technologies here as a first hurdle to using it. This isn't ideal.

In the book he describes a scenario where your stream processors record to
their own storage where they are in the stream, but Kafka's stock consumers
now seem to keep track of that in ZooKeeper instead, which seems like an odd
place to make the decision.

We initially had a small Node.js server, just to experiment with, but I
discovered that it had no error handling at all. I could put fake hostnames in
and it would just hang out, as if eventually maybe they would appear and it
could connect. This is really the Kafka driver's fault; we switched back to
Java and the Java client worked, it's just a little overcomplex. But we also
periodically came in to find the server had crashed. I still don't know why.
(I'm open to it being our general ignorance and a misconfiguration or
something.)

In the book, Jay describes this beautiful computing model where you have these
log streams and you just process them, and it's high-level and very alluring.
The actual APIs that Kafka gives you are not beautiful or intuitive. Rewinding
to the beginning is something you can only do after you read, for instance. We
were thinking of using it like an external write-ahead-log (as described in
the book) but it just doesn't really support that use-case directly through
its API.

It's kind of a shame, because AMQP doesn't support that use case all that
well. I believe you have to decide whether you want your queue to act like a
round-robin affair or as a persistent queue. Kafka sort of lets you have both;
streams (ostensibly) work like persistent broadcast queues. I don't think I'll
be able to use AMQP as a write-ahead-log by itself; probably I'll have to have
some kind of mediating service that's just recording events to persistence and
have a separate way of getting historical stuff.

I spent a year or so unable to work on Kafka but telling everyone to read I
Heart Logs, so getting in there a few months ago and seeing how wide the gap
is between the beautiful theory and the practice has been disillusioning.
Frankly, the actual system and the one in the book are pretty radically
divergent. I am still a big fan of the system described in the book. I hope
someday I get to use it.

~~~
gfodor
Kafka Streams [1] is the library being worked on by Confluent that I think
best approximates the vision for stream processing systems outlined in the
book.

[1]
[http://docs.confluent.io/3.0.0/streams/](http://docs.confluent.io/3.0.0/streams/)

~~~
fusiongyro
Thanks, I'll definitely check that out.

------
jknoepfler
The linked article now includes a giant disclaimer on top more or less
retracting the view expressed in the title. Please update the title to
accurately reflect the linked content. Also note that the author is mostly
griping because of issues which no longer exist. I've posted the author's
words below:

"Update, September 2016

OK, you can pretty much ignore what I wrote below this update, because it
doesn't really apply anymore.

I wrote this over a year ago, and at the time I had spent a couple of weeks
trying to get Kafka 0.8 working with .NET and then Node.js with much
frustration and very little success. I was rather angry. It keeps getting
linked, though, and just popped up on Hacker News, so here's sort of an
update, although I haven't used Kafka at all this year so I don't really have
any new information.

In the end, we managed to get things working with a Node.js client, although
we continued to have problems, both with our code and with managing a
Kafka/Zookeeper cluster generally. What made it worse was that I did not then,
and do not now, believe that Kafka was the correct solution for that
particular problem at that particular company. What they were trying to
achieve could have been done more simply with any number of other messaging
systems, with a subscriber reading messages off and writing them to some form
of persistent storage (like Elasticsearch). I'm sure there are issues of scale
or whatever where Kafka makes sense.

It is true, as many people have pointed out in the comments, that my primary
problem was the lack of a good Kafka client for .NET. If I'd been able to
install a Kafka Nuget package and it had just worked, this would never have
been written. But I couldn't. Today I could probably use a thin wrapper around
librdkafka, and if I ever have to work with Kafka from .NET again, that's
probably what I'll do. C/C++ libraries are great for stuff like that: C can
talk to anything, and everything can talk to C. Yay.

I do understand the performance-related reasons that drove the decision to
design a clever-client architecture, but it was, apparently, extremely
difficult to create a good client unless you were working with either Java, or
with a lower-level language such as C or Go which could work with the complex
protocols and implementation requirements.

So, anyway, like I said, you can ignore the stuff below which was written
about an old version of the software, while I was in a very bad mood. But I'm
going to leave it here, in the hopes that it may serve as a warning to future
developers of really complicated infrastructure components. It probably won't,
though."

------
thomaslee
> When you Produce a Message Set onto the bus, you don't directly get back a
> response telling you that the messages have successfully been persisted to
> one or more partitions. Instead, you must also Consume the bus, and you
> should eventually receive multiple messages acknowledging the persistence of
> each message in the set.

Maybe this has changed recently, but IIRC this isn't true if your
ProducerRequest has the ack bit set to 1 or 2 (i.e. leader or replica acking):

[https://github.com/confluentinc/kafka/blob/79aaf19f24bb48f90...](https://github.com/confluentinc/kafka/blob/79aaf19f24bb48f90404a3e3896d115107991f4c/core/src/main/scala/kafka/api/ProducerRequest.scala#L60)

The response/ack is sent directly over the socket sending the request.

> If a Node dies then a "leadership election" happens, ZooKeeper is updated
> with the new metadata, and your application must react to this and handle
> the changes. There's a six second delay while this happens

Not that I doubt it, but not sure where six seconds comes from here ...
perhaps waiting for partition leader elections? It's been long enough that I
can't quite remember exactly what happens during a failover.

> and who knows what happens if you try and send messages to a dead node
> during that time.

Depends how it died, which client API you're using and how the client is
configured. Some combination of:

* data loss if acking is disabled (hint: enable acking) * backpressure and errors in the client until new partition leaders kick in * client socket writes hanging "forever"

If the latter is surprising: no SO_SNDTIMEO in pure Java blocking socket I/O.
Think the new clients may address that, but not entirely sure.

As an aside: can't emphasize enough how important it is to get your
configuration right early. By the time you run into problems, it's often too
late. Pay heed to any tuning guides you can find. Talk to Confluent if you're
still unsure.

> AND HAVE THEY OPEN SOURCED THIS MAGICAL SERVER? NO, THEY BLOODY HAVEN'T.

[https://github.com/confluentinc/kafka-
rest](https://github.com/confluentinc/kafka-rest) this thing? FWIW, it's kind
of a joke for high throughput anyway. Last time we spoke to Confluent they
sort of discouraged its use for exactly that reason.

Still, it's an easy bridge for folks who aren't too fussed about throughput.
Not sure why you'd be using Kafka if throughput's not your thing, but y'know.

> If you are using Java/Scala/Clojure/Kotlin/whatever and can use the Official
> Java Client then I'm sure Kafka is a perfectly reasonable choice for a
> message bus, although there are plenty of others that seem to me to be far
> less bloody-minded.

Despite all the gotchas, Kafka's capable of pretty incredible throughput in a
fault-tolerant HA configuration. I can empathize with some of the
frustrations, but past a certain scale the proposed alternatives just aren't
IMHO.

------
agounaris
Why someone should be a fan of Kafka? its not the team of my town its a damn
hammer.

