

Apache Kafka 0.8.0 released - mumrah
http://kafka.apache.org/downloads.html

======
nullymcnull
Have read a bit of the intro material, but I'm still not grokking what makes
Kafka fundamentally different from ActiveMQ / Apollo. Can anyone sum up where
and why one might need Kafka?

~~~
jbooth
It's an architecture thing. Most message queues are written the way you'd
initially think to manage a message queue, you keep a big queue of objects in
memory, and in order to get delivery guarantees, you have to hold on to them
until the consumer confirms receipt. This leads to pathological garbage
collection scenarios, the old "holding onto an object just long enough to make
it really expensive to GC".

Kafka, on the other hand, when you write a message to the broker the broker
writes it immediately to disk queue rather than holding it in memory. But
isn't that slower? No, it's not, because it's in page cache, which is managed
more efficiently than garbage collected memory. Then, when consuming, rather
than keeping metrics for each individual message being received, consumers
simply have a log position -- they periodically commit, which tells the broker
that all of the messages until that point have been consumed. If they never
commit, eventually another consumer will get those messages.

So basically, it scales a ton better because you're just doing scads of
sequential I/O with occasional commits, rather than tracking a bunch of
messages in memory individually (which in theory should be fast but causes GC
problems).

~~~
xorgar831
How does that compare to RabbitMQ's disk backed store?

~~~
jbooth
I'm not familiar at all with RabbitMQ, so can't really comment, but I'm pretty
sure they give guarantees like a producer can wait until a given message is
consumed. This means there's no fire-and-forget, even though the message is
logged to disk at one point, you need to do all that per message book keeping.

~~~
whisk3rs
It is more accurate to say that RabbitMQ supports both fire-and-forget and
producer-waits. The exact behavior is specified by a combination of how you
configure exchanges and queues, and per-message settings, and how you write
your client code. For example, your application can decide that some of the
messages it injects into a queue are to be durable and others not. It is quite
flexible (though the docs are lacking when it comes to specific advice for
various use cases).

------
xal
Kafka is an integral part of Shopify's infrastructure. It's brilliant but
under appreciated technology.

A full company, scalable event bus like this can totally revolutionize the way
you build services.

~~~
nemothekid
And thank you for the amazing go client.

------
mumrah
This is the first release as an Apache top-level project and represents many
months of hard work.

A few of the major improvements (from
[https://archive.apache.org/dist/kafka/0.8.0/RELEASE_NOTES.ht...](https://archive.apache.org/dist/kafka/0.8.0/RELEASE_NOTES.html)):

    
    
      * Intra-cluster replication support
      * Support multiple data directories
      * Many new internal metrics
      * Time based log segment rollout
    

Plus many bug fixes and other improvements.

------
hans0l074
May I ask how Kafka compares with an AMQP solution such as RabbitMQ? Thank
you.

~~~
mumrah
Quora discussion: [http://www.quora.com/RabbitMQ/RabbitMQ-vs-Kafka-which-one-
fo...](http://www.quora.com/RabbitMQ/RabbitMQ-vs-Kafka-which-one-for-durable-
messaging-with-good-query-features)

RabbitMQ developer on the kafka-users list: [http://mail-
archives.apache.org/mod_mbox/kafka-users/201306....](http://mail-
archives.apache.org/mod_mbox/kafka-
users/201306.mbox/%3CCACOPnvdd0_Xr3oOy8Dc8pu0z-_u813xnCOb80ZWpjVeUudhK9A@mail.gmail.com%3E)

SO discussion on several queuing systems:
[http://stackoverflow.com/questions/731233/activemq-or-
rabbit...](http://stackoverflow.com/questions/731233/activemq-or-rabbitmq-or-
zeromq-or)

~~~
morkbot
Quora link where you don't have to register/log-in:
[http://www.quora.com/RabbitMQ/RabbitMQ-vs-Kafka-which-one-
fo...](http://www.quora.com/RabbitMQ/RabbitMQ-vs-Kafka-which-one-for-durable-
messaging-with-good-query-features?share=1)

------
turingbook
I wrote about this release and Kafka in general in Chinese and gather some
information perhaps useful for Chinese guys:
[http://geek.csdn.net/news/detail/3866](http://geek.csdn.net/news/detail/3866)

------
rmk2
This has nothing to do with the software itself, but it bothers me: Why do you
call a "high-throughput distributed messaging system" Kafka? Kafka's stories
essentially describe the polar-opposite: crippled, ineffective, labyrinthine
message systems that are exceedingly hierarchical in nature. They are also
rather user-unfriendly, i.e. their users usually die horrible, lonely deaths.
Am I just missing the in-joke here, and it's called Kafka _because_ it is
exactly the opposite of this, or did somebody overdo the hipster naming
scheme?

~~~
cleaver
That was my first thought. Based on the novels I've read, I would interpret it
in a couple of possible ways:

\- You receive a message, but the system can't tell you why your received it
nor what you should do. (The Trial)

\- It's not a distributed messaging system with bugs. Actually, you are the
bug. (Metamorphosis)

As an aside, I went to a tech conference in Prague two months ago and visited
Café Slavia, a hangout not just of Kafka, but also author Milan Kundera and
president/poet Václav Havel. I had a glass of absinthe in their honour.

------
pspeter3
Congratulations to the Kafka team! Their work is always extremely impressive.

------
harichinnan
Was this developed in Scala?

~~~
mumrah
It is a mixture of Java and Scala, though mostly Scala. GitHub mirror of the
source: [https://github.com/apache/kafka](https://github.com/apache/kafka)

------
nine_k
I like the naming trend and expect releases of Apache Ionesco, Apache
Lovecraft, and Apache Poe real soon now. </obligatory-joke>

------
x3942
Finally!

------
shmerl
Why is it written in Java? Isn't it a default performance hit? I'd expect such
frameworks to be written in high performance languages.

~~~
nine_k
Java is pretty high performance, usually on par with C++, and usually faster
than e.g. Go.

Writing Java code so that there are no perceivable GC pauses is an art, but it
is not impossible to achieve.

JVM might require more RAM upfront, but a well-written program is usually
reasonably memory-efficient, too, so the consumed memory grows reasonably
slowly with the problem size.

Writing things in pure C is often just too time-consuming.

~~~
shmerl
_> Java is pretty high performance, usually on par with C++_

I'm not convinced. Java I/O is far form perfect, and Kafka is probably very
heavy on I/O side.

 _> and usually faster than e.g. Go._

That's strange, since Go to some degree was intended as replacement for Java
without having Java's downsides. Why would Go be less performant?

I'd be interested if someone would write such framework in Rust though. C++ is
of course a default expectation, but usage of Java somehow surprises me in
this case.

~~~
mbell
Kahka is written in Scala, not Java. The only Java in the project is for a
JavaAPI and hadoop connectors.

Comparing languages in absolute performance terms is bad idea, it's an extreme
simplification of what really goes into creating performant applications.

~~~
shmerl
_> Kahka is written in Scala, not Java. _

Thanks for the correction.

