
Jocko – Kafka implemented in Golang - cube2222
https://github.com/travisjeffery/jocko
======
bsaul
What are the usual benefits of all those "x reimplemented in go" type of
projects ? Do people see performance improvments in practice ? Or is it
simpler to deploy / monitor ?

I know go has all the hype at the moment, but i'm curious to know what the
benefits are in practice. Those real world reimplementation project make for a
great benchmark.

~~~
geodel
Though author has not mentioned I think Go program would be using an order of
magnitude less amount of memory compared to Java alternative. Besides of
course you have mentioned ease of deployment. Also standard builtin http(1/2)
client/server, json package can help a lot for setting up communication
between various components.

~~~
mdasen
Go memory usage will definitely not be an order of magnitude less. This is
something that cannot be emphasized enough: the idea that a Go program of any
substantial size would use an order of magnitude less memory than a Java one
is completely wrong.

First, the "proof": [https://days2011.scala-
lang.org/sites/days2011/files/ws3-1-H...](https://days2011.scala-
lang.org/sites/days2011/files/ws3-1-Hundt.pdf). That's a paper from Google.
The memory footprint of the Go program was 501MB, the Java version 617MB, and
the Scala version 293MB. Now, given that the Scala version and Java versions
were so far off in memory just tells you that a bunch of it just depends on
some specific things used. Most importantly, it should dispel the myth that
Java uses more memory than Go. The JVM isn't perfect, but arguing that Java
uses an order of magnitude more memory is simply wrong.

Java has a bad reputation for memory usage among people who have never run
substantial programs because a Java hello-world (or other small programs) will
often allocate itself a decently sized heap. Java programs also take a while
to start up because of the JVM, but that doesn't make Java slow.

Go certainly has some good things. Structs can help with memory locality and
avoiding some unnecessary pointers. But that's not going to help one with
attaining an order of magnitude better memory usage.

But Go's garbage collector isn't made for memory efficiency, but rather short
pause times. That can be great, but Go seems to have made the choice that
shorter pause times are preferable to throughput (actually accomplishing work)
or using less memory (might as well throw more RAM at the problem).

Now let's talk about Kafka specifically. It's been a while since I've looked
at it, but Kafka actually doesn't keep the data on the JVM heap. It writes to
the filesystem using the kernel's built-in async flushing and uses sendfile to
write data directly from the OS page cache to sockets. In a lot of ways, Kafka
bypasses its own runtime and just leans on the OS for the heavy lifting.

I like Go, but it isn't magical. It certainly doesn't offer an order of
magnitude memory improvement over Java. Go has some great bits, but if you're
thinking that you'll get an order of magnitude memory improvement, it's not
there. It might even use more memory than Java.

~~~
Thaxll
And yet it does, try to create a simple API in Java vs Go and look at the
memory of both program, there is no need for white paper to see that the JVM
at least will allocate 100MB~~+ where Go will start bellow 10MB.

~~~
caconym_
He addressed this:

> Java has a bad reputation for memory usage among people who have never run
> substantial programs because a Java hello-world (or other small programs)
> will often allocate itself a decently sized heap. Java programs also take a
> while to start up because of the JVM, but that doesn't make Java slow.

~~~
Thaxll
In a world where we break down monolithic applications into smaller services
it's clear that the JVM memory model at startup is way bigger than any other
language, I'm not sure how one can argue about that.

Since those "micro services" have just a few features, their memory won't grow
up that much over time, so the startup memory is an important factor.

~~~
caconym_
I'm not a Java expert (or even a semi-frequent user of it) but I think it
would be pretty crazy if you could not override the default initial heap size
and, indeed, it seems that you can:
[http://stackoverflow.com/questions/1951347/xms-initial-
heap-...](http://stackoverflow.com/questions/1951347/xms-initial-heap-size-or-
minimum-heap-size)

------
manigandham
NATS (and NATS Streaming) is a good implementation of fast messaging with
streaming/persistence built in Go without the usual Kafka issues:
[http://nats.io/](http://nats.io/)

Also AMPS for the commercial software version:
[http://www.crankuptheamps.com/](http://www.crankuptheamps.com/)

~~~
devoply
Biggest problem that I have with NATS and why I can't use it is that
configuration is all done in config files. So say you want to add a new user
during operation, too bad. There is an issue about trying to add this sort of
support and they are resistant to it the last time I checked. They still want
to use configuration and maybe provide a hot-reload. But seriously, you need a
run-time API that you can use to modify access to make it useful to me.

------
lngnmn
I do remember that a straightforward C++ rewrite of Cassandra performed ~10
times better.

I imagine that a Go implementation would be orders of magnitude less lines of
code, resource usage and tree-four times more performant.

~~~
valarauca1
Go doesn't use less resources then Java (Source: Google [1]).

Go is only _marginally_ ~10% faster then Java [2]

[1] [https://days2011.scala-
lang.org/sites/days2011/files/ws3-1-H...](https://days2011.scala-
lang.org/sites/days2011/files/ws3-1-Hundt.pdf)

[2]
[https://benchmarksgame.alioth.debian.org/u64q/go.html](https://benchmarksgame.alioth.debian.org/u64q/go.html)

~~~
lngnmn
> Go doesn't use less resources then Java

It obviously does. Java's strings and other objects representation, JNI
coersions, necessary copying of buffers, common aliasing bugs in code which
affects GC, bloatware of dependencies, etc, etc.

Knowledge of some principles protects one from being distracted by noise.

~~~
magic_quotes
> common aliasing bugs in code which affects GC

Care to elaborate?

------
fooyc
If at the same time we could move some responsibility from the client to the
server, it would be awesome.

Kafka gives so much responsibility to the client that most implementations are
incomplete and don't cope well with state changes (e.g. adding or removing a
node, migrating partitions to an other node, etc).

So, unless you are using Java and the official client, you don't benefit from
all of Kafka goodness, fault tolerance and scaling abilities.

------
jpgvm
I hope there will be a static "discovery" backend that doesn't use Serf,
preferably just config file or similar to set/seed the Raft peers and perhaps
commands to add/remove peers from Raft.

Good to see more users of hashicorp/raft though, it's definitely the
implementation that is looking to have the better long term future.

------
tcbawo
Since reading through the source of the Kafka server (and several client
implementations), I've been intrigued by the idea of reimplementation using
cooperative multitasking.

Kafka clients tend to be very resource intensive. It would be interesting to
support a smaller/scalable footprint and a lower latency mode of operation.

------
almost_usual
I'm guessing zookeeper was killed because Raft was implemented here. Have you
experimented with Jepsen to see how this handles network partitions?

------
bsg75
What are the downsides to Zookeeper, or the motivations for keeping it out of
the stack?

------
tonyedgecombe
Yet another open source project where the front page doesn't tell me what it
does.

------
neeleshs
Is it wire compatible with Kafka?

~~~
tcbawo
And, which version(s) of the Kafka protocol?

------
dpkp
fantastic project -- I've been waiting for someone to tackle this!

