
From Kafka to ZeroMQ for real-time log aggregation - azaras
http://tomasz.janczuk.org/2015/09/from-kafka-to-zeromq-for-log-aggregation.html
======
dave_ops
I don't really understand what the heavy emphasis on "real-time" here.

I mean it's log/event aggregation for ops insight. Unless the whole system is
some tightly coupled feedback loop into an unsupervised machine learning model
where the whole thing has actual hard-real-time requirements (something which
might well be impossible to build), then there's no possible way that having a
second or two delay, between a message being created and when you can actually
see it, can possibly matter.

I mean you don't have someone with instantaneous reflexes and resolution
ability sitting there 24/7 with their eyes peeled as a stream of thousands of
log messages flies by.

The whole premise seems spurious. Is it really necessary for every startup and
their uncle to delude themselves into thinking that their use case is "mission
critical" or "carrier grade" or whatever?

Auth0 provides basically "login as a service". Its not like they're managing
the access control to nuclear launch codes or something.

Unless some medical device manufacturer was stupid enough to make a critical
surgical assistance device require an internet connection, a WAN round-trip on
unreliable networks, and reliance on a 3rd party service in order to start
operating it... how can this service being down possibly be anything more than
an annoyance? What's the worst possible scenario? A session has to be rebuilt?
A user has to make an extra login attempt?

By their own admission the service has gone down already due to the old system
architecture. How many babies died?

~~~
woloski
The article is talking about webtask.io, the underlying engine for sandboxed
code execution used by Auth0 for allowing customers to extend the platform
with arbitrary nodejs code. In such system there is a need to acccess real
time logs (think tail -f) while you are debugging your stuff.

Also, the emphasis is not just the real time aspect. The article mentions the
issues with kafka for HA

~~~
cma
Real time has a technical meaning in software and that's what he is referring
to: [https://en.m.wikipedia.org/wiki/Real-
time_computing](https://en.m.wikipedia.org/wiki/Real-time_computing)

Live updated logs meant for human readability shouldn't need anything hard or
soft real-time in the technical sense.

~~~
rusanu
Immediate (in human 'immediate' sense) visibility of a change via what
basically is an ETL pipeline absolutely meets the technical meaning of "near-
real-time data-processing" as in that the very link. The article meets "real-
time ETL" in a very technical sense.

------
haddr
Very intersting article! I was working with both Kafka and ZeroMQ and both of
them are really useful, but they are not interchangeable when it comes to
designing your solution. You better know which one to pick for the job.

Zookeeper is a troublesome piece of software, especially when running on
highly loaded clusters. I had experienced some pretty weird stuff with kafka +
zookeeper, where reconnection and rebalancing of topics could break
availability quite dramatically.

Another thing, that was not mentioned in the article is that Kafka guarantees
the order of the messages, while ZeroMQ doesn't. This is certainly a
showstopper for event sourcing applications.

On the other hand, ZeroMQ would be happily dropping messages when the pipeline
is not balanced or when there is a spike in the data load. Also better set
some sane high water mark, or you will end up consuming all your memory or
probably get killed by OOM killer.

Anyway, while both are good solutions be careful not to be mislead by the
advertising tone of both projects. Neither of them is a silver bullet...

~~~
statusgraph
Note that Kafka only guarantees message ordering per-partition

------
stephen
Perhaps obvious/naive, but Amazon Kinesis is a great "no sysadmin required"
Kafka clone that we've had a lot of success with.

Per Zookeeper, I'm sure it's better than each system trying to rebuild
consensus on their own, but I am loathe to ever be responsible for running a
cluster.

I'm surprised Amazon doesn't have Zookeeper-as-a-service, or perhaps even
better, a Zookeeper-as-a-service facade that actually uses Dynamo behind the
scenes.

~~~
erikcw
What's been your expiriance with regard to producer latency with Kinesis? My
limited experimentation with it so far seems to point to that as a potential
pain point for high throughout low latency systems.

~~~
tylertreat
In my experience, Kinesis' performance makes it a nonstarter for low-latency
systems and use cases with many consumers. Shards are limited to 5 reads/sec,
so consumers are heavily throttled. Producers also have significant put
latency. Kafka's latency and throughput are OOM better but of course come with
the operational overhead. Depends on what your needs are, but Kafka is a
better choice for "real-time" systems.

~~~
stephen
Interesting; I didn't know that Kafka had order-of-magnitude-better latency.
Thanks for the data point.

I wonder if this is something inherent to Kinesis's design, or if Amazon will
magically make it faster at some point in the future. Do you have a
suspicion/indication either way?

~~~
tylertreat
Well, for one, there is no streaming socket API, just HTTP. I suspect the 5
reads/sec limit on shards is mainly due to multitenancy, but I could see this
increasing in the future. Amazon has already improved the latency of Kinesis a
fair amount, but it still has a ways to go before it matches Kafka.

We also have cases where we need many consumers reading a stream, so having
that effectively limited to 5 concurrent consumers is a pretty tough
limitation. Their solution to this is to just have consumers sleep on a failed
poll, but that doesn't really help if you want to scale out your ingestion.
You don't have that problem with Kafka.

------
philsnow
It looks like they kept the zookeepers on the same machines as the kafka
brokers. Why not split them up and have a 5-node zookeeper cluster and
decouple the number of kafka nodes from the number of zookeeper nodes
(especially if you're virtualizing these machines already)?

~~~
texthompson
That's a great idea, because the number of Kafka nodes scales with the size of
your data. Your Zookeeper nodes don't need to scale along with your data.

------
jacques_chester
Given that they consider durable logging to be a key non-functional
requirement, I'm not sure how ditching durability is a win.

Single ordered delivery is hard. Really hard. It's easier to allow multiple
delivery and/or unordered delivery -- if your log entries are given various
correlating UUIDs and signed, this makes the whole problem about fifty
hojillion times easier.

Loggregator[1] and Doppler[2] use UUIDs to identify request, app and agent,
which makes it easier to allow logs to arrive out of order from multiple
sources and then be reassembled.

One thing that helps a _lot_ is to separate metrics from logs. Logging
frameworks get overloaded into metrics systems. Logs are useful if you are
recording essentially unique events ("I started at ...", "there was a request
for /foo at..."). They are wasteful of high QOS resources if you're looking at
_statistical_ data in which _fine-grained event identity isn 't relevant_
("57Mb is used by this process", "this request took 100ms").

For metrics it is better to use lossy sampling and derive a statistical view
of goings on, rather than trying to capture every single data point.

A log entry should be like a page in your personal diary on your birthday. A
metric is a phone survey asking your age. If you don't answer the survey, the
data is still useful even with standard error.

Edit: I forgot my usual disclaimer that I work for Pivotal Labs, a division of
Pivotal, which is the leading contributor of engineering effort on Cloud
Foundry.

[1]
[https://github.com/cloudfoundry/loggregator](https://github.com/cloudfoundry/loggregator)

[2]
[https://github.com/cloudfoundry/loggregator/tree/develop/src...](https://github.com/cloudfoundry/loggregator/tree/develop/src/doppler)

~~~
cwp
Read more closely. Durability was NOT a requirement.

    
    
        Since we've decided to scope out access to historical logs 
        from the problem we were trying to solve and focus only on real-time log 
        consolidation, that feature of Kafka became an unnecessary penalty without
        providing any benefits.

~~~
ZielenMiejska
And that puzzles me: how useful is a logging solution which can loose log
messages by design? In an environment where "babies will die"? Maybe there is
durable logging somewhere else? What am I missing here?

~~~
vidarh
Consider the situations where it's likely to lose log messages:

When the system is overloaded.

Exactly the type of situations where you _don 't_ want to slow the system down
further by spending resources on trying to let non-essential services survive.

Presumably their tradeoff is that it's more important for the system to remain
available than for every log message to be delivered. Then _secondly_ you try
to deliver log messages with as high reliability as possible.

Often it is better to design for non-essentially components to fail early, or
at least prevent their resource usage from escalating and dragging down other
parts of the system (in this case, fixed buffers in 0MQ lets them isolate load
in one part of the system by simply locking the rate the drain the buffers at
below a suitable threshold that's normally fast enough).

~~~
acconsta
The problem is that overloaded 0MQ pub sockets drop exactly the data you don't
want to drop — the newest data. Maybe they handle that in the application
layer, but it's not clear.

~~~
PieterH
It's a problem in theory, not in practice. The notion of throwing away old
data (or coalescing new with old) is relevant to slow queues. It turns out to
be irrelevant with ZeroMQ.

First, message rates with ZeroMQ are often hundreds of thousands per second.
The architecture must be designed so that no buffers, anywhere, overflow. If
they do, you have a problem, usually a slow subscriber. Throwing out older
data doesn't cure the problem. What ZeroMQ does is punish the slow subscriber
by dropping so that the publisher doesn't crash. It's not recovery for the
subscriber, it's protection for the publisher (and thus for other
subscribers).

Second, trying to delete old messages is complex and sometimes impossible (if
they're already in system buffers). The design of ZeroMQ's internal pipes has
one writer and one reader, without locks. For the writer to mess with the
reader would slow down everything and introduce risk of bugs. Dropping new
incoming data is the only way anyone has ever found to keep things running at
full speed.

These design choices were often delicate and counter-intuitive, yet they have
turned out to be mostly accurate.

~~~
acconsta
>trying to delete old messages is complex and sometimes impossible

I'm not familiar with ZeroMQ's data structures, so forgive my ignorance. At
the high water mark, why can't the consumer throw away old messages instead of
the producer throwing away new messages? There are no locks or bugs — that's
what the consumer does anyway.

I'm not saying that should be the default behavior, but perhaps an option.
It's cleaner than silently dropping new messages, then sending an entire
buffer of old messages if the subscriber recovers.

------
Xorlev
Strange, we've had exactly the opposite time with Kafka. It's never gone down
for us unless we've done something dumb (e.g. running it out of disk). We
punish Kafka and regularly publish and consume massive amounts of data to/from
it. It's the one thing in our infrastructure I haven't been able to kill. Even
when we ran Kafka out of disk _it still kept running for quite a while_.

Then again, I've never tried running stateful services on Docker. Seems like a
bad time.

> Kafka/Zookeeper combo struggled to rejoin the cluster

We've had issues with the broker properly re-registering with Zookeeper, but
nothing that wasn't solved by stopping the broker and starting it again after
the Zookeeper session timeout elapsed.

> Large companies like Netflix may be able to spend some serious engineering
> resources to address the problem or at least get it under control.

We're a company of 70, of which maybe 15 qualify as backend engineers/ops.
Never had an issue managing Kafka. We've accidentally deployed Kafka with
insufficient heap space and it kept on ticking.

> As a result of this difference, despite Kafka being known to be super fast
> compared to other message brokers (e.g. RabbitMQ), it is necessarily slower
> than ZeroMQ given the need to go to disk and back.

Writes do, certainly. But you don't have to "go back" from disk with Kafka
during steady-state operation. Kafka writes segments to disk, this ends up in
the pagecache. Reads to this segment are served directly from RAM which is
excellent for the fanout consumer case. A healthy Kafka cluster usually has
little disk read IO (unless a consumer is catching up or batch jobs are
reading older data), but lots of network IO out.

Not saying that Kafka was the right solution here, but it seems like
guaranteed delivery and archival of logs is vastly more important than "real
time" logs, but what do I know? Their design of the Kafka-based system seems
sketchy: colocating brokers with workers is awfully strange.

If logging went down, do babies die? If so, ZMQ seems like the wrong solution.
If they just wanted to publish logs in realtime only, cool. I'm sure ZMQ will
serve them well as it's a better fit for realtime non-durable publishing of
data.

That said, I'd be interested in them publishing more details about their Kafka
outages. The fact that we've had such contrasting experiences points to an
interesting X factor in their setup that should be avoided by others.

~~~
Rapzid
Why not both? Is what I was thinking. I agree with you, personally, about
attempting to durably transport and store logs. If Kafka were not real-time
enough for me I would probably consider a system where I routed logs to kafka
and then into a more real-time system.

------
gregwebs
Would be interesting to see if the situation improved with Consul or Etcd.
Here is a project that tries to make those look like ZooKeeper:
[https://github.com/glerchundi/parkeeper](https://github.com/glerchundi/parkeeper)

------
halayli
goodluck with zmq. I've had pretty bad experience using zmq in production.

~~~
eikenberry
At least it isn't Zookeeper. Our Zookeeper cluster was the biggest PITA I've
ever dealt with.

~~~
erichmond
Could you explain what specifically goes wrong with ZK clusters? I keep
hearing how people have issues with it, but we've been running one for ~4
months now with absolutely 0 issues.

------
grosskur
Could this be a good use case for NSQ? It seems simpler to operate than Kafka,
and somewhat more durable than ZeroMQ.

[http://nsq.io/](http://nsq.io/)

------
sargun
ZMQ presents its own set of challenges, because it's typically implemented as
a linked-in native library, and this presents some interesting safety
problems. In a previous life, we tried out ZMQ as a replacement to RabbitMQ
for a use-case that we didn't actually need long-term persistence for (real-
time notifications), but we found Java, and Python ZMQ to provide exciting,
hard-to-debug memory leaks, as well as plain segfaults. But, for some time
I've been thinking about this, because having a smart client library does
allow you to do really neat things efficiently (See: Fast Paxos: [http://msr-
waypoint.com/pubs/64624/tr-2005-112.pdf](http://msr-
waypoint.com/pubs/64624/tr-2005-112.pdf), Chubby:
[http://static.googleusercontent.com/media/research.google.co...](http://static.googleusercontent.com/media/research.google.com/en//archive/chubby-
osdi06.pdf)). I think the ideal model is to split up responsibilities in a
distributed system between nodes that are smart clients, and the rest of the
system, and in this, linked-in drivers can continue to utilize simple
protocols like RPC + Protocol Buffers, and the actual system can use more
complex, higher-level semantics, like dual-dispatch to smooth over latency. --
For whatever reason, people are more comfortable with running a totally
foreign, binary library in their same process / memory space, as opposed to
running a binary and depending on it. -- I realize that operationally, it's
somewhat more complex to do the later, but that's a question about maturity of
methods. We've seen the Go community adopt this approach somewhat with things
like the Packer plugin architecture
([https://www.packer.io/docs/extend/plugins.html](https://www.packer.io/docs/extend/plugins.html)).

------
fidget
Wonder what the advantage over just plain syslog is

~~~
hardwaresofton
IIRC, syslog is slower than file I/O, so I imagine one of the main benefits is
speed (ZMQ is completely in memory)

~~~
lobster_johnson
Rsyslog has all sorts of input and output drivers, and can be set up without a
spool directory, so that it's all in-memory. Don't know about syslog-ng, I'm
sure it has similar support.

~~~
hardwaresofton
I stand corrected then! I have to say I don't have a huge amount of experience
with syslog, but after a little bit of resarch it looks like it does support
mostly in-memory (with some sync to disk) logging.

~~~
lobster_johnson
Syslog is just a network protocol. There are a whole bunch of implementations;
there's nothing in the protocol that says it has to end up on disk.

~~~
fidget
> Syslog is just a [family] of network protocols[, most informally specified.]

syslog-ng (my syslog of choice) has 3 'syslog' network protocols. `tcp`/`udp`,
`network`, and `syslog`. Now lets play 'match the syslog-ng name to the rfc`
(or lack thereof)!

~~~
lobster_johnson
Syslog didn't start out as a standard, and the RFCs basically just try to
formalize what was already implemented in OSes. It's a big mess. The protocol
itself is not... great.

------
hardwaresofton
Great write-up, thanks for explaining the reasoning behind the switch, and
including an honest comparison of Kafka and ZeroMQ (obviously they can't be
compared directly as they have such different guarantees and features/use
cases).

------
pbhowmic
Odd that nobody mentioned RabbitMQ as an alternative. Why is that?

~~~
jack9
It's orders of magnitude slower. Might as well just use AWS sqs.

If you want to do throughput of 10k/s per machine, do yourself a favor and use
ZeroMQ. Kafka is a nightmare when your topology needs to change. ZMQ is just
connecting pipes and splitting throughput.

~~~
CarlHoerberg
10k/s is no problem for RabbitMQ, even on aws t2 machines. Where did you get
that from? 50k msgs/s is no problem on an aws c3.xlarge instance (single
queue, auto-ack and transient messages).

~~~
jack9
> 10k/s is no problem for RabbitMQ, even on aws t2 machines

Sending them from themselves to themselves you can do even better than that.
Loopbacks aren't useful metrics for message passing.
[http://bravenewgeek.com/tag/rabbitmq/](http://bravenewgeek.com/tag/rabbitmq/)
is the most recent attempt to benchmark the various solutions (that I have
found) and my own testing results in slightly less than the results
shown...which I can only attest to unfamiliarity with RMQ, Kafka, etc. In the
end, ease of maintenance only speaks to the weaknesses of solutions that need
tweaking for specific topologies.

------
siliconc0w
We use
[https://github.com/gliderlabs/logspout](https://github.com/gliderlabs/logspout)
streaming to elastic search but this does have a few seconds of latency.

Somewhat off topic but you aren't worried about using docker to sandbox user
scripting? Especially being a security company.

------
rryan
To the author -- you should give credit to the artist whose artwork you used
:).

[http://mikeangel1.deviantart.com/art/Metamorphosis-Franz-
Kaf...](http://mikeangel1.deviantart.com/art/Metamorphosis-Franz-
Kafka-333440905)

------
jkarneges
Nice architecture. The proxy behavior sounds very similar to Pushpin [1],
which supports listening from a ZeroMQ SUB socket and pushing out via HTTP
streaming.

[1] [http://pushpin.org](http://pushpin.org)

------
johnflan
Its strange to me that dropping log messages is acceptable, particularly when
failure occurs in the cluster. These log messages would be important for
auditing the system or customer actions.

------
harshulj
It will be helpful if Kafka can come up with a pluggable system for storing
configurations instead of Zookeeper. In that case problems such as the one you
faced could be solved.

------
polskibus
I was wondering, how does your offering differ from aws lambda? Have you
compared whether lambda provides any logging facility?

------
04rob
Does the realtime requirement rule out something like logstash for this
situation?

~~~
brianwawok
I don't think they are using real-time in the same way as the computer science
real-time. don't see how a second delay from logstash would matter.

~~~
illumen
It's a nice development feature to see console.log() output very close to when
it happens. But I'd probably be able to live with a 1.0 second delay. I've
worked on servers with a bigger delay. It's not as nice as a 0.1 second delay
though.

They're probably trying to optimise for making developers happy.

------
zemo
so the application has to be aware of the log publishing strategy, and the
application itself writes its log output directly into a zmq socket?

