
Distributed Systems with ZeroMQ - jpro
http://java.dzone.com/articles/distributed-systems-zeromq
======
jandrewrogers
To make a distinction that the article does not (but should), ZeroMQ is a
network transport abstraction layer. It is _not_ useful for many distributed
systems despite the title for the same reason. For non-trivial distributed
systems, behaviors that are correct for network transport use cases that are
built into ZeroMQ are pathological for distributed system use cases.

In the specific case of ZeroMQ, there is a deep underlying assumption that
logical queues are fundamentally independent things. As a corollary,
scheduling what work is done on which queues is of no consequence as long as
the contract of the individual queues is upheld.

For any non-trivial distributed system, the above assumption is not true.
Correct and scalable scheduling of operations is a function of the current
status of all logical queues visible to a process. The processing priority of
one queue is dependent on the current status of all other queues, which can
change from operation to operation. Distributed systems are cooperatively
scheduled and much of the self-balancing behavior of good distributed system
designs come from this adaptive scheduling behavior. Unfortunately, systems
like ZeroMQ intentionally hide and encapsulate all of the properties of the
network transport that would be used to inform the scheduling of operations
over a set of logical queues. And if you assume logical queues are
fundamentally independent from a scheduling perspective then that is a good
design.

ZeroMQ is a good for moving bits over a network but it is often not a correct
choice for distributed systems.

~~~
snprbob86
> For non-trivial distributed systems, behaviors that are correct for network
> transport use cases that are built into ZeroMQ are pathological for
> distributed system use cases.

I'm not sure what you mean. Please provide one or more concrete examples.

> ZeroMQ is a good for moving bits over a network but it is often not a
> correct choice for distributed systems.

It seems to me that distributed systems are _defined_ by moving bits over a
network.

You're right that there is some confusion regarding what ZeroMQ actually us.
This is in no small part due to it's unfortunate name: ZeroMQ isn't a queue.
It is implemented with queues, but that's because it sends and receives
discrete messages instead of bytes, like it's underlying protocols. The
underlying transport protocols use "buffers" to store bytes on their way from
your application to their destination. When you have a buffer of _messages_ ,
that's called a queue!

ZeroMQ is a network protocol for messaging. If you need custom scheduling,
load balancing, etc, you can send and receive control messages on a secondary
channel. The ZeroMQ Guide [1] is extremely enlightening in this regard. It's
worth reading even if you never use ZeroMQ because all of the underlying
principals apply to any network transport.

[1] <http://zguide.zeromq.org/page:all>

~~~
jandrewrogers
You are not understanding the nature of the problem. I have not only used ZMQ
extensively in distributed systems but have modified the internals for some
purposes. The problem is intrinsic to its design.

Robust, high-throughput distributed systems are built around the concept of
Nash equilibria. To ensure that, there is an optimal ordering to the set of
possible message operations over those queues. It gives them a priority and
you attempt to schedule operations in approximately optimal order. The
ordering is not fixed; every message operation on a queue may have side
effects that alters the total ordering of message operations for any process
associated with that queue. Consequently, the schedule of operations over the
entire set of message queues must be adaptively and dynamically imposed in
order to approximate a good Nash behavior.

In real software, this is usually just a simple state machine that adaptively
prioritizes operations based on the aggregate state of those messages and
queues. Also, we usually approximate ideal scheduling for performance reasons
(and good approximations are usually good enough). Priority modifying side
effects propagate throughout the distributed system; local modifications in
priority provide stabilizing negative feedback to the global behavior. (See
also: greedy routing optimization problems etc. This is a pretty complex area
of mathematics.)

Consequently, there are some popular design choices for network server systems
that tend to work poorly for distributed systems, either because they prevent
the creation of an effective operation scheduler or because they hinder the
propagation of negative feedback that prevents pathological interactions.

Giving every socket its own OS thread or similar is right out. That lets the
operating system decide when operations are scheduled and the operating system
has no concept of prioritization in the sense that is important for
distributed systems. Trying to use thread interlocks to impose an order leads
to pathological context-switching storms. High-performance schedulers are
cooperative. (You can design a schedule with multiple threads that executes
with minimal interlocks, but it is not trivial.)

Unlimited buffering and hard-limited buffering are also poor abstractions
because they obscure the state of a queue. Robust schedulers often prioritize
queues based in part on how it reduces total buffering. For some good designs,
you can prove the existence of a reasonable upper bound on the total buffers
required for a node or queue with no hard limits on any particular buffer. It
is an elegant side effect of scheduling interactions in a distributed system
but requires a scheduler that is aware of many aspects of buffer state.

In principle, it should be possible to design a message queue abstraction
designed for the requirements of distributed systems. It would not be as
simple as ZMQ though. ZMQ is not designed to support that use case, so in
cases where I am building a non-trivial distributed system protocol, we build
on top of epoll directly. It is not a knock against ZMQ, it was designed for
other purposes.

~~~
snprbob86
Very few distributed systems have the communication volume and patterns
necessary to justify explicit scheduling policy beyond round-robin and fair-
queuing. In fact, many distributed systems get by just fine on HTTP using
round-robin load balancing and even without keep alive!

I've worked with some pretty big systems that use a model that look an awful
lot like 0mq and perform splendidly. I've also worked on several smaller
systems (games) that needed to use UDP and all kinds of custom semantics for
reliability to accomplish their task. I'd imagine that if such a thing happens
when you scale down, what you're saying is possible when you scale up.

To anyone reading this thread: If you don't understand all the complexities of
jandrewrogers' response, just assume YAGNI. In that case, if your problem
lends itself to a stream oriented protocol, then ZeroMQ is a great choice. You
can deal with the other complexities if the need arises.

~~~
jandrewrogers
All the systems you are talking about are so loosely coupled that most people
do not classify them as "distributed systems", certainly not in the computer
science sense. The complexity is not in the number of machines but in the
interactions and coupling of individual nodes required by basic operation of
the cluster.

Lots of things work very well with ZMQ, just not distributed systems. If your
application scales efficiently on ZMQ then it is a giant pile of computers and
not a "distributed system". I've seen systems that scale to thousands of nodes
and also seen systems that won't scale past a few machines on ZMQ. System
architects should be able to distinguish the two cases without standing up a
cluster to see if it fails. All it requires is one part of your system to not
fit the ZMQ case for things to start falling apart. That this is an expected
result is not exotic computer science.

~~~
snprbob86
> If your application scales efficiently on ZMQ then it is a giant pile of
> computers and not a "distributed system".

At this point, I'm invoking Poe's law.

------
KenCochrane
If you like this, you should check out ZeroRPC, it handles a lot of the
boilerplate code you would need to write by hand.

Links: \- <http://zerorpc.dotcloud.com> \-
<https://github.com/dotcloud/zerorpc-python>

------
soravux
We are currently using ZeroMQ in our distributed task framework in Python,
SCOOP (<http://scoop.googlecode.com>).

ZeroMQ was chosen as the communication library because it simply works and
isn't bloated. No need to implement the state-machines for common patterns in
our sockets, ZMQ does it and fast. It doesn't replace a standard socket,
though, it only add a layer of functionalities over it.

While using it, we found some minor negative point such as delays needed by
the socket upon shutdown, which require sleeps between unit tests, or the
random port connector that is not random... But overall, ZMQ is a tool that
saved us much developing time and should not be overlooked by distribution
systems.

------
nathancahill
Required reading for anyone learning Python

