
Easy cluster parallelization with ZeroMQ - mdup
http://mdup.fr/blog/easy-cluster-parallelization-with-zeromq
======
gh02t
Worth mentioning that the cluster tools in IPython implement basically the
same system as he ends up with, plus lots of other functionality and it is
using ZMQ as the backend as well. If you've got a small to medium sized task
(I'd say less than 500 nodes is a good number) to run on your cluster
ipcluster is pretty great. Even when I have a proper queuing system like SLURM
or PBS in place, I still often use ipcluster on top.

[http://ipython.org/ipython-doc/dev/parallel/](http://ipython.org/ipython-
doc/dev/parallel/)

~~~
andreabedini
I'm curious on how you use ipcluster on top of pbs

~~~
gh02t
If you mean the practicalities of how to use it, they describe it a bit here
[http://ipython.org/ipython-
doc/dev/parallel/parallel_process...](http://ipython.org/ipython-
doc/dev/parallel/parallel_process.html#using-ipcluster-in-pbs-mode) .

If you mean how have I used it specifically, I've used it to do a variety of
things. Ranging from recreating what was described in the original post to
using it as an ad hoc MPI replacement. I find it's handy for managing jobs
that have some coupling (and so aren't easily suited to running via the
features of the batch system), but aren't tightly coupled enough to warrant
the suffering of having to write MPI-based code.

------
avargas
ZeroMQ is amazing. A few years ago I built a prototype project for a client
that basically was fail2ban but scalable. It monitored nginx logs and
broadcasted some information, and workers did the rest. Most of the data was
in memory and passed around, and communication was done via ZeroMQ. It was
done this way so that we could split the heavy-load components off the server
and into workers, and allow the server to simply do it's job: tail nginx logs
and act upon ban requests from workers. It was amazing, sadly I never
completed it and deployed it on a production environment but from initial
tests, it outperformed fail2ban by a lot.

------
Merkur
I had fun reading - thx. I found the solution of "requesting workers"
interesting. My first reflex to the round robin argument was - a better load
balancer but the requesting worker made that totally not necessary! I wounder
what negative sites sutch architectures may have?

------
nickpsecurity
Nice article. I hope to see a benchmark of it on a less parallel problem with
low-latency network hardware. Maybe a comparison of that against a vanilla MPI
or parallel framework. We'll get to see what its real overhead is.

~~~
gh02t
It's a bit hard to compare to MPI, which is considerably lower level and is
oriented towards tightly coupled compute clusters. MPI is more about passing
around data structures and not so much about dispatch and execution of
independent tasks, which is more of a distributed computing type endeavor. You
can build something like this on top of MPI (and I did once as an experiment),
but it is tough to compare them directly since they are designed on different
assumptions.

~~~
nickpsecurity
Fair point. I've been out of cluster computing for years so what do you think
would be comparable?

~~~
dalke
Somewhat tangential, but you may be interested in this essay on "HPC is dying
and MPI is killing it" [http://www.dursi.ca/hpc-is-dying-and-mpi-is-killing-
it/](http://www.dursi.ca/hpc-is-dying-and-mpi-is-killing-it/) . There were
many comments here on HN, at
[https://news.ycombinator.com/item?id=9335441](https://news.ycombinator.com/item?id=9335441)
. The essay also has a couple of followups. It really is an excellent essay.

The author suggests GASNet, Charm++, and several other things that would be
comparable to MPI.

~~~
nickpsecurity
That's an excellent essay. I agree with about every point. I worked MPI over
ten years ago. It seemed like an assembler language for cluster computing:
something for raw efficiency or to hold us over until we get a _real_
language. I was playing with Cilk and other parallel tools then. _Much_ better
but little adoption or optimization.

The examples given in Spark and Chapel show how much better it can be. Such
methods deserve much more investment. The bad thing is that the DOE labs are
clinging so hard to their existing tools when they're in the best position to
invest in new ones via huge grants they have. They built supercomputing before
anyone heard of the cloud. Their wisdom combined with today's datacenter
engineers could result in anything from great leaps in efficiency to something
revolutionary. Yet, they act elitist and value backward compatibility over
everything else. Starting to look like those mainframe programmers squeezing
every ounce of productivity they can out of COBOL.

I think the change will have to come from outside. A group that works both
sides with funding to do new projects. They either (a) improve MPI usage for
new workloads or (b) get the alternatives up to MPI performance for HPC
workloads. Then, they open the tool up to all kinds of new projects in both
HPC and cloud-centered research (esp cloud compute clusters). Maybe the two
sides will accidentally start sharing with each other when contributing to the
same projects. Not much hope past that.

~~~
dalke
"They built supercomputing before anyone heard of the cloud."

Minor historical pontification on my side: The term 'cloud' is only the
current term for an old idea. Before cloud there was 'grid'. Before that was
the Amoeba OS and other distributed OSes. Long before that, a 1960s hope for
Multics was that it would be:

> ... a utility like a power company or water company. This view is
> independent of the existence or non-existence of remote consoles. The
> implications of such a view are several. A utility must be dependable, more
> dependable than existing hardware of commercial general-purpose computers. A
> utility, by its nature, must provide service on demand, without advance
> notice in most cases. A utility must provide small amounts of service to
> small users, large amounts to large users, within very wide limits. ...
> [http://www.multicians.org/fjcc3.html](http://www.multicians.org/fjcc3.html)

That's shortly after the CDC 6600, and before the DOE really got into the
supercomputing business.

In any case, I don't know the politics of supercomputing, and my experience in
that field is even older than yours. When I look at current clouds systems,
and try them out, I throw my hands up, because they don't handle the types of
architectures I use in my research. I do compute-heavy work on medium-sized
data, with algorithms that works best with stateful nodes in a boss/worker
system. Luckily for me, I don't need low latency, so building something on top
of ZeroMQ is good enough for what I want, and easy to do.

~~~
nickpsecurity
Exciting to see someone that remembers all those good, old systems! I've been
studying them in recent years to solve modern problems that... they already
solved. Yeah, some of the techniques go way back. To be honest, I've been
thinking of re-implementing a version of Globe toolkit with modern security
engineering techniques to deal with our Internet apps issues.

What do you think of that? Worthwhile to leverage such an architecture?

~~~
gh02t
> Exciting to see someone that remembers all those good, old systems! I've
> been studying them in recent years to solve modern problems that... they
> already solved

I saw a presentation from one of the VPs of Cray at ORNL. He spoke on the
current wave of GPGPU and how similar it is to the old massively vectorized
architectures in supercomputers (in particular Cray ones, since he was from
Cray) from the late 70s through the early 90s. Apparently they've been able to
leverage a lot of their old code in the Cray compilers for their newer GPGPU
features.

~~~
nickpsecurity
Bam! Another example of people applying lessons of the past to make the
present so much easier. Glad for them. Unsurprising with Cray given they're
quite innovative and led the way in SIMD processing.

------
FullyFunctional
Great writeup. Perhaps slightly less clean, you could cut the messages in half
by having the available report the result as well (empty in the initial case).

Is the worker idling between replying and receiving new work? In other words,
does ZMQ enable overlapping, say always having two outstanding requests per
worker?

