

Ask HN: Multiprocessing & concurrency: C, Python, Clojure (& Go :))? - fnl

I've never looked into Clojure so far (sorry :) but how does this compare to OpenMP(I)? Or even with Python's parallel processing capacities? How much would performance difference vs. implementation ease be between a C MPI and Clojure parallel processing implementation? Or how would Clojure's MP libraries compare to Python's multiprocessing.JoinableQueue and Manager? Would it be worth considering learning Clojure? And, yes, I know I should learn a new language per year, but then I would at least like to know how it compares to Go - as I am considering that because of the channels, but I now often hear Clojure named alongside Go for exactly those reasons.<p>So, just from the MPI perspective: Clojure, maybe Go, and Python or C(++), what are you using and why?
======
phamilton
I'm a sysadmin for a university HPC facility. We basically have users split
into two groups. Single node and Multinode. We even have separate clusters for
them. We have one cluster with infiniband interconnect and the other with
gigabit ethernet.

The multinode users love MPI because it screams over infiniband. These are
people running genome mapping simulations or fluid mechanic simulation etc.
This is actually a minority of our users. Most of our users are perfectly
content with our dual socket hex core westmere cluster. They use various
applications for their simulations, but most of them have difficulty scaling
past 12 cores anyway.

So, in my experience, MPI is great because the hardware becomes the limiting
factor, and the other implementations are a little more software bound. So if
you have a couple hundred cores available for a single job, you are stuck with
MPI. If you are sticking to a multithreaded implementation, the other
languages you mention might be a good solution.

~~~
fnl
Impressive admin work you have there :) Thanks, that is I think the most
concise answer summing it up. So, as you are even in the "right environment"
for that - one question might be left another guy raised: got any thoughts on
Haskell about this?

~~~
phamilton
So the majority of our users are either running commercial software, or
running something home-rolled. The home-rolled software is generally the
product of years of slapping bandaids on something someone wrote at such-and-
such university. Almost always written in fortran 77.

It's becoming a bit of a problem, because we know they have terrible
inefficiencies in algorithms and are not very well optimized for our system.
But a ME Professor is more interested in the splashback effects supersonic
turbines create than how efficient his simulation is. If it takes an extra
100,000 CPU hours, thats only an extra week for a 500 processor job. As long
is it works well enough, they generally don't want to touch it.

So in that climate, there is very little research done using languages other
than fortran and C. When it does it usually it comes about as a result of an
interesting library that does X and is written in Y.

We are a small shop, especially for a 10,000 core system. We've only got 3
full time employees and three student employees:a part time web dev, a part
time hardware tech, and me (Also part time.) . I've been trying to spend my
time working with users and optimizing their code but for the most part they
just aren't that interested. I think the prevalence of fortran shows a lot
about the overall mentality. Just use what works, and get as much use out of
it as you can.

------
leif
These are two questions (see the recently posted parallel clojure talk).

For concurrency (short version: this means multiple processes doing different
things sharing some small amount of state), I'd say Clojure, Go, and Python
are all good choices. Really, anything with either immutable data structures
and good IPC support (Clojure has this, agents are fantastic for IPC), or
coroutines (Go, Python, Lua...) will give you great concurrency control.

For multiprocessing/parallel processing (short version: this means multiple
copies of the same processing working on smaller parts of a large data set),
OpenMP is a great way to go. I have gotten a lot out of Clojure and Hadoop (on
a large cluster), and the new pvmap/pvreduce look like they'll help a lot on
the single node, multicore front. I would not trust Python to do these things
fast, and I can't tell you about Go. If you can partition your data
successfully and prevent GIL problems, Python will do fine.

~~~
fnl
Ok, I should have said that I am only interested in the "multiprocessing"
aspect; That sounds pretty promising what you write about MP in Clojure,
although. Concerning Hadoop, I have my doubts about it - I never switched and
now I even read people are going back to SQL & OpenMP (or MPI) anyways.
Python's "old wound" GIL story is "solved" (let's say, "patched" ;) ) since
2.6 (if you can accept the huge overhead from multiprocessing's calls for
coarse-grained threads), so I am fine with that so far. So, has anybody got
insight into how Clojure (or even pvmap/pvreduce specifically) performs
against either OpenMP/MPI or Python?

EDIT: I maybe better should say, how much easier it is to use. After, all it
is a JVM, so it will be slower than C, obviously, but you also get rid of the
million of possible bugs you can introduce in it (which makes using Python so
attractive to me). So, is there a good reason for somebody who as used either
Python or OpenMP for multiprocessing to switch to Clojure?

~~~
leif
Python is almost always memory-heavy. C is not, if you do it right. Of course,
C, if you do it right, will be the fastest, it's the "do it right" that's
hard. Clojure, on the JVM, will be slower than C "done right", but faster than
most C, because the JVM's JIT is surprisingly smart. Any performance problems
you get with Clojure can usually be mitigated by adding type hints to force
your repetitive calculations to use primitive type operations, which will get
compiled by the JVM to direct machine instructions, as fast as C.

You should switch to Clojure from Python because it's faster.

You should switch to Clojure from C because it's easier, and more
specifically, easier to get it to a certain level of speed (which is high).

Don't let the JVM fool you into thinking it's slow. It isn't.

~~~
fnl
Well, I do not quite agree with your argument that Python is "always memory-
hungry". As a matter of fact, writing "good" Python code is a very seldom
observed art and most performance issues can be gotten rid of if you know
enough about the language and follow the style and design advices of true
"Pythonistas", as the "gurus" there call themselves (and many more who do not
deserve that title ;)). However, I have been programming several years in Java
on an earlier assignment, where stuff got so clumsy I wrote a 6 page manual
for my team outlining "don't do's" in Java because it would either cost tons
of performance or memory, mostly related to very basic stuff such as
instantiation strategies and everything that might be "good" for OOD, but bad
for performance. We ended up writing most of the core of the system in Lisp
and then generated the "light-weight" Java stuff from a DSL one of our Lisp
"masters" had conjured, to avoid all the bloat you have to go through to write
simple things such as walkers, etc. in Java. If I ever had real memory
problems ever, I must say it was with JVM, and it took me months to, for
example, come up with a solution to index a few TBs with Lucene and the JVM
"crashing" (well, hanging) because of gc (although that was back more than
five years now, so things will have gotten better, I admit). Nay, I must say
the Python/C combo is much to my liking. I want something that fits into that
picture, not so much an alternative. And there I currently see Erlang as quite
promising combo to my two favorite languages, after going through all the
advice and pointers everybody here gave me.

~~~
leif
> As a matter of fact, writing "good" Python code is a very seldom observed
> art and most performance issues can be gotten rid of if you know enough
> about the language and follow the style and design advices of true
> "Pythonistas", as the "gurus" there call themselves (and many more who do
> not deserve that title ;)).

First of all, should performant code in your language be "seldom observed" and
require the knowledge of the "gurus"? I should hope not.

I have written plenty of optimized python in my day, and it's always been a
decent constant behind c or java, no matter how many of the "best practices" I
follow. Pypy may be a different story, but their toolchain was too complex and
slow last I looked.

Most of python's best practices are language bugs, anyway. Generators are
great, but most of the rest is crap. I should cache my namespace lookups
outside the loop? Why can't you do that for me, Python? That's one of the
simplest compiler optimizations we have. All the fastest Python is written in
C extensions. Why do you think this is?

~~~
fnl
As an update: Python 3k solves the namespace issue you mention. (And I hope
more and more dev's will finally move on, I love it and only can say it is
100% worth the switch! I'll never again code in pre-3k if not forced to.)

And no, I am not saying that Python outperforms Java, obviously. But it can
provide very decent performance at very low LOCs. And then you can identify
the bottlenecks and implement that part in C. Overall, a much faster approach
than writing in Java (and, obviously, than writing in pure C) with final
performance rates beating a JVM easily.

EDIT: No, that is actually even too complicated than what I mostly need to do:
For the real bottlenecks I usually just need to find the C library that solves
that hard part (e.g., some number-crunching stuff I am doing) and already has
a Python wrapper, or you generate one (hello SWIG), or, worst case, you really
have to write one (yet have to find that case, although).

------
chops
Seeing how I'm currently in Chicago for ErlangCamp (which is excellent so
far), it's relevant for me to recommend Erlang if you want to go big in the
multiprocessoring and concurrency stuff.

I know that wasn't one of your 4 mentioned languages, but it's worth a
mention.

~~~
fnl
Yeah, that would be number five and I was considering adding it, actually;
However, I am not really sure I like it, because the speed of Erlang isn't
that great and I for that point always favor anything that can keep up with C
(and, the Go guys at least promise they will; so I'd like to see how fast
Clojure can go on a JVM). Now, I know Python can't, but then you can always
first write your prototype in it, and then later fix performance considerably
with some C.

~~~
mzl
If you have some heavy computation parts that you could implement in some
lower-level language like C, then you might want to check out Erlang again.

Erlang is really good at making the coordination layer efficient (the message
passing and threads are fast) as well as usually rather clean. The original
use-case for Erlang, telecomunication switches, are written in a mix of Erlang
and C. The low level stuff (handling call-data and similar things) is handled
by C, and the high-level structure of the program is in Erlang.

~~~
fnl
Followed the advice I got here from you and others and am now studying the
language. And, so far, I must say, Erlang rocks. Naturally it will be a few
weeks before I can write something real and make more of an educated decision,
but it really looks very nice so far - so many solutions to problems I have. I
especially like the "selective message passing" bits and the ways you can
operate on bitstreams already.

------
sqrt17
I've been a happy user of Python's multiprocessing package, for the following
reasons:

* It's really natural if you work on an OS that has fork() (i.e., not Windows), you just need to create a Pool after you've initialized everything and can use Pool.imap (in much the same way as itertools.imap) after that. * Distributing data for initialization (which is ~100-200MB in my case) would add another level of complexity if I wanted to distribute tasks among several computers automatically.

At the granularity I'm working (most single tasks use >10ms, many >100ms), a
simple worker pool works just as fine as the ForkJoin model that Clojure
parallel is based on.

OpenMP and OpenMPI are for a very different use case where you know (pretty
much) exactly how long the different parts of an operation will take, so you
can make different processors (or different nodes) take different, determined-
in-advance, chunks of the problem. Usually, the "parallel map" abstraction
implemented by Python's multiprocessing, Java's ForkJoin, or Hadoop/Disco is
much more useful as a quick fix to parallelize something than if you start out
with channels and try to build something on top of that.

TL/DR: It depends on your workload what multiprocessing workload is right for
you. For most bread-and-butter workload where you can use a parallel map
function, multiprocessing/ForkJoin is just fine; OpenMP is great if you have a
very predictable split between processors and are willing to jump through
additional hoops to squeeze out the additional 5-10% of performance. Don't use
queues etc. unless you know you need them. Multi-node solutions (MPI,
map/reduce) usually add another layer of complexity because you need to ship
the initialization data to each node (if you need, e.g. a statistical tagging
or parsing model).

~~~
_delirium
Can't you also use OpenMP for the quick-fix "parallel map" abstraction? The
typical C way of mapping an operation is to do it in a loop, and if the loop
iterations don't depend on each other (as in the mapping case), you can just
add a "#pragma omp parallel for" at the top of the loop to make it execute via
the worker pool.

~~~
sqrt17
If the execution time of the function varies considerably (i.e., some of the
sub-problems are faster to solve), then you want to be able to keep the other
processors busy with the next sub-problems while one of them keeps gnawing at
a particularly difficult one.

If the timing is very predictable, OpenMP's parallel-for construct fits your
bill very nicely.

~~~
_delirium
Hmm, I thought OpenMP's worker-pool management did do that? E.g. if you have
an 100-iteration loop, and have your num_threads set to 10, then it'll start
up the first 10 subproblems; the first thread to finish will get the 11th,
etc. If one of the first 10 is really slow, the other threads will still move
on to churn on the rest of the loop iterations. Or is that not how it works?

~~~
sqrt17
Actually, you're right. OpenMP allows you to choose between "static"
scheduling (where it would just cut the 100-iteration loop into 0..24, 25..49,
50..74, 75..99 if you have four processors) and "dynamic" scheduling with a
specifiable block size (where it would, for a block size of 10, first chop off
four iteration blocks of 0..9, 10..19, 20..29, 30..39, and allocate the other
six blocks depending on which processor wants more work first).

According to the OpenMP tutorial, the default for OMP_SCHEDULE is
implementation dependent, and the gcc OpenMP implementation uses dynamic
scheduling with a chunk size of 1 by default.

------
fauigerzigerk
Both Clojure and Go have very interesting features to support parallel
computing. Both are definately worth learning. Go's implementation is rather
immature at this point in time. Some of the libraries and the garbage
collector are really slow but things are changing rather quickly in Go land
right now.

Go's design makes much more efficient use of a single machine's resources than
any JVM based language ever could. In terms of memory it is comparable to C or
C++. Goroutines are very lightweight because they use a variable size stack,
so Go supports much more fine grained parallelism. Whether you need that or
not depends on your application of course. I found goroutines much simpler and
intuitively accessible than the large set of Clojure's parallel primitives.
But I think reasonable people can have different opinions on that one as STM
(software transactional memory) is attractive too and immutable data
structures do make some things simpler.

Of course, Clojure is a Lisp dialect, which has other benefits that have
nothing to do with parallelism. However, for many of my own tasks, memory
usage is critical and Clojure hogs memory even more than regular Java, so
that's why I would prefer Go once it matures a little. I think Go can replace
C and C++ over time. Clojure will never do that in its current JVM based
implementation.

In terms of raw performance, if memory is not the issue, you can expect
similar results from Clojure and Go, provided you make use of static type
hints in Clojure (which doesn't look pretty) and you give Go's creators some
more time to optimize the stuff that's known to be slow.

------
chrisaycock
MPI applications are usually written in C or Fortran, namely because the use
case is number crunching. That's very different from what Clojure and Python
are associated with. There aren't many scientific applications targeted to
virtual machines or dynamic languages for this very reason. (I know about Sage
and NumPy, etc. I am referring to LAPACK-style computations.)

------
fnl
To sum up what you so far have responded: For data distribution it seems,
OpenMP and MPI are still the best implementation if you expect to have a
constant runtime for each of these tasks. On the other hand, for task
distribution (say, ms > 10 or 100, and variable runtimes), any of the higher
languages (e.g., Python, Erlang, Clojure) are acceptable abstractions to ease
their development.

There seems to be no language that truly would facilitate distributed
computations (multicore & multiprocessor) significantly, however (maybe
Haskell?). Is that really it, there is no language that is specifically apt
for the new multi-core/-processor machines popping up everywhere that will
make programming life easier for those architectures, especially when I want
to do some algorithmic speed-ups? I am a bit disappointed, in a way.

~~~
gtani
good backgrounders on the concurrency landscape (messaging/actors, STM,
datfalow

<http://blog.ezyang.com/2010/07/graphs-not-grids/>

[http://www.igvita.com/2010/08/18/multi-core-threads-
message-...](http://www.igvita.com/2010/08/18/multi-core-threads-message-
passing/)

[http://erlware.blogspot.com/2010/07/brief-overview-of-
concur...](http://erlware.blogspot.com/2010/07/brief-overview-of-
concurrency.html)

[http://www.sauria.com/blog/2009/10/05/the-cambrian-period-
of...](http://www.sauria.com/blog/2009/10/05/the-cambrian-period-of-
concurrency/)

[http://www.slideshare.net/twleung/a-survey-of-concurrency-
co...](http://www.slideshare.net/twleung/a-survey-of-concurrency-constructs)

------
hogu
you may want to consider ZeroMQ, depending on what you're trying to do, the
python libraries are very good, they work hard to avoid copying excess data,
as well as releasing the gil appropriately
[http://stackoverflow.com/questions/35490/spread-vs-mpi-vs-
ze...](http://stackoverflow.com/questions/35490/spread-vs-mpi-vs-zeromq)

~~~
fnl
Nice pointer to a MPI library I wasn't aware of - thanks!

~~~
hogu
zmq isn't an MPI library, it's a generic messaging library

they refer to it as sockets on steroids

~~~
fnl
Ah, well, I should have a closer look at times... Thanks for the hint ;)

------
Detrus
I don't use any, but from looking at the syntax and code examples the
newcomers make pretty small incremental improvements. I think a DSL that
separated out the control flow from the rest of the code would get things
going in the right direction. I'm hacking one together for Javascript.

As far as performance vs implementation here is a handy chart
[http://shootout.alioth.debian.org/u32/code-used-time-used-
sh...](http://shootout.alioth.debian.org/u32/code-used-time-used-shapes.php)
That's for generic language things, but concurrency probably doesn't stray far
from the overall performance.

~~~
fnl
Hmm, so you would not say that Clojure's or Go's approaches are worth it in
terms of programming ease? The plots you point to looks at least seems
worrying: Python is smacked on the concise 0-axis, C on speed. Clojure at
least makes no especially good grounds on those plot, as I already expected.
However, I was hoping people would say it is so much easier writing a parallel
implementation of the code I am using that I switched to Clojure/consider
using Go/whatever.

~~~
Detrus
The charts are for general language use, not specifically concurrency. Go and
Clojure are better than locks and mutexes of C, but there were similar ideas
in Scala and other languages along the way and the newcomer improvements don't
land far from those.

So Go, Clojure, F#, Scala, Python could proabably be lumped into the same ease
of implementation category as far as concurrency. They'll all make a mess if
you have complex control flow. This
<http://news.ycombinator.com/item?id=1824683> (which you've already seen) or
[http://en.wikibooks.org/wiki/F_Sharp_Programming/Async_Workf...](http://en.wikibooks.org/wiki/F_Sharp_Programming/Async_Workflows)
is what I call a mess, but it's as clean as it gets for now.

I think the real test is whether people start using concurrency as the norm
and not as a separate use case (since it's annoying to write). I don't think
these languages will have that effect.

------
n1k
I would add Erlang and node.js to the list. I need to build a HTTP server for
frequent polling for a project that I am working on (polling or websockets if
available, lots of users) and I narrowed down building that component in
either Erlang, node.js or Python.

I have really taken a liking to the Erlang model. Processes share no memory
and communicate by passing messages. There is no shared state locking/mutex.

Once I have more data/information (benchmarks, ease of development/integration
etc.) I will publish my results.

------
vu3rdd
Like everything, it depends on the problem. There are problems which needs
parallelism but cannot use many of the languages mentioned by the OP. Eg:
Realtime Multi-channel video encoding. Most of such systems use custom
processors. Unless massively multicore mesh architecture (like Tilera TILE64)
become common place, we are stuck with using such custom processors and that
would mean using plain old C. Once mesh based multicore becomes cheaper and
commodity, we will see use of other languages in such problem spaces.

------
krakensden
Go promises good concurrency, but you might want to do a bit of research
before dedicating much resources to it. Last time I checked, their os threads
to goroutines scheduling was a bit wonky.

Though it's entirely possible I screwed something up or missed an important
bit.

~~~
marketer
Go does concurrency well, but it's designed for concurrency on a single
machine, not so much parallel processing on many different machines.

~~~
fnl
Well, I greatly suspect they will add that kind of stuff later, reading blogs
about what the long-term goals of the language are. I think Go might be a
serious newcomer in that aspect, considering they want it to level with C in
performance. However, I don't want to start a g8 discussion (and yes, I don't
like the syntax either... :), I'm especially interested in how good the ease
of use vs. the performance of these languages compare, as I still use Python
and OpenMP for that stuff.

