
Concurrency is not Parallelism. Rob Pike at Waza 2012 [video] - DanielRibeiro
https://blog.heroku.com/archives/2013/2/24/concurrency_is_not_parallelism/?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+heroku+%28Heroku+News%29
======
danbruc
You have four tasks A, B, C and D. B and C both depend on the result of A, D
depends on the result of B and C. Now B and C are concurrent because there is
no dependency between them - you can execute these tasks in order ABCD or
ACBD. It is only important that A executes before and D after B and C. And
there is a third option - you can execute B and C at the same time making the
execution parallel. So concurrency is about dependencies between tasks or the
lack of them while parallelism is about actually executing independent tasks
at the same time.

~~~
jberryman
> So concurrency is about dependencies between tasks or the lack of them while
> parallelism is about actually executing independent tasks at the same time.

I understand what you're getting at, but I don't think that's a great summary
of concurrency vs. parallelism. My version would be: currency is about non-
determinism and multiple threads of control while parallelism is about doing
multiple computations at once for performance reasons (possibly
deterministically).

EDIT: sorry I see this has already been posted almost verbatim a few other
places in the thread

~~~
danbruc
I personally would not draw a (strong) line between concurrency and
nondeterminism - while concurrent (real-world) systems usually show a fair
amount of nondeterminism there is still no inherent requirement for that. The
dependencies are static and one can always schedule task execution in a fully
deterministic way.

~~~
jberryman
Well, sure we're assuming ultimately some sort of deterministic result of our
computation, but we're discussing pl semantics and properties of the
computation itself, and non-deterministic semantics are one of the most
important characteristics of concurrent programming, no?

~~~
danbruc
We definitely want a deterministic result for every task execution even if the
execution of some subtasks happens nondeterministically. But after rethinking
what I wrote last time I would now even strengthen my statement. Concurrency
is a structural property of a system, parallelism is property of an (specific)
schedule of subtasks and therefore a property of the runtime behavior of a
task. Nondeterminism is also a property of the runtime behavior and therefore
nondeterminism MAY be associated with parallelism but CAN NOT be associated
with concurrency.

------
beambot
The irony was not lost on me... Heroku posts a video of Rob Pike discussing
intelligent routing (load balancing) using concurrent go a week after the
"random load balancing fiasco" [1]. I chuckled.

[1] <http://news.ycombinator.com/item?id=5215884>

~~~
grey-area
Yes although the example given by Rob Pike highlights all the information that
would have to be shared between the load balancer and the web processes in
real time in order to make intelligent routing work well.

Very hard to enforce when you're running a heterogenous network trying to
handle routing which may have any kind of app answering requests. If there was
an easy solution to this, evidently Heroku would have put it in already, but
I'd hope they're working on it.

Sounds like some of their customers have outgrown them though; given the
amount they are paying per month they could easily host themselves and hire
someone to look after their servers - at some point it becomes necessary to
control your entire stack if you're looking at performance.

------
slunk
The difference, as I understand it, is actually pretty simple... This suggests
to me that the confusion is due to the synonymy of the terms in common usage.
Without making any claims about the quality of this presentation specifically,
I wonder why anyone would insist on using them and inevitably have to lecture
perfectly smart people about why they don't mean the same thing...

~~~
virtuabhi
I listened to his presentation. IMHO the presenter laid too much stress on
parallelism and concurrency being different, but never explained the
difference clearly and succinctly. But from the presenter's side, I can see
that he needs to sell his product(golang). Therefore he assumed that - these
definitions are often confused, it is a big issue, and go is 42.

~~~
seanmcdirmid
These terms have been confused since the early 90s at least. It is one of the
first things we systems people learn: concurrency is doing multiple things at
once because you must, parallelism is doing multiple things at once because
you can.

Infrastructure and techniques that support concurrency (e.g. actors) are
almost always totally unsuitable to parallelism (e.g. SIMD ala GPUs), and the
other way around. Likewise, skills don't transfer very well between the two
fields. So when someone talks about this technology being useful to your use
case, and they've confused the terms, its very annoying (you should use actors
to get more parallelism, WTF????).

~~~
greatzebu
You're taking a pretty narrow view of parallel computing. For example, there
are plenty of systems that use essentially actor models for large-scale
parallel computation, e.g. Charm++ and HPX.

~~~
seanmcdirmid
Isn't Charm++ infrastructure comparable to MPI? In that case, its not clear
what the goals are. I think it was during the MPI-phase that we began to
realize that we had two very different problems on our hands. Fine-grained
message passing has been obsolete in the parallel field at least for awhile
now.

~~~
nivertech
Do you say that MPI is obsolete? I thought most legacy codes in HPC based on
MPI?

~~~
seanmcdirmid
You know, its rude to say something is obsolete in this field, so we just
say...there are other choices and hope the hint is taken. Nothing ever dies
quickly, it just fades away off into the sunset.

There is a lot of legacy code running with MPI for sure, but the big data and
almost big data trends are clear, and CUDA has been absolutely disruptive in
the scientific computing field (though some people are beginning to use
MapReduce here also).

~~~
nivertech
yeah, I always map CUDA blocks to racks and CUDA threads to blades ;)

GPUs, Xeon Phi, FPGAs and other accelerators still need somebody managing
them. Nobody said it should be MPI, but there are still many hybrid
architectures with MPI handling distribution and actual compute done using
OpenMP or accelerators.

In my system I use Erlang/OTP to handle distribution and concurrency and
OpenCL for data-parallel compute.

~~~
seanmcdirmid
> yeah, I always map CUDA blocks to racks and CUDA threads to blades ;)

Sounds like Jeff Dean.

> Nobody said it should be MPI, but there are still many hybrid architectures
> with MPI handling distribution and actual compute done using OpenMP or
> accelerators.

MPI or even RPC works fine as a control mechanism, just not as a critical
performance-sensitive abstraction, where we care about the width and speed of
the pipe, and MPI is nothing like a pipe!

> In my system I use Erlang/OTP to handle distribution and concurrency and
> OpenCL for data-parallel compute.

This is quite reasonable. Once one understands the difference between
concurrency and parallelism, they can pick appropriate tools to deal with
each. As long as they confuse the issues, they'll make bad choices.

------
willlll
I really love this talk, but man oh man the gophers are seriously creepy.

~~~
p9idf
They're not so creepy compared to some of the artist's other work:
<http://reneefrench.blogspot.com/>

~~~
willlll
Thanks for sharing, everything makes more sense now.

------
magic_haze
I'm trying to wrap my head around this, not being familiar with Go yet... is
this analogous to the difference between a Task and a Thread in the .net
world?

~~~
jlouis
A task is a future or a promise which can deliver a result later. A Goroutine
in go is more general than a task because they don't have to terminate when
they deliver a result. The subsume tasks in any way.

Goroutines are mapped onto (kernel-level) threads in order to make it possible
to run more of them at once and thus provide parallel execution. Otherwise,
you have a single scheduler of goroutines in a single process. You can still
switch context between the goroutines, thus the system is concurrent. But it
will not be parallel: One goroutine only at a time.

For a system in which most goroutines wait on IO or the network, it may not be
a problem to run many goroutines on a single core only.

~~~
magic_haze
I still don't quite follow. What do you mean by "terminate" a task? From my
understanding, when a Task returned, the underlying Thread would be returned
to the common pool for use by other Tasks. Does the Go scheduler detect when a
task is blocked for IO, and reuse the thread for other goroutines?

------
martinced
tl;dr Parallelism implies concurrency but concurrency doesn't imply
parallelism.

~~~
jules
It's a bit more nuanced than that. Concurrency implies non determinism. For
example if you have multiple threads then the result of the program is non
deterministic: it may depend on the particular interleaving of the threads
over time. Parallelism is about executing multiple things at the same time to
improve performance. You can have parallelism without concurrency. For example
a parallel map operation is parallel, but not non deterministic. It always
gives the same result. Now, under the hood to implement parallel map, there is
probably some concurrency involved, but that is not exposed in the interface
of map. You can have concurrency without multiple threads, for example Node
with asynchronous callbacks is concurrent, because the order in which those
callbacks get called is non deterministic. But it does not necessarily need to
run on multiple threads. Usually we don't want the end result of our program
to be non deterministic, so we use synchronization primitives like locks and
Go's channels to limit the non determinism in such a way that it does not
affect the end result.

What most people don't realize is that for concurrency to be useful, there
must be parallelism. But wait, didn't I just say that in Node you have
concurrency without multiple threads? Yes, but the parallelism does not need
to be in your CPU! If you do an asynchronous disk read, or an asynchronous
network request, the disk and other computers on your network are working in
parallel with your program. Concurrency allows us to exploit that parallelism,
so that the CPU can do useful work while the disk is busy reading our data.

Now, you can write your program with concurrency, but if there is no
parallelism anywhere in the system, that's not useful. So in practice it is
the reverse: concurrency implies parallelism (but not necessarily on the CPU),
but parallelism does not imply visible concurrency (though it might be hidden
under the hood of a parallel construct like parallel map).

~~~
mseepgood
> What most people don't realize is that for concurrency to be useful, there
> must be parallelism.

No, as far as I understand from this and other talks, Rob Pike argues that
concurrency is useful even without parallelism: it's a better way to structure
software. A nice example for this is shown in his talk about lexical scanning
in Go: <http://www.youtube.com/watch?v=HxaD_trXwRE>

~~~
jules
I'd say concurrency is a bit overkill for that problem (i.e. threads or
processes scheduled by the run time system that appears non-deterministic to
the programmer). Deterministic coroutines work fine. Even Python generators
are sufficient.

Though perhaps "useful" is a bit too broad: maybe it should be replaced by:

> What most people don't realize is that for concurrency to improve
> performance, there must be parallelism.

