
Is Parallel Programming Hard, and, If So, What Can You Do About It? [pdf] - poindontcare
https://www.kernel.org/pub/linux/kernel/people/paulmck/perfbook/perfbook.2015.01.31a.pdf
======
aout
Since it's a really big, technical book here are some quick quotes I thought
were useful in general:

"This book is a handbook of widely applicable and heavily used design
techniques, rather than a collection of optimal algorithms with tiny areas of
applicability."

"As a rough rule of thumb, use the simplest tool that will get the job done.
If you can, simply program sequentially. If that is insufficient, try using a
shell script to mediate parallelism."

"[...] parallelism is but one potential optimization of many. A successful
design needs to focus on the most important optimization. Much though I might
wish to claim otherwise, that optimization might or might not be parallelism."

It's a very good handbook for advanced developers who use native languages. It
basically adresses any problem you might encounter when dealing with sync,
locks etc...

Near page 300 the author writes about higher level languages and clearly
states the shortcomings of these new types of programming. Not sure what to
think about it. Are we really discussing the usage of high level "scripts" to
handle heavy computations in the 10 next years? I'd love to hear a math or
physics Ph.D about this.

EDIT: I mean, Wolfram is kind of a "ultra high level scripted platform" and
seems to work pretty well. Isn't that the future?

~~~
gh02t
Well, scripting languages are actually pretty popular even now. I work in
uncertainty quantification for nuclear engineering. One common task is to run
the same simulation many times with slightly different parameters. Scripting
languages like Python are popular for this and they give you easy parallelism.

The current pattern in research is usually to hammer out a prototype in a
scripting language, profile, then migrate hot code paths to Fortran or C++.
That's only if necessary though, you can get remarkable performance most of
the time just using NumPy and similar.

I'm a long time user/lover of Mathematica and now what I guess is the Wolfram
language, definitely an expert on it. There's one thing I know about it, which
is that it isn't the future, at least not in my industry. Open source is
surprisingly popular in the NE research community and as long as WL stays
closed, it won't be significant.

~~~
acaloiar
Bioinformatics largely takes a similar approach. Because the field is rooted
in Statistics, statistical theories are often prototyped in GNU R by Ph.Ds
without formal computer science backgrounds. While statistically sound, R
implementations often lack the performance properties necessary to crunch
large datasets. This is where those of us with backgrounds in computation will
optimize and parallelize code using lower level languages. There is no shame
in starting from a scripting language. I often prototype ideas using R, Perl
or Python before finalizing them in C, Java, OpenCL, etc.

------
chengiz
There is also a 1-column pdf
([http://kernel.org/pub/linux/kernel/people/paulmck/perfbook/p...](http://kernel.org/pub/linux/kernel/people/paulmck/perfbook/perfbook-1c.2015.01.31a.pdf))
for those annoyed by the 2-column format. The margins are enormous, but at
least you dont have to scroll down then up.

~~~
smosher_
Thanks, my pdf reader can trim margins and make two columns out of one, but it
can't make one column out of two.

~~~
chengiz
Which reader is this?

~~~
smosher_
Okular

------
JoeAltmaier
This is about large-calculation parallelism. It covers bus architectures and
language primitives.

There's another whole parallel (multi-threaded) conversation that's not about
raw compute speed, but about latency. Does your app keep running when somebody
holds a mouse button down? Does the API stay live when data is being crunched?
Does your service handle hundreds or thousands of client requests serially or
in parallel?

Those problems have completely different solutions from those addressed in the
OP. The language-primitive approach doesn't begin to touch the issues
involved.

~~~
randcraw
That's not what I see. The book delves into the coordination of shared
resources in a shared space (like RAM). This could take the form of task
parallelism for speedup, or more likely in this case, task parallelism for
concurrency and resource sharing. Since there's so little discussion in the
book of issues related to speedup (task decomposition or resource allocation
or load balancing), I'm inclined to think it's of most use to someone building
a shared memory RDBMS or the like, where semaphores and locks are critical to
avoidance of bugs and data corruption.

~~~
JoeAltmaier
Those things are of course at bottom of any multi-task discussion. But they
are so far below the actionable layer for service designers as to be moot. For
that we need to talk about transactional data, about restartable processes,
about context and state and where its kept, from a language and data structure
point of view.

------
ColinWright
There should be a warning of some sort that this is a 7.5 MB book, not just an
article. It just wiped out my mobile roaming.

There's a lot to be said for hiding technical details from the non-technical
general public, but there's something to be said for having occasional
warnings.

~~~
malkia
What's next? Stop signs?

~~~
mietek
So that’s how we arrived at “Caution: Slippery When Wet”…

------
ZenoArrow
The hard part of nearly any parallel activity that aims for order over chaos
is synchronisation. It strikes me that part of the problem with parallel
programming is that the margin for error with regards to synchronisation can
be very slim.

Some would say that we need to improve our code to fit into this margin for
error, and I can agree with that, but what if we look at the issue from the
other direction. What methods could we employ for increasing our margin of
error? What might these methods look like?

If we treat the goal as improving adjustment to synchronisation mistakes,
we'll first need to know when adjustments should be made. One way to do that
is to predict how long each process is likely to take, then measure the
success of this model against real world performance. In this way, a system
could get better at synchronisation the longer it runs for. It doesn't matter
if the original guess was way off, as long as you correct it with feedback.

~~~
skj
> What methods could we employ for increasing our margin of error? What might
> these methods look like?

These methods and their applications are the field of distributed computing :)

------
faragon
The book is good. BTW, the title is a bit misleading, in my opinion: handling
paralelism at low level is hard, but parallel programing is hard only because
of the languages we use are not transparent enough, and because many problems
are hard to split in parallel tasks.

------
fullwedgewhale
I've got the book in my queue to read, but parallel programming is hard. I
think it pretty much still stands that there is no way to show that parallel
code works, other than to do a formal proof of each section of the code.

~~~
ArkyBeagle
Might be worth looking at "Doing Hard Time" by Bruce Powell Douglass. It's
drenched in "executable UML" flavor but the toolchains that existed before
Rose RT included less of that.

~~~
fullwedgewhale
Thanks, I'll take a look at that.

------
terapinterapin
You could have done the recent Heterogeneous Parallel Programming MOOC (
[https://class.coursera.org/hetero-004](https://class.coursera.org/hetero-004)
).

------
PopsiclePete
Don't use a language that doesn't provide you with the tools to handle it?

I don't have many issues with concurrent code in Clojure, or C#, or Go.

But in pre-1.8 Java or Python or Ruby or C++, I'm guessing you're going to
have a really, really bad time...

~~~
ArkyBeagle
Ignoring issues like potential non-reentrance of code and other library
issues, C++ is perfectly capable of being used to make programs that use
pthreads. It's not a bad time at all...

Python apparently has threading and I've seen "from miltiprocessing import.."
in Python before.

Tcl also has threads, or you can just write things to be event driven. I have
never used Tcl to implement a multiprocessor solution, but others have.
Because sockets are a first-class object in Tcl, _that_ approach is quite
fruitful. I use something you could roughly call "coroutines over a network"
daily in Tcl.

------
dschiptsov
Learn you some Joe Armstrong's books..)

