

An introduction to parallel programming using Python's multiprocessing module - rasbt
http://sebastianraschka.com/Articles/2014_multiprocessing_intro.html

======
platz
I feel that a discussion of
[http://en.wikipedia.org/wiki/Amdahl's_law](http://en.wikipedia.org/wiki/Amdahl's_law)
should be mandatory when introducing parallel programming

Small sequential portions of an otherwise parallel algorithm can have huge
effects on the overall running when trying to scale up.

"parconc" explains this while discussing a parallel version of k-means, talks
about how things like granularity of data needs to be fine-tuned for parallel
algos, and provides some nice visualizations into what the CPU's are actually
doing on a timeline:
[http://chimera.labs.oreilly.com/books/1230000000929/ch03.htm...](http://chimera.labs.oreilly.com/books/1230000000929/ch03.html#sec_par-
kmeans-perf)

Overall I think multicore is a good tool to have in your toolbox, but it seems
like there needs to be a lot of tuning and effort to get good rewards for the
time invested.

~~~
flatline
This is certainly true for some applications, but originally when Amdahl's law
was formulated, people made estimates based on e.g. 95% of an application
being parallelizable, and thus fairly rapidly reaching a point of diminishing
returns. In practice, however, there's often a simple reformulation of the
problem that can result in a much higher percentage, and Amdahl's becomes less
an obstacle than it would first appear. There are still many problems that
scale more or less linearly with the number of processing units and are
"trivially parallelizable".

------
d0vs
[https://docs.python.org/dev/library/concurrent.futures.html](https://docs.python.org/dev/library/concurrent.futures.html)

------
abemassry
I wrote
[https://github.com/abemassry/crazip](https://github.com/abemassry/crazip) in
python and it was challenging to find out how to do multiprocessing
effectively, not sure how this would run if implemented in other laungaues.

------
yohanatan
This article contains a perfect example of why I don't like to write or read
comments in code. Comments are not compiled and thus allow for sloppy verbiage
such as the following:

    
    
        # Exit the completed processes
        for p in processes:
            p.join()
    

The comment should read something more like: "Wait for all the subprocesses to
exit" but is that really any more helpful than just reading the code and
seeing that join is called on each subprocess and connecting the dots from
there?

~~~
rdtsc
> but is that really any more helpful than just reading the code and seeing
> that join is called on each subprocess and connecting the dots from there?

It isn't if you have been programming using threads before ,"join" is obvious
then

But if you haven't and maybe you are a physicist trying to get something done
in python, "p.join()" could mean anything -- "Join to what?", "Why is there no
argument to join()?" "We are joining the data togther there like a list..."
"It looks like we should be stopping the processes but the method is not
called 'stop()' so that's not it"...

That is the problem of teaching this stuff by someone who has been programming
for a while, this kind of stuff gets internalized and becomes obvious but it
is not obvious to a beginner.

~~~
roadie
Will use this as ammo next time someone complains about my minimalist approach
to commenting. I think comments need to be targeted at a specific audience. If
you know that only professionals are going to work on your code then I think
only high-level comments are probably OK.

------
hueving
Instead of calling the 'else' condition of the for loop a 'completion-else',
just call it the 'nobreak' condition. Unlike 'completion-else', 'nobreak'
immediately describes when it will be executed.

------
webmaven
Better starting point: [https://medium.com/@thechriskiehl/parallelism-in-one-
line-40...](https://medium.com/@thechriskiehl/parallelism-in-one-
line-40e9b2b36148)

------
hueving
> x_i = (point_x - row[:,np.newaxis]) / (h)

TypeError: list indices must be integers, not tuple

~~~
andreasvc
This assumes a NumPy matrix.

------
gomesnayagam
Still multiprocessing is experimental phase for various use case :(

------
thikonom
pretty poor language choice to teach parallel programming concepts

------
zobzu
the author seems to be unaware of threading and event based models.

multiprocessing adds memory isolation through the CPU's protected memory.

~~~
freyrs3
If you're working on CPU-bound tasks with NumPy/SciPy using threads then you
have to think very hard to make sure most of the critical sections are hitting
the NumPy calls in C which release the GIL. It's not a great reliable way to
program. The way the author describes is basically the only pure-Python way of
achieving parallelism for this kind of problem.

~~~
zobzu
it doesnt make it any less true. besides, that C python's GIL sucks doesnt
mean threading or event models sucks.

In doubt, ask nginx how they feel about that. Then ask apache's mpn-prefork
how that feels.

~~~
freyrs3
If you're holding a global mutex every 100 instructions while context
switching between CPU bound tasks, then yes, the GIL does suck. There are a
class of IO-bound problems where threading/evented models in Python can be
used effectively but that's not the class of problems the author is talking
about here.

