
The multiprocessing module in Python - chachram
http://linuxramb.blogspot.com/2016/10/the-multiprocessing-module-in-python.html
======
gshulegaard
My only gripe is:

> Since a new instance of Python VM is running the code, there is no GIL and
> you get parallelism running on multiple cores.

Has the potential to be misinterpreted by new Python devs. As I understand it,
each _process_ gets its own interpreter with its own GIL. So its not that
there is no GIL, just that there are now 3 GILs each managing separate
execution spaces.

Also, at a higher level of discussion, this is just a rehash of multi-
_processing_ vs. multi- _threading_ so perhaps it would be more illuminating
to start from there and move down to the library examples.

The casual reference to multi- _threading_ in the quote from the POSIX fork
definition also adds to potential reader confusion.

~~~
brianwawok
The flip side is a lot of people think "oh the GIL, I can't even run Python
code in a server with more than 1 CPU"

No you can, just run lots of processes and make lots of GILS. Ends up still
using less memory than Java in many cases. Gil will be the least of your
worries.

~~~
gshulegaard
That's a fair point: you _can_ take advantage of multiple CPUs with Python,
but you have to eat the overhead of multi-processing vs multi-threading.

Multi-processing breaks down when you have to communicate _with high
frequency_ between various logical runners.

I think there are vastly more problems that can be solved with multi-
processing than those that _require_ multi-threading but they do exist.

There is also another layer of simulated parallelism via coroutines...but I
was just trying to point out that there was potential for confusion in this
piece.

~~~
dom0
You can of course use multithreading to get parallelism, eg. if the expensive
calculations are not done in Python itself but rather some other library, be
it cryptography, compression, numpy, serdes'ing stuff and so on. I/O works in
parallel as well.

~~~
ddorian43
It's still 1 core for multithreading (on the python part).

~~~
gshulegaard
Just to add more context for others...

C extensions called from Python _are_ still subject to the GIL of the calling
interpreter[1].

In fact, this guaranteed thread safety has led to the creation of many non-
thread safe Python C extensions which is now a critical blocker to efforts to
removing the GIL[2], although there are also a number of other issues as well.

As noted in the StackOverflow, you _can_ release the GIL but there are some
tricks to doing so safely[3].

But threads of a C extensions library are not _necessarily_ pinned to a single
core by the GIL like regular Python byte code.

Although many performance oriented libraries like numpy[4] forsake multi-
threading by default in many cases and rely on parallelization by the user
(programmer)[5].

[1] [http://stackoverflow.com/questions/651048/concurrency-are-
py...](http://stackoverflow.com/questions/651048/concurrency-are-python-
extensions-written-in-c-c-affected-by-the-global-inter)

[2] [https://youtu.be/P3AyI_u66Bw](https://youtu.be/P3AyI_u66Bw)

[3] [https://docs.python.org/3/c-api/init.html#thread-state-
and-t...](https://docs.python.org/3/c-api/init.html#thread-state-and-the-
global-interpreter-lock)

[4] [http://www.numpy.org/](http://www.numpy.org/)

[5] [http://stackoverflow.com/questions/16617973/why-isnt-
numpy-m...](http://stackoverflow.com/questions/16617973/why-isnt-numpy-mean-
multithreaded)

------
rdtsc
There is a nice hack for multiprocessing module to use for IO-bound code
without having to fork processes:

    
    
       from multiprocessing.dummy import Pool as ThreadPool
       pool = ThreadPool(16)
       res = pool.map(one_arg_fun, [arg1, arg2, ...])
    

A 3 line IO parallelism speedup trick. Used it fetch stuff from multiple
servers recently.

~~~
ipunchghosts
can you elaborate?

~~~
jessedhillon
It's the equivalent of: (edit: actually an improvement over)

    
    
        from threading import Thread
    
    
        def blocking_function(arg):
            ....
    
    
        pool = []
        for arg in args[:16]:
            f = lambda: blocking_function(arg)
            thread = Thread(target=f)
            thread.start()
            pool.append(thread)
    
    

Edit: see below, this doesn't include job queueing but rather, hard limits
your input to maximum 16 args. The point is, the code is a nice, easy way to
start a number of threads working on a list of arguments.

~~~
rdtsc
Almost but as quinnftw mentioned it does a bit more, specifically it limits
the number of threads it runs to 16. I can pass 1000 url or arguments into it
and it won't spawn 1000 threads but keep it at 16 max. Also I think it does a
bit a better job with exception propagation.

------
cossatot
Some good alternatives, for when you just want quick and easy parallelization:

Joblib[0]: 'Embarrassingly parallel for loops'. Basically just write a
generator and get multicore processing on it. Pretty straightforward.

Multiprocess[1]: This is a fork of the multiprocessing module. The biggest
benefit to me (the one time I used it) is that it's easier to create shared
data structures for multiprocessing (which I couldn't do with
multiprocessing.Pool). Last week I had to do a really long graph calculation
that only needed to return a result for a very, very small fraction of the
arguments. Joblib stored all of the null results as 'None', which tore through
my RAM after billions of relatively fast calculations before crashing. But
using a shared dictionary and multiprocess.Pool and imap_unordered, I was able
to use mutlicore processing, adding items to the dict only if the right
conditions were met, and discarding the 'None's. RAM use was very minimal.

[0]:
[https://pythonhosted.org/joblib/index.html](https://pythonhosted.org/joblib/index.html)
[1]:
[https://pypi.python.org/pypi/multiprocess](https://pypi.python.org/pypi/multiprocess)

~~~
d33
Also, consider using Celery [0] and distribute the computation across
different hosts as well.

[0] [http://www.celeryproject.org/](http://www.celeryproject.org/)

------
sp332
I used this module for a fractal rendering program, years ago. The APIS are
(were?) a little annoying - last I checked, the relevant code had to be
picklable. But it used all my CPUs and scaled just fine.

------
lowglow
I had a really rough time finding practical/real world examples of this stuff.
After digging around and piecing together a bunch of tips/guides/pointers, I
managed to get what I wanted working.

If you're looking for some real world examples of using multiprocessing (v3),
check here:
[https://github.com/dpgailey/asteria/blob/master/asteria-v3/c...](https://github.com/dpgailey/asteria/blob/master/asteria-v3/client.py)

~~~
lowglow
Also if anyone has a ton of experience with task
management/scheduling/multiprocessing I'd love to buy you a coffee and pick
your brain -- reach out! :)

------
mchristen
Nice job hijacking my browser back button...

~~~
Sohcahtoa82
Works fine for me.

~~~
mchristen
Latest Chrome on MacOS. If you hit back after going to that page you have to
click through about 4-5 other articles before you ultimately get back to this
comment thread.

Even happens if you open in a new tab.

~~~
gshulegaard
Works fine for me as well. Also Chrome on MacOS...although I don't know if I
am "latest" version.

~~~
mchristen
Happens w/ Chromium and also Firefox on Debian.

In fact, I can't find a single browser/OS combo it doesn't happen on.

~~~
gshulegaard
MacOS Sierra 10.12.3/Chrome 56.0.2924.87 (64-bit)

Don't have my Ubuntu machine to test it there though.

------
brianolson
If you need multi-core for performance, first you should rewrite in a compiled
language instead of Python. Java/C/C++/Go/FORTRAN/whatever will run 10 to 20
times faster before even worrying about running multi-core.

~~~
kirkdouglas
Python is compiled language.

~~~
corysama
That is technically correct, but not useful for the conversation...

~~~
jMyles
No it's not. That's like (and I know this has been said again and again) that
"War and Peace" is a hardcover book. It's a non-sequitur.

