
Parallelising Python with Threading and Multiprocessing - shogunmike
http://www.quantstart.com/articles/Parallelising-Python-with-Threading-and-Multiprocessing
======
bquinlan
I'd like to point out that the Python standard library offers an abstraction
over threads and processes that simplifies the kind of concurrent work
described in the article:
[https://docs.python.org/dev/library/concurrent.futures.html](https://docs.python.org/dev/library/concurrent.futures.html)

You can write the threaded example as:

    
    
      import concurrent.futures
      import itertools
      import random
    
      def generate_random(count):
        return [random.random() for _ in range(count)]
    
      if __name__ == "__main__":
        with concurrent.futures.ThreadPoolExecutor(max_workers=2) as executor:
          executor.submit(generate_random, 10000000)
          executor.submit(generate_random, 10000000)
        # I guess we don't care about the results...
    

Changing this to use multiple processes instead of multiple threads is just a
matter of s/ThreadPoolExecutor/ProcessPoolExecutor.

You can also write this more idiomatically (and collect the combined results)
as:

    
    
      if __name__ == "__main__":
        with concurrent.futures.ThreadPoolExecutor(max_workers=2) as executor:
          out_list = list(
          	executor.map(lambda _: random.random(), range(20000000)))
    

In this example case, this will be quite a bit slower because the work item
(in this case generating a single random number) is trivial compared to the
overhead of maintaining a work queue of 200000000 items - but in a more
typical case where the work takes more than a millisecond then it is better to
let the executor manage the division of labour.

~~~
canjobear
To take advantage of process level parallelism you still have to have a
pickle-able function, i.e. defined at the top level in a module.

    
    
      In [1]: import concurrent.futures
    
      In [2]: with concurrent.futures.ProcessPoolExecutor(max_workers=8) as executor:
         ...:     out_list = list(executor.map(lambda _: random.random(), range(1000000)))
         ...:
      Traceback (most recent call last):
        File "/usr/local/Cellar/python/2.7.6_1/Frameworks/Python.framework/Versions/2.7/lib/python2.7/multiprocessing/queues.py", line 266, in _feed
          send(obj)
      PicklingError: Can't pickle <type 'function'>: attribute lookup __builtin__.function failed

~~~
e12e
Good point. Just a couple of points on futures: 1) they're backported to
python 2[1], and 2) to make the example work, you need a pickleable function
as you say, for example, if you have ipython running in a virtualenv:

    
    
        import pip
        pip.main(["install","futures"])
    
        import random
    
        def l(_):
            return random.random()
    
        with f.ProcessPoolExecutor(max_workers=4) as ex:
            out_list = list(ex.map(l, range(1000)))
    
        len(out_list)
        #> 1000
    

[1]
[https://pypi.python.org/pypi/futures](https://pypi.python.org/pypi/futures)

~~~
e12e
Whops, forgot to add a line to import futures:

    
    
        import futures as f #Include after pip.main(...

------
halayli
This example is not too realistic and just narrows it down to the case where a
job can be divided into isolated tasks with no shared data/state.

Often times threads need to update shared dict/list etc... With
multiprocessing this cannot be done. You can use a Queue for this but it's
horribly inefficient.

Generally speaking if you need performance and Python is not meeting the
requirements then you are better off using another language.

~~~
shogunmike
I agree, the scope of the article is somewhat specific to the "toy" example
presented.

Generally I would use C++ or (gasp!) Fortran with either MPI or CUDA for these
sorts of tasks if performance was the most critical factor.

I'm excited by the Julia language though!

~~~
js2
It's py3 but there is a back port of it for py2 that works wonderfully. I've
recently begun using it and will never look back to multiprocessing (on which
it is built).

~~~
andreasvc
You probably meant to reply to the thread about concurrent.futures.

------
tiger10guy
For the every day when I want to make embarrassingly parallel operations in
Python go fast I find joblib to be a pretty good solution. It doesn't work for
everything, but it's quick and simple where it does work.

[https://pythonhosted.org/joblib/](https://pythonhosted.org/joblib/)

~~~
shogunmike
I was going to discuss Parallel Python
([http://www.parallelpython.com/](http://www.parallelpython.com/)) in the next
article - have you used that? How does it compare to joblib?

~~~
tiger10guy
I haven't used that, but it looks interesting. After a brief look it seems
like they both submit jobs to Python interpreters started up in other
processes.

Parallel Python (PP) seems to have a clunkier API, but also more
functionality. I think the biggest advantage is that it can distribute jobs
over a cluster instead of just different cores on the same machine. I might
look into PP if I need to do things on a cluster, but I think I'll still stick
with joblib when I'm on one machine.

That's just my first impression. I'd be interested to read your blog post.

------
zo1
I've had good success using Celery to parallelize tasks/jobs in python.

www.celeryproject.org

Also, it has a very nice concept called canvas that allows you to
chain/combine the data/results of different tasks together.

It also allows you to switch out different implementations of the
communication infrastructure that Celery uses to communicate and dish-out
tasks.

------
dekhn
For python developers who dislike the continued existence of the GIL in a
multicore world, and who feel that multiprocessing is a poor response given
the existence proofs of IronPython and Jython as non-GIL interpreter
implementations, please consider moving to Julia.

Julia addresses nearly all the problems I've found with Python over the years,
including poor performance, poor threading support on multicore machines,
integration with C libraries, etc. I was a big adherent of Python but as
machines got more capable, the ongoing resistence to solving the GIL problem
(which IronPython demonstrated can be done with reasonable impact on serial
performance) I could not continue using the language except for legacy
applications.

~~~
andreasvc
I don't know what you are talking about. The GIL has never bothered me. I have
been using Python together with multiprocessing and threads with
concurrent.futures. For integration with C libraries I use Cython; generally
interfacing with C is one of Python's strong points, don't know where you got
that from. Have you actually looked into why Python has a GIL? It's a pretty
clear trade-off, I think. It seems intuitive to me that requiring lots of
small locks to avoid a global lock might not be beneficial, and attempts to
get rid of it such as PyPy is doing with software transactional memory involve
big changes, so it's not like you can decide overnight "let's get rid of the
the GIL".

Julia looks nice but comes with its own set of problems: no inheritance,
1-based indexing, less libraries, less mature.

~~~
dekhn
Yes, I have looked into why the Python has a GIL. I've even written C
interface code which released the GIL and then reacquired it when necessary (I
know a ton about this, having spent too many of the last 20 years integrating
C and python). Yeah, I actually know what the tradeoffs are and can evaluate
them (I used to work with the author of IronPython).

you have several choices for C integration in Python. SWIG, which is now
generally considered a huge mess, hand-wrapping, which is a tedious pain, and
dlopen/dlsym methods that talk to the C api direectly (which requires
something like GCCXML to handle type recognition for complicated APIs).

I don't think PyPy's approach to transactional memory is the right direction
either.

In short: multithreading on multicore machines is how you write performant
software in industry. The hardware is designed for, the compilers are designed
for it, and if you don't take advantage of it, you're just wasting machines.

Now people could argue that multiprocessing addresses it, but it's just
message passing between different process spaces, which while a wonderful and
powerful tool, is ultimately just more cumbersome (hey, I used to write big
MPI/OpenMP apps that did both models at the same time).

Anyway, the ultimate existence proof is that IronPython was both faster
serially _and_ in parallel, without the GIL, than CPython. So basically we
_know_ it's possible. The Python developers have no will, inclination, or
ability to make it so,.

~~~
andreasvc
It would be interesting if you could give some arguments for your positions.
Why is STM not the way? Why, if IronPython is as good as you say it is,
doesn't it see greater adoption or why don't other implementations use its
strategies for removing the GIL? Wikipedia says that IronPython scores worse
on PyStone benchmarks compared to CPython, and it's likely that this is a
consequence of IronPython's fine-grained locking which is required in the
absence of a GIL.

As for interfacing with C, like I said, Cython really makes this a lot easier
than the approaches you mention. You mention IronPython as not having a GIL,
but then IronPython doesn't allow easy interfacing with C code, e.g., it's not
compatible with numpy ...

------
wbsun
Python threads, aren't they just single threaded execution?!

~~~
sitkack
Yes, they are concurrent but not parallel.

~~~
deathanatos
See other replies to the post you've replied to: threads in Python can be
"parallel", if one of the threads releases the GIL. This can happen during
calls to I/O, or more generally, any C call that decides to release the GIL.
Most of the time, you're doing I/O anyways, so it suffices. If you're not
(you're truly doing computation), then there is multiprocessing.

------
matrixise
Have you seen there is an error in the code for the threading part?

the right way if you want to use a thread is thread =
threading.Thread(target=CALLABLE, args=ARGS)

and not

thread = threading.Thread(target=CALLABLE(ARGS))

------
CraigJPerry
For the example task we could use the multiprocessing Pool and (the
undocumented) ThreadPool.

This implements the worker pool logic already so we don't have to.

~~~
shogunmike
I was considering adding this but I wasn't fully sure that it would be good
content for a "first intro to parallel programming" article. Perhaps a good
candidate for the next one?

Thanks for mentioning it though.

------
thikonom
For network bound operations Twisted's cooperate / coiterate come handy.

------
eudox
Or just use a better language. One is that is actually compiled and fast?

~~~
zo1
Speed is just one of the many things that people consider when evaluating what
language/stack to use for a particular job/task.

