
Parallel tasks in Python: concurrent.futures - gibuloto
https://vinta.ws/code/parallel-tasks-in-python-concurrent-futures.html
======
pletnes
One common misconception (or should I say, overgeneralization) is repeated in
the article: threads are always unsuited to CPU intensive work.

For instance, most numpy operations release the GIL, meaning that you can
perform heavy computation on multiple threads simultaneously. Certain other C
extensions do the same, including some bits of the standard library. The usual
caveats apply about threading bugs, of course.

Another detail is that numpy linked to e.g Intel MKL will multithread some
operations by default. Running multiple threads-in-threads is likely to cause
slowdown.

~~~
jzwinck
Only np.dot() has intrinsic multithreading. No other functions do. Bizzarely
np.dot() is the fastest way to do things other than dot product (like copy or
multiply) in some cases.

~~~
vladf
well, also np.linalg routines that call LAPACK may be multithreaded.

~~~
jzwinck
That's misleading--you say linalg routines "may be" multithreaded, but the
vast majority of them never have been. matmul and einsum, despite being
candidates for intrinsic multithreading, are not multithreaded. You can read
discussion about that here: [https://jackkamm.github.io/blog/a-parallel-
einsum/](https://jackkamm.github.io/blog/a-parallel-einsum/)

~~~
vladf
I'm sorry, isn't "objects belonging to class C may have property P" a fair way
to say there are some members which have it and some which do not? I don't see
how that's misleading. I was correcting your statement about np.dot being the
only parallel fn

~~~
jzwinck
Can you name one function other than dot() and tensordot() which has intrinsic
multithreading?

~~~
vladf
Eig inv svd det, at least, in 2010.
[https://dpinte.wordpress.com/2010/01/15/numpy-performance-
im...](https://dpinte.wordpress.com/2010/01/15/numpy-performance-improvement-
with-the-mkl/)

------
orf
> posted in Jan. 2017

Now we have asyncio and awesome libraries like aiohttp[1] you can get a much,
much higher throughput than you'd ever achieve with threads with less code.

1\. [http://aiohttp.readthedocs.io/](http://aiohttp.readthedocs.io/)

~~~
miracle2k
I found this article extremely persuasive, and it matches my own experiences
with asyncio: The performance gain might be there if most of what you do is
waiting for a network response, but even a small amount of data processing
will make your program CPU bound pretty quickly.

[http://techspot.zzzeek.org/2015/02/15/asynchronous-python-
an...](http://techspot.zzzeek.org/2015/02/15/asynchronous-python-and-
databases/)

~~~
zackelan
Note that it's not either/or - you can dispatch work from an event loop to a
thread pool (or a process pool) with loop.run_in_executor [0], while
loop.call_soon_threadsafe [1] can be used by worker threads to add callbacks
to the event loop.

This means that the "frontend" of a service can be asyncio, allowing it to
support features like WebSockets that are non-trivial to support without
aiohttp or a similiar asyncio-native HTTP server [2], while the "backend" of
the service can be multi-threaded or multi-process for CPU-bound work.

0: [https://docs.python.org/3/library/asyncio-
eventloop.html#exe...](https://docs.python.org/3/library/asyncio-
eventloop.html#executor)

1: [https://docs.python.org/3/library/asyncio-
eventloop.html#asy...](https://docs.python.org/3/library/asyncio-
eventloop.html#asyncio.AbstractEventLoop.call_soon_threadsafe)

2: Flask-SocketIO, for example, requires that you use eventlet or gevent,
which are the "legacy" ways of doing asynchronous IO: [https://flask-
socketio.readthedocs.io/en/latest/](https://flask-
socketio.readthedocs.io/en/latest/)

------
rcthompson
Will this finally let me write a parallel Python script that doesn't explode
when I press control+C?

~~~
metalliqaz
Depends. What do you want to happen when you press ctrl+C?

~~~
rcthompson
I want it to not hang forever, not spam the console with pages of exception
tracebacks, and not require a ton of boilerplate process-management code to
accomplish the above. Ideally it would also allow me to handle other
exceptions (i.e. besides KeyboardInterrupt) that occur both in the main
process and child processes. I've never figured out how to do this with
multiprocessing, despite lots of attempts.

~~~
mixmastamyk
Don't know what you did wrong, but you should be able to catch exceptions at
their appropriate level and shut down gracefully.

~~~
icegreentea2
multiprocessing was designed to mirror the threading library... so the
exceptions are specifically set up not to cross thread/process boundaries. But
the same strategies to handle multithreaded exceptions and keyboardinterrupt
should apply.

If you want the child processes to shutdown when the main process goes down,
you should be able to just set .daemon = True on the process objects before
you start them.

If you want exceptions in the children thread to propagate up to the main
thread and then handle, looks like you'd just need to send the exception
across a queue or something in multiproccessing. In the new futures library,
your future (wrapping the process) has a return value property you can try to
access. If the child process ran into an exception, that exception would be
raised in the main process when you tried to access that return value.

------
simonw
I used the Python 2 backport of concurrent.futures for a project recently
(parallelizing calls to an external API) and it worked fantastically well.
It's a really nice model for doing concurrent outbound I/O in a bunch of
threads.

------
ggm
I tried using threads on multiple pipe reading, to centralise a logfile
sorting problem (each discrete logfile is a gz which itself is only partially
in order, and then between files a merge-sort has to be performed) It was
enjoyable to try to fix, but ultimately I found the solution only marginally
better than explicit processes feeding a single reader doing round-robin. I
think the lesson I learned is that if the _problem_ integrates back into a
single context there isn't much you can do to avoid that bottleneck once all
the other parallelism opportunities have been overcome.

------
rflrob
What's the advantage of using a ProcessPoolExecutor over just using
multiprocessing? Is it that there's a single interface that you can use for
both threads and processes?

~~~
icegreentea2
I don't really think there's an advantage. Just like how multiprocessing tries
to mirror the threading interface, ProcessPoolExecutor just mirrors the
threaded implementation of the new futures based concurrency interface.

I think futures are nicer for certain types of interactions. For example,
futures 'return' actual values, so its nice for dispatching a task that you'll
get a result from back. Futures also raise exceptions (when you try to inspect
their results, if an exception occurred in the task). This might make for
cleaner error handling code.

------
tyu100
I'm using concurrent.futures in production and its use of the multiprocessing
module caused the Python grpc library to break in a really strange and hard-
to-debug way:

[https://github.com/grpc/grpc/issues/13873](https://github.com/grpc/grpc/issues/13873)

I suspect it's not the only Python library that will see issues if you are
running it in the Future context.

------
mixmastamyk
Cool, just wrote my first code with this module a week ago. A client needed to
run background tasks under Flask without the ops complexity or dev time needed
to set up a job queue.
[https://stackoverflow.com/a/39008301/450917](https://stackoverflow.com/a/39008301/450917)

------
Rotareti
I wrote a similar interface to run asyncio compatible ProcessPool/ThreadPool
executors, a couple of days ago:

[https://github.com/feluxe/aioexec](https://github.com/feluxe/aioexec)

------
amelius
Sad to see that Python still suffers from the Global Interpreter Lock (GIL),
and that the only way out is still to use multiple processes (which causes
problems of its own, e.g. sharing of large data structures becomes expensive).

~~~
dullgiulio
Only for computationally expensive operations done in interpreted Python.

C extensions, IO operations etc. always release the lock. In practice GIL is a
problem only when it is profiled to be a problem.

Python is used a lot in the data analysis world and nobody cares about the
lock, because a fraction of the CPU time is spent within the lock.

~~~
amelius
> Only for computationally expensive operations done in interpreted Python. C
> extensions, IO operations etc. always release the lock.

So you are saying it is a problem in Python but not in other languages? Which
is exactly my point :)

------
Dowwie
I found a concurrent.futures.ThreadPoolExecutor useful for database seeding,
where I invoke a whole lot of sql alchemy core inserts

------
Jsharm
Does anyone have a recommendation for what to use for a cache shared between
processes? Would hdf5 work?

~~~
cranklin
If you want a file-based cache, yes.

------
solotronics
so how does this compare to something like Deco? [https://github.com/alex-
sherman/deco](https://github.com/alex-sherman/deco). I guess since this uses a
single GIL its good for IO limited things?

~~~
icegreentea2
As written, the code in the blogpost is good for IO limited things. But as it
notes, if you replace 'ThreadPoolExecutor' with 'ProcessPoolExecutor' then you
get actual multiprocessing, and you may be able to get speedup on compute
bound tasks.

The linked repo looks like some nice wrappers/decorators around the 'old'
multiprocessing library to make it really easy to parallelize a bunch of
function calls within a blocking function.

------
delaaxe
The last for loop of the last code example doesn't need to be under the with
statement.

------
dokument
Would this allow separate GC'ng for each task?

~~~
loganekz
Only ProcessPoolExecutor[1]. If you use a thread pool or async io it will be
single python process/GIL.

[1]
-[https://docs.python.org/3/library/concurrent.futures.html#pr...](https://docs.python.org/3/library/concurrent.futures.html#processpoolexecutor)

