
Developing a computational pipeline using the asyncio module in Python 3 - jaxondu
http://www.pythonsandbarracudas.com/blog/2015/11/22/developing-a-computational-pipeline-using-the-asyncio-module-in-python-3
======
hyperion2010
Using a threadpool with asyncio doesn't buy you any parallelism, you have to
use a processpool, which is fine for a data processing pipeline. Asyncio does
not let you bypass the gil but it does vastly reduce the cost of spinning up
absurd amounts of sockets and the like.

~~~
superbatfish
I saw this article, enjoyed it, and then said to myself "Now let me go read
the inevitable well-meaning but misguided comments about the GIL." And sure
enough, the very top comment is a well-meaning but misguided comment about the
GIL.

The author of this article is talking about CPU-intensive tasks that involve
C-extensions. Such extensions typically _release_ the GIL during their
execution, which means that you can effectively parallelize such workloads in
a ThreadPool. (He explicitly mentions this about Numpy/Scipy, but most other
CPU-intensive extension libraries behave the same way.) Heavy users of Numpy
use Thread-level parallelism very frequently.

~~~
hyperion2010
Yep. Thought I was missing something since he was taking about cpu bound tasks
and threads, if you are using c extensions then it is wasteful to spin up
additional interpreters and you can also get fun bugs like numpy reusing the
same random seed in child processes!

------
semi-extrinsic
It would be nice if someone who groks the async "with" statement could add
proper opcode support for it in pycdc. Better to do it when you have spare
time than when you've clobbered an important .py file and want to recover it
from the .pyc file ASAP. (pycdc can be a life saver.)

[https://github.com/zrax/pycdc/issues/70](https://github.com/zrax/pycdc/issues/70)

------
baldeagle
Isn't this example just like map/reduce in python?

I guess it is more like parallel running than async, since I associate async
with small sleeping costs and good thread switching. How does the GIL play
into this?

~~~
rspeer
In asyncio and in most other asynchronous frameworks for Python, threads
aren't usually involved.

Your Python interpreter is doing one thing at a time. However, a function can
get to a statement that needs to wait for something _outside_ of the
interpreter -- usually I/O, but it could also be a timer.

What happens at that point isn't thread-switching. The function gets suspended
and control goes elsewhere in the program, just like if you yielded from a
generator. (It's the same mechanism.) The function can be woken up again by
feeding it the data it was waiting for -- the asyncio main loop is responsible
for this.

If you actually want your processor to do more things at the same time,
though, asyncio's model of asynchronous computing isn't going to do it. Python
programmers are afraid of threads (I mean, they have a lot of disadvantages
and not much benefit in Python, due to the GIL), so they'll tend to use
multiple processes for that, instead of threads.

EDIT: But at this point I realize you're asking about a much more specific
thing in the article. This article is asking you to not be afraid of combining
threads and asyncio. It's suggesting that a useful asynchronous thing you can
do is to spawn a thread, perform a computation in it, and wait for the result.

At this point you do need to worry about the GIL. C extensions can release the
GIL, so the article suggests you use one of those (NumPy).

~~~
Nrpf
Or you can use Numba to write gil free imperative code as well.

------
1st1
Looks like everybody is in love with async/await in Python :)

~~~
pekk
I'm not sure that's true. Some people weren't happy about the PEP. Few people
are using it in any case, so most are either still unaware of it or haven't
used it yet themselves.

~~~
superbatfish
I'm very excited to start using it. It will be the thing that finally pushes
me to Python 3.

>Few people are using it in any case

Probably true, but it's only been out for 2 months now, and most people who
needed this feature are either using an ugly workaround (guilty) or have moved
on to other languages that already had this feature. This time, I'm glad
Python joined the crowd.

