
Python 3.5 and Multitasking - sonicrocketman
http://brianschrader.com/archive/python-35-and-multitasking/
======
ikken
I've been using the new async/await syntax to write beautiful asynchronous
websockets server and I fell in love with it. It handles hundreds of thousands
of concurrent connections (on top of aiohttp) and the code is so much cleaner
than it would be with i.e. NodeJS and Express and Promises. It reads like a
serial code.

I think benchmarking asyncio with any type of CPU bound tasks misses the
point. Previously we were relying on hacks like monkeypatching with gevent,
but now we've been presented with clean, explicit and beautiful way to write
massivly parralel servers in Python.

~~~
tomchristie
Anything you can share? Some good examples of networking code with async/await
would be incredibly helpful. The documentation covers the primatives but I
have a hard time putting it all together.

~~~
ikken
Unfortunately the code is not open source - I'll try to open parts of it in
the future.

But please do check this simple gist I found some time ago that helped me
understand how powerful asyncio is:

[https://gist.github.com/gdamjan/d7333a4d9069af96fa4d](https://gist.github.com/gdamjan/d7333a4d9069af96fa4d)

~~~
oliwarner
I'm actually tearing up here. That is... Beautiful.

------
btreecat
>I did this for two reasons, the first being that I cannot, for the life of
me, figure out how to use asyncio to do local file IO and not a network
request, but maybe I'm just an idiot.

I don't think you are an idiot I just think you didn't search well enough.
There is a reason there is no local file IO with asyncio.

Check these links for more info:

* [https://stackoverflow.com/questions/87892/what-is-the-status...](https://stackoverflow.com/questions/87892/what-is-the-status-of-posix-asynchronous-i-o-aio)

* [http://blog.libtorrent.org/2012/10/asynchronous-disk-io/](http://blog.libtorrent.org/2012/10/asynchronous-disk-io/)

From what I understand the way libuv (what node.js uses) gets around OS limits
is with a thread pool.

Also, this documentation might be helpful:

* [https://docs.python.org/3.5/library/asyncio-dev.html#handle-...](https://docs.python.org/3.5/library/asyncio-dev.html#handle-blocking-functions-correctly)

My experiences don't seem to mirror your frustration. I found asyncio quite
useful for tinkering with a simple web scraper that hit multiple sites at once
(each with a different response time) and munged all the data together into
one data set.

Thanks for the write-up!

~~~
sandGorgon
have you used a database in a non-blocking way with asyncio ? I'm thinking of
the way Psycogreen works or node-postgres works.

~~~
ikken
Please check aiopg [1]. There's also an interesting list of packages for
asyncio [2].

[1]
[https://aiopg.readthedocs.org/en/stable/](https://aiopg.readthedocs.org/en/stable/)

[2] [http://asyncio.org/](http://asyncio.org/)

------
fijal
Maybe it's worth noting, maybe not, but this extremely trivial example (e.g.
serial.py) gets executed 20x faster just by using pypy which speeds up serial
execution. This is more than you would get by any sort of
multiprocessing/threading shengenians __just __by using an optimizing VM (and
granted, this example is very simple, but maybe addressing basic performance
problems should come first)

~~~
sandGorgon
pypy has not been updated in a while (it's compatibility is with Python 3.25).

what's even more worrying is that it had almost run out of funds in July.
Currently, it has 5000$ in the fund earmarked for Py3 support
([http://pypy.org/py3donate.html](http://pypy.org/py3donate.html))

somehow I thought that Google was supporting Pypy.

~~~
fijal
PyPy is being update very regularly with 3 releases a year on average. Most of
the work however focuses on things our users really want - stability,
performance, C extension support, warmup speed, memory consumption - you know
the mundane stuff. If someone is willing to put effort into supporting more
python 3, that's great! we would welcome the contributions.

PS. We're closing onto 3.3 release soon

PPS. Google donated a bit of money to PyPy when Guido was there, definitely
"google supporting pypy" is a bit of a stretch. It was years ago though

~~~
sandGorgon
Google not supporting it in a big way is very strange. is there a political
conflict between cpython and pypy ?

Because it seems to me that Pypy is the future of python. and it makes total
sense for someone like Google or Dropbox to support it with a lot of money.

~~~
fijal
It makes sense != it makes money. Google does not support cpython either in
any meaningful way (that is, someone hired full time to work on cpython for
example).

~~~
sandGorgon
huh ?

[https://www.python.org/~guido/](https://www.python.org/~guido/) _In January
2013 I joined Dropbox. I work on various Dropbox products and have 50% for my
Python work, no strings attached._

[http://www.linuxjournal.com/magazine/interview-guido-van-
ros...](http://www.linuxjournal.com/magazine/interview-guido-van-rossum)

 _I don 't have a 20% project per se, but I have Google's agreement that I can
spend 50% of my time on Python, with no strings attached, so I call this my
“50% project”._

~~~
dalke
Dropbox isn't Google.

I don't know if Google does or does not provide meaningful support to CPython
development. Certainly it did, as you point out. But fijal's comment was about
the present, not the past.

------
keypusher
concurrency in python is kind of a disaster, in my opionion. there are a lot
of different options but they all seem to have significant drawbacks, and not
just limited to ease of use. i know concurrency is a hard problem, but i wish
there was one really good, straightforward solution instead of 3 or 4
different half-baked convoluted solutions (threading, multiprocessing,
asyncio, subprocess in stdlib, plus twisted, gevent, pulsar etc as third-
party).

~~~
ikken
We see people hyping all the time for NodeJS mainly because it can handle
millions of concurrently open connections. Now we get a much cleaner (IMHO)
way to achieve the same thing in python (of course python is slower) - i.e.
see aiohttp as websockets server. I would not call it a disaster.

~~~
nomel
It's great that it works for you for this one particular type of application,
but the whole world isn't io bound. Some of us just want regular, plain old,
concurrency. As this shows, multiprocessing is still the only, unsane,
solution right now. I just want to be able to make a thread, and do stuff in
it in parallel to other threads, like I can with most other languages.

"Go write a C extension" you tell me, "use something besides cPython" he says,
"just use multiprocessing" I hear. Sure...but ffs, we've had multicore
processors for almost TWO DECADES now.

One of my biggest, and apparently unchanging, problems with Python is the
desire to keep things simple in the interpreter, to the disadvantage of the
_language_. Sure "implementation for interpreters may vary" blah blah, but you
have to target the bottom end in performance, and most widely installed, which
is _definitely_ cPython for both points.

~~~
IgorPartola
I think these are two distinct use cases. NodeJS actually does not (AFAIK)
handle one of them. Basically, you have IO bound tasks and CPU bound tasks
(and a mix of both which is really nasty business). Python has had CPU-bound
task concurrency via multiprocessing and it's been OK. My preference would be
to get rid of GIL and improve how threading is actually done, but technically
you can serve CPU-bound tasks today with Python 2 and 3. This is (AFAIK) not
something that Node does out of the box.

The IO bound tasks in Python are a problem and I wish there was a clean
solution. Python does not have a global event loop, so there is not an easy
place to hook in coroutines, callbacks, etc. So for a while we were stuck with
one of the following:

1\. Use threading or multiprocessing. This sucks for more than concurrency of
like 2-8.

2\. Use eventlet, gevent, or another event loop. The problem here is that you
have to buy into it whole hog. No component of yours can be blocking, and
that's hard to tell.

3\. Write your own event loop. I've done this and find it to be the most
understandable and easy to debug approach. This sucks because of the amount of
effort it takes for something so fundamental (because networking is tricky).

Some people would be happy if Python got better at solving IO-only bound
tasks. I guess that's where this feature comes in. I haven't played with 3.5
yet because I am mostly stuck on 2.7 for reasons. However, looking at it, I
feel like there should have been more of a separation between blocking and
non-blocking code here. Something alongs the lines of an async function not
being able to call a blocking function.

Re: CPU and IO bound tasks: I know of no great framework for this besides
threading (not the kind in Python + GIL, but real threading). I usually just
side-step this problem by separating tasks that are both IO and CPU bound into
smaller tasks that are only CPU or only IO bound. Thankfully, that's generally
pretty easy to do.

~~~
kashif
asyncio provides an event loop in the standard lib now

------
csytan
While I haven't had a chance to use 3.5's async/await syntax, I have used
AsyncIO pretty heavily to deal with multiple sensor inputs/outputs on a
Raspberry Pi.

The author is right. If you write a coroutine, __any code that uses it must
also be a coroutine __. This is pretty annoying when you 're trying to test
something manually. It bubbles up this way until you eventually hit the event
loop.

If you're trying to debug a coroutine in the interactive shell you've got to
do something like this:

    
    
      loop = asyncio.get_event_loop()
      # Blocking call which returns when the hello_world() coroutine is done
      loop.run_until_complete(hello_world())
      loop.close()
    

That's my main beef with it. Debugging can also be painful because when you
hit an exception, your stack trace will also involve the asyncio library.
Aside from those complaints, I'm a fan. It works fine and reads better than
callback-style code.

~~~
mangecoeur
You have a similar problem in nodejs - if the api you want to use is async it
has to take callbacks, so if you have to do something with that result you
have to nest in another callback etc... it gets ridiculous. But it's pretty
much a fundamental issue with async code - to keep things async then you have
to set up the whole interdependent network of functions to be able to work
async, since you don't know when anything is going to return its value.

It takes some getting used to - I think Python people are likely to have more
trouble with this precisely because Python is a very clear, explicit, and
mostly imperative language -you can read Python code as a sequence of
instructions and that will be pretty much the way it gets executed, which
makes understanding programs very easy. Async is simply a less intuitive way
to program, so to adopt it you have to be sure it's worth the hassle of giving
up easily understood code.

------
ak217
This mirrors my experience precisely. I was very excited about async/await,
hoping that this would integrate coroutines into regular Python scripts,
without the need to manage some complex dispatch engine. I was equally
disappointed to learn that it's business as usual, with painful and inadequate
semantics out of the box.

At least we have pypy. The community should really be rallying behind that
project.

~~~
1st1
> This mirrors my experience precisely. I was very excited about async/await,
> hoping that this would integrate coroutines into regular Python scripts,
> without the need to manage some complex dispatch engine. I was equally
> disappointed to learn that it's business as usual, with painful and
> inadequate semantics out of the box.

You're not supposed to use async/await without a framework like asyncio or
tornado.

One way to provide a better UX is to merge asyncio into Python on a deeper
level, but this is something that many people won't like.

------
1st1
First of all, see this pic:
[https://pbs.twimg.com/media/COLLg0TUAAA4j79.jpg:large](https://pbs.twimg.com/media/COLLg0TUAAA4j79.jpg:large)

asyncio doesn't provide any nio abstractions for files because (a) it's not
really needed, and (b) there is no easy way to implement it.

(a) basically you shouldn't expect your code to block on disk io. But even if
it does block for a very short amount of time it's probably fine.

(b) one way to implement nio for files is to use a threadpool. Maybe we'll add
this in later versions of asyncio, but it will require to write some pretty
low level code in C (and reimplement big chunks of asyncio in C too). Another
option is to use modern APIs like aio in Linux, but as far as I know almost
nobody uses it for real.

Bottom line -- you don't need coroutines or asyncio to do file io. What you
need asyncio (and frameworks like aiohttp) is to do network programming in
Python efficiently.

~~~
beagle3
> basically you shouldn't expect your code to block on disk io. But even if it
> does block for a very short amount of time it's probably fine.

No, it's very often not fine. Magnetic disks, still the norm for many, and
definitely with large storage, often go as low as 5KB/s for random access (or
even sequential access to very fragmented files). Reading a 1MB file can
easily take 5-10 seconds in some setups - which is not acceptable for any
interactive service. It's not fine for a web server to not service any
requests for 5 seconds.

> Another option is to use modern APIs like aio in Linux, but as far as I know
> almost nobody uses it for real.

Anyone I know who tried came back screaming. There is no way to do an async
file open, for example - which means that if you rely on aio, you can block
for 10 minutes waiting for an NFS or SMB mounted file to open.

The only sane, portable thing to do for Unix/Posix is use a threadpool for
async file io - or just use something like libuv which already abstracted
async operations this way.

------
justinlardinois
Tangentially related: of the people and projects that are using Python 3, why?
I've found that aside from syntax and a few features here and there, Python 2
and 3 are more or less the same technology-wise, especially since 2.7 has a
lot of features backported from 3. Thus to me it seems better to stick with 2
because there's so many existing libraries and CPython is the only
implementation with complete Python 3 support.

If Python 3 had good concurrency and optimization (something neither version
has right now), I'd consider using it, but is there an already existing reason
that I'm just not seeing?

~~~
mangecoeur
Simple answer - It's the current (and currently updated) version. Why would
you not use it when starting a new project?

More fancy answer - There's plenty of nice improvements to the language
(better handling of iterators, tidied up std lib, modern objects, etc), though
it's probably worth it for unicode support alone - trying to convince python2
to properly handle international text is just a huge pain (also - no you can't
just replace all accented characters with non accented ones if you want to
preserve the correct meaning of text and not piss off all your customers by
misspelling their names). Yes you can eventually solve all problems with
python2. But why bother when you can just use the latest version, there's
basically no cost for new projects.

~~~
vegabook
There is a very clear cost for many new projects: infinite... for the simple
reason that some key libraries _still_ do not do Python 3 at all. Let's be
clear. Cassandra is the best large ingest nosql database out there. Python 2
only on CQLSH. Bloomberg. The default terminal used by the 300 000 most
important financial people on earth. API Python 2 only. Theano. 3.x via 2to3
only. Anaconda, 2 still the default download with 3 an afterthought (in other
words, even for new downloads, 2 still the majority!). This is more than about
stuff not being ported yet. It's about people _preferring_ 2.x.

This idea that Python 3 is cost free needs to be expunged. For large classes
of users, Python 3 is _not possible even if they want to_ (which they don't).
Stop this erroneous propaganda. Unicode is nice if you're a web guy, and if
you're a web guy, why are you using Python already? Unicode is completely
irrelevant for everybody else and is most definitely not a core reason to move
to Python 3. You're living in cloud cuckoo land on your 3.x magic mushroom
trip.

Earth to web jockeys. Python's hardcore is numerical computing, and that
hardcore is not on Python 3, and is not moving anytime soon, and certainly not
for the dubious benefit of unicode. Moreover the web stack is much less
important to Python than is numerical and scientific computing, for the simple
reason that while the former has multiple better competitors in the form of
golang, JS et al, the latter does not.

[sidebar: since when does ascii not cater for accents?? I am bilingual french
/ english and I have never had a problem typing french accents in ascii?
You're creating misleading propaganda again. Once again, EatHeart, I quote
you: "I suspect you don't really know what you are talking about...", or
worse, you have an agenda to mislead.]

~~~
baq
>Unicode is completely irrelevant for everybody else and is most definitely
not a core reason to move to Python 3. You're living in cloud cuckoo land on
your 3.x magic mushroom trip.

you're from the US, right? there's about 6 billion people for whom ascii isn't
enough. some of them program in python.

------
aidenn0
I don't use python all that much, but out of curiosity, what's the problem
with multiprocessing? In the languages I _do_ develop in, I find it much
easier to reason about than multithreading.

~~~
andor
Data exchanged between the processes is serialized with pickle. Pickling is
slow, adds latency, and doesn't work on all objects.

[https://docs.python.org/2/library/multiprocessing.html#pipes...](https://docs.python.org/2/library/multiprocessing.html#pipes-
and-queues)

~~~
aidenn0
That sucks; there's no shared-memory message-queue implementation?

~~~
beagle3
any message queue implementation would require serializing (which the Python
standard library provides with "pickling"). If there was a reasonable way to
share live objects without the GIL, Python threads would use them.

------
n0us
When he brings up requests in comparison to urllib he seems to not know about
aiohttp
[https://github.com/KeepSafe/aiohttp](https://github.com/KeepSafe/aiohttp)

------
Lofkin
Better multitasking and concurrency syntax, with task scheduling and out of
core support here: [http://dask.pydata.org/](http://dask.pydata.org/)

------
vegabook
In other words, a lost opportunity. Asyncio and this new syntax is both hard
for beginners and experienced python coders alike, _and still doesn 't do
multicore!_. We're clocking up 3.x version numbers as if this will magically
provide the illusion of progress, but the killer feature is still not there.
Instead we get type annotations. In a dynamic language. Which doesn't compile
and therefore doesn't need them. With no performance advantage. If I have to
type declare everything, I want 10x performance. Okay?

If Python were a listed company, the CEO would have been replaced long ago.
I'm tired of watching my favourite language flail around like this. Will
Continuum Analytics or Enthought _please_ fork 2.7?

~~~
sonicrocketman
I think the limitations of the GIL are not all in all a bad thing, but I do
think that the Python community needs to seriously look at making Asynchronous
execution a priority in future versions. Asyncio is really complicated and
provides very little in terms of performance.

Multitasking is the most important issue that Python faces today (and maybe
PyPy is the answer).

~~~
vegabook
So we're in 3.5 already, and one of its central features, async, is flawed,
you say. So not only do we not have a killer feature, but even one of the
nice-to-haves is a dog. Timeout. This project needs new leadership. The
competition (Golang) is walking all over Python on the async issue.

~~~
beagle3
I read your comments on this thread, and I frankly wonder where you are coming
from. It seems you care a lot about Python, but have a crisis because your pet
peeve (threading) is not being addressed, and as a result expect the team who
has been guiding Python to the success that it is over the last 25 years to
step down. (And it has been a huge success - it has none of the big corporate
money backing that Java, C# and Go have - but mindshare and notability on the
same scale, dominates significant niches like non-HPC scientific computation,
and has very significant presence in almost every field).

Could you tell me what (and how long) your experience is?

My answers about your complaints are basically:

1\. Solving threading is (relatively) easy, if you _just_ give up backwards
compatibility; Not the 2.x -> 3.x compatibility which is comparatively trivial
- but in a major way, breaking every single extension library, and the vast
majority of Python code (or slowing it down unbearably). You might be happy,
but the rest of the Python users (basically, the reason you actually use
Python) won't.

2\. Assuming they agree with you about their failure (I dont, FWIW), someone
has to step up and offer an alternative. Who is that, what are they doing
these days, and why do you think they will succeed where Guido et al "failed"?

3\. Perl at 28 years is not much older than Python at 24 years (relatively
speaking), but has been sliding into obscurity for a long time now, whereas
Python is flourishing. The Python 3 transition is actually happening as
planned (IIRC, there was an expected 5 year period just for feature and speed
parity!). Perl 6 is esoteric, Perl 5 is aging and dying. I think that for a
living, popular language, the Python team is doing a commendable job, even if
they do not address a specific issue (that many people care about, but would
actually not make that much of a difference in practice if we are to learn
from other languages)

~~~
Lofkin
The GIL issue has been solved without breaking backwards compat:
[http://pyparallel.org/](http://pyparallel.org/)

It just need further dev work and acceptance into py3

~~~
beagle3
Did you actually look into it? The threads you run can't make any change to
existing objects, and various other changes. It does break compatibility, and
needs patched version of Numpy, ODBC (and I would guess, most other packages).

Definitely not "solved without breaking backwards compat".

------
gcb0
so is it a little syntax sugar on top of the multiprocess module to please ios
developers?

there's nothing new that i can see under the hood

~~~
sonicrocketman
I mentioned iOS because Apple's Grand Central Dispatch API is something I wish
Python had. Python's multitasking is so complicated while Apple's GCD is so
elegant. It's sad really.

~~~
Lofkin
Your example is trivially parallelized with some decorators from the dask
library.

Example here:
[http://dask.pydata.org/en/latest/imperative.html](http://dask.pydata.org/en/latest/imperative.html)

~~~
sonicrocketman
I hadn't seen Dask before this. Thanks!

~~~
Lofkin
Sure!

If you like it, can you please blog about it (and post on hn)? Needs exposure
outside the python data community.

