
The magic of asyncio explained - apoorvgarg
https://hackernoon.com/a-simple-introduction-to-pythons-asyncio-595d9c9ecf8c
======
anilakar
Yet another asyncio tutorial that shows you to run a few sleep tasks
concurrently. Can we finally get one that shows how to do _real_ stuff such
like socket programming, wrapping non-async-compatible libraries and
separating cpu-intensive blocking tasks to awaitable threads?

~~~
sametmax
> such like socket programming

That's one of my biggest pet peeves (and if you see my other comments, you'll
notice I have quite a few).

To do socket programming in asyncio, you can either use:

\- protocols, with a nice reusable API and an interface that clearly tells you
where to do what. But you can't use "await". You are back to creating futures
and attaching callback like 10 years ago.

\- streams, where you can use async / await, you but get to write the entire
life cycle by yourself all over again.

I get that protocols are faster, and match Twisted model, and I get that
streams are pure and functional, but none of this is easy. I use Python to
make my life easier. If I wanted extreme perfs I'd use C. If I wanted extreme
pureness I'd use Haskell.

> wrapping non-async-compatible libraries and separating cpu-intensive
> blocking tasks to awaitable threads

That's the one of the things asyncio did right. Executors are incredibly
simple to use, robust and well integrated.

Problem is: they are badly documented and the API is awkward.

I won't write a tutorial in HN, but as a starting point:

You can use:

    
    
        loop = asyncio.get_event_loop()
        future = loop.run_in_executor(executor, callback, arg1, arg2, arg2...)
        await future
    

If you pass "None" as an executor, it will get the default one, which will run
your callback in a thread pool. Very useful for stuff like database calls.

But if you want CPU intensive task, you need to create an instance of
ProcessPoolExecutor, and pass it to run_in_executor().

I say it's one of the things asyncio did right because the pools not only
distribute automatically the callbacks among the workers of the pool (which
you can control the number), but you also get a future back which you can
await transparently.

~~~
windexh8er
I'm curious of your take on Trio or Curio. Do either address the peeves you've
outlined?

[https://github.com/python-trio/trio](https://github.com/python-trio/trio)
[https://github.com/dabeaz/curio](https://github.com/dabeaz/curio)

~~~
sametmax
Trio is a better Curio, so you don't really need Curio anymore. It was what
started everything and deserves credit for that though.

As for Trio, it's what asyncio should have been from the beginning, at least
for the high level part (although not for the pet peeves of socket
programming: it's too low level for Python IMO)

The problem with Trio is that it's incompatible with asyncio (minus some
verbose quirky bridges), so you get yet another island, yet another ecosystem.
So what, now we get twisted, tornado, gevent, qt, asyncio... and trio ?

The madness must stop.

And that's why I think there is a better way: creating a higher level API for
asyncio, which enforces the best practices and make the common things easy,
and the hard things decent.

A complete rewrite like Trio would be better (e.g: it famously handles Ctrl +
C way better and has a smaller API). But this ship has sailed. We have asyncio
now.

asyncio is pretty good honestly. But it needs some love.

So, considering asyncio is what we have to work with, and by experience, it's
quite great if you know what you are doing, I advice people to actually write
a wrapper around it.

If you don't feel like writing a wrapper, I'll plug in the one I'm working on
in case people are curious:
[https://github.com/Tygs/ayo](https://github.com/Tygs/ayo)

It:

\- is based on asyncio. Not a new framework. Integrated transparently with
normal asyncio.

\- implement some lessons leaned from trio (e.g: nurseries, cancellation, etc)

\- expose a sweet API (running blocking code is run.aside(callback, _args,_
*kwargs), and automate stuff it can (@ayo.run_as_main setup the event loop and
run the function in one row)

\- make hard things decent: timeout and concurrency limit are just a param
away

I does need some serious doc, including a rich tutorial which features a pure
asyncio part. Also it needs some mix between streams and protocols. I'm not
going to skip that part, I think it's mandatory, but I'll need many months to
make the whole thing.

Now, I am not Nathaniel or Yury, so my work is not nearly as bullet proof as
theirs. I would not advice to install ayo in prod now, but I think it's a
great proof of concept of how good asyncio can be.

And we most certainly can do even better.

------
greyman
Python is my language of first choice, but I must say that I am not that
thrilled how this multithreading ended up. There are many tutorials about the
topic promising to explain how it works, usually in the form of "simple
introduction". But when one tries to implement something production-ready,
with correct error handling etc., things starts to complicate pretty quickly;
at least that was my experience. I don't want to accuse anyone specifically,
but most of the tutorials I saw seems to portrait it in a way, that it looks
easier than it actually is.

Ultimately, my company decided, that instead of fighting with asyncio, certain
projects will switch to Go.

~~~
sametmax
That's because most of those tutorials have not been written by somebody
actually putting something in production.

I've been using asyncio for a while now, and you can't get away with a short
introduction since:

\- it's very low level

\- it's full of design flaws and already has accumulated technical debt

\- it requires very specific best practices to be usable

I'm not going to write a tutorial here, it would take me a few days to make a
proper one, but a few pointers nobody tells you:

\- asyncio solves one problem, and one problem only: when the bottleneck of
your program is network IO. It's a very small domain. Most programs don't need
asyncio at all. Actually many programs with a lot of network IO don't have
performance problems, and hence don't need asyncio. Don't use asyncio if you
don't need it: it adds complexity that is worth it only if it solves your
problem.

\- asyncio is mostly very low level. Unless you code your own lib or framework
with it, you probably don't want to use it directly. E.G: if you want to make
http requests, use aiohttp.

\- use asyncio.run_until_complete(), not asyncio.run_forever(). The former
will crash on any exception, making debugging easy. The later will just
display the stack trace in the console.

\- talking about easy debugging, activate the various debug features when not
in prod ([https://docs.python.org/3/library/asyncio-dev.html#debug-
mod...](https://docs.python.org/3/library/asyncio-dev.html#debug-mode-of-
asyncio)). Too many people code with asyncio in the dark, and don't know there
are plenty of debug info available.

\- await is just a way to inline a callback. When you do "await", you say 'do
the stuff', and any lines of code that are after "await" are called when
"await" is done. You can run asynchronous things without "await". "await" is
just useful if you want 2 asynchronous things to happen one __after__ another.
Hence, don't use it if you wants 2 asynchronous things to progress in
parallel.

\- if you want to run one asynchronous thing, but not "await" it, call
"asyncio.ensure_future()".

\- errors in "await" can be just caught with try/except. If you used
ensure_future() and no "await", you'll have to attach a callback with
"add_done_callback()" and check manually if the future has an exception. Yes,
it sucks.

\- if you want to run one blocking thing, call "loop.run_in_executor()".
Careful, the signature is weird.

\- CPU intensive code blocks the event loop. loop.run_in_executor() use
threads by default, hence it doesn't protect you from that. If you have CPU
intensive code, like zipping a lot of files or calculating your own precious
fibonacci, create a "ProcessPoolExecutor" and use run_in_executor() with it.

\- don't use asyncio before Python 3.5.3. There is a incredibly major bug with
"asyncio.get_event_loop()" that makes it unusable for anything that involve
mixing threads and loops. Yep. Not a joke.

\- but really use 3.6. TCP_NODELAY is on by default and you have f-string
anyway.

\- don't pass the loop around. Use asyncio.get_event_loop(). This way your
code will be independent of the loop creation process.

\- you do pretty much nothing yourself in asyncio. Any async magic is deep,
deep down the lib. What you do is define coroutines calling the magic things
with ensure_future() and await. Pretty much nothing in your own code is doing
IO, it's just asking the asyncio code to do IO in a certain order.

\- you see people in tutorials simulate IO by doing "asyncio.sleep()". It's
because it's the easiest way to make the event loop switch context without
using the network. It doesn't mean anything, it just pauses and switch, but if
you see that in a tutorial, you can mentally replace it with, say, an http
call, to get a more realistic picture.

\- asyncio comes with a lot of concepts, let's take a time to define them:

    
    
        * Future: an object with a thing to execute, with potentially some callbacks to be called after it's executed.
        
        * Task: a subclass of future. The thing to execute is a coroutine,, and the coroutine is immediately scheduled in the event loop when the task is instantiated. When you do ensure_future(coroutine), it returns a Task.
    
        * coroutine: a generator with some syntaxic sugar. Honestly that's pretty much it. They don't do much by themself, except you can use await in them, which is handy. You get one by calling a coroutine function.
    
        * coroutine function: a function declared with "async def". When you call it, it doesn't run the code of the function. Instead, it returns a coroutine. 
    
        * awaitable: any object with an __await__ method. This method is what the event loop uses to execute asynchronously the code. coroutines, tasks and futures are awaitables. Now the dirty secret is this: you can write an __await__ method, but in it, you will mostly call the __await__ from some magical object from deep inside asyncio. Unless you write a framework, don't think too much about it: awaitable = stuff you can pass to ensure_future() to tell the event loop to run it. Also, you can "await" any awaitable.
    
        * event loop: the magic "while True" loop that takes awaitables, and execute them. When the code hits "await", the event loop switch from one awaitable to another, and then go back to it later.
    
        * executor: an object that takes code, execute it in a __different__ context, and return a future you can await in your __current__ context. You will use them to run stuff in threads or separate processes, but magically await the result in your current code like it's regular asyncio. It's very handy to naturally integrate blocking code in your workflow.
    
        * event loop policy: the stuff that creates the loop. You can override that if you are writing a framework and wants to get fancy with the loop. Don't do it. I've done it. Don't.
    
        * task factory: the stuff that creates the tasks. You can override that if you are writing a framework and wants to get fancy with the tasks. Don't do it either.
    
        * protocols: abstract class you can implement to tell asyncio __what__ to do when it establish/loose a connection or send/receive a packet. asyncio instantiate one protocol for each connection. Problem is: you can't use "await" in protocols, only old fashion callback.
    
        * transports: abstract class you can implement to tell asyncio __how__ to establish/loose a connection or send/receive a packet.
    

Now, I'm putting the last point separately because if there is one thing you
need to remember it's this. It's the most underrated secret rules of asyncio.
The stuff that is literally written nowhere ever, not in the doc, not in any
tuto, etc.

asyncio.gather() is the most important function in asyncio
===========================================================

You see, everytime you do asyncio.ensure_future() or loop.run_in_executor(),
you actually do the equivalent of a GO TO. (see:
[https://vorpus.org/blog/notes-on-structured-concurrency-
or-g...](https://vorpus.org/blog/notes-on-structured-concurrency-or-go-
statement-considered-harmful/))

You have no freaking idea of when the code will start or end execution.

To stay sane, you should never, ever, have an dangling awaitable anywhere.
Always get a reference on all your awaitables. Decide where in the code you
think their life should end.

And at this very point, call asyncio.gather(). It will block until all
awaitables are done.

E.G, don't:

    
    
        asyncio.ensure_future(bar())
        asyncio.get_event_loop().run_in_executor(None, barz)
        await asyncio.sleep(10)
        

E.G, do:

    
    
        foo = asyncio.ensure_future(bar())
        fooz = asyncio.get_event_loop().run_in_executor(None, barz)
        await asyncio.sleep(10)
        await asyncio.gather(foo, fooz)  # this is The Only True Way
       

Your code should be a meticulous tree of hierarchical calls to
asyncio.gather() that delimitates where things are supposed to stop. And if
you think it's annoying, wait for debugging something which life cycle you
don't have control over.

Of course it's getting old pretty fast, so you may want to write some
abstraction layer such as
[https://github.com/Tygs/ayo](https://github.com/Tygs/ayo). But I wouldn't use
this one in production just yet.

~~~
mtrovo
Wow, really nice list, I wish I knew it before I started to work with asyncio.

> stay sane, you should never, ever, have an dangling awaitable anywhere.
> Always get a reference on all your awaitables. Decide where in the code you
> think their life should end.

This is the most difficult part for me, it's not trivial to know if a function
you're calling is async or not without looking at the function source,
specially when you're using external libraries. Also by default there's no
logs about this kind of situation so it's a easy way to shoot yourself in the
foot and waste 10 minutes debugging to find a dangling awaitable on a function
call you didn't realize was async.

~~~
wruza
And still people vote for async-await because “true light threads are hard to
implement at low level”. This generator-based madness has to end, but few seem
to understand what hassle it brings to their coding and what an alternative
could be. I don’t get it.

------
tlrobinson
This is essentially how modern JavaScript works, in particular with the
addition of async/await syntax [1] (which was originally from C#, I think),
but it's been possible with libraries like task.js, co, and Bluebird [2] since
generator functions were available (either natively or via transpiling).

The main difference is in JavaScript the event loop is automatic and hidden,
and asynchronous IO is the default, so it's a bit harder to shoot yourself in
the foot.

1\. [https://developer.mozilla.org/en-
US/docs/Web/JavaScript/Refe...](https://developer.mozilla.org/en-
US/docs/Web/JavaScript/Reference/Statements/async_function)

2\. [https://github.com/mozilla/task.js](https://github.com/mozilla/task.js)
[https://github.com/tj/co](https://github.com/tj/co)
[http://bluebirdjs.com/docs/api/promise.coroutine.html](http://bluebirdjs.com/docs/api/promise.coroutine.html)

------
sadgit
I’m still bummed that Python took this direction. Maybe introducing new
keywords into the language for event loop concurrency was Python’s way of
satisfying “explicit is better than implicit” but i can’s shake the feeling
that callback passing and generator coroutines are a fad that is complex
enough to occupy the imagination of a generation of programmers while offering
little benefit compared to green threads.

~~~
manaskarekar
This might be interesting reading. Comments on rust's approach to async from
green thread.

[https://aturon.github.io/blog/2016/08/11/futures/](https://aturon.github.io/blog/2016/08/11/futures/)

~~~
nemothekid
One of the key reasons Rust took this approach is because Rust's
implementation is zero cost - it doesn't require a runtime to implement. It is
potentially very efficient, and thanks to Rust's other guarantees its very
safe to use. It's as close to bare metal as you can get for a async framework
and as a result its incredibly efficient.

However, to me, the implementation is a lot complex that green threads. The
Future's crate has had a lot of churn, and when I first dabbled in it a while
back, it was one of the first few times I struggled to understand what the
compiler errors even meant as the types were so deep. Compared to golang's
'go', futures are harder to understand, plus you have to rewrite all your
networking/blocking code to be compatible (Python would still likely have to
do the same, but I think it could be done in such a way that if you were using
the system provided networking libraries, you could get compatibility for
"free").

Python doesn't benefit from the Rust benefits explained in that article.
Python already has a garbage collector and runtime. Python is single threaded.

I don't follow the Python language to have a well informed opinion on why they
went with futures, but I doubt its close to the reasoning that Rust chose.

~~~
smittywerben
> Python is single threaded.

That's not true, there are multiple threads in Python i.e. zlib from the
standard library.

People want unsafe code to communicate concurrently through the Python thread
state. I'll let someone else tackle that one.

------
shoo
quoting the article:

> Concurrency is like having two threads running on a single core CPU.

> Parallelism is like having two threads running simultaneously on different
> cores

> It is important to note that parallelism implies concurrency but not the
> other way round.

Aurgh! I don't think this attempted definition-by-simile is helpful, or even
somewhat correct.

I much prefer yosefk's way of framing things:

> > concurrent (noun): Archaic. a rival or competitor.

> > Two lines that do not intersect are called parallel lines.

...

> Computation vs event handling

> With event handling systems such as vending machines, telephony, web servers
> and banks, concurrency is inherent to the problem – you must resolve
> inevitable conflicts between unpredictable requests. Parallelism is a part
> of the solution - it speeds things up, but the root of the problem is
> concurrency.

> With computational systems such as gift boxes, graphics, computer vision and
> scientific computing, concurrency is not a part of the problem – you compute
> an output from inputs known in advance, without any external events.
> Parallelism is where the problems start – it speeds things up, but it can
> introduce bugs.

...

> concurrency is dealing with inevitable timing-related conflicts, parallelism
> is avoiding unnecessary conflicts

yosefk's whole essay about this is great:
[https://yosefk.com/blog/parallelism-and-concurrency-need-
dif...](https://yosefk.com/blog/parallelism-and-concurrency-need-different-
tools.html)

~~~
foxes
I also initially thought the same thing. Page two of "Parallel and Concurrent
programming in Haskell" maybe says it in a nicer way:

>A parallel program is one that uses a multiplicity of computational hardware
....

>concurrency is a program-structuring technique in which there are multiple
threads of control...

(a pdf can readily be found with your favorite search engine for the full
extract :) ).

I would much prefer to see a precise, rigorous definition and then examples
(or eg and then defn is also acceptable), instead of just a list of examples.
Examples help you understand a rigorous statement. But, if you only give a
hand waving explanation for something, I think it just creates more confusion
in the end, as you never know exactly what is correct. It's leaving it open
for ambiguity.

------
meken
I really liked this article. It's by far the most concise explanation of
asyncio in python that I've come across. Also, great use of little "quoted"
statements throughout that encourage you to stop and really understand what
was said before moving on (these probably have a special name).

Bravo!

------
eliasson
A few years ago I wrote an BitTorrent client in Python 3.5 to get to know
asyncio better.

Maybe those blogposts are still of use to somebody:

\- [http://markuseliasson.se/article/introduction-to-
asyncio/](http://markuseliasson.se/article/introduction-to-asyncio/)

\- [http://markuseliasson.se/article/bittorrent-in-
python/](http://markuseliasson.se/article/bittorrent-in-python/)

------
ioquatix
Here is a comparison of `asyncio` (Python), `async` (Ruby) and Go:
[https://github.com/socketry/async-
await/tree/master/examples...](https://github.com/socketry/async-
await/tree/master/examples/port_scanner)

I wrote a similar article but for Ruby:
[https://www.codeotaku.com/journal/2018-06/asynchronous-
ruby/...](https://www.codeotaku.com/journal/2018-06/asynchronous-ruby/index)

Yes, it's a good model for many use cases.

One thing I wondered about Ruby, is it really necessary to have the `await`
keyword?

~~~
kasbah
`await` is useful since with the absence of it you can schedule multiple tasks
concurrently. In JS:

    
    
        var task1 = someAsyncTask1()
        var task2 = someAsyncTask2()
     
        await Promise.all([task1, task2])
    

If `await` was implicit then task2 would wait for task1 to finish.

------
jrs95
It seems odd that Python ended up with asyncio when they had a clear and
successful model they could’ve adopted from Stackless. It would have been more
difficult to implement, but it would allow for the same benefits without
requiring an asynchronous programming model, which would have reduced the
total amount of effort involved in getting to having good concurrency in
Python.

------
_Codemonkeyism
Wasn't there a language where every call was async? Instead of async ...
A/returing Future[A] it did/would return A from method calls.

If it didn't exist, one can imagine one.

A.x = 3 would be wrapped in A.map(_.x = 3) etc. So you write code that would
be executed when you finally await a value. No more red/blue world. Would
probably need coroutines instead of threads for executing.

~~~
marcosdumay
Just to post it as an answer instead of a question. That's Haskell's IO.

It is just one of the lots of concurrency behaviors available in libraries.
Also, parallelism is "free" on pure code.

~~~
_Codemonkeyism
From my understanding, this is not Haskells IO - though my time with Haskell
is limited.

1\. Haskell uses special notation 'do' to handle access to IO wrapped values,
e.g. (contrieved example, one would not use do for such simple cases)

    
    
          y = do x <- xWithIO 
              return x + 1
    

instead of

    
    
          y = x + 1    
          

2\. Haskell method signatures do include IO, e.g.

    
    
        doSomething   :: Int -> IO Int
    

instead of

    
    
        def doSomething(i:Int):Int
    

3\. Because IO is usually not the only effect managaging monad, as I've said
in another comment, the type signature usually uses a type alias that does
alias a monad transformer stack like

    
    
        type Result a = ReaderT Env ( ErrorT  String ( StateT  Integer Identity))
    

or concurrency mixed in

4\. This the same as my Scala code, where I have cats FutureT monad
transformers with Scalactic errors OrT Every stacks showing up all over my
APIs as a type alias of 'WithErrors'.

5\. 'Also, parallelism is "free" on pure code.' Not sure what's that got to do
with it, but yes if you have no concurrency problems (concurrent writes to
shared data) you don't need to think about concurrency and parallelism is
free.

But if my understanding is wrong, I'm happy to learn something about
concurrency in Haskell without it showing in code and type signatures.

~~~
marcosdumay
Well, it does not have the exact same syntax of your example. Even more
because your example was pure. Haskell does that automatically for pure code
too (`y = x + 1` would do exactly what you described) but it's not really
relevant.

IO code always returns a promise, and the next statement on a `do` block may
await the previous promise and yield the execution to whatever other piece of
code can run, based on some rules on the compiler, based in large part on data
dependency. If I'm reading your comment correctly, that is what you are asking
for.

~~~
_Codemonkeyism
Sorry for being so confusing

"Even more because your example was pure."

No it wasn't which is the whole point.

    
    
        y = x + 1
    

means

    
    
        y = x.map(_ + 1)
    

by default with async effects.

"same syntax"

Which is also the point, that there is syntax for the effects. When everything
is async, there should be no special syntax as you have that syntax all over
your code and it's redundant.

------
jbarham
Someone send this article to Armin Ronacher (creator of Flask, Jinja, etc.) so
he can understand asyncio since he wrote a much longer and more detailed
article explaining why he doesn't understand asyncio (previous discussion at
[https://news.ycombinator.com/item?id=12829759](https://news.ycombinator.com/item?id=12829759)).

------
kyberias
Is this any different than in C#?

------
mikec3010
My first endeavor with asyncio felt worse than a beating with a wet rubber
hose. But it was a character building experience to really get up close and
personal with the asynchronous model and I can definitely see its advantages
over imperative, as well as it's always good to have another tool in the box.

------
anc84
Thinly veiled advertisement for a "Intelligent Infrastructure Analytics - a
Machine Learning driven approach for DevOps & SREs of modern age" company.
Seems like hackernoon just published a native advertisement? Not surprisingly
shady though, considering the "buy crypto with credit card" link in the top
bar.

