
A Hitchhiker’s Guide to Asynchronous Programming - crazyguitar
https://github.com/crazyguitar/pysheeet/blob/master/docs/appendix/python-concurrent.rst
======
BiteCode_dev
I get that this guide tries to ease you in into low level concurrency
concepts.

However, if you try to just get work done in Python, this is not what you
want.

Don't do threads/processes yourself, use pools:

    
    
        import random
        import time
        from concurrent.futures import ProcessPoolExecutor, as_completed
    
        def hello():
            seconds = random.randint(0, 5)
            print(f"Start blocking for {seconds}s")
            time.sleep(seconds)
            print(f"Stopped blocking after {seconds}s")
            return seconds
    
        if __name__ == "__main__":
    
            with ProcessPoolExecutor(max_workers=2) as exec:
    
                a = exec.submit(hello)
                b = exec.submit(hello)
    
                for future in as_completed((a, b)):
                    print(future.result())
    

And don't manage the loop yourself. Use Python 3.7, and replace:

    
    
        loop = asyncio.get_event_loop()
        loop.run_until_complete(loop.create_task(foo()))
        loop.close()
    

With: asyncio.run(foo())

The code is not just shorter, it is way, wayyyyyyyyyyyy, more correct.

Also don't program asyncio by hand. Use a lib. E.G: wanna do http, use
aiohttp.

This is Python, don't suffer more than you need to.

~~~
crazyguitar
I agree with you. We should use a reliable library, as you said. The primary
purpose of this article is to help people understand what a coroutine and an
event loop are. Therefore, programmers can use asyncio API fluently without
misuse.

However, I don't think use threads/processes is a bad idea. A pool gives you a
constrain to utilize threads/processes, but sometime we may want to adjust the
number of threads/processes based on system load. Under this circumstance,
using a pool is not the best choice.

~~~
heavenlyblue
You’re sending the poor newbie on a journey of self-discovering of picklable-
non-picklable, passing of arguments back and forth, working with Queues (which
have well-known yet undocumented race conditions), missing exception stack
traces due to dead processes and all sorts of useless garbage they don’t need
to know about.

Also what exactly would that newbie be building that starts and stops threads
depending on the system load? What kind of a contraption is that? What are you
doing?

Finallizing all of the above: under the circumstance you mentioned, you should
check whether you have just seriously over-architected the solution.

~~~
crazyguitar
I understand you are worried about newbies misuse APIs. You remind me that I
should add a warning to inform the sample code in this article should not use
in programs. Thanks.

Also, I did not advocate a newbie should start and stop threads by themselves.
I want to say I agree that we should use high-level APIs in most cases, but,
in some cases, we may need to use low-level APIs to achieve our missions. I am
unwilling to limit what kind of APIs should use. In my opinion, like you said:
"you have just seriously over-architected the solution," we should be careful
to use APIs. Even though high-level APIs are safer, programmers may misuse
them.

------
rhizome31
Also, if you do a lot of concurrent programming, you should consider platforms
with lightweight processes (as provided by Erlang/Elixir, among others). I
find code based on this paradigm much easier to write and debug than async
code and it comes with additional benefits such as error isolation and the
ability to parallelize CPU-bound tasks.

------
leetrout
> Obviously, A coroutine is just a term to represent a task that is scheduled
> by an event-loop in a program instead of operating systems.

This full of less-than-ideal technical writing like this example.

~~~
crazyguitar
I understand. I will review my contents persistently. BTW, if you are
available, could you give me some writing tips? Thank you so much.

~~~
leetrout
[https://styleguide.mailchimp.com/voice-and-
tone/](https://styleguide.mailchimp.com/voice-and-tone/) (previously
voiceandtone.com)

[https://mkaz.blog/misc/notes-on-technical-
writing/](https://mkaz.blog/misc/notes-on-technical-writing/)

[https://spin.atomicobject.com/2014/09/09/never-use-the-
word-...](https://spin.atomicobject.com/2014/09/09/never-use-the-word-
obviously/)

[https://developers.google.com/tech-
writing/overview](https://developers.google.com/tech-writing/overview)

~~~
crazyguitar
Awesome! Thank you

------
sk0g
Maybe "for Python" appended to the title might be handy.

~~~
crazyguitar
I agree.

------
Myrmornis
Here are two good reads on asynchronous programming

[http://krondo.com/an-introduction-to-asynchronous-
programmin...](http://krondo.com/an-introduction-to-asynchronous-programming-
and-twisted/)

[https://nullprogram.com/blog/2019/03/10/](https://nullprogram.com/blog/2019/03/10/)

~~~
crazyguitar
Nice. Thanks

------
dang
Please don't put "Show HN" on reading material. It's against the rules
([https://news.ycombinator.com/showhn.html](https://news.ycombinator.com/showhn.html))
because if it were allowed, everyone would put Show HN on everything.

~~~
crazyguitar
Oh! sorry! thx

------
hnews_account_1
This is only marginally related to the article in general, but python's
implementation of concurrency and multi threading is fantastic in my
experience. It took me literally a full 3 hours to get the basic hang of it,
and I went from that to writing embarrassingly parallel code to do very large
data operations in a matter of weeks.

Not to sound ignorant, but I had zero idea about semaphores and locking even a
month into using their implementation and my code worked perfectly. Big fan of
that library since my work involves both querying REST APIs for data and doing
computationally intensive operations on it. My cloud system is very low grade
but with GIL, what now takes 12 minutes to complete on a good day would've
taken literal hours to finish if I'd written it serially.

~~~
RossBencina
I thought that Python's GIL (Global Interpreter Lock) precluded implementing
parallel code in Python. Has something changed recently?

~~~
BiteCode_dev
Python always could use multiprocessing to do parallel processing leveraging
several CPU.

However, it became especially easy with Python 3.2 (10 years ago) which
introduced the ProcessPoolExecutor
([https://docs.python.org/dev/library/concurrent.futures.html#...](https://docs.python.org/dev/library/concurrent.futures.html#concurrent.futures.ProcessPoolExecutor)):

    
    
        import random
        import time
        from concurrent.futures import ProcessPoolExecutor, as_completed
    
    
        def hello():
            seconds = random.randint(0, 5)
            print(f"Start blocking for {seconds}s")
            time.sleep(seconds)
            print(f"Stopped blocking after {seconds}s")
            return seconds
    
        if __name__ == "__main__":
    
            with ProcessPoolExecutor(max_workers=2) as exec:
    
                a = exec.submit(hello)
                b = exec.submit(hello)
    
                for future in as_completed((a, b)):
                    print(future.result())
    

The same exact same API exist for thread BTW. I don't think a tutorial should
introduce you to concurrency using manual process/thread management any more.
I makes no sense to me.

You may note that multiprocessing still eats more RAM than with typical
Go/Rust code since max_workers=n means n+1 Python VM spawing but on modern
servers you don't really feel it. That's what most WSGI setup do anyway.

Now, before 2019, there was one more use case that wasn't served well: how do
you share some computation between isolated processes that need to communicate
to make their work ? Typical use case was people using numpy or pandas
crunching numbers that depended on each others. Indeed, communicating between
processing using piping is expensive given the cost of message passing
serialization.

However, in the previous Python release (3.8), we introduced a mechanism to
share memory for almost free
([https://docs.python.org/3/library/multiprocessing.shared_mem...](https://docs.python.org/3/library/multiprocessing.shared_memory.html)):

    
    
        from multiprocessing.managers import SharedMemoryManager
    
        with SharedMemoryManager() as smm:
            sl = smm.ShareableList(name="unique_name", range(2000))
    

The sl object can then contain int, float, bool, str, bytes and None and its
reference can be shared among processes. Each item can be deleted and
replaced. You an also get an hold on sl by using the "unique_name" if you
don't have the reference at hand.

There is a raw ShareableMemory object for stuff like numpy/pandas array
buffers if this is your main concern.

~~~
hnews_account_1
Thank you for this. I did not know of this change in implementation in Python
3.8.

Are you a core dev btw? I have a complaint about single threads and processes
and how they have absolutely no way to return values back to the main thread
except through some shared memory object. Am I too ignorant to understand how
big of a challenge it is to do this?

    
    
      from threading import Thread
    
      def func1(*args):
         #something
    
      def main():
         new_thread, return_val = Thread(target=func1, args=(1, 2))
         new_thread.start()
         new_thread.join()
         print(return_val)
    

Instead to implement this, I keep having to use like single process pools that
already have return mechanisms encoded. All I need is for a way of starting
off a thread (or process) and joining it once my main thread / process is done
and retrieving its return value for my use (assuming there is one).

~~~
BiteCode_dev
Not a core dev.

The clean way to return a value is to pass it to a Queue
([https://docs.python.org/fr/3/library/queue.html](https://docs.python.org/fr/3/library/queue.html)),
and this is what the executor does, but for your use case, it's overkill.

For very simple use cases, you can inherit from Thread and force join() to
return the value:

    
    
        class SimpleReturningThread(Thread):
            result = None
            def run(self):
                try:
                    if self._target:
                        self.result = self._target(*self._args, **self._kwargs)
                finally:
                    del self._target, self._args, self._kwargs
            def join(self, *args, **kwargs):
                super().join(*args, **kwargs)
                return self.result
    
    

Personally, I'd stick to using a pool with only one worker in it. It's not
worth the trouble of doing all this work. Wrap it in a few functions if you do
the same thing repeatedly and call it a day, it's unlikely going to be what
most your code is about anyway.

------
kabacha
I've been digging through asyncio for few weeks now and I actually really
didn't like this article.

The thing that made me finally click with asyncio was simple explanation that
coroutines are "pausable functions" and few hello world/sleep examples. While
this article goes into servers, threading and all sort of overly complex and
long explanations.

For some people this might be more approachable but I don't see anything
"hitchiker's guide" about this article in particular.

------
clarry
All these intros to asynchronous programming fail to address the most
interesting (and arguably most important yet also most difficult) case, which
is asynchronicity on a modern multi-core server. Instead, threads and event
loops are presented as mutually exclusive strategies.. naively using one
thread for every connection doesn't scale, and naively using event loops means
you're stuck running it all on one core, which doesn't scale.

~~~
RossBencina
I guess that's because they are "intros," and often multi-core event loops are
implemented by the language runtime or some threadpool library. This is a good
watch:

Dmitry Vyukov — Go scheduler: Implementing language with lightweight
concurrency (Oct 14, 2019)

[https://www.youtube.com/watch?v=-K11rY57K7k](https://www.youtube.com/watch?v=-K11rY57K7k)

------
jojo14
I for one have always thought that keeping things synchronous based on
select() is a better intellectual discipline and less error prone. Only in few
cases you are forced to use asynchronous programming. However that's not why I
comment here.

I just want to inform that "yield from" and "@coroutine" are now deprecated.
So the article needs a bit of an update:

Note: Support for generator-based coroutines is deprecated and is scheduled
for removal in Python 3.10.

References: \- [https://docs.python.org/3/library/asyncio-
task.html#generato...](https://docs.python.org/3/library/asyncio-
task.html#generator-based-coroutines) \-
[https://docs.python.org/3/whatsnew/3.7.html](https://docs.python.org/3/whatsnew/3.7.html)
\- [https://bugs.python.org/issue36921](https://bugs.python.org/issue36921)

~~~
nurettin
what if we yield from an async function without the @coroutine decorator? That
is also an AsyncGenerator. Or is the scope of deprecation solely concerned
with the @coroutine decorator?

~~~
crazyguitar
I think the syntax, `yield from`, and `@coroutine` are two things. `async def`
+ `yield from` means we delegate generator to another generator. Therefore, in
the async function, using `yield from` is equal to declare an asynchronous
generator function.

However, using `@coroutine` + `yield from` means we transform a generator into
a generator coroutine. Because a generator is a form of coroutine, in Python
3.4, `@coroutine` turns a function or a future into a generator function. Note
that if a function is a generator function, `@coroutine` does not do anything.
Based on the document, Python recommends using `async def` instead of
`@coroutine` to declare a coroutine because `@coroutine` will be removed in
Python 3.10.

~~~
nurettin
we can transform async generators into coroutines (async functions) by
creating a new async function that simply starts iterating over the async
generators, so every name is overloaded which makes communication kinda hard.
I will just assume the happy case of @coroutine getting a downgrade. Don't use
it anyway.

------
vips7L
I wish more languages took the Go/Zig approach for async/await and didn't
introduce colored functions [0] that pollute your whole scope.

[0] [http://journal.stuffwithstuff.com/2015/02/01/what-color-
is-y...](http://journal.stuffwithstuff.com/2015/02/01/what-color-is-your-
function/)

------
32gbsd
> Python introduced a concept, async/await, to help developers write
> understandable code with high performance is it actually better or is it
> just a new kind of threading/loop? being understandable is second fiddle.

~~~
gigatexal
I think C# had it long before python did...

------
signa11
fwiw, dave beazley writings on similar (and everything else as well) are
excellent.

