Hacker News new | comments | show | ask | jobs | submit login
Python3 Asyncio for middling developers (whatisjasongoldstein.com)
35 points by coffeefirst on Sept 3, 2017 | hide | past | web | favorite | 14 comments



> That seems wrong. I should be able to run normal Python code as async, right?

No, that is not the case. To better understand this you should look at an async library, like https://github.com/aio-libs/aiohttp Look at what it actually calls all the way down under the hood.

If it were as simple as adding `asyncio.sleep(0)`, then that library seems as though it would have been much easier to write. :P

Just look at the code you posted at the end, it actually runs faster synchronously, without `asyncio.sleep(0)`. The sleep is what happens async, not the print statements, therefore, all you're doing is introducing delay.

Similarly, the Django ORM DB calls you make in the other examples are all still happening synchronously. However, you're just adding a delay that causes them to get picked off in an inconsistent order.


So what you're saying is it needs an 'async' version of the Django ORM for the example to work

In other words what is really needed is an ORM that would allow you to write:

  source = await Source.objects.get(id=source_id)
  await source.update()
(?)


Yes, in a cooperative async model anything which would block will stop progress of every coroutine, so you would need an async version of the ORM library for this to work. When you await something, you're telling the event loop to stop running this coroutine, and wait for something (usually IO) to happen before resuming execution. Because of this, if you block on IO without using await, the coroutine doesn't know to yield and will simply pause; thus execution of every single coroutine is blocked.

This leads to a sort of infectious need to make everything async as even a single non cooperating coroutine can bring the whole show to a halt. It's essentially the red vs blue function problem of Python. However, there is actually a nice alternative, gevent. gevent will monkey patch all functions in the standard library which would block, e.g. reading from a socket, attaching an implicit await to them. If the author has used gevent, the example Django code would actually work as expected, since the code would execute until the database connection was written to/read from and then immediately await.

Either way, async IO is still a somewhat tricky thing to understand and get right. It took writing a non blocking event loop with epoll in my case to really grok what was going on under the hood of something like asyncio.


For OP's case, I wouldn't have jumped to async, but instead either to multithreading or multiprocessing. Pool().map makes this really trivial. Taking their example, and tweaking it slightly:

  import requests
  from multiprocessing import Pool
  
  
  def fetch_things():
      pool = Pool()  # defaults to number of CPUs
      urls = ['https://example.com/api/object/1',
              'https://example.com/api/object/2',
              'https://example.com/api/object/3']
      return pool.map(requests.get, urls)
  
  
  print(fetch_things())
Output (because those URLs are nonsense...):

  [<Response [404]>, <Response [404]>, <Response [404]>]
It's just as easy to do it in threading. Just switch that "from multiprocessing import Pool" with "from multiprocessing.dummy import Pool"


Have you timed it? Starting threads in Python is slow...


Considering everything that is involved in making a request to the internet, multithreading would have to be spectacularly slow to even come close to making serial approach quicker:

  $ python quicktest.py 
  ['http://www.google.com', 'http://news.bbc.co.uk', 'http://news.ycombinator.com', 'http://www.cnn.com', 'http://www.foxnews.com', 'http://www.msnbc.com']
  [<Response [200]>, <Response [200]>, <Response [200]>, <Response [200]>, <Response [200]>, <Response [200]>]
  Serial: 1.23853206635
  [<Response [200]>, <Response [200]>, <Response [200]>, <Response [200]>, <Response [200]>, <Response [200]>]
  Multiprocess: 0.912357807159
  [<Response [200]>, <Response [200]>, <Response [200]>, <Response [200]>, <Response [200]>, <Response [200]>]
  Multithreaded: 0.708998918533
edit: Here's the code:

  import requests
  import time
  from multiprocessing import Pool
  from multiprocessing import Pool as ThreadPool
  
  
  session = requests.Session()
  
  urllist = ['http://www.google.com',
             'http://news.bbc.co.uk',
             'http://news.ycombinator.com',
             'http://www.cnn.com',
             'http://www.foxnews.com',
             'http://www.msnbc.com']
  # Warm up?
  responses = []
  for url in urllist:
      responses.append(session.get(url))
  
  print urllist
  
  start = time.time()
  responses = []
  
  for url in urllist:
      responses.append(session.get(url))
  
  print responses
  print "Serial: {}".format(time.time()-start)
  
  start = time.time()
  
  pool = Pool()
  responses = pool.map(requests.get, urllist)
  
  print responses
  print "Multiprocess: {}".format(time.time()-start)
  
  start = time.time()
  pool = ThreadPool()
  responses = pool.map(requests.get, urllist)
  
  print responses
  print "Multithreaded: {}".format(time.time()-start)


Have _you_ timed it? Not in general, but for this specific case? Thread creation is relatively expensive to some operations, but maybe the speed is entirely irrelevant to the task at hand. In this case, the author of the article is auto-curating some articles from a list of people he finds interesting. If this were done once per day as a cron job, it could almost certainly be done entirely _serially_ with zero concurrency and full blocking and still finish fine. Adding in concurrency is nice, but certainly any method will do with this volume.

This is certainly one of the cases where you should just do whatever is simplest (to _you_ the programmer). The first step is always to optimize for cognitive overhead. I.e. make the code easy to reason about. Next (and relatively rarely) is it necessary to good to optimize for different bottlenecks in your code.


I went back and timed it. The overhead is at _most_ 100ms in my use case (there's some ambiguity because of other problems with the async implementation, I suspect it's actually lower than this). Given many of the requests are 1s long and this is a background task, that's totally fine.


I'm sure someone will correct me here, but in my experience, Pythons Async model is a cluster-fuck.

It feels very much like it was just kind of thrown in to keep up with the trends, without any thought as to whether it made sense or whether it was the most "Pythonic" way of implementing it.


A lot of thought was put into it, including input from designs ten years older. Whether it is a good design I'll leave to others as I haven't used it much.


Async IO libraries tend to make very simple things very complicated. I spent half a day trying to use Boost.Asio to receive a UDP frame before giving up and using QUdpSocket (which took less that 5 minutes).

Agreed with the author's sentiment of feeling stupid.


coroutines are tacos, and monads are burritos. Repeat after me.

ps: for the author, my only theory about the 0s sleep is that coroutines aren't preempted like threads, they use collaborative concurrency, so unless they actually say "ok I agree to pause now and let others do something" well the interpreter will evaluate all the instructions until completion. My 2 cents


this is actually titled "I’m too stupid for AsyncIO"

and its about NOT understanding AsyncIO.


I was kidding with other devs yesterday that I had more fun trying designing multithreaded code than learning asyncio.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: