
How to Retrieve 100k Objects with Python: Why We Prefer Threading to Asyncio - mrosett
https://medium.com/@rbpengineering/how-to-retrieve-100k-objects-with-python-why-we-prefer-threading-to-asyncio-d6b2cbfd7d90
======
mrosett
OP here. There are some good guides [0] about using Python's asyncio module
(and libraries built on top of it) but I hadn't seen a comparison of it to
threading which is what I prefer for client-side coding. The key difference is
that using the async module requires modifying every function in the call
stack, while threading can be cleanly wrapped around existing code. So
although threading has more of a memory footprint and feels less Pythonic,
it's a much better option when working with third-party libraries.

[0]:
[https://pawelmhm.github.io/asyncio/python/aiohttp/2016/04/22...](https://pawelmhm.github.io/asyncio/python/aiohttp/2016/04/22/asyncio-
aiohttp.html)

~~~
mattbillenstein
> The key difference is that using the async module requires modifying every
> function in the call stack

This is why I still prefer gevent - it's easy to do async I/O using familiar
patterns (gevent Pool works like multiprocessing Pool) while mostly writing
blocking code.

------
badrequest
Just wanted to say I appreciate how ya'll made a bucket available to test the
code on. More of this from technical blogposts, please!

~~~
mrosett
I’m glad you appreciated that! I want results to be reproducible.

------
dekhn
I use very simple code- multiprocessing wrapped around s3.download_file and
see gigabit+ throughput with a single core.

