Python Concurrency: The Tricky Bits

quasirandom · on Feb 18, 2021

A really good way to learn about concurrency is to write a SOCKS5 proxy. Start with a simple one using Python threads, then Python asyncio, and if you're brave C sockets with poll/select. The C parts are in Beej's guide to network programming. The SOCKS5 spec is very simple and has an RFC.

tasubotadas · on Feb 17, 2021

Tricky bits? Thanks to GIL, it's just broken and doesn't work.

chrisseaton · on Feb 18, 2021

If you don't need parallelism, Python's concurrency model with its GIL is good enough and not broken.

BerislavLopac · on Feb 18, 2021

"Thanks to GIL, multithreading in CPython is just broken and doesn't work."

Fixed that for you.

rbanffy · on Feb 18, 2021

"Thanks to GIL, multithreading in CPython is just broken and doesn't work."

Thanks to the GIL, multithreaded Python code doesn't have to worry too much (you do, a little) about shared mutable objects.

readmodifywrite · on Feb 18, 2021

The GIL is not there to prevent data corruption on shared objects - it only protects the interpreter's internal state. The fact that sometimes you can get away with it is an accident of the GIL's implementation, not a feature anyone should rely on. It also means that you cannot rely on that behavior not to change on a successive versions of CPython.

The only safe way to do shared state between threads in CPython is locks/mutexes/message passing/etc. Even things like a simple addition to an integer are absolutely not made thread safe by the GIL.

rbanffy · on Feb 18, 2021

No, but at least you won't have two cores incrementing the same integer at the same time.

helgie · on Feb 18, 2021

You can use numba to define functions that do not need to hold the GIL to run.

targafarian · on Feb 18, 2021

To say concurrency is broken in Python because of the GIL seems to be reiterated adage without actual wisdom behind it.[1]

Are you writing an IO-bound task? Then asyncio[2] (even if just viewed as an interface: e.g., you can make it even more performant by using uvloop[3]) is a great way to achieve performant concurrency. As Raymond Hettinger likes to point out, even if you throw away the GIL, you will have to implement locks on shared resources (unless you're writing embarrassingly parallel codes--in which case either you can invoke them separately or use the multiprocessing module to great effect, or see how I do it easily below), and by the time you get all the locks implemented in your GIL-less world (locks which are error-prone), you've given up the performance you thought you'd get due to throwing away the GIL.

Are you writing a CPU-bound task? I've only encountered this in scientific computing, as I'm a scientist, and for this (as another commenter mentions) I either use numpy[4] (strictly within a numpy call, the GIL is not held) or numba[5] (by default the entirety of your numba-fied functions do not hold the GIL). Therefore, I can either use Python threads or asyncio or Numba's built-in parallelism[6,7] (summary: some numpy routines for working with arrays of values are auto-parallized if called within a numba-fied function, and you can explicitly parallelize your own loops via `for i in numba.prange(N):`; in any case, then decorate the function with `@numba.njit(parallel=True)`) to run these numerical routines in parallel. I easily get 100% of all my CPU's threads doing work this way.

I have parallelized numba-fied functions both via numba.prange and via python threads (yes, threads) in the past few weeks to speed up 20-60 minute-long-running codes, and even though I've done this before, I was still very pleasantly surprised at how easy it was, and I got very nearly a 16x speedup on my 16-thread laptop, which made my code practical for interactive data exploration and building up simulations to help me understand some things I was observing, which is where Python really shines for me.

Also note that writing a routine in numba provides a straightforward path to putting your numerical codes on GPU[8,9], too, if the that model makes sense for your algorithm (though I've primarily used PyCUDA for this, which worked well).

Are you writing something that needs to use distributed (e.g., cluster) based resources? Then dask (dask.distributed) is a great way to go, with a similar interface to the Python standard-library `futures`. Runs all the same code and you get the same forms of parallelism, but distributed across computers.

In any case, do you get to FORTRAN or C speeds? Probably not, but close enough, and with the benefits (and costs) of using Python code.

[1] talk by Raymond Hettinger: https://www.youtube.com/watch?v=9zinZmE3Ogk [2] asyncio: https://docs.python.org/3/library/asyncio.html [3] uvloop: https://github.com/MagicStack/uvloop [4] numpy: https://scipy-cookbook.readthedocs.io/items/ParallelProgramm... [5] numba: https://numba.pydata.org/ [6] parallelizing numba: overview blog post: https://www.anaconda.com/blog/parallel-python-with-numba-and... [7] parallelizing numba, docs: https://numba.pydata.org/numba-doc/dev/user/parallel.html [8] numba on gpu, example: https://github.com/numba/numba-examples/blob/master/examples... [9] numba on gpu, docs: http://numba.pydata.org/numba-doc/dev/cuda/index.html

tasubotadas · on Feb 18, 2021

Is this some kind of Stockholm's syndrome?

This has been discussed plenty of times and it basically boils down to "just get f*cked and work around it". In my case, I just use different programming languages for performance sensitive tasks.

But it's amazing how people can come up with all sorts of justifications for a broken system that they happen to love. And then people wonder how there are people who can support Trump :D.

targafarian · on Feb 18, 2021

To make this less personal and more specific for someone not enlightened on the topic, please point out how I am incorrect, specifically, or at least provide a source or point to such a discussion that shows how I am incorrect.

I am not a professional programmer, but I get a lot of work done very productively in Python (or at least I thought I was being productive; in light of how you speak to me, it sounds like I'm actually not being as productive as I thought I was, rather I'm in some world believing fake insinuations that what I'm doing is being productive...).

sleepysysadmin · on Feb 17, 2021

Threading is nice until you learn about the global interpreter lock. Then you go multiprocessing for everything.

dvfjsdhgfv · on Feb 17, 2021

OK, let's get things more straight:

* Threads are useful not just for speeding up, but also for parallel execution (storege/network io, graphics, ...)

* If you really care about speed and would like to parallelize CPU-bound code, you'll probably do better using another language like C/C++ (or Java, Go, D etc.) for this particular task

* In general, threads are used instead of processes because of the overhead involved in creating the latter

* Python multiprocessing module isn't the most efficient anyway, for example when passing large objects between processes because of the way they're serialized

* It also has some other quirks so I would definitely not advise to use it for "everything".

delduca · on Feb 17, 2021

https://nullonerror.org/2020/12/13/full-saturate-your-bandwi...

pybo · on Feb 17, 2021

Anyone using Dask (dask.org)?