In response to the multiple comments here complaining that multithreading is impossible in Python without using multiple processes, because of the GIL (global interpreter lock):
This is just not true, because C extension modules (i.e. libraries written to be used from Python but whose implementations are written in C) can release the global interpreter lock while inside a function call. Examples of these include numpy, scipy, pandas and tensorflow, and there are many others. Most Python processes that are doing CPU-intensive computation spend relatively little time actually executing Python, and are really just coordinating the C libraries (e.g. "mutiply these two matrices together").
The GIL is also released during IO operations like writing to a file or waiting for a subprocess to finish or send data down its pipe. So in most practical situations where you have a performance-critical application written in Python (or more precisely, the top layer is written in Python), multithreading works fine.
If you are doing CPU intensive work in pure Python and you find things are unacceptably slow, then the simplest way to boost performance (and probably simplify your code) is to rewrite chunks of your code in terms of these C extension modules. If you can't do this for some reason then you will have to throw in the Python towel and re-write some or all of your code in a natively compiled language (if it's just a small fraction of your code then Cython is a good option). But this is the best course of action regardless of the threads situation, because pure Python code runs orders of magnitude slower than native code.
> complaining that multithreading is impossible in Python without using multiple processes, because of the GIL ... this is not true
I think some people's opinions is that if you're writing in C then you're not really writing a Python program, so they think it is impossible in Python. Which seems a reasonable point to make to me.
Your argument is that Python is fine for multithreading... as long as you actually write C instead of Python.
If a, b and c are numpy arrays then this function releases the GIL and so will run in multiple threads with no further work and with little overhead (if a, b and c are large). I would describe this as a function "written in Python", even though numpy uses C under the hood. It seems you describe this snippet as being "written in C instead of Python"; I find that odd, but OK.
But, if I understand you right, you are also suggesting that the other commenters here that talk about the GIL would also describe this as "written in C". They realise that this releases the GIL and will run on multiple threads, but the point of their comments is that proper pure Python function wouldn't. I disagree. I think that most others would describe this function as "written in Python", and when they say that functions written in Python can't be parallelised they do so because they don't realise that functions like this can be.
This only gives you parallelism in one specific situation though - where operations like '+' and '@' take a long time. If they were fine-grained operations, then this doesn't help you.
If instead of operating on a numerical matrix, you were instead operating on something like a graph of Python objects, something like a graph traversal would be hard to parallelise as you could not stay out of the GIL long enough to get anything done.
I agree that not quite all situations will be covered by coarse locks like this. But many will, and my original comment was meant to draw attention to those situations. Your previous comment seemed to be saying that everyone already knew about those, but I still believe that some people commenting or reading here weren't aware you could release the GIL with just a numpy call.
I did also concede that if you do have to write your algorithm completely from scratch, with no scope for using existing C extensions (be they general purpose like numpy or more specialist that implements the whole algorithm) then yes you'll be caught be the GIL, so I agree with you on that. But I also made the point that you'll be caught even more (orders of magintude more!) by the slowness of Python, so any discussion about parallelism or the GIL is a red herring. It's like worrying that you car's windscreen will start to melt if you travel at 500mph; even if that's technically true, it's not the problem you should be focusing on.
It's interesting you mention graphs because the most popular liberally licensed graph library is NetworkX, which is indeed pure Python and so presumably isn't particularly fast. There are graph libraries written as C extension modules but I believe they are less popular and less librally licensed (GPL-style rather than BSD-style). So I definitely agree that this is a big weakness of the Python ecosystem.
This is a good point. Fine grained parallelism works well with c extensions, but coarse grained does not. Which is why data science and MATLAB like tasks are so often done in python without worrying too much about the python penalty. But if you have a lot of small dense matrix operations, even VBA in excel will be faster by more than 10x because you keep popping back into python.
Could you explain the return statement in your example? I only know the '@' as a decorator in Python. This looks like invalid syntax to me what am I missing?
A whole lot depends on what exactly it is that someone wants to get out of using threading.
The GIL means that a single Python interpreter process can execute at most one Python thread at a time, regardless of the number of CPUs or CPU cores available on the host machine. The GIL also introduces overhead which affects the performance of code using Python threads; how much you're affected by it will vary depending on what your code is doing. I/O-bound code tends to be much less affected, while CPU-bound code is much more affected.
All of this dates back to design decisions made in the 1990s which presumably seemed reasonable for the time: most people using Python were running it on machines with one CPU which had one core, so being able to take advantage of multiple CPUs/cores to schedule multiple threads to execute simultaneously was not necessarily a high priority. And most people who wanted threading wanted it to use in things like network daemons, which are primarily I/O-bound. Hence, the GIL and the set of tradeoffs it makes. Now, of course, we carry multi-core computers in our pockets and people routinely use Python for CPU-bound data science tasks. Hindsight is great at spotting that, but hindsight doesn't give us a time machine to go back and change the decisions.
Anyway. This is not the same thing as "multithreading is impossible". This is the same thing as "multithreading has some limitations, and for some cases the easiest way to work around them will be to use Python's C extension API". Which is what the parent comment seemed to be saying.
> All of this dates back to design decisions made in the 1990s which presumably seemed reasonable for the time ... Hence, the GIL and the set of tradeoffs it makes. Now, of course, we carry multi-core computers ... Hindsight is great at spotting that, but hindsight doesn't give us a time machine to go back and change the decisions.
Sadly I don't think this is _quite_ true. I believe GILs are used in a number of interpreters and fall prey to the common problem of where either coarsening locks or making them finer ruins somebody's day. I believe Guido Van Rossum hung the GILectomy on two main issues: The interpreter must remain relatively simple, and C extensions cannot be slowed down.
I'm not disagreeing with the decision (necessarily) but it isn't simply a layover from a bygone era. It was a decision that has been reaffirmed and upheld numerous times.
I'm familiar with the various attempts to remove the GIL over the years.
The thing is, in the 90s the choices the produced the GIL as it exists were not bad ones; that's why I went to the trouble of explaining how it affects threaded code and why those effects can be considered reasonable tradeoffs for what was known at the time, in implementing threading (without completely breaking the ecosystem of Python + Python extensions, which was already significant even back then).
Of course, knowing what's known today about the directions computing and the use of Python went in, different decisions might end up being made, but at this point it's very difficult (more difficult than people typically expect) to undo them or make different choices.
I've done it once, converting about 15 lines of python to rust. It was completely painless and resulted in a large speedup (changed a hotspot that was taking approximately 90% of execution time in a scientific simulation to approximately 0%).
Type system and expressive macros seems like a big win over c to me.
Care to share a bit more detail on how you did this? Was there some interfacing library that you used analogous to Cython/SWIG/etc.? Presumably you didn't code directly against the C API (in python.h)?
The rust library interfacing with python is https://github.com/dgrunwald/rust-cpython This library understand things like python arrays, objects, etc. and provides nice rust interfaces to them. Basically I just have to write a macro that specifies what functions I'm exposing to python, and other then that I'm writing normal rust. On the python side I'm importing them and calling them like any other python library.
I really wish he had shown his numpy code. He said at 13:46 "Numpy actually doesn't help you at all because the calculation is still getting done at the Python level". But his function could be vectorised with numpy using functions like numpy.maximum or numpy.where, in which case the main loop will be in C not Python. I can't figure out from what he said whether his numpy code did that or not.
But either way, it's interesting that in this case the numpy version is arguably harder to write than the Cython version: rather than just adding a few bits of metadata (the types), you have to permute the whole control flow. If there's only a small amount of code you want to convert, I would still say it's better to use numpy though (if it actually is fast enough), because getting the build tools onto your computer for Cython can be a pain. And for some matrix computation there are speed inprovements above the fact that it's implemented in C e.g. matrix multiplication is faster than the naive O(n^3) version.
Because we want first class python multithreading, like many other languages have. If we have to drop down into C, might as well use another language with first class multi-threading like java, kotlin, golang or swift and avoid all the other issues that come with slow GIL languages.
I think you're thinking of the multiprocessing module, which uses separate processes to bypass the GIL. That's why the arguments and results need to be pickleable: pickle is a serialisation procotol, so it allows you to communicate the contents of objects between different processes. If you use threads within a single process, you don't need to pickle the objects; you just pass the object directly.
This is just not true, because C extension modules (i.e. libraries written to be used from Python but whose implementations are written in C) can release the global interpreter lock while inside a function call. Examples of these include numpy, scipy, pandas and tensorflow, and there are many others. Most Python processes that are doing CPU-intensive computation spend relatively little time actually executing Python, and are really just coordinating the C libraries (e.g. "mutiply these two matrices together").
The GIL is also released during IO operations like writing to a file or waiting for a subprocess to finish or send data down its pipe. So in most practical situations where you have a performance-critical application written in Python (or more precisely, the top layer is written in Python), multithreading works fine.
If you are doing CPU intensive work in pure Python and you find things are unacceptably slow, then the simplest way to boost performance (and probably simplify your code) is to rewrite chunks of your code in terms of these C extension modules. If you can't do this for some reason then you will have to throw in the Python towel and re-write some or all of your code in a natively compiled language (if it's just a small fraction of your code then Cython is a good option). But this is the best course of action regardless of the threads situation, because pure Python code runs orders of magnitude slower than native code.