In response to the multiple comments here complaining that multithreading is imp...

chrisseaton · on Aug 20, 2018

> complaining that multithreading is impossible in Python without using multiple processes, because of the GIL ... this is not true

I think some people's opinions is that if you're writing in C then you're not really writing a Python program, so they think it is impossible in Python. Which seems a reasonable point to make to me.

Your argument is that Python is fine for multithreading... as long as you actually write C instead of Python.

quietbritishjim · on Aug 20, 2018

Let's say I write this:

    def add_and_mult(a, b, c):
        return a + b @ c

If a, b and c are numpy arrays then this function releases the GIL and so will run in multiple threads with no further work and with little overhead (if a, b and c are large). I would describe this as a function "written in Python", even though numpy uses C under the hood. It seems you describe this snippet as being "written in C instead of Python"; I find that odd, but OK.

But, if I understand you right, you are also suggesting that the other commenters here that talk about the GIL would also describe this as "written in C". They realise that this releases the GIL and will run on multiple threads, but the point of their comments is that proper pure Python function wouldn't. I disagree. I think that most others would describe this function as "written in Python", and when they say that functions written in Python can't be parallelised they do so because they don't realise that functions like this can be.

chrisseaton · on Aug 20, 2018

This only gives you parallelism in one specific situation though - where operations like '+' and '@' take a long time. If they were fine-grained operations, then this doesn't help you.

If instead of operating on a numerical matrix, you were instead operating on something like a graph of Python objects, something like a graph traversal would be hard to parallelise as you could not stay out of the GIL long enough to get anything done.

quietbritishjim · on Aug 20, 2018

I agree that not quite all situations will be covered by coarse locks like this. But many will, and my original comment was meant to draw attention to those situations. Your previous comment seemed to be saying that everyone already knew about those, but I still believe that some people commenting or reading here weren't aware you could release the GIL with just a numpy call.

I did also concede that if you do have to write your algorithm completely from scratch, with no scope for using existing C extensions (be they general purpose like numpy or more specialist that implements the whole algorithm) then yes you'll be caught be the GIL, so I agree with you on that. But I also made the point that you'll be caught even more (orders of magintude more!) by the slowness of Python, so any discussion about parallelism or the GIL is a red herring. It's like worrying that you car's windscreen will start to melt if you travel at 500mph; even if that's technically true, it's not the problem you should be focusing on.

It's interesting you mention graphs because the most popular liberally licensed graph library is NetworkX, which is indeed pure Python and so presumably isn't particularly fast. There are graph libraries written as C extension modules but I believe they are less popular and less librally licensed (GPL-style rather than BSD-style). So I definitely agree that this is a big weakness of the Python ecosystem.

galangalalgol · on Aug 20, 2018

This is a good point. Fine grained parallelism works well with c extensions, but coarse grained does not. Which is why data science and MATLAB like tasks are so often done in python without worrying too much about the python penalty. But if you have a lot of small dense matrix operations, even VBA in excel will be faster by more than 10x because you keep popping back into python.

quietbritishjim · on Aug 20, 2018

I agree with the idea, but you have fine grained and course grained switched.

bogomipz · on Aug 20, 2018

Could you explain the return statement in your example? I only know the '@' as a decorator in Python. This looks like invalid syntax to me what am I missing?

augusto2112 · on Aug 20, 2018

It's matrix multiplication.

Has been in Python since version 3.5

https://www.python.org/dev/peps/pep-0465/

bogomipz · on Aug 20, 2018

Thanks, I'm not quite that up to date on my 3.x Python. Cheers.

ubernostrum · on Aug 20, 2018

A whole lot depends on what exactly it is that someone wants to get out of using threading.

The GIL means that a single Python interpreter process can execute at most one Python thread at a time, regardless of the number of CPUs or CPU cores available on the host machine. The GIL also introduces overhead which affects the performance of code using Python threads; how much you're affected by it will vary depending on what your code is doing. I/O-bound code tends to be much less affected, while CPU-bound code is much more affected.

All of this dates back to design decisions made in the 1990s which presumably seemed reasonable for the time: most people using Python were running it on machines with one CPU which had one core, so being able to take advantage of multiple CPUs/cores to schedule multiple threads to execute simultaneously was not necessarily a high priority. And most people who wanted threading wanted it to use in things like network daemons, which are primarily I/O-bound. Hence, the GIL and the set of tradeoffs it makes. Now, of course, we carry multi-core computers in our pockets and people routinely use Python for CPU-bound data science tasks. Hindsight is great at spotting that, but hindsight doesn't give us a time machine to go back and change the decisions.

Anyway. This is not the same thing as "multithreading is impossible". This is the same thing as "multithreading has some limitations, and for some cases the easiest way to work around them will be to use Python's C extension API". Which is what the parent comment seemed to be saying.

kjeetgill · on Aug 20, 2018

> All of this dates back to design decisions made in the 1990s which presumably seemed reasonable for the time ... Hence, the GIL and the set of tradeoffs it makes. Now, of course, we carry multi-core computers ... Hindsight is great at spotting that, but hindsight doesn't give us a time machine to go back and change the decisions.

Sadly I don't think this is _quite_ true. I believe GILs are used in a number of interpreters and fall prey to the common problem of where either coarsening locks or making them finer ruins somebody's day. I believe Guido Van Rossum hung the GILectomy on two main issues: The interpreter must remain relatively simple, and C extensions cannot be slowed down.

I'm not disagreeing with the decision (necessarily) but it isn't simply a layover from a bygone era. It was a decision that has been reaffirmed and upheld numerous times.

[0]: https://lwn.net/Articles/754577/ [1]: just google Gilectomy, it's been covered in a few places that I don't have handy.

ubernostrum · on Aug 21, 2018

I'm familiar with the various attempts to remove the GIL over the years.

The thing is, in the 90s the choices the produced the GIL as it exists were not bad ones; that's why I went to the trouble of explaining how it affects threaded code and why those effects can be considered reasonable tradeoffs for what was known at the time, in implementing threading (without completely breaking the ecosystem of Python + Python extensions, which was already significant even back then).

Of course, knowing what's known today about the directions computing and the use of Python went in, different decisions might end up being made, but at this point it's very difficult (more difficult than people typically expect) to undo them or make different choices.

chrisseaton · on Aug 20, 2018

Right I should have said parallel threads, not multithreading.

Rotareti · on Aug 20, 2018

Does anyone know how well Python and Rust team up compared to Python and C in practice?

ikornaselur · on Aug 20, 2018

I've yet to play with beyond just experimenting a little bit, but it seems it works very well.

I've mainly been looking at these resources:

https://github.com/rochacbruno/rust-python-example

https://github.com/PyO3/pyo3

Though I have not done rust <-> python in real practice

jwandborg · on Aug 20, 2018

Subjectively I'm really impressed by PyO3.

If you care about speed, Rust is supposedly as fast as C. The Rust ecosystem also has a lot of supposedly safe(!) tools for parallelism.

gpm · on Aug 20, 2018

I've done it once, converting about 15 lines of python to rust. It was completely painless and resulted in a large speedup (changed a hotspot that was taking approximately 90% of execution time in a scientific simulation to approximately 0%).

Type system and expressive macros seems like a big win over c to me.

quietbritishjim · on Aug 20, 2018

Care to share a bit more detail on how you did this? Was there some interfacing library that you used analogous to Cython/SWIG/etc.? Presumably you didn't code directly against the C API (in python.h)?

gpm · on Aug 21, 2018

The rust library interfacing with python is https://github.com/dgrunwald/rust-cpython This library understand things like python arrays, objects, etc. and provides nice rust interfaces to them. Basically I just have to write a macro that specifies what functions I'm exposing to python, and other then that I'm writing normal rust. On the python side I'm importing them and calling them like any other python library.

The build system is https://github.com/PyO3/setuptools-rust (which is linked at the bottom of the above readme).

charlescearl · on Aug 20, 2018

Also a nice short talk by Caleb Hattingh https://www.youtube.com/watch?v=NfnMJMkhDoQ

quietbritishjim · on Aug 20, 2018

> talk [about Cython]

That was interesting, thanks!

I really wish he had shown his numpy code. He said at 13:46 "Numpy actually doesn't help you at all because the calculation is still getting done at the Python level". But his function could be vectorised with numpy using functions like numpy.maximum or numpy.where, in which case the main loop will be in C not Python. I can't figure out from what he said whether his numpy code did that or not.

But either way, it's interesting that in this case the numpy version is arguably harder to write than the Cython version: rather than just adding a few bits of metadata (the types), you have to permute the whole control flow. If there's only a small amount of code you want to convert, I would still say it's better to use numpy though (if it actually is fast enough), because getting the build tools onto your computer for Cython can be a pain. And for some matrix computation there are speed inprovements above the fact that it's implemented in C e.g. matrix multiplication is faster than the naive O(n^3) version.

woolvalley · on Aug 20, 2018

Because we want first class python multithreading, like many other languages have. If we have to drop down into C, might as well use another language with first class multi-threading like java, kotlin, golang or swift and avoid all the other issues that come with slow GIL languages.

TheCondor · on Aug 20, 2018

The thread workers pool circumvent the GIL if you carefully follow some rules. I think the arguments and results need to be pickleable.

quietbritishjim · on Aug 20, 2018

I think you're thinking of the multiprocessing module, which uses separate processes to bypass the GIL. That's why the arguments and results need to be pickleable: pickle is a serialisation procotol, so it allows you to communicate the contents of objects between different processes. If you use threads within a single process, you don't need to pickle the objects; you just pass the object directly.

walterstucco · on Aug 20, 2018

that's not writing Python though