Jython and IronPython do not have a GIL. Multiprocessing avoids the GIL. Blocking on I/O gives up the GIL. There are all kinds of techniques used instead of throwing threads naively at every problem. And, conveniently, none of this is mentioned in the article. Either the author was not aware of these basic facts, or suppressed them.
It is blatantly false that "no single issue has caused more frustration or curiosity for Python novices and experts alike than the Global Interpreter Lock." The author may consider it important, but this does not mean that author is speaking for everyone else.
Novices would have good reason to avoid shared-everything threading, which introduces piles of race conditions and difficulty controlling runaway threads, and should try simpler tools first and see whether they can get good results instead of prematurely optimizing with techniques they don't know how to use.
Experts will know that the GIL is often not a primary concern, and where it actually is a concern they'll be conversant with other tools like multiprocessing and task queues.
The people with the most to say about the GIL are mediocre programmers who want to show off that they are so good Python is limiting them, and people not very familiar with Python (possibly with background in languages which try to make threads the answer to everything) who have an axe to grind.
Instead of asking how it is to do what they want to do, they just assume that the problem is the GIL and there is no solution, then expect to be praised for their technical acumen. People with technical acumen just solve the problem in any of the available ways instead of bitching in public about how it's the tool's fault they can't solve the problem because they have defined the problem incorrectly and insist on some arbitrary way of doing it.
I'm sorry you didn't find the article useful, but perhaps you're not in the article's target audience. I simply wanted to give Python novices some information and background about the GIL.
That said, the fact that other implementations do not have a GIL isn't relevant to the article; it specifically refers to the cPython implementation. And your observation that multiprocessing avoids the GIL is explicitly mentioned in the article. To say "blocking on I/O gives up the GIL" is true in a very narrow sense but not very interesting. Import any third party package using C extensions and you now need to worry about how well the author manages the GIL.
But unfortunately I have to agree with the gp as well, a part of it is FUD. The FUD is not in what was said (most what was said about the GIL is true) the FUD is in what was not said, and that is that threads are there for a reason, and they do provide good speedup in a large number of applications, namely those that are IO bound.
I don't think that fact is too hard for novices to grasp so it should be me mentioned. The perception otherwise is that the creators have simply gone insane and decide to add threading to the language but you should never use it because it doesn't work (so why didn't they just remove it then?). Well it does work in large class of problems.
Think about nodejs. Like it or not it has become popular recently. It doesn't by default support or take advantage of multiple cores yet it is often found to be performant enough to handle a decent number of concurrent clients connecting. Granted it doesn't pretend to have threads in the first place but it is an example of a modern, useful ecosystem that does not take advantage of all the cores.
> To say "blocking on I/O gives up the GIL" is true in a very narrow sense
Not it is not. It is true in a very general and wide sense. Try it out! Spawn 50 threads and try to download 50 different pages, you'll notice a speedup relative to doing it all sequentially in a while loop.
> Import any third party package using C extensions and you now need to worry about how well the author manages the GIL.
Also misrepresents the facts a bit. GIL actually is supposed to make it easier to write C extensions. If the C extension doesn't mess with the GIL it is straight forward to write because the GIL is there. If the GIL wasn't there calling C extensions and returning or having a callback in Python from C would be a pretty complicated affair.
Now you can play with the GIL in C and release it to achieve parallel speedup, and I have done that, but that was in a few cases already over the years.
I was once one of the very noobies that your article describes. In fact, I even made the same Stack Overflow post asking why threading made my program slow to a crawl. I did then (thanks to a friendly answer) learn about multiprocessing rather than threading, but I never actually got around to learning -- or even thinking about!-- what exactly the GIL is and why it makes threading terrible for CPU bound tasks.
Point being, I enjoyed it.
David Beazley does a good job on explaining the GIL. There are multiple videos of his GIL talks on YouTube. This one, for instance: http://www.youtube.com/watch?v=Obt-vMVdM8s
He has a page about this at http://www.dabeaz.com/GIL/
Thank you for writing it.
2) There are in fact cases where forking isn't a reasonable solution. For one of my previous jobs I developed a multi-threaded server with Python (before I learned about the GIL). The problem with forking came in the fact that there was one particular variable that needed to be shared between threads for processing. It happened to contain about 5GB of data, memory which I didn't want duplicated and which would have been a pain in the butt to keep synced between processes.
For most of these shared data structures I could do multiple processes and just use Redis as a shared data store. However, this piece of data was accessed far too frequently and that would have just been a major performance bottleneck.
To say the least, I was terribly frustrated when I learned about the GIL. I wish they would put a big warning up on their home page or at least in the docs relating to threading.
The engineer routes around the problem, the scientist tries to understand the problem better. An engineer with a deadline will not be held up in any significant way by the GIL. The engineer will use one of the many ways you've already listed to solve the problem and move on.
The scientist, not bound as strictly by deadlines or by "just being done", will explore the issue more leisurely, focusing on the whys and what the problem means. The scientist will explore the implications.
I think only you are attempting to "blame" Python for this problem - the author's tone in no way suggests that the problem is in Python itself or even the CPython implementation. There's a distinct, "here's the lay of the land on this issue" about this article, rather than a "Python is at fault for these reasons..." that you seem to be interpreting.
While I agree superficially in many ways with your characterization of engineers on deadlines versus scientists who will explore certain issues more closely, I think there are a number of things wrong with your comment:
1) As a scientist I frequently had to deal with the GIL. A major part of my time was taking C libraries, making SWIG wrappers for them, and using the Python SWIG libraries to do productive science. Once SWIG-wrapped, a good C library is very quick to do prototyping but it requires close attention and a good understanding of CPython API threading details to ensure your SWIG libraries are safe for other scientists.
2) Nor are "engineers on deadlines" going to accept the GIL in CPython. I'm an engineer on deadlines and after 18 years of programming in Python, I've switched to go because frankly, I can't see Python having much of a future anymore.
3) Ultimately, engineers and scientists are the same thing. 24% of the time of every scientist is spent coming up with some engineering solution in a hurry so you can get your experiment to work, (the other 75% of the time is spent writing grant proposals begging for money for the engineering and writing papers so that your grant proposals get approved, and 1% actually feeling true scientific inspiration). The whole "leisurely" word you use doesn't apply to how scientists anywhere in the real world work (except perhaps somebody with awesome funding nearing retirement, or perhaps some wealthy citizen-scientist with a home lab).
I would think of it like going to the moon. Curiosity took us there, but a lack of a financial motive to continue going there made future missions more difficult, to the point where we stopped going.
People have effectively gotten around Python's GIL for most engineering purposes. Those who remain aren't doing so because they have a problem to solve, they remain because the GIL is is the problem to solve. An end, rather than a means. This makes it... well you're right, not leisurely, but certainly more pensive an activity than trying to solve another problem and having to "overcome" the GIL. I doubt anyone these days would get paid to fully remove the GIL implementation in CPython.
And yes I get it, you're old as fuck and know more than everyone younger than you. Neat.
In Scala/Java, I might build a single immutable object (taking up e.g. 1kb) and transmit it to 10 actors. They use it as needed and let the GC deal with it when finished. In Python, I need to serialize it, transmit it 10 times and use 10kb of memory to store the copies.
The GIL is a flaw in the language. We should accept that. There are workarounds and hacks, but the GIL is still a flaw.
(Incidentally, my background is in python. My usage of Scala/Java is far more recent.)
I don't necessarily agree that it's a flaw, but that's another discussion entirely.
"You" are not required to do the serialization. Multiprocessing does it automatically behind the scenes. The only time it becomes relevant is when something can't be serialised.
It is correct though that serialization consumes cpu and time while it is happening - something that doesn't happen when all actors are local to the process. However the moment you do serialisation you can also do it across machines, or nodes within a machine which gives far greater scope for parallelism assuming the ratio of processing work to size of serialised data is large.
You can also do that with Akka, for example.
It's true that you can't avoid serialization when you need to work across multiple boxes. That doesn't mean serialization and IPC should be forced upon you the minute you want to parallelize. There are a LOT of jobs that can be handled by 2-8 cores, provided your language/libraries give support for it.
The GIL was not a problem.
Stackless still has the GIL - it facilitates concurrent, not parallel programming. Stackless Python programs run on a single core, with cooperative task switching between microthreads.
Why would you want this? From the website:
* Improved program structure.
* More readable code.
* Increased programmer productivity.
"The real world is ’concurrent’. It is made up of a bunch of things (or actors) that interact with each other ina loosely coupled way with limited knowledge of each other." -- Grant Olson "Why Stackless" 2006
"Concurrent computing is a form of computing in which programs are designed as collections of interacting computational processes that may be executed in parallel." -- http://en.wikipedia.org/wiki/Concurrent_computing
"However, if you've been around the Python community long enough, you might also know that the GIL was already removed once before--specifically, by Greg Stein who created a patch against Python 1.4 in 1996." (Also mentioned in the OP)
More info can be seen at http://dabeaz.blogspot.nl/2011/08/inside-look-at-gil-removal...
I guess back in 1996 a system like that may have been considered overkill because multithreading was still pretty exotic.
But if even this overhead is too much you can compile python without treading support.
This isn't completely true. If you are doing anything non CPU bound, using threads is trivial, as the GIL will allow you to perform IO in parallel.
(Not to mention that Python has many event-driven I/O options available which are generally more efficient than threading)
Nope. The writer sounds misinformed and is spreading FUD.
I have successfully used Python's threads to perform concurrent database fetches, http page getters, file uploads in parallel. Yes, there was almost linear speedup.
If you listen to this story it sounds like Guido and most other talented and smart Python contributors added threads to Python just to fuck with people heads -- "thread don't work but let's add them anyway! just to mess with them!". Nope they added them because there are many cases when they work.
The answer is if you handle concurrent I/O Python's threads will give you good speedup. Threads are real OS threads and come with nasty side-effects if using shared data structures, but make no mistake you will get the speedup.
Your mileage may very and everyone is probably biased and has a different perspective, but where I am coming from in the last 10+ years I have written mostly I/O bound concurrent code. There were very few cases where I hoped to use extra CPU concurrency.
Now I did have to do that a couple of times and if you do have that issue, most likely you'd want to descend down to C anyway. Which is what I did. Once in C you can release the lock so Python can process in parallel and your C extension/driver can process in parallel. This is exactly what I did.
Now wouldn't it be nice if Python had CPU level concurrency built in. Yes it would be great. But I don't think that is the #1 issue currently. We still don't have 16 cores on most machines.
I remember for years scouring the web for a Twisted version of an already existing library,because I had made the mistake of picking Twisted as the I/O concurrency framework. Regular library module is available, oh but I need to return a Deferred from it of course, in order for me to use it.
>I have successfully used Python's threads to perform concurrent database fetches, http page getters, file uploads in parallel. Yes, there was almost linear speedup.
It's not quite that simple - especially for the novice. Not every IO oriented library is thread safe (urllib2) and not every C extension remembers to give up the GIL, so no - threads do not always automatically get you increased IO throughput.
Worse - other languages do set you up to expect to be able to interleave IO and CPU bound code with threads. This frequently doesn't work well in Python. And finally - yes I have written CPU bound code and I think the techniques (processes, numerical libraries, Cython, alternative runtimes, etc) are sufficient to meet my performance needs it is a little annoying to feel that I'm giving up on 3/4 of the horsepower in my 4 core box by default...
Yes, for that, the Python threading model works fine.
The problem is if you're doing a lot of processing on the threads and/or passing data between them.
For CPU concurrency, go with Multiprocessing, works like a charm (maybe using ZeroMQ between the processes)
As for why it hasn't been solved yet...the api for threads and processes is pretty much identical. Since you're just as well off using a process in the majority of cases, that's we we go with.
Actually, the reason it hasn't been "solved" yet has much more to do with the cPython implementation rather than the fact that we can just use multiprocessing. There is a ton of globally shared data in the cPython implementation. Retrofitting a locking scheme granular enough to obviate the need for the GIL while at the same time not negatively impacting single-threaded performance is decidedly non-trivial.
The PyPy guys are making decent progress by attacking the problem from another angle: using software transactional memory to automatically resolve data conflicts that arise from multiple threads mutating data simultaneously.
If your problem can map to distributed memory techniques, then you have multiple advantages over shared memory programming. Most importantly you can parallelize over multiple machines. Other advantages include decoupling of each parallel task from each other (fewer race conditions and other hard to debug problems).
There are several ways to achieve distributed memory parallelism in Python: multiprocessing, zeromq, raw tcp/ip sockets and mpi4py. Which approach makes sense to use will depend on your problem.
Concurrency is the 'tricky bit' of the 'algorithms' pillar .
 The Four arbitrary pillars of computing (algorithms, languages, systems, and data structures) I
Threading creates new OS-level threads, but whenever your code is being run by the bytecode interpreter, Python holds the global interpreter lock. This is released during I/O operations and a lot of the built-in functions, and you can release it in any C or Cython extension code you write. If you're running into Python speed bottlenecks, you can usually get significant speedups with very little effort by moving the bottlenecky code to Cython and maybe adding a few type declarations.
Multiprocessing spawns a pool of worker processes and doles out tasks to them by serializing the data and using local sockets for IPC. This naturally has a lot of overhead, and there's some subtlety with the data serialization. So, be aware of that. The nice part, though, is that you don't have the GIL, which can sometimes speed things up.
You can skip all conversation about the GIL and threads neatly by simply preferring a different concurrency model. There are plenty of ways to do this (see above re: bikesheds and their colors), but being permanently tied to CPython and the threading module is increasingly uncommon for professional Python, and it isn't as unavoidable as things like networking or even which language you're going to use.
Edit: I see that the author's in this thread. Nicely written article, but a tad hyperbolic.
What issues do you see with python2 to 3 transition?
However, from the later list, there are some really damning packages. For one, we're going to have transition off Twistd, which is much more than just an event loop. Py3 might not have enough advantage(s) to compel us to make that transition.
So, just wait to migrate until twisted is ready?
To the point of the comment you first responded to, there is lot of issues not being "solved" with respect to network libs, their upgrading to Py3, and Py3 looking to step into that arena. In other words, there is no horizon for networking libs updating, and they may feel even less motivated if the Py3 crowd is saying, "don't worry about it, because we are going to standardize our way." Does the Py3 transition-clock only start then?
I'm not actually seeing an issue with python 2 -> 3 transition which is what my original comment was asking about. I see a lot of negativity surrounding python 3 and talks of fracture, but no actual evidence of it.
Most major projects have not actually switched to Python 3, that just support it using 2to3.
I'm still not convinced about the whole Python 2 to 3 conversion being worth it (asides from the fact that not porting means your project is seen as not being actively maintained). It has led to added complexity in code so that it works on both versions, increased difficulty in packaging/distribution, having to test/debug on both versions, obsolescencing vast amounts of code that will never be ported, confusion among new users, untold man-hours being spent, ...
I don't mean to be sound so down on Python 3, but it has caused me nothing but additional work and frustration. The supposed benefits of Python 3 are still years away.
As for the GIL, the OP is right that it's a CPython implementation problem, not a structural one that has to exist in Python. There is some exciting work that has been done in the direction of fixing this at the implementation level, and we are excited about seeing how far we can take it. Stay tuned...
Also, I'm not quite clear on your issue with packaging. Can you be more specific? Compared to C, I find importing existing code to be nearly heavenly.
Regarding backwards compatibility, maybe it's an issue, maybe it's not. On the one hand it's frustrating and some people will make it an issue. On the other hand, I can't help but think of cases like Apple, where they were the first to abandon modems, serial ports, diskette drives, and CD drives—but probably made the right decision.
What do you use to package? distutils, setuptools, distribute, distutils2, pkgutils...? How to choose?
Not just network libraries; How many cli modules are there now?