
Python has a GIL, and lots of complainers - gthank
http://blog.labix.org/2010/07/09/python-has-a-gil-and-lots-of-complainers
======
nkurz
Wow, that's quite a self-referential post! I don't know what Gustavo's role is
within Python development, but I really don't think he understands the
motivation of the core developers like Brett.

Here's from Brett's original:

    
    
      I have a finite amount of time to volunteer in helping 
      make Python what it is. Having people push upon me that   
      they either think I am failing because I can't find a 
      solution to a problem (the "complainers") or that the 
      solution that I help create is not good even though it 
      has not been implemented and proven to be a failure (the 
      "what about this?" folks) just sucks away that much more 
      time, preventing me from helping make Python better for 
      YOU. It gets to the point that I sometimes wonder why I 
      put in so much time and effort when so many people gripe 
      about the volunteer work that I put in to produce this 
      thing that is given to the world for free. 
    

And then Gustavo:

    
    
      I apologize, but I have a very hard time reading this and 
      not complaining. 
      ...
      I know this is just yet another complaint, though. I 
      honestly cannot fix the problem either, and rather just 
      talk about it in the hope that someone who’s able to do 
      it will take care of it.
    

Disgusting. Could you at least make an effort to convince the developer that
they personally would benefit from the massive amount of work you are
demanding of them, instead of just trying to bully them into it?

More from Gustavo:

    
    
      I can’t provide ideas or solutions.. I can’t fix the 
      problem.. they don’t even care about the problem. Why am 
      I using this thing at all?
    

Go a little farther: how does the developer benefit from your use of this
software? If you can figure out how to answer this question in the positive,
you've probably figured out a much better strategy for convincing others to
solve your problems for you. If you can't, perhaps there is some other
software that you could buy? If the developers don't care about the 'problem',
maybe it's not their problem after all?

~~~
nailer
I think it's OK to have an opinion on anything you like. You shouldn't expect
anyone to fix it for free though.

I've never written a VM, and I don't think I'd be very good at it. I've
written useful modules other people use, spoken at a Python conference, and
have other OSS code in various places. I'm not a succubus, at least I hope I'm
not. I don't even mind multithreading in Python, since I use Threads and
Queues and they work just fine for my purposes. But I'm sure, other more
cooler things would happen if CPython had the same kind of multithreaded
coolness that Jython and IronPython have. I don't think it makes me a bad
person if I say that.

~~~
nkurz
I agree: it's good to have and to express opinions. But as you point out there
is a real difference between "wouldn't it be cool if..." and "you really
should be working on this problem of mine" instead of doing whatever it is you
currently think is more important.

I don't even really know who is right about a Python GIL. When working with C,
I have a strong preference for separate processes and mmaps. When working with
Perl (which I use a lot more than Python), I've never really felt much urge to
use threading. Aesthetically, although I haven't really used it, I think
Erlang's message passing approach is much better than native threads. But
maybe Gustavo's right --- better threading would indeed allow lots of cool
things to be done.

That aside, what saddens me about the exchange between Gustavo and Brett is
the misconnect. It strikes me as obvious that the key for Gustavo here is to
show that it's in Brett's self-interest to improve threading in Python. I
didn't see any attempt of this. Instead, he seems to be taking exactly the
tack Brett just finished saying was ruinous to his morale.

~~~
jnoller
Except Gustavo is wrong in that he has to convince Brett that free threading
is a Good Thing. Part of Brett's post was _we already know that_ \- but we're
not willing to sacrifice backwards compatibility for it, and we do not
have/have not thought of any good solutions to fix it.

So having people say "this sucks you guys should fix it" 100x a day, with
increasing levels of "you owe me" or "you suck" tone in their voices is
depressing and frustrating.

~~~
j_baker
I don't disagree with you, but that's just part of working on a popular
project. There are plenty of armchair language designers who could _of course_
do everything better than you.

~~~
jnoller
Oh, I agree with you! But I think we both know that while we all end up
growing thick skins (or quitting) it can still get under your nails and make
you sad sometimes.

Especially when you've been at it awhile. Guido must have skin like a bloody
rhinoceros.

------
swannodette
It's interesting that people clamor for a language feature that doesn't work
well if you don't have:

    
    
      1) Cheap immutable data structures
    
      2) Excellent language support for managing mutable state
    
      3) A sophisticated VM designed around concurrency (parallel GC)
    

Python has none of these. Getting rid of GIL won't make parallel programming
in Python any easier. It'll just show you how hard it really is.

The other option is to adopt a language that _does_ provide these things. I
can think of a new Lisp that celebrates pragmatism over purism that fits the
bill pretty well...

~~~
alnayyir
People don't want to learn, work, strive, experiment, or otherwise do anything
that doesn't involve having the world handed to them on a platter.

A lot of people use Python because it was batteries included, because
everything you ever wanted was already done and made for you. I work in Python
on a daily basis, but I cringe regularly at the #python channel on Freenode
and the kinds of people the language attracts.

Python, quite simply, attracts people who don't want to code or learn
anything.

Those aren't the kinds of people who are going to learn Clojure, sadly.

Rubyists, god bless them, are a little more adventurous, for better or worse.

~~~
nostrademons
That's not really fair. A lot of Pythonistas are _also_ quite strong Haskell,
Scheme, or even Clojure programmers. On their own time. But the nice thing
about Python is that they can also get paid to write it.

The part about the GIL that I hate is that it directly affects my ability to
use Python _for work_. There's a good chance that a Python-based system that I
love will be replaced with _Java_ because the GIL simply isn't workable in a
multithreaded environment, and there're a bunch of other engineering reasons
why the system really should be multithreaded. I don't suffer _any_ GIL-
related problems in my personal programming, because my hobby projects simply
don't get to the scale where it matters. But most of my hobby hacking is in
Haskell anyway.

~~~
jnoller
What's the use case at work that's making it unworkable - just curious.

------
andraz
Brett in the comments points out that they would accept a patch doing away
with GIL if it keeps C module compatibility.

However the GIL cannot be done away without breaking compatibility with
existing C modules while keeping the performance up. That's impossible
requirement. Doing away with GIL means doing away with refcounting, or the
performance penality is unavoidable. Doing away with refcounting means major
changes of third party modules C API.

The issue people have is that Python 3.0 is introducing a bunch of stuff that
makes code incompatible in different ways - mostly for aesthetical reasons,
but does not deliver stuff that people are calling for again and again -
performance and dropping the GIL.

Oh, and for people saying "use multiple processes" have never really had a
problem where the process has multiple gigabytes of state that it needs to do
its work. The state that can be easily shared in multithreaded mode, but can
be only shared with a great deal of manual work in multiprocess situation.

Yes, you then do distribution over multiple machines too, but that doesn't
solve your problem that you cannot afford running a separate python process on
each of 12 cores, each process taking 10gb of memory. Because you only have
32gb.

However complaining won't help. Maybe Unladen Swallow or PyPy eventually
resolve the issue, but it doesn't look like it's on their priority list
either. Well, though luck for us lazy programmers who are getting all this
stuff for free.

So thank you Brett, Guido and others working relentlessly on Python. It's
great. You know what bothers us, but even if you don't ever fix it, I'll still
be grateful for all the hard work you did!

~~~
aaronblohowiak
"Oh, and for people saying "use multiple processes" have never really had a
problem where the process has multiple gigabytes of state that it needs to do
its work. The state that can be easily shared in multithreaded mode, but can
be only shared with a great deal of manual work in multiprocess situation."

Doesn't python have fork with copy-on-write semantics?

~~~
thwarted
The copy-on-write semantics are a function of fork, an OS call (which is why
it's available in python as os.fork), not a function of python. So if your OS
supports COW for forked processes, python will have it.

Check the man pages for fork(2) and vfork(2) for fork related COW information.
There's some interesting stuff in there.

As a side note, there's an interesting write-up on the portability of fork
between UNIX and win32 in the perlfork man page (forgive me for mentioning
perl in a python related thread), and the hoops that needed to be jumped
through to emulate fork on win32 and keep it compatible with real fork from
the visibility of the perl script itself.

------
reitzensteinm
"If you’re tired of hearing the same arguments again and again for 10 years,
from completely different people, there’s a pretty good chance that there’s an
actual issue with your project, and your users are trying in their way to
contribute and interact with you in the hope that it might get fixed."

I completely disagree. The design of complex software involves tradeoffs, and
even if you make very good decisions, there will always be disadvantages to
your approach that will get pointed out over and over. You can never make
everyone happy.

~~~
jnoller
Thank you; that's one of the nicest ways I've heard it put.

------
jrockway
In theory, the maximum speedup you could achieve by making Python perfectly
multi-threaded on a 4 core machine is 4x. In comparison, rewriting the
critical section in Haskell (a very Python-like language), you'll get a 50x
speedup. On one core.

The "dynamic language" family is not for high-performance number-crunching.
It's for gluing together extremely complex applications, at which it excels. A
2x speedup just doesn't matter much.

(You can use cooperative multitasking to scale out anyway; I can write a Perl
application that easily handles 30,000 open TCP connections each with its own
thread of execution; a stack, C stack, etc. And 30,000 is an OS limit, not a
Perl limit... if I had more sockets, I could serve many more connections.
Remember, most things that are hard to program are complex systems. Writing
the performance-intensive-but-simple parts in a different language is easy and
effective. Why waste the Python core developers' time making it good at
something it's going to be bad at, when they could be spending the time making
it better at something it's good at?)

~~~
nostrademons
Actually, no, it's worse than that. Threaded code on Python often runs
_slower_ on multiple CPUs than on a single CPU. David Beazley ran some numbers
on a simple benchmark than indicated that running the same program with two
threads on two CPUs was _twice as slow_ than running the same program on a
single thread on one CPU. It's not just a matter of not being able to use
extra cores: the GIL actively slows down the other threads through context-
switching overhead:

<http://blip.tv/file/2232410>

Also, the serialization introduced by the GIL hurts worst when you're gluing
together complex applications. Say that you have a finely-tuned C++ app that
spends about 20% of its time in an embedded Python interpreter (number chosen
to make the math easy). Now you make the C++ multithreaded and run it on four
cores. The 80% of the CPU time spent in C++ gets parallelized effectively, so
now it's only 20% of the original runtime. But the Python access is all
serialized by the GIL, so it's still 20% of the original time, or _50%_ of the
new runtime. At this point, you're highly likely to get thread contention
issues between different threads attempting to acquire the GIL (as per the
video), so your Python performance goes down even more - which makes thread
contention even more likely, and so on.

By contrast, if you'd rewritten the Python in Haskell, the most you could get
was a 25% speedup, because most of the execution time was spent in C++ anyway.

~~~
jrockway
Yeah; I've read that many implementations of GIL-less Python make it
significantly slower overall. Slow on one core, slow on four cores. Great.

With repsect to your second example, I would argue that if you're gluing, you
put the glue on the outside of the core -- hence, your single-threaded Python
wrapper would be running your fast, multi-threaded Haskell or C++. In that
case, the only pain point is supplying data and interpreting the results,
which is usually well inside the realm of Perl/Python/Ruby's capabilities.

~~~
nostrademons
Most large-size mixed Python/C++ systems have multiple layers:

<http://c2.com/cgi/wiki?AlternateHardAndSoftLayers>

"Inside" or "outside" of the core doesn't have much meaning in this context.
Typically, you have a C++ driver that provides your application's main(),
which invokes the Python interpreter, which can then call back into C
libraries. Or you might wrap that itself with a Python script. That's the
approach that the Tornado web server takes: a Python driver program, which
enters an epoll select loop, which calls back into the Python to handle
requests, which may themselves call functions written in C. The point is that
it's not just Python scripts calling super-fast C libraries, it's also often
large C++ apps invoking Python scripts for lesser-used features.

------
dfox
I think that GIL is pretty tangential to multi-core. If you want something to
execute in parallel on SMP system it is always better to have it in separate
address spaces. This is even more true in dynamic language runtime, because
almost anything you do (such as accessing global namespace) requires some kind
of synchronization and for CPU-intensive code slowdown from this
synchronisation tends to be larger than from sequential execution with GIL for
almost any significant number of concurrent threads (such as 3 threads on Core
2 Quad on Linux 2.6.2something)

~~~
runT1ME
> requires some kind of synchronization and for CPU-intensive code slowdown
> from this synchronisation tends to be larger than from sequential execution
> with GIL for almost any significant number of concurrent threads (such as 3
> threads on Core 2 Quad on Linux 2.6.2something)

what? You're saying the _absolute worst case_ synchronization scenario is on
par with the GIL, so you might as well not have it?

------
CrLf
I'm not a very heavy user of Python, but I have used it for a few projects.
I've never hit the need to have multiple threads in a Python project, but I
wonder:

Is the GIL that much of a bottleneck? Isn't the multiprocessing module and its
locking primitives enough for most work where CPU-bound processing needs to be
distributed over multiple cores? When that fails can't that part of the
software be done in C with Python as a wrapper language?

I see people here complaining about how the GIL limits their use of Python for
serving requests and whatnot. Stuff that looks like the canonical example of
something that can be solved by a multi-process module.

But I also see people complaining about stuff that is really CPU-bound,
without seeing them state that they already overcome the main problem with
this stuff, which is making them (theoretically) parallel in the first place.

I'm from an unix background, where threading is as heavy (or as lightweight)
as multi-processing. Maybe much of the complainers come from Windows, where
the equivalent of fork is very heavy compared to threading. Maybe the problem
is under Python, and not really the GIL.

I'm not saying that the GIL is not a problem. But if it hasn't been removed
yet, maybe the problems that come from removing it are even worse.

And Python isn't the only tool around. You can combine languages in a single
project...

~~~
jnoller
As the maintainer of multiprocessing, I can say I've never really been
bothered by the GIL. I know about it - I know that if I have CPU bound tasks,
I'm going to better off using processes, ergo, multiprocessing. I spin up some
processes, pop in a queue and I'm off the the races.

That said - the majority of my code uses plain old python threads. Web load
testing tools, subprocess execution, etc, etc - anything with I/O works fine
for me contained in threads and since thats where I spend _most_ of my time
(in I/O) they work fine for me.

In my last "multiprocessing heavy" chunk of code, I wasn't using it for local
processes and work-sharing. I was using it to spread work over a network of
hosts using managers and the other tools within it.

The one gotcha with jumping in between the two is serialization. When you deal
with multiprocessing, your objects which are passed in between the
pools/queues/processes much be pickle-able - this means for tasks which
contain lots of unpickle-able, shared state multiprocessing _is not a good
answer_.

Essentially, having free threading in cPython would mean that you could have
your cake (concurrency on multiple cores) and eat it (without incurring
serialization, mutable shared state) too.

------
aaronbrethorst
I don't know anything about Python, and had no idea what the GIL was until I
scanned through the article, and found out a little more about it 5 paragraphs
from the end. Please follow the inverted pyramid structure!
<http://en.wikipedia.org/wiki/Inverted_pyramid>

Here's how the blog post could have started:

"CPython, the standard Python implementation, cannot use coroutines,
lightweight processes, fork/join frameworks, and other non-sequential
programming techniques due to its Global Interpreter Lock, or GIL. Brett
Cannon, a Python core developer, unfairly dismisses this fundamental flaw:
<quote from brett>"

------
mfukar
Let me start by saying I don't like the GIL. It's limiting and buggy (last
week we discovered a bug involving GIL and on-demand imports for constructing
Unicode objects...it's hiding pretty well). However, if I were to judge it as
a design decision and not as something I can "fix", I would seriously say I'm
neutral about it.

Why? It's pretty simple: since the language doesn't have any acceptable way to
provide concurrency with threads, I'm going OS on its ass. And so should you,
"complainers" or not. The solution is probably not ideal, and it's cetrainly
_A Bad Thing_ to many of you, but Unix programming showed the way. That's
right. We should be doing more of it (in this context). A lot more of this
(again, in this specific context). I’m talking about fork(2), execve(2),
pipe(2), socketpair(2), select(2), kill(2), sigaction(2), and so on and so
forth. These are our friends. They want so badly just to help us.

Sure, it's not always nice. Typically, if you want to share a large amount of
state, you'd have to do it manually, which would result in code not so pretty
as the Python lovers usually write. That may be a disadvantage to them. This
point is moot, because I remember my professors screaming to us to avoid
shared state as much as possible when doing concurrent programming - if you
need to do it, you should rethink your solution before you rethink your tools.

I've agreed with Brett on multiple occasions in the past about this; people
seem to think that changing the way Python is implemented to encompass more
and more of certain features _the way they like them_ would somehow make the
language more powerful and popular and all around awesome. That's not true.
Most of the "complainers" are people that chose Python because of its power,
not wanting to actually learn anything other than an API. Well, folks, you
can't have a single tool for every job.

------
cageface
I'm skeptical that allowing threads in the interpreter is really the right
away to achieve good concurrent performance. Xavier Leroy makes the case
better than I can:

[http://caml.inria.fr/pub/ml-archives/caml-
list/2002/11/64c14...](http://caml.inria.fr/pub/ml-archives/caml-
list/2002/11/64c14acb90cb14bedb2cacb73338fb15.en.html)

~~~
danieldk
Did you read this quote near the end?

"Shared-memory multiprocessors have never really 'taken off', at least in the
general public."

The assumptions that held in 2002 seem to be gone these days. There are many
8-16 CPU users these days, especially in OCaml's main public (academia).

~~~
cageface
True. In that respect the situation has changed quite a bit and you might
imagine that the potential gains might be significantly higher.

The extra complexity still sounds pretty daunting though. You're going to make
development of Python core for the common case (single-threaded) a _lot_
harder for the benefit of the exceptional case. I think it's fair to say that
most of the people that really need full-bore SMP performance are going to
want a faster, lower-level language than Python to do their heavy lifting in
anyway. Game developers are probably ahead of most of the rest of us in this
area.

------
ErrantX
All I will say on this matter (to any post about the GIL) is that Python is a
language I love very much.

But my main work use is concurrency and threading; writing that native
threading in C is fine and dandy, but it does seriously offset the benefits of
doing the rest in Python.

(for the record; I do understand and agree with the rationale behind needing
the GIL - but it is frustrating to come up against it when trying to push
python to your boss as a "really simple language we could hack this up with")

~~~
jnoller
I never had this problem; we use python threads _heavily_ at work - they're
all I/O bound and they work just fine. We switch to multiprocessing when it's
CPU bound. Works pretty well.

------
JabavuAdams
Unfortunately, this makes CPython fairly useless for high-performance games
and tools.

Obviously, that's not its primary market, but this GIL thing is a non-obvious
land-mine that people need to be aware of when considering Python for a
project.

The problem with the "just re-write the slow parts in C or C++" argument is
that in my experience, this isn't actually faster than just carefully
designing C++ from the start, using modern libraries. It's really only faster
for people who are really awful at memory management.

------
keytweetlouie
Thanks for pushing to get python in a more competitive position. Even if you
don't like threads it's a bad idea to not let anyone use them. With growing
multi-core machines developers want all the options available to them. I don't
want my language to tell me what I can't do. Running python on other vm's is a
fine idea but the most effort goes into cpython NOT jython or others. I
wouldn't want great features on a VM that gets little attention from the
python core developers. Ironically I'm complaining and not helping.sorry. I
would love to tackle these issue's if I had the time.

------
xenophanes
_Python programs execute in sequence. No Fork/Join frameworks, no coroutines,
no lightweight processes, nothing. Your Python code will execute in sequence
if it lives in the same process space.

The answer from Brett and Guido to concurrency? Develop your code in C, or
write your code to execute in multiple processes._

Could someone explain this to me? It says no fork/join to get concurrency, but
then it says you can use multiple processes. Fork makes another process. I'm
confused. What's a "process space"?

~~~
fjh
It says "no Fork/Join frameworks", not "no Fork/Join". You can definitely fork
processes in Python: <http://docs.python.org/library/os.html#os.fork>

Using different processes allows you to execute code in parallel, but
different processes can't access each other's memory, while threads within a
process share their memory space and can therefore operate on the same data.

~~~
xenophanes
Is there a different kind of fork that would make "fork/join frameworks" work?

------
afhof
I am sure it has been said before, but it might not be a bad idea to try what
Linux is doing. The BKL is used less and more granular locks are starting to
take its place.

~~~
pmjordan
It's not quite so easy. The kernel is in a special position in that it can
distinguish between and defend against two types of race condition:

\- races caused by different _CPUs_ accessing the same resource. Unlike
userspace, the kernel knows what each CPU is doing (roughly) at any given
time.

\- races caused by preemption. Basically, this means the running
thread/process is paused and the CPU is scheduled to run a different
thread/process which then does something that interferes with whatever the
original thread/process was in the middle of doing.

First, the kernel can prevent the latter altogether by marking sections of
code non-preemptible. What this means for races is that you can stop the CPU
from being forced to context switching while accessing a contented resource,
thus guaranteeing speedy progress out of the critical section.

Secondly, there are spinlocks. This means that if a thread of execution tries
to gain exclusive control over a contented resource, instead of relinquishing
control over the CPU (thus rescheduling/context switching), it just sits there
in a live loop waiting for the resource to be freed. On the surface, this
seems like a bad idea: the CPU can't do _anything_ while it's waiting for
another CPU to do its stuff with the contended resource, and it's just wasting
precious cycles. However, because the kernel is a known entity, it's possible
to guarantee that all resource accesses will be extremely short because the
critical section running on another CPU is non-preemptible. Generally, this
technique is used when the critical sections are (much) shorter than the cost
of a context switch.

In userspace, the two types of races are indistinguishable, and you don't have
separate weapons for fighting them. Spinlocks _are_ a really bad idea in
userspace unless you know your software is the only (major) user of CPU time
on the system, and only runs in as many threads as there are CPUs.

tl;dr: The kernel (a) has more information and (b) more control, so you can't
apply those techniques to Python.

------
metachris
The comments are a great read!

------
j_baker
Am I the only heavy python user who didn't know who Brett Cannon is?

~~~
jackdied
Sadly no, but now I know for sure you aren't Jim Baker (the jython commiter).

Most python users have never heard of anyone other than Guido. That's fine
because people who contribute to open source projects don't do it for the
fame, and that's reasonable because you can't be expected to memorize the
names of all the people who wrote the tools you use. That said, it can be
frustrating being not Guido (Brett gets heaps of credit inside the community
but zero percent of Guido's name recognition).

1st story: my first PyCon I went out to the smokers deck and introduced myself
(this was '02 or '03 - I'm not sure we even had name badges). The other guys
turned out to be Tim Peters, Christian Tismer, and Alex Martelli. Python
jackpot.

2nd story: I told that first story to a random lunch table of people at PyCon
last year (as a rule I eat lunch with no one I know at cons) and the
collective reaction was: who?

So yeah, don't feel bad about not knowing who Brett is (but do buy him a beer
if you run into him). And definitely don't get involved in open source for the
fame. [Bonus trivia: Gustavo also has a commit bit, as do a couple other
people in this thread]

