
Global Interpreter Lock, or how to kill it - kingkilr
http://morepypy.blogspot.com/2011/06/global-interpreter-lock-or-how-to-kill.html
======
ChuckMcM
I'm confused, Software Transactional Memory may be new but optimistically
transacting is not. In 1993 I was using it at Sun in network protocol design
and I was doing that because people suggested they had used that technique
before and it worked well. (so it was known earlier than my use of it)

Basically if you have a system where contention is possible but unexpected,
you 'tag in' before you start to do things that would be wrong if you did them
in contention, and you 'tag out' when you're done. The system keeps track of a
mutable state value which gets updated when state is mutated and the 'tag id'
of the person who mutated it. When you tag out if you're the only tag id that
has been mutating the system your done, otherwise the system resets your
changes and you re-do. This 'wins' if most of the time you won't be contended.
Its 'safe' because you always detect when it was contended and restart from a
safe starting point. You don't 'roll back' generally because if you lost the
tag race its because someone else "won" it and the new state is again
consistent.

Surely there is some seminal paper on this somewhere.

~~~
pnathan
That sounds an awful lot like some bus contention schemes from the late
70s/80s. It's been a while since my grad class that covered that though, so I
can't provide any citations.

I notice that we're still stuck in that time period insofar as real software
innovations, sigh.

~~~
mattgreenrocks
It's also used in lock-free algorithms nowadays.

I remember being delighted by it when I first read about it. Rather than
assume contention, it just deals with it only when it comes up. A refreshing
change of mindset.

------
jerf
This seems to be conflating two levels of concurrency. One is at the user
level, where the user may want to write a function that atomically removes an
element from one list and adds it to another, and one is at the interpreter
level, where it needs to be able to complete all operations involved in, say,
adding the element to the list, without another thread coming in and stomping
on the first thread, with all the associated hazards.

At the interpreter level, as the article mentions, it suffices to lock the
structures down as Jython does. STM would just seem to add overhead that isn't
necessary.

At the user level, STM + imperative(/uncontrolled effects) is basically a
known failure. A lot of effort has been spent on it, with people with similar
levels of control over their VM (like the C# attempt), and it just doesn't
really work.

If you've got the control necessary to automatically STM things, you might as
well just equally-automatically copy the Jython-style locking. STM is "nifty"
but I don't see that it actually adds anything useful. Either that or you have
to rigidly control effects in user code, and that's not Pythonic in any sense
whatsoever (philosophical _or_ practical).

~~~
Locke1689
_At the user level, STM + imperative(/uncontrolled effects) is basically a
known failure. A lot of effort has been spent on it, with people with similar
levels of control over their VM (like the C# attempt)_

Agreed. The only project that I know of that's even close to a "working"
implementation of STM is Haskell and that's because Haskell doesn't have
uncontrolled stateful code.

~~~
swannodette
Clojure is designed for the user level and offers some unique advantages over
Haskell's STM [1]:

    
    
      * MVCC snapshot avoiding transactions restarts 
        on read invalidation.
      * Ensure references on read-writes provides a 
        kind of manual control over resource acquisition order.
      * Has explicit commute which reduces retries on 
        commutative writes.
    

[1] [http://stackoverflow.com/questions/4560605/how-does-
clojure-...](http://stackoverflow.com/questions/4560605/how-does-clojure-stm-
differ-from-haskell-stm)

EDIT: I didn't read the OP closely enough. Yes grafting fine-grained STM onto
a imperative language hasn't born much fruit.

~~~
william42
Isn't Clojure still mostly functional and mostly based on persistent data
structures?

~~~
calebmpeterson
Yes very much so...and then some.

------
swannodette
Thoughts on how this can be made efficient at all? For example, Clojure has
refs to limit the number of things which need to be tracked by STM, as well as
fast persistent immutable data structures to avoid the overhead of copying
data.

------
abhijitr
"All of PyPy, CPython and IronPython have a GIL"

IronPython actually does NOT have a GIL:
<http://wiki.python.org/moin/IronPython>

------
viraptor
I had this on my mind for some time, and this seems a good time to ask: why
wouldn't python have an "unsafe" mode. If the internals were threadsafe, why
couldn't we have a version of python that will explode on threading mistakes
and it's up to the user to make sure the proper places are locked? Basically,
the same level of guarantee that C provides. I'd be happy to use that version
in places where it's needed. Same version of threading was possible in Perl
(use if what you're doing is threadsafe, expect coredumps otherwise).

~~~
ezyang
That's the problem: the internals of Python are not thread safe. Essentially
every library backing up Python is written without thread safety in mind, and
this is precisely _why_ the GIL is necessary.

------
sharkbot
I'm not clear on the desired result of this project. Is it to make currently-
unsafe Python code automatically correct? Or is it to keep the GIL semantics,
but make it faster?

If it is the former, then I worry. Software transaction memory is hard to get
right in languages without explicit and trustworthy annotations for side-
effecting code (ie, types) [1].

1)
[http://www.bluebytesoftware.com/blog/2010/01/03/ABriefRetros...](http://www.bluebytesoftware.com/blog/2010/01/03/ABriefRetrospectiveOnTransactionalMemory.aspx)
(currently down, but Google has a cached copy)

~~~
andrewcooke
python doesn't scale well with multiple threads/cores[1]. the high level
motivation is to fix that. since the GIL is the main source of the problems,
that has to go.

[1] the best current solution is to use the multiprocessing package which runs
a completely separate python instance on each core, but obviously that doesn't
support simple shared memory access (you can do it, but it's not "natural").

------
andrewcooke
is the speed hit (factor 2-5) really that bad? do haskell programmers have
experience that confirms that?

and would it be possible to somehow switch this on and off dynamically, so
that a single "pypy" can adapt automatically if multiple threads start?

and how will this affect a stable, well supported [edit: full library],
"final" release of pypy (especially, p3)? is it going to remove
effort/resources from a GIL version? i get the impression it's getting close
to stable/easy to use and it would be a pity to lose that.

[edit: ps, otherwise, this sounds most excellent]

~~~
gwern
> is the speed hit (factor 2-5) really that bad? do haskell programmers have
> experience that confirms that?

The penalty was pretty bad in the earliest STMs, but the Simons have done so
much work on multi-core stuff and the STM libraries that it's hard to say.

(I will note that lack of purity might be, as the comments point out, a real
problem. That was what sunk the .Net/C# folks trying to add STM.)

------
cturner
I think processes could replace threads in most cases, but common OSs are
hampering us.

Fork is slow in Windows. In unix, having lots of processes crowds ps (creates
disincentive) and if you want to effectively manage a tree of threads you have
to do fiddly work managing a thread group and (if you want to be fast)
wrapping your head and software around shared memory IPC.

I think that if support for multiple processes in mainstream processes was
more effective than it is, we'd both spend less time worrying about threads
and write more stable software.

~~~
fijal
Two answers really:

* sharing memory - sometimes you have lots of immutable data, like modules, graphs, whatnot. Yes, there is copy-on-write and no, it doesn't work well on any python implementation out there. Also sometimes it's mostly-immutable data, but not quite.

* serializing lots of data is a mess and even if feasible is usually a big performance hit if you want to exchange actual objects.

~~~
supersillyus
With Plan 9's rfork() (and linux's less nice clone()), you can create new
processes that share the memory of the old process, which addresses those two
answers. Though, I'm not sure if this reinforces OP's point or just indicates
that the distinction between threads and processes can be fuzzy.

~~~
mattgreenrocks
What are the uses of a process that shares address space with its parent?

~~~
Someone
I wondered, too, so I had to look this up <http://cm.bell-
labs.com/magic/man2html/2/fork> shows that it is not the whole address space:

RFMEM If set, the child and the parent will share data and bss segments.
Otherwise, the child inherits a copy of those segments. Other segment types,
in particular stack segments, will be unaffected. May be set only with RFPROC.

So, it basically is a way to start a process that shares all its globals
(including static variables, I think) with another process, but not other
memory. That is more secure than having threads, but also more restricted, as
one cannot share heap-allocated structures between such processes. I guess
this feature gets used most in Fortran code where nothing gets allocated
dynamically.

It also makes it easier to selective kill a thread of execution from the
command line, but I do not see when that might be useful.

------
tomp
Here is the article mentioned (but not linked to) in the article

A comprehensive strategy for contention management in software transactional
memory

<http://portal.acm.org/citation.cfm?doid=1594835.1504199>

------
udoprog
It seems like the real solution would be to introduce low level guarantees and
memory access (similar to Java) and the option to disable the GIL and C API
extensions in CPython (until a new API is introduced?).

Also, give the developers access to some real synchronization primitives, that
would be sweet.

I'm not a CPython developer, but the last points of the points on the desired
list[0] seems very unfeasible to me. Not even STM solves the "Speed"
requirement, but PyPy gives away with native extensions so it's halfway there!

[0] <http://wiki.python.org/moin/GlobalInterpreterLock>

------
afhof
If you are not familiar with the GIL, this video was really informative:
[video] <http://blip.tv/file/2232410>

------
SoftwareMaven
Wouldn't this negatively impact power consumption (over fine-grained, Jython-
style locks) by having a processor repeating tasks instead of blocking and
waiting for a task? It seems like this is intended to allow Python to scale to
multiple cores better, but that is most often useful in a data center
environment, where wasting power seems like a bad idea.

~~~
ch0wn
As the author wrote at the end of the article, instead of removing the GIL
once and for all, they will rather add an option to use STM instead of the GIL
as compile-time option or maybe even at runtime, as it would not only effect
power consumption but obviously the speed as well.

That way the user can decide whether he needs multi-core scalability or simply
speed.

------
brianjherman
Why can't we use stackless python?

~~~
wladimir
Stackless python has a GIL as well. What we need is lockless python :-)

