
The Gilectomy – How's It Going [video] - varunramesh
https://www.youtube.com/watch?v=pLqv11ScGsQ
======
comex
If you're the type that prefers to read text, here's LWN's writeup of the
linked talk:

[https://lwn.net/SubscriberLink/723514/f674d4a807264ba1/](https://lwn.net/SubscriberLink/723514/f674d4a807264ba1/)

~~~
scribu
I've enjoyed LWN articles in the past, but found this particular writeup very
tedious to follow, compared to watching the video.

A direct transcript, perhaps with some light editing, would have been more
useful, IMO.

------
thomaslee
If any Python devs are out there reading: my understanding is that removing
the GIL itself isn't the hard part so much as removing the GIL _while
satisfying certain constraints_ deemed necessary by GvR and/or the rest of the
community. I know some of those constraints relate to compatibility with
existing C extensions -- but there must be others too?

The reason I ask is Larry's attempt buffered ref counting surely has
implications for single-threaded code that maybe relies on the existing
semantics -- e.g. a program like this may no longer reliably print
"Deallocated!":

    
    
      Python 2.7.13 (default, Mar  5 2017, 00:33:10) 
      [GCC 6.3.0 20170205] on linux2
      Type "help", "copyright", "credits" or "license" for more information.
      >>> class Foo(object):
      ...     def __del__(self):
      ...             print 'Deallocated!'
      ... 
      >>> foo = Foo()
      >>> foo = None
      Deallocated!
      >>> 
    

A bad example in some ways since in this particular case we could wait for all
ref counting operations to be processed before letting the interpreter exit,
but hopefully my point is still clear.

Similarly, what about multi-threaded Python code that isn't written to operate
in a GIL-free environment -- absent locks, atomic reads/writes, etc.? At best,
you might expect some bad results. At worst, segfaults.

Are these all bridges that need to be crossed once a realistic solution to the
core GIL removal issue is proposed? As glad as I am that folks are still
thinking hard about this problem, I'm personally sort of pessimistic that the
GIL can be killed off without a policy change wrt backward compatibility.
Still, I do sort of wonder if some rules of engagement wrt departures from
existing semantics might help drive a solution.

~~~
jholman
If I'm understanding you, some or all of these questions are explicitly
addressed in the Q&A. My apologies if you got that far and I simply didn't
understand you.

For example, your first question seems to be asking about whether there's a
semantic change coming from a lack of immediacy in when __del__ will run. And
the answer is explicitly "yes, and the docs already told you not to count on
that".

As for multi-threaded Python code... and perhaps also multi-threaded C code in
extensions... I think the clear answer is "yes, our whole goal is to remove
some guarantees that were previously provided, so if you counted on those
guarantees you're in trouble". Again, c.f. the Q&A in case that helps.

From the talk, it doesn't look to me like Larry Hastings has a plan for the
policy change in question; so maybe "bridges that need to be crossed once [the
technical issues are smaller]" is correct?

------
WaxProlix
It's funny, I've written a lot of python in quite a few domains and haven't
really struggled directly because of the GIL before. Is this more of a 'data
scientist' problem? I feel like if I had a huge pile of data to crunch, python
wouldn't be my first choice really.

~~~
prewett
Servers are where the problem is. The GIL makes python functionally single-
threaded, which is a bummer for your server at any kind of scale. So you end
up having to have n cores' worth of server processes behind a load balancer,
even if you only need one server machine, which is a bummer if a you have a
stateful server (such as a game server), as you now have to manage
communicating state between processes by storing it in another process
(frequently Redis).

But python is easy and fun to write code in, and "developer time is expensive,
servers are cheap," so there are a lot of python servers which could benefit
from a lack of GIL. Never mind that it's fast to write, but difficult to
maintain since it is a dynamically typed language and one typo creates runtime
errors that any statically typed language will catch at compile time. Or that
it is slow. Or that a python process never really releases memory back to the
system, just within itself, so the process slowly grows over the course of a
few weeks. Or that the Twisted framework you're using for cooperative
multitasking because of the GIL is really easy to block on a database query by
accident, leading to uncooperative multitasking (= large lags), resulting to
forced server restarts, and loss of players (= loss of revenue). So yeah,
"developer time is cheap" but it's sort of an expensive cheap. I came to the
conclusion that python is unsuitable for servers, but until Go came out, there
wasn't a realistic alternative, since C++ and Java are too heavyweight, and
Ruby suffers from similar problems (don't know about a GIL).

~~~
thomaslee
> Servers are where the problem is. The GIL makes python functionally single-
> threaded, which is a bummer for your server at any kind of scale.

Right, agreed. I can imagine some of the frustration you might experience
using CPython for high throughput systems: kind of like NodeJS without the
benefits of a standard library written with async/non-blocking I/O in mind.

A bit curious about a few things you mention here, though:

> Or that a python process never really releases memory back to the system,
> just within itself, so the process slowly grows over the course of a few
> weeks.

I'm not sure this is true in general, is it? Can you elaborate? It's been a
while since I've dug around in Python innards, but if Py_DECREF(x) leads to a
refcount of zero IIRC free(x) is ultimately called -- albeit in an indirect
manner via a layer or six of tp_dealloc calls and tp_free. :) I suppose
calling free(x) may only return the memory associated with x to (g)libc's free
list and not necessarily back to the OS [0]. No different to C/C++ in that
regard, I guess.

> I came to the conclusion that python is unsuitable for servers, but until Go
> came out, there wasn't a realistic alternative, since C++ and Java are too
> heavyweight, and Ruby suffers from similar problems (don't know about a
> GIL).

"Too heavyweight" in that they're relatively difficult to write in comparison?
Maybe true of Java-the-language, but the JVM itself is an absolute workhorse
when it comes to high performance. Plenty of languages to choose from there,
typically without a GIL. Jython, for example, has no GIL [1].

And yep, Ruby/MRI has a GIL (but JRuby does not).

[0]
[https://www.gnu.org/software/libc/manual/html_node/Freeing-a...](https://www.gnu.org/software/libc/manual/html_node/Freeing-
after-Malloc.html) [1] [https://stackoverflow.com/questions/1120354/does-
jython-have...](https://stackoverflow.com/questions/1120354/does-jython-have-
the-gil/1147548#1147548)

~~~
fiddlerwoaroof
Common Lisp implementations generally do multithread really well and give you
lovely syntactic abstraction capabilities while also running significantly
faster than comparably high-level languages.

------
ars
Gilectomy project: the removal of Python's Global Interpreter Lock, or "GIL".

------
chairmanwow
Is he being serious when he says he only has one test case? That really
doesn't seem like a reasonable thing to do. Furthermore, would a recursive
implementation of Fibonacci even benefit from multithreading?

~~~
wulfjack
The goal should be, and is kind of what Larry Hastings is looking for, is that
_any_ program should run 8 times faster on a 8-core CPU compared to a 1-core.
And as said above Python can basically only use one core b/c of GIL. Actually
Python 2.7 multithreading runs _much_ slower on a multicore CPU than on a
single core due to locking congestion on the GIL.

~~~
marvy
What? No! Multithreaded programs should run faster on 8 cores than on one
core. That's not very realistic for single-threaded programs, in any language.

I could be wrong, but I think Py2.7 is about the same speed on multicore vs 1
core. Where did you get that idea?

~~~
diek
Python 2.7 has terrible thrashing in the way the GIL is acquired that is
exacerbated as more threads are used. Dave Beazley has given great talks with
the technical details:
[http://www.dabeaz.com/GIL/](http://www.dabeaz.com/GIL/)

------
mrfusion
Can Python copy how other languages like golang or java operate without a Gil?
Why or why not?

~~~
chrisseaton
Python has different semantics to those languages. I don't think it's formally
specified, but people program in Python expecting the semantics that reference
counting provides, and that unsychrnoised concurrent access to data structures
will not cause errors. Despite ongoing research, it appears to be hard to
continue to provide these semantics without a GIL.

Golang (informally I think) and Java (more formally) are not specified to
provide reference counting semantics and not specified to guarantee that
unsychrnoised concurrent access to data structures will not cause errors.

So the languages have different semantics - that's why you can't copy-and-
paste the solution from one to another.

Some alternative implementations of Python don't follow the above semantics,
like Jython, but then some people aren't happy with that. It may not be
acceptable to the community to drop those semantics, even if they were never
formally given.

~~~
munin
> and that unsychrnoised concurrent access to data structures will not cause
> errors

Python doesn't ensure that unsynchornized concurrent access to data structures
won't cause errors. As I understand and experience Python multithreading, all
the GIL ensures is that of the "load - inc - store" stages running amonst N
threads, each separate stage will be locked, but not the overall sequence. So
you'll still have data races, even with the GIL, and you still need to use
mutexes etc in your Python program, which is why they are there.

~~~
chrisseaton
I mean it won't cause errors within the basic data structure access
operations. I'm not talking about composition. In Java a hash table write can
fail with an exception if there is a concurrent write that conflicts. A Python
dict write is atomic, because it happens within a single instruction as you
say and so will not be interrupted and will never fail. That's what you aren't
getting in Java. That expectation is very hard to provide without a GIL.
Jython does it with blunt fine grained locking, but that's slow.

------
amelius
Can't they use the GC techniques used in other languages? I've heard that
Golang has a very efficient concurrent garbage collector.

~~~
peterhunt
Implementing a tracing gc was covered in the video.

~~~
amelius
Well, I didn't see the video yet, but I noticed they are referencing "The
Garbage Collector Handbook", which is from 2011. The people from Golang have
had some more recent successes with their concurrent garbage collector, which,
as I've heard, is really efficient.

~~~
poooogles
>The people from Golang have had some more recent successes with their
concurrent garbage collector, which, as I've heard, is really efficient.

Haven't they just sacrificed throughout for pause time though? Not a GC expert
at all, but that's what Ive got the gist of from speaking to people.

