
Let's Remove the Global Interpreter Lock - MikusR
https://morepypy.blogspot.com/2017/08/lets-remove-global-interpreter-lock.html
======
eslaught
The comments here are missing a massive use case: shared memory. Shared memory
isn't just about programmer convenience. It's about using a machine's memory
resources more effectively.

Yes, shared memory is available in multi-processing, but it doesn't
necessarily interact well with existing codes.

I've been working on adding Python support to Legion [1], a task-based runtime
system for HPC. Legion wants to manage shared memory so that multiple cores
don't necessarily need multiple copies of the data, when the executing tasks
don't conflict (all are read-only, or access disjoint data). Legion is C++, so
this mostly "just works". Some additional work is required to support GPUs,
but it's still not so difficult. But with Python, if we go with
multiprocessing, we have to switch to a different mechanism. Worse, Python is
an optional dependency for Legion, so we can't depend on Python's
multiprocessing support either.

If you have a large existing project, and a use case that can take advantage
of shared memory, being forced into Python's multiprocessing scheme for
parallelism is a pain.

We've been investigating using a dlmopen approach as well, based on this proof
of concept [2]. Turns out that dlmopen in every available version of libc has
a critical bug that prevents it from being practically useful, if you have any
desire to make use of native modules. You can build a custom libc with this
patch [3] but rolling a custom libc is also a massive pain.

In all likelihood we'll end up rolling our own multiprocessing to make this
work. If the GIL were truly gone though, we could potentially avoid many of
these issues.

[1]: [http://legion.stanford.edu/](http://legion.stanford.edu/)

[2]:
[https://news.ycombinator.com/item?id=11844268](https://news.ycombinator.com/item?id=11844268)

[3]:
[https://patchwork.ozlabs.org/patch/496559/](https://patchwork.ozlabs.org/patch/496559/)

~~~
detroitcoder
^This. It is a very common usecase for applications I work with to create a
very large in memory read-only pd dataframe and then put a flask interface to
operations on that dataframe using gunicorn and expose as an API. If I use
async workers, the dataframe operations are bound by GIL restraints. If I use
sync workers, each process needs a copy of the pd dataframe which the server
cannot handle (I have never seen pre-fork shared memory work for this
problem). I don't want to introduce another technology to solve this problem.

~~~
ogrisel
The pickling implementation of joblib has support for memory mapping numpy
arrays nested in arbitrary data structures such as pandas dataframes.

Save the dataframe in a folder that can be accessed by the gunicorn worker:

    
    
        import joblib
        joblib.dump(df, '/folder/shared_data.pkl')
    

Then in the code run by the flask / gunicorn workers themselves:

    
    
        import joblib
        shared_df = joblib.load('/folder/shared_data.pkl', mmap_mode='r')
        # use the shared_df as usual (inplace modifications are not
        # authorized)
    

Some pandas function can have issues with read-only buffer though:
[https://github.com/pandas-dev/pandas/issues/17192](https://github.com/pandas-
dev/pandas/issues/17192) (caused by a currently unsolved bug / limitation of
Cython) but it can work for your use case.

~~~
detroitcoder
This looks very interesting. I am reading the docs
[https://pythonhosted.org/joblib/parallel.html#manual-
managem...](https://pythonhosted.org/joblib/parallel.html#manual-management-
of-memmaped-input-data) and it looks like it would help a lot (possibly solve
the issue). Do you have any experience using this in production?

~~~
detroitcoder
DAMN. I just did a basic test and it kinnda just worked?!? I created a test
dataframe of 100M rows X 10 cols which took up ~2.3G and then used joblib.dump
within the on_starting hook which is run when the gunicorn master starts up.
Then loaded that df in with joblib.load within the worker and the total memory
consumption was practically flat. Then I bumped up the number of workers to 20
and still flat. That is actually amazing. Coolest thing I have seen in months
for how easy it is. Now I have to test out if the analytics actually work and
a deep dive into the mechanics of mem-mapping.

~~~
ogrisel
Thanks for your feedback. I am glad I could help you.

------
crb002
Having ported Ruby to IBM's Blue Gene/L my advice is to forget about the GIL.
Run one Python process per core. Use something like MPI2 for message passing
communication. Ruthlessly eliminate bloat code from production binaries and
statically link all the things.

~~~
metalliqaz
I agree wholeheartedly. Almost every time I hear from someone who is upset
about the GIL, I find that they would be much better suited to using
multiprocessing instead of multithreading.

With 80% of the developers out there, they are basically assured of producing
better, more stable code this way.

~~~
Animats
Python's "multiprocessing" means launching another Python interpreter in a
subprocess. Each process has a full copy of the Python environment. They may
share the base interpreter, but there's a separate copy of every package
loaded and all data. Memory consumption is bloated and the CPU caches thrash.
Launching a subprocess is expensive; it means a full interpreter launch and a
recompile/reload.

"Multiprocessing" is useful when you have a lot of work to do concurrently and
not too much data to pass between processes. I've used Python subprocesses
that way. Parallelizing your number crunching is probably not going to work
very well.

~~~
peterkelly
If CPU load is an issue, why would you be using an interpreter in the first
place?

~~~
khedoros1
A lot of the heavy lifting is done through calls to C libraries anyhow, with
Python just being a convenient way to pass the data around.

~~~
falcolas
Indeed, and in that case the GIL is effectively a non-issue (there's no
requirement for the GIL to be held by non-python code).

~~~
Animats
No, no, if you're manipulating Python objects from C code, you have to hold
the lock. You can release it only when not doing anything with objects in
Python's memory space. Otherwise you get race conditions and intermittent
crashes.

------
gshulegaard
I feel like the GIL is, at this point, Python's most infamous attribute. For a
long time I thought it was also the biggest flaw with Python...but over time I
care less and less about it.

I think the first thing to realize is that single-threaded performance is
often significantly better with the GIL than without it. I think Larry
Hasting's first Gilectomy talk was extremely insightful (about the GIL in
general and about performance when removing the GIL):

[https://youtu.be/P3AyI_u66Bw?t=23m52s](https://youtu.be/P3AyI_u66Bw?t=23m52s)

I am not sure I would, personally, trade single-threaded performance for
enabling multi-threaded applications. I view Python as a high-level rapid
prototyping language that is well suited for business logic and glue code. And
for that type of workload I would value single-threaded performance over
support for multi-threading.

Even now, a year later, the Gilectomy project is still slightly off
performance-wise (although it looks really really close :) ):

[https://youtu.be/pLqv11ScGsQ?t=27m32s](https://youtu.be/pLqv11ScGsQ?t=27m32s)

As noted elsewhere, multi-processing offers adequate parallelization for this
type of logic. Also, coroutines and async libraries such as gevent and asyncio
offer easily approachable event loops for maximizing single-threaded resource
utilization.

It's true that multi-processing is not a replacement for multi-threading.
There definitely are tasks and workloads where multi-processing and its
inherent overhead make it unsuitable as a solution. But for those tasks, I
question whether or not Python itself (as an interpreted, dynamically typed
language) is suitable.

But that's just my $0.02. If there is a way to remove the GIL without
negatively impacting single-threaded performance or sacrificing reference
counting for a more robust (and heavy) GC, then I am all for it. But if there
is not...I would just as soon keep the GIL.

~~~
VectorLock
The GIL has been a much bigger problem for perception than it ever has been
for performance. Python has lost more mindshare over it than anything else.
The few machine cycles that were ever saved by moving away from it were far
outweighed by the waste of human cycles.

~~~
bsder
The few machine cycles that were ever saved by _NOT_ moving away from it
(which is the _ONLY_ justification for keeping it) were far outweighed by the
waste of human cycles.

If Python would simply suck it up and eat the 20% performance hit, we could
_stop talking about the GIL_ and start optimizing code to get the 20% back.

------
dec0dedab0de
Could someone who really wants to get rid of the GIL explain the appeal? As
far as I understand, the only time it would be useful is when you have an
application that is

    
    
      1. Big enough to need concurrency
    
      2. Not big enough to require multiple boxes. 
    
      3. Running in a situation that can not spare the resources for multiprocessing. 
    
      4. You want to share memory instead of designing your workflow to handle messages or working off a queue. 
    
    

#4 does sound appealing, but is it really worth the effort?

~~~
make3
There are many cases where the objects are too big to be passed around. Python
is used a huge amount in Machine learning and datascience, where being able to
do parallel work on stuff already in memory would be great.

~~~
smaddox
Can't this already be handled by calling out to a C/C++ or FORTRAN procedure
that processes the data in multiple threads? For number crunching, Python is
almost exclusively used as glue.

~~~
foobarchu
You CAN handle it, but why should you have to? If it's possible to remove that
barrier, then it absolutely should be removed. If the only answer to a problem
is "use another language", then the language in question has a limitation that
needs to be addressed.

~~~
Drdrdrq
It is not a limitation at all in this case. Python is just a front to
Tensorflow and similar libraries/frameworks so GIL doesn't matter there.

------
wiremine
> We estimate a total cost of $50k...

Just looking at it from a financial perspective, having a great Python
interpreter that doesn't have a GIL seems like a no brainer for $50,000, and
it creates another reason why people should take a look at PyPy.

Side note: if you haven't looked at PyPy, check it out, along with RPython

[https://rpython.readthedocs.io/en/latest/](https://rpython.readthedocs.io/en/latest/)

~~~
andruby
How can they estimate this? What about all the libraries that might not be
compatible with the solution PyPy comes up with?

This feels like a number that might in the end blow up to 10x the original
estimate.

~~~
rguillebert
It's not the PyPy developers' job to make every Python library threadsafe,
people writing libraries will have to make their code threadsafe, like in
every other language.

~~~
taeric
There is a clear difference here, though. Making a change that could lead to
poorly written libraries now being broken is clearly the fault of the change.
Userspace for these libraries is defined by how it is, not how it was
intended.

(And really, was it intended to be dangerous in this way?)

~~~
masklinn
> There is a clear difference here, though. Making a change that could lead to
> poorly written libraries now being broken is clearly the fault of the
> change.

No, these libraries are already semantically broken in the same way e.g.
libraries which didn't properly close their files and assumed the CPython
refcounting GC would wipe there asses were broken.

They're already broken under two non-GIL'd implementations.

------
vladf
There seem to be a lot of naysayers in the comments about removing the GIL.
Multiprocess parallelism isn't always appropriate, so I find this to be a very
promising change that will definitely make me want to switch to PyPy. Here are
the use cases I've found multiprocessing to be inappropriate:

* High-contention parallel operations. Doing synchronization through a Manager (a separate IPC-based synchronizing broker process) is of course less preferable than, say, a futex.

* Embarrassingly parallel small tasks. This is a big one. If the operation being parallelized is short, then message-passing overhead takes up more runtime than the operation itself, like a bad Amdahl's Law scenario. Shared address space multithreading solves this problem.

* Related: parallelization without the pickling headaches! Many objects can be synchronized but not easily pickled or copied! True multithreading would really enable a large amount of use cases (map a lambda instead of a named function, anyone?) since the same Python interpreter can just pass a pointer to a single shared object.

* Related: lots of libraries (Keras, TensorFlow, for instance) make heavy use of module level globals, and aren't meant to be run on multiple cores on the same machine (TF, for instance, hogs all GPU memory). Multithreading in these deep learning environments (assuming PyPy support from those packages) is useful for parallelizing the input ingestion pipeline. But this point isn't TF/Keras dependent; I can't recall other modules but don't doubt the heavy use of module-globals that's unfriendly with fork()-ing, especially if kernel-related state is involved.

~~~
njharman
> Multiprocess parallelism isn't always appropriate

Using Python isn't always appropriate.

~~~
vladf
Are you saying that because a language is missing something, when considering
a fix for that thing, the existence of other languages/solutions is an
argument against that fix?

~~~
njharman
I'm saying that hammers and saws exist cause more than one tool is needed to
solve problems.

------
cjbillington
This seems like a good place to spruik something I made, a Python package for
profiling how much the GIL is held:

[https://github.com/chrisjbillington/gil_load](https://github.com/chrisjbillington/gil_load)

In my experience, the GIL is not held for nearly as high a proportion of the
time as people think it is, because properly written C extensions and blocking
io always releases the GIL. So long as the proportion of time the GIL is held
is not approaching 100%, then you can still get gains from threading. This is
almost always the case in numerically heavy code that uses numpy or scipy,
since the extensions release the GIL. Threads work almost just as well at
speeding up this code as in any GIL-free interpreter.

And usually long before you consider multithreaded code, you'll want to move
the bottlenecks of your code over into Cython or something, since that can
give speedup factors much larger than multithreading. In which case all you
need is a "with nogil:" around the the meaty bit of you Cython code, and then
it too will be able to get speedups from multithreading.

------
seunosewa
The ideal solution is for someone to design a new programming language that is
as similar to Python as possible without requiring a global lock. Rarely used
features that make it hard to parallelize Python would be dropped. STM might
be built into the language instead of being hacked into one implementation,
etc.

~~~
rburhum
So basically recreate an entire language and library ecosystem because there
is one feature that is less than ideal? I hope you realize why a better
approach may be to reengineer that one component...

~~~
dTal
Python has many less-than-ideal features. Do you think we finally got it
right, that we will use Python forever, and that the library work of the past
decade or so is irreplaceable?

"Is it possible that software is not like anything else, that it is meant to
be discarded: that the whole point is to see it as a soap bubble?" \-- Alan
Perlis

~~~
ehsankia
Just to go from py2 to py3, which was relatively a MUCH smaller change than a
whole new language, it's taken a decade and it's still far from over. I don't
see how a whole new language would be any better. And it's not like there's a
lack of new languages popping up left and right. There's a reason most of them
just die out. It's insanely hard to gain critical mass unless you have a huge
backer, like a whole organization or company using the language.

------
twoodfin
"It mostly works for simple programs, but probably segfaults on anything
complicated" is not a promising beginning. Starting with race condition chaos
and trying to patch your way out of it with "strategic" locking

a) Inspires much less confidence than starting with a known-correct locking
model (the degenerate case being a GIL) and preserving it while improving
available concurrency.

and

b) Seems at least 50/50 to end up without much in the way of tangible
scalability gains once enough locking has been added to reduce the rate of
crashes and data corruption to an acceptable (?!) degree. At least that was my
takeaway from all the challenges Larry Hastings has documented while working
on the gilectomy. Sure, they don't have to worry about locking around
reference counting, but it's not like writing a (concurrent?) GC operating
against concurrently executing threads isn't a significant design challenge
itself with many tradeoffs to make.

~~~
mcherm
> "It mostly works for simple programs, but probably segfaults on anything
> complicated" is not a promising beginning.

Perhaps they would have done better to say "it works correctly for all
programs that do not assume the built-in data structures are threadsafe". That
is an accurate description, what you quoted is a reasonable approximation.

------
devwastaken
This would be great if it means we can run the C portions of Python in threads
without performance hits. I recently started a little project that is a cross-
platform GUI for batch bzip2 compression, and Python did it quite well with
its built-in bzip2 module. But, once I tried to do it parallel, the
performance impacts of GIL were obvious. Yes, you can work around that with
multi-process, but I'd rather not be spamming the running processes list and
have to actually handle seperate processes that should be threads.

In the end I settled for C++ and QT with the native bzip2 library with a few
modifications.

~~~
monkmartinez
> I'd rather not be spamming the running processes list and have to actually
> handle seperate processes that should be threads.

I may be a bit naive asking this... but why would you care that much?

Looking at activity monitor on my Mac, I count 14 Google Chrome Helper Process
instances each spawning upwards of 13 threads. Adobe does something similar,
as do several other programs/applications on my machine. Yet, my machine is
mostly idle.

I can only speak for myself here. If I want something done on my computer... I
don't care if it spams my process list if that is what it takes to complete
the task. Don't crash my machine, but do what you have to do to get it done
quickly.

~~~
devwastaken
This is a parallel compression application that uses all cores of a system by
default. On some systems, it may use 100% HDD, others near 100% CPU. Its meant
to take up as much resources as it can unless its core usage is lowered. But,
with any program that has a high workload, the potential exists that the
programs UI will not respond, or perhaps your desktop won't even allow you to
get to the UI to stop the process. This is where task manager saves the day.

Along with that, I like it to be a single process so its easily wrappable in
whatever monitoring or process-throttling application you want. I will admit
I'm completely assuming that multiple processes is harder than a single
process to do that with.

Also, when you get up to the 16 thread count, seeing that many processes pop
up at the top of your process list is both annoying and doesn't let you know
how much the application overall is using easily. It could also be scary to
some users who have never seen that before and think its trying to run a whole
bunch of programs.

Yes, some of those are clearly nitpicks and not good technical reasons, but
this is a problem that is fixed with a good framework anyways.

------
andreasgonewild
I just can't stop thinking that somewhere along the line one of the Guidos
should have reacted to handing out global locks left and right. I mean, as
long as its only you and your friends using it. But once it starts spreading,
these are the kind of issues that need to be kicked out of the way asap. Lock
granularity affects the entire design of client code, reducing it basically
means rewriting everything.

Ah well, at least it serves as a warning sign for budding language composers
as myself. Snabel did full threading before it walked or talked:

[https://github.com/andreas-gone-
wild/snackis/blob/master/sna...](https://github.com/andreas-gone-
wild/snackis/blob/master/snabel.md)

And to any Pythoneers with sore toes out there: pick a better language or
learn to live with it, down-voting me will do nothing to solve your problems.
It's a tool, we're supposed to pick the best one for the job; not decide on
one for life and defend it to death. Imagine what could happen if language
communities started working together rather than competing. There is no price
to be won, we're all being taken for a ride.

------
Pxtl
Ick, I'd forgotten how monkeypatchable the core of Python was.

If not for that, I'd focus on supporting some kind of pseudo-process where
multiple instances of the Python interpreter could be loaded but they would
only share pure-functional libs which, I assume, could be used in a threadsafe
fashion... but then you run into the mutability of those libs. Well, the
mutability of _everything_ in python. Plus what happens if those libs expose
anythign that you could hold a reference to - what happens to refcounting in a
multithreaded Python?

Honestly, I feel like the world has passed Python by. At this point the cost
of its performance limitations don't seem to be worth its payoff. Not that
it's a bad language - I _like_ Python. I just don't really feel the need to
use it for anything anymore.

------
benhoyt
Excellent! Where's the Donate button or call to action for businesses who want
to support this? There's a small link in the sidebar to "Donation page", but
that doesn't seem to have a place to donate for the remove-the-GIL effort.

~~~
fijal
As mentioned in the blog post the individual donation buttons are not a
resounding success. I'm happy to sign contracts with corporate donors (or even
individuals) that we'll deliver. My mail should be public, if not #pypy on
freenode or fijal at baroquesoftware.com

~~~
btown
Is the issue that individual donations are unpredictable (and therefore
difficult to use as justification for such a large scope increase)? Would you
consider setting up something akin to a Patreon to allow individuals to commit
to recurring monthly support for the project?

~~~
fijal
The main issue is that the effort it takes to setup and maintain it greatly
outweighs the amount of money we get (typically). There is also complexity
with taxation, jurisdictions and all kinds of mess that is usually very much
not worth couple dollars (e.g. $7/week on gratipay for example)

------
memracom
Personally, I don't think the GIL matters. First of all most of us run apps on
Linux which has reduced the overhead of processes so much that threads have
lost much of their advantage. Secondly, people understand that locks are
generally a bad thing to use unless you really are a threading/locking rocket
scientist. Most mere mortal developers are better to use message queues. Even
the Java world has mostly given up locks in favor of java.util.concurrent
which was implemented by serious experts to handle all of the corner cases
that you would not think of. Third, using an external Message Queuing system
like RabbitMQ gives you other benefits. And fourth, writing distributed apps
glued together by message queues helps you avoid the dreaded Big Ball of Mud.

At this stage in Python's evolution, I view the GIL removal as a computer
science project that some people will implement again, and again, just to
learn or to exercise their chops. Great idea! Just don't demand that the
entire community of Python developers goes down your road.

If CPython never gets rid of the GIL that suits me just fine. GIL free
programming can be done on other implementations of Python like Jython and
IronPython. As far as PyPy is concerned, as long as it does not disrupt the
use of PyPy as a means of speeding up a CPython app from time to time, then
have fun.

~~~
nurettin
Coming from mobile and desktop programming, most use cases I've seen for
threads revolve around doing something in the background to keep user
interfaces responsive. That use case already has a threadsafe queue. The UI
queue.

When your thread finishes or is ready to signal progress, you queue the event
to the UI thread and forget about it.

Now I've been following this pattern for a long time and have no a experience
dealing with GIL How this removal of GIL going to effect this use case if at
all?

------
jtchang
Just reading this post makes me think that it could do with a bit more
"marketing" speak. I love python. I use it day to day and realize there is a
GIL.

But give me some business reasons as to why removing the GIL is critical. Will
is save me a ton of money? Will my stack magically just run faster?

I wonder if Google has already done so since they would benefit quite a bit
from a GIL-less python.

------
wyldfire
> We have some money left in the donation pot for STM which we are not using;
> according to the rules, we could declare the STM attempt failed and channel
> that money towards the present GIL removal proposal.

I didn't donate to that pot but that does seem like a judicious and reasonable
step to take given the assessment of STM.

------
alfanerd
I just made a quick test: CPython-3.6.1 vs. Jython-2.7.0 (May 2015)

I ran Larry Hastings' Gilectomy testprogram x.py: fib(40) on 8 threads HW:
MacBook Pro, 2015, 8 (4+4) cores, 1Gb RAM

Jython ran the program 8 times faster, utilising all 8 cores >95%. Python ran
on 1-2 cores less than 60% utilisation. (Pretty sure Jython will run 16 times
faster on 16 cores)

It's 2017, why this is acceptable to GvR and the Python community is beyond
me.

Jython: real 1m4.959s user 7m38.521s sys 0m2.396s

Python: real 8m19.035s user 8m16.508s sys 0m11.424s

~~~
zepolen
Could you post the code for that fib program I couldn't find it anywhere.

~~~
alfanerd
It is in the Gilectomy branch of Larry Hastings github project.
([https://github.com/larryhastings/gilectomy](https://github.com/larryhastings/gilectomy))

I also pasted the fib test to pastebin:
[https://pastebin.com/Ryyb2K7V](https://pastebin.com/Ryyb2K7V)

~~~
zepolen
Ah so just the naive recursive fibonacci on 8 threads with no data sharing
between them.

Interestingly doing the same on Cpython using the multiprocessing module was
~2x slower than jython/threads. More interestingly pypy with multiprocessing
was ~5x faster than jython/threads.

    
    
      $ time jython fib.py 40
      real	1m11.247s
      user	6m14.130s
      sys	0m3.012s
      
      $ time python fib.py 40
      real	2m4.067s
      user	11m46.103s
      sys	0m2.352s
      
      $ time pypy fib.py 40
      real	0m21.040s
      user	1m51.461s
      sys	0m1.892s

------
zzzeek
Super glad they're going to try using mutexes and not that STM approach which
was looking to be immensely complicated. Was not looking forward to the kinds
of interpreter bugs that was going to produce.

------
ComputerGuru
I can't believe it's 2017 and the official pypy updates come from blogspot; I
thought this was a plea from a community member.

Anyway, really good on them to finally move on killing the GIL. It's been a
long-time issue - the type that only gets worse the longer you ignore it. That
said, I think today Python and GIL are synonymous and the entire Python
ecosystem has almost evolved around the GIL. While I'm sure there are
applications that would benefit from its removal, I think in the whole, the
ecosystem will not change much because of this.

------
ericfrederich
Perhaps a little unrelated, I used the rpyc package to get Jython and CPython
working together. In the end I was able to use Java libraries from CPython
pretty much seamlessly.

~~~
vram22
You mean you used RPyC at both ends, on the CPython side and on the Jython
side. Cool idea. I knew about RPyC but had not thought of using it in this
way. And getting access to Java libraries by doing this, can be very useful, I
can see.

------
andy_ppp
Great, and I want be a billionaire with washboard abs. The main problem is
none of the Python code that currently exists is thread safe so you might as
well start again from scratch. Python is a needlessly complicated language
with two important things; Numpy and TensorFlow. These use Python as a
scripting language for C. Just move to Go, Scala or Elixir/Erlang if you want
to avoid the GIL (or write anything parallel). You can thank me later!

------
macrael
Do people here use pypy in production? What are the benefits?

~~~
sillysaurus3
Sure. Free 2-5x speedup. pypy + pypy's pip generally works as a transparent
drop-in replacement to python + python's pip, so it's free speed.

It doesn't (or didn't) work when you need to rely on an extension that uses
Python's C API. I haven't followed the scene in awhile so maybe that's
changed. pypy's pip has so many libraries that I hardly notice, so maybe they
solved that.

Unfortunately python is fundamentally slower than lua or JS, possibly due to
the object model. Python traps all method calls, but even integer addition,
comparisons, and so on are treated as metamethods. That's the case for Lua
too, but e.g. it's absurdly easy to make a Python object have a custom length,
whereas Lua didn't have a __len__ metamethod until after 5.1. I'm not sure it
even works on LuaJIT either. Probably in the newer versions.

~~~
JulianWasTaken
I can't tell what you mean by the last paragraph there, but oftentimes PyPy's
speedups come exactly from inlining stuff like what you refer to there --
Python's not fundamentally slower, it's those kinds of stuff that you can
speed up.

(And yeah the CPython API is still a pain point if you've got a library that
uses it, although some stuff will still work using PyPy's emulation layer.
It'd be great if people stopped using it though.)

~~~
sillysaurus3
For example, Python makes it fairly easy to trap a call to a missing method,
both via __getattr__ and __missing__. In JS the only way you can do that is
via Proxy objects, and even those have limits.

You can't always inline the arithmetic ops effectively. You can recompile the
method each time it's called with different types, but that's why the warmup
time is an issue. This wouldn't be a problem if Python didn't make it so
trivial to overload arithmetic. JS doesn't.

~~~
JulianWasTaken
Ah! Yes, agreed, Python does certainly make it too easy to do things that
cannot reasonably be sped up.

~~~
sillysaurus3
Twist: Lua makes it trivial to overload arithmetic using metatables, but
LuaJIT seems to have solved that. If there is any warmup time, it's hard to
tell. Mike Pall is a JIT god, and I wish we had more insight into everything
that went into producing one of the best JIT's of all time.

I'd love a comment/post that highlights the differences between JS and Lua _as
the reason why_ LuaJIT was able be so effective. There must be differences
that make Lua possible to speed up so much. There are easy ones to think of,
but the details matter a lot.

EDIT: I found some discussion at
[https://news.ycombinator.com/item?id=1188246](https://news.ycombinator.com/item?id=1188246)
but it left me wanting more.

Related:

[https://stackoverflow.com/questions/4911762/why-is-luajit-
so...](https://stackoverflow.com/questions/4911762/why-is-luajit-so-good)

[http://article.gmane.org/gmane.comp.lang.lua.general/58908](http://article.gmane.org/gmane.comp.lang.lua.general/58908)

[http://lua-users.org/lists/lua-l/2010-03/msg00305.html](http://lua-
users.org/lists/lua-l/2010-03/msg00305.html)

~~~
sillysaurus3
More:
[https://www.reddit.com/r/programming/comments/1r2s82/lua_fun...](https://www.reddit.com/r/programming/comments/1r2s82/lua_fun_is_a_highperformance_functional/)

------
cool-RR
Python's GIL issue is like the Israeli-Palestinian conflict.

1\. People like to talk about it a lot, complain about it and say their
opinion of what should be done with it.

2\. It's not likely to be resolved for years to come.

3\. In the end, the problem has very little effect on people's lives, much
much less than the amount of hype around the issue.

------
fijal
Hi, blog post author here. Let me put an offer here:

If you want to ask a question that warrants a response (as opposed to
promoting your own effort, which is valid but does not warrant a response),
please mail me, the mail is public and I'll put the responses publically on
either my blog or pypy blog.

------
sobkas
>fully working PyPy interpreter with no GIL as a release, possibly separate
from the default PyPy release

I have concerns that if such functionality will not be in the main release
enabled by default(and consequently don't get as much testing), it will just
bitrot and in the end, will be removed.

~~~
_wmd
They're asking for funding to spend on a risk-free attempt at GIL removal
(risk-free since it won't bone PyPy mainline), if the attempt meaningfully
succeeded I'd imagine their next step would be making it the default.

A fully functional PyPy that could do heavy math in multiple threads would be
an amazing tool in the box, but there are plenty of risks to that (penalizing
single threaded performance, for example). So this strategy makes plenty of
sense to me.

They can't just do it on mainline from the outset because there are huge
obstacles to overcome.. for example, that ancient foe, CPython extension
interface compatibility, which assumes a single global lock covering all
mutable extension data. I don't think there will ever be a way around
maintaining the GIL for that, even if pure Python code can freewheel it
otherwise

------
issaria
It's not going to happen, you not only have to fix all the legacy code, but
also fix the developers.

------
stuaxo
I liked "STM" as a big-idea approach, but can see how going the traditional
way may bear fruit more quickly.

Could the experience gained this way (and by other projects such as the
gilectemy) help with a future STM attempt ?

I wonder if we need better hardware for STM to work well too.

------
vasilakisfil
Just curious: if they solve it in Python, would it be possible to solve it in
Ruby too ?

~~~
taf2
If I understand correctly the issue in Ruby is the existing C extensions that
have been written to assume the lock exists...

~~~
myusernameisok
It's the same issue with Python. AFAIK there are a number of Python libraries
that are not thread-safe, and the GIL prevents them from being an issue.

~~~
aidenn0
I thought the GIL was not held during execution of foreign code in python (at
least that was one point given for why the GIL wasn't a big deal in practice).

~~~
dom0
No, it must be explicitly released. The GIL must be held to invoke almost all
Python runtimes (main exceptions: acquiring the GIL, low-level allocator).

------
unkown-unknowns
> If we can get a $100k contract, we will deliver a fully working PyPy
> interpreter with no GIL as a release, possibly separate from the default
> PyPy release.

If done as a separate release, will that version be maintained in the future?

------
Beltiras
Think about a recursive function whose implementation is changed while it is
running. The replacement might have an entirely different algorithm. Which
version finishes the stack call?

~~~
chrisseaton
The version that was originally activated. I think that's the case in every
single parallel implementation of a programming language ever. I can't imagine
it working any other way.

When you redefine a method in any language I'm aware of you just change which
method the name points to. You don't modify the original method.

~~~
Beltiras
So a function:

    
    
        def fun(*args):
            if not args:
                return 0
            return fun(*(args[1:]))
    

would be call-by-address after the first invocation? It could be lookup-by-
name by way of code.

~~~
chrisseaton
The naive implementation, and the semantic model, is always lookup-by-name on
every invocation.

In practice we apply speculative optimisations including inline caching and
guard removal with remote dynamic deoptimisation via safe points to make it a
direct call instead.

------
tyingq
Would this work with cpython extensions that were ported to PyPy?

~~~
fijal
They would run under GIL (I can't see CPython C API being thread-friendly
unless gilectomy succeeds)

~~~
tyingq
Ah, ok. So this approach doesn't completely remove the GIL, but removes it as
a barrier for pure python code running in PyPy?

Or does it break the current support for porting cpython extensions?

~~~
fijal
It removes it for pure python code. The C extensions run under the lock (which
is unfair to call _interpreter_ any more)

------
est
Sub-interpreter looks like an interesting idea, I don't mind limited to few
primitive immutable objects shared between threads as long as it's shared.

------
faragon
Does it work also in ARM architecture ("weak" memory model), or just in
x86/x86-64-like ("strong" memory model)?

------
denfromufa
Will this work with CFFI?

------
sandGorgon
This is a PERFECT usecase for Kickstarter. It makes me sad that this is a blog
post that made it number 1 on HN with vast readership with open pursestrings..
yet there is not a campaign fundraising link.

Use Kickstarter or Plasso to sell a pypy pro license - its so much easier for
companies to pay invoices than to donate.

If nothing else, I would pay for an official conda pypy package which works
seamlessly with pandas and blas.

~~~
dmix
> yet there is not a campaign fundraising link.

Did you read the article? They said in the article they aren't asking for
individual donations at the moment:

>> we would like to judge the interest of the community and the commercial
partners to make it happen (we are not looking for individual donations at
this point)

Plus I'm sure they will consider using Kickstarter when the time comes.

~~~
dweekly
Cash is one objective way to discern interest.

------
smegel
Removing it is the easy part.

