
Subinterpreters for Python - lukastyrychtr
https://lwn.net/SubscriberLink/820424/172e6da006687167/
======
gtrubetskoy
Subinterpreters existed from the very early days in the C API and were key to
the implementation of mod_python (which I wrote). So if you used mod_python,
you used subinterpeters without realizing it.

[http://modpython.org/live/current/doc-
html/pythonapi.html#mu...](http://modpython.org/live/current/doc-
html/pythonapi.html#multiple-interpreters)

EDIT: And it looks like I had subinterpreters in the first released version in
May 2000, so the initial git (formerly SVN) commit already had them
[https://github.com/grisha/mod_python/blob/9b211b7e8a65f1af4b...](https://github.com/grisha/mod_python/blob/9b211b7e8a65f1af4b5facd29797703f8e76d97b/src/mod_python.c#L1874)

EDIT2: Just noticed this comment:

    
    
      * Nov 1998 - support for multiple interpreters introduced.

~~~
BiteCode_dev
How did you deal with C-extensions, since apparently most don't support it at
all (which is a shame, apparently we messed up culturally here).

~~~
gtrubetskoy
I didn't :)

------
geofft
From the PEP:
[https://www.python.org/dev/peps/pep-0554/](https://www.python.org/dev/peps/pep-0554/)

> _A common misconception is that this PEP also includes a promise that
> subinterpreters will no longer share the GIL. When that is clarified, the
> next question is "what is the point?". This is already answered at length in
> this PEP. Just to be clear, the value lies in:_
    
    
        * increase exposure of the existing feature, which helps improve
          the code health of the entire CPython runtime
        * expose the (mostly) isolated execution of subinterpreters
        * preparation for per-interpreter GIL
        * encourage experimentation
    

I think I'll ask the followup question - what is the point _of those_? Why
should we increase exposure of an existing feature we know is not fully baked
and we know will cause problems with NumPy/SciPy? How will the exposure
improve the code health of CPython and who will do the improvement? What is
the advantage in exposing isolated execution of subinterpreters? In what way
does exposing this feature help prepare for a per-interpreter GIL? What
experiments are being encouraged specifically?

~~~
mehrdadn
I think the Rationale section tries to explain the benefits?
[https://www.python.org/dev/peps/pep-0554/#rationale](https://www.python.org/dev/peps/pep-0554/#rationale)

What I don't get is how they expect C extensions to adapt. I don't expect
widespread support for this for like... a decade?

~~~
geofft
The Nick Coghlan quote seems a bit out of context - sure, there are all these
things it does that multiprocessing doesn't do, but multiprocessing _allows
concurrency at all_ and this doesn't. The comparison makes sense if it's
talking about no-shared-state concurrency, but this PEP is quite clear that
it's only proposing no-shared-state ... single-threaded operation.

The rest of the section is sparse on what you actually do with it. It says
that it "has the potential to be a powerful tool" but not what you do with the
power. It says it's about "enabling the fundamental capability of multiple
isolated interpreters" but it's not clear what the capability brings you. The
only thing with some details is the first sentence - "Running code in multiple
interpreters provides a useful level of isolation within the same process."
But what is the level? It sounds like it gives you isolation for most things
but not all, which is about as useful as a face mask with a breathing hole cut
out in it.

It'd help to see a concrete answer of something you can build with this that
you can't build without it (or perhaps not as
easily/performantly/reliably/etc.). The Ceph PR, where they do something
similar themselves
([https://github.com/ceph/ceph/pull/14971](https://github.com/ceph/ceph/pull/14971)),
gives a good answer: "Notably, with this change, it's possible to have more
than one mgr module use cherrypy." (But it's still not clear to me why the mgr
can't just use multiple Python interpreters... ceph-mgr is kind of an amalgam
of various useful services for your Ceph cluster, and it's never been clear to
me why it has to be "the" manager with some modules instead of various
independent services you can turn off/on as you need.)

~~~
pdonis
_> multiprocessing allows concurrency at all and this doesn't_

I do not think multiprocessing will meet the concurrency needs of many
potential applications.

For one thing, I've looked at the multiprocessing source code, and to me it
looks like a hairball of bugs just waiting to happen. To run Python code in a
separate process with some set of initial data, it basically pickles it in the
parent and then unpickles it in the child. That seems awfully kludgey to me.

For another, if you have multiple concurrent tasks that have to share data,
doing it within a single process between multiple threads is a lot easier than
doing it between multiple processes. Every other major language besides Python
allows the former; only Python has the GIL which makes the former basically
useless.

~~~
loeg
> To run Python code in a separate process with some set of initial data, it
> basically pickles it in the parent and then unpickles it in the child.

FWIW, the subinterpreter threading scheme will also use a similar message
passing construct to pass values between interpreters. As the sibling comment
mentions, it's just a message passing scheme.

That said,

> I've looked at the multiprocessing source code, and to me it looks like a
> hairball of bugs just waiting to happen.

I have as well, and completely agree.

~~~
stuaxo
Came here just to agree.

------
VWWHFSfQ
I feel like Lua would have gotten more traction in the early days, and maybe
eclipsed Python, if it wasn't so hyper-focused on being a tiny embed-able
language/runtime. It was so well-designed from the very beginning.

Every new release of Python seems like it gets bigger and bigger and more and
more incomprehensible. The Python 2 -> 3 transition was a (necessary?)
disaster. And now we're trying, in earnest, to figure out how to get rid of
the GIL. The async/await syntax is a whole other fiasco. Now we have colored
functions all over the place. Python code doesn't even work with Python code.
It's just an absolute mess.

~~~
luhn
What's wrong with the async/await syntax?

~~~
VWWHFSfQ
It should have been implemented as functions or attributes instead of language
syntax sugar.

Like Lua did. Like Rust did.

Imagine this:

    
    
       def foo():
           return "bar"
    
       async def foo():
           return "bar"
    

How do you know how to call that function? It's called the same thing. It has
the same signature.. It returns the same thing. One is sync, the other is
async. This is "colored functions".

You have to call the 2nd one like:

    
    
        await foo()
    
    

why do you have to care about that? It's either a coroutine or it's not. All
functions in Python should just be callable like normal functions. But the
Computer Scientists went and messed that up and made colored functions.

It's absurd.

~~~
pdonis
I don't mind the "await foo()" so much; as others have noted, it makes
explicit that your code is yielding control at that point.

What I mind is having to do "async def", "async for", "async with" all over
the place for no good reason. Python didn't do that with generators; you
defined an ordinary function, and if it had a "yield" somewhere in its body,
it returned a generator. I don't have to do "gen def", "gen with", "gen for",
etc. to remind the interpreter that it's a generator.

The obvious way to handle async coroutines is the same: you define an ordinary
function, and if it has "await" somewhere in its body, it's a coroutine. I
shouldn't have to be reminding the interpreter everywhere that it's a
coroutine.

(The reason "await" is a special case is that, with a generator, it's just an
ordinary iterable, so it's not yielding control if I just use it like any
other iterable, say in a for statement. If I want to do something with the
generator that might yield control (as in the "using generators as coroutines"
paradigm that came before async/await), I have to call its "send()" method, or
otherwise do something special that will stand out. So there's no need to add
a keyword for it. But with a coroutine, ordinary call sites that wouldn't
yield control with any other kind of function can yield control, so it makes
sense to add a keyword to make that explicit.)

~~~
BiteCode_dev
I have the opposition opinion. I think generators should have had an explicit
keyword for defining them, like:

gen def stuff()

Because it doesn't, people are very confused about generators, and think they
are functions, which they are not. They are completely different objects.

Plus there is no way to tell if a definition is a generator without reading
all of it.

Explicit is better than implicit and all that.

------
trentnelson
> NumPy core developer Sebastian Berg chimed in as well. He suggested that it
> could take up to a solid year of work to support subinterpreters in NumPy.

Whoa, that's a shame. I actually found it really easy to patch NumPy to make
it PyParallel-compatible -- simply had to tweak the memory allocator stuff:
[https://github.com/pyparallel/numpy/commit/046311ac1d66cec78...](https://github.com/pyparallel/numpy/commit/046311ac1d66cec789fa8fd79b1b582a3dea26a8)

Example that loaded a 12GB NumPy array and serviced requests in parallel:
[https://github.com/pyparallel/pyparallel/blob/branches/3.3-p...](https://github.com/pyparallel/pyparallel/blob/branches/3.3-px/examples/wiki/wiki.py).

I really wish the PyParallel approach gained more traction. Having the
solution Windows-only reaaaaally didn't help with having other core developers
experiment with the approach.

[*]: [https://pyparallel.org](https://pyparallel.org)

~~~
anentropic
This seems great!!

Obviously, being Windows-only is a big drawback - is there any hope it could
be implemented for other platforms some day? (I mean in terms of technical
feasibility, rather than effort, money etc)

I know literally nothing about the OS side of things relevant to this topic,
but googling for "async io linux" turns up io_uring which seems to date from
2019ish and is maybe addressing some of what's lacking there?

~~~
trentnelson
If you're looking for a bit more info on the Windows-only aspect, there is a
lot of detail here: [https://speakerdeck.com/trent/pyparallel-how-we-removed-
the-...](https://speakerdeck.com/trent/pyparallel-how-we-removed-the-gil-and-
exploited-all-cores).

I think the TL;DR though is that PyParallel was a successful proof-of-concept
(but seems to have failed at moving the bar anywhere) _because_ the threadpool
and async I/O primitives on Windows are so much more sophisticated than what's
available on any other platform.

On Linux/Mac, I'd have to write _so_ much scaffolding to get the same baseline
functionality offered by Windows, and many parts wouldn't even be possible to
replicate. It'd be a huge engineering effort that would take a team of people.
(Just like all the Vista+ threadpool stuff took a team of kernel engineers
working for years at Microsoft.)

That being said, I've been looking at stuff like the Chromium cross-platform
threadpool stuff recently and that could potentially be used as a substitute
(I believe it maps 1:1 with native threadpool APIs on Windows, and mimics the
best it can on Linux/Mac). But that's an unwieldy 3rd party package for Python
to suddenly depend on.

I also disagree with the sentiment that the GIL, parallel computing and async
I/O are all separate, orthogonal pieces. The reason PyParallel was so
performant was the fact that I treated all three as very intertwined concepts
that had to be addressed all at once.

------
aasasd
As usual, when someone makes a move towards the Coveted Feature, people
promptly insist that gradual progress is impossible and the only way is with
every problem being solved at once in the hundred million modules—each somehow
taking a year to undo the use of global variables. I.e. same as it was for the
past twenty years when gradual progress could've been made. Deference is again
made towards Numpy, the de-facto implementation of Python. No trace of choice
is offered to users who would perhaps use a handful of modules from those that
happen to be converted at a point in time, or use Python in ways that aren't
practical now.

The actual problem, of course, is that multi-threading Python would still be
slow, and those who see it as just a shell to run C modules, do have a point.

------
jonathanpoulter
This seems like such a significant problem for the language and has been
around since the beginning of time. Considering the amount of value that
Python adds to companies and individuals around the world, is there a reason
that institutions or someone with the means hasn't funded a project to "solve"
the GIL problem?

~~~
acdha
It's far more common to have Python programs be I/O bound, and when they are
CPU bound it's often not due to GIL contention (remember that C extensions can
drop the GIL before doing something lengthy). It would be nice if the GIL was
gone but a fair fraction of Python developers would not notice much
difference.

~~~
pdonis
_> It's far more common to have Python programs be I/O bound_

But part of the reason for that is that Python programmers know there's no
point in trying to run multiple CPU-bound tasks in the same Python process, so
they don't try.

 _> C extensions can drop the GIL before doing something lengthy_

Yes, but they're still limited in what they can do--as soon as they call back
into a Python bytecode they're GIL-bound again. And if you depend on C
extensions any time you need CPU bound concurrent tasks, you're giving up a
lot of the advantages of using Python in the first place.

~~~
acdha
It’s part of the reason but not all or, I suspect, even most: the number of
things which people need to compute with multi threading but without much I/O
is not an especially large fraction of what people use pure Python for. If you
need raw CPU speed for arbitrary code, it’s not the language most people would
pick.

The exceptions also tend to have existing high-quality extensions (crypto,
compression, image processing, etc.) so while it’s technically true that
you’re giving up Python most people aren’t doing that personally - they’re
just calling Pillow or numpy - or they’re using it for a tiny fraction of the
total program.

Frequently this ends up being the same speed or even faster than using other
languages because most people are either using the same C libraries or
learning just how many optimizations their simple implementation lacked.

Again, it’s not hard to come up with things where the GIL is inarguably a
bottleneck but it comes up a lot more in debates than real-life in my
experience.

~~~
andreareina
I'm one of those with embarrassingly-parallel cpu-bound workloads. The
multiprocessing module works, but the extra bookkeeping and plumbing over an
actually-parallel-multithreading implementation is a pain in the butt. That
said, the multithreading speedup is both faster and easier than porting to
another language.

~~~
acdha
Definitely - I used to support a computational lab and have worked on enough
other CPU-bound problems to have plenty of things which I wouldn’t recommend
Python for. I just think that as a field we’re predisposed to focus on that as
the kind of work Real Programmers™️ do when most groups are limited by their
ability to implement and maintain business logic long before they hit the wall
on what Python can do.

------
hhas01
I recall digging into Python subinterpreters a decade ago. Abandoned it
because the real killer wasn’t shared GIL, it was shared modules. If one
subinterpreter imports a module and modifies that module’s state, then every
other subinterpreter that uses that module is impacted too.

Article says nothing about that, which makes me very cautious. The GIL only
impacts performance; module sharing wrecks robustness.

Even if each subinterpreter does now keep its own module cache, there’s still
the challenge of working safely with common C-level resources such as file
handles. (Other than telling users “don’t do that”—and GLWT.)

I’ve used Python now for 17 years and it’s been a very productive tool for me.
But it definitely has its baked-in limitations and fighting those is an
exercise in rapidly diminishing returns.

Something as fundamental as parallelism can’t just be slopped on top of a
language as an afterthought; it needs to be designed for it from the start. So
rather than trying to retrofit bad parallelism to Python, perhaps it’d be
sense to bring the positive parts of Python over to something that already
does parallelism right, such as Erlang, and build from there.

------
shanemhansen
I don't see the benefit of a subinterpreter compared to a subprocess. I wish
python didn't have a GIL, but I don't see what problems this would solve for
me that multiprocessing doesn't.

~~~
joelthelion
Perfomance,I suppose? Spawning a process is expensive.

~~~
xioxox
I understood that on Linux using a process was similar in speed to a thread,
though they're much slower on Windows. Has that changed?

For my python multi core code I like using fork with sending objects over a
socket with pickle. It gives more control than multiprocessing.

It works pretty well. One downside of fork with python is that the reference
counters are scattered about, leading to big COW memory churn, however. Big
numpy arrays should be shared, however.

~~~
BiteCode_dev
Spwaning the process is not the only problem, communicating between them is
also expensive.

------
davidw
Tcl has had those for...20 years or so? They're pretty handy for some things.

~~~
bch
Absolute sandboxing, w the option to engage “safe” interp that also (tuneably)
segregates it from OS facilities like the network and filesystem.

/s Will be looking forward to Pythonistas to be extolling the virtues of
multiple interps as much as hearing node folk go on about event-oriented
programming ;)

Nb: Much love to both node and python. Looking forward to seeing python
exercise this.

------
pinopinopino
This is pretty common for languages right? TCL and the JVM come to mind. Nice
for code running in a sandboxed environment, especially if you have full
control. I think python is getting more interesting lately, always found it a
bit of a boring language.

------
coldtea
> _In particular, giving each subinterpreter its own global interpreter lock
> (GIL) is not (yet) on the table._

What's the use then?

~~~
NegativeLatency
[https://news.ycombinator.com/item?id=23172884](https://news.ycombinator.com/item?id=23172884)

~~~
coldtea
So, basically, no real reason besides "preparation for per-interpreter GIL"...

I, for one, wouldn't be encouraged to use it in this state, where it doesn't
really bring any benefits yet!

------
dilandau
I suspect at this point it will require a new interpreter to fully address the
deeply-embedded assumptions about memory access and bytecode execution in
CPython. This has been tried many times, though, with only PyPy really seeing
much traction. CPython being the de-facto interpreter, and the huge ecosystem
of widely-used C extensions, makes any change likely to be slow... and the
upside would need to be very compelling.

I think what we see instead is people moving to different runtimes altogether
-- golang, jvm, v8, whatever.

I would suggest that Python has gone off the rails trying to mimic other
languages too much recently. At some point the identity crisis hopefully ends
and Python will return to it's strengths -- which are best expressed by the
Zen of Python. Until then, get ready for more features.

~~~
BiteCode_dev
See HPy for another solution to that: [https://speakerdeck.com/antocuni/hpy-a-
future-proof-way-of-e...](https://speakerdeck.com/antocuni/hpy-a-future-proof-
way-of-extending-python?slide=1)

------
anentropic
My first thought was, like many others, "if they still share a GIL then why
bother?"

But if this is a first step along the road to that, then it's all good and why
not.

Another thought is: could this be a "thread-safe alternative to gevent" in
some use cases?

I'm thinking particularly of web app hosting situation where you have
something like uWSGI+gevent+your WSGI app

In that case, apart from the required gevent.monkeypatch (which presumably
would not be needed with subinterpreters) your web app code does not
explicitly do any stuff with gevent. But it is running in a gevent thread, so
your web app code now has a non-obvious requirement to be thread-safe. This
has bitten me with subtle bugs in the past.

I would be interested to explore the characteristics of a
uWSGI+subinterpreter+WSGI app set up. It should avoid the thread-safety issue.
Like gevent threads, the subinterpreters would share a GIL. I guess it might
use more memory?

------
jamestimmins
I really enjoy this style of article. Looks like the author based most of this
off of mailing-list interactions. There's clearly fascinating information
within those threads, despite how difficult they are to follow retroactively.

~~~
roelschroeven
Absolutely. LWN has quite a lot of those; it's one of the main reasons I have
a subscription there.

------
boublepop
My 2 cent would be to make the necessary changes in the standard library such
that an external library (on pypi) can enable the feature. That lets us start
working with them, but puts no pressure on c extension developers to support
it because subintepreters “aren’t standandard” but just yet another pypi
library which might fail in combination with others. Then in a couple of years
if the ecosystem adopt and supports them, it can be moved to standard.

------
eatonphil
Is this like V8 Isolates for Python?

------
bsder
Oh how I wish Python had just sucked it up at 3.0 and eaten the performance
hit for removing the GIL.

By now, everybody would have optimized it back to normal (or better!).

~~~
loeg
There was no possible way to do that given the things python lets you get away
with "atomically". Requiring explicit synchronization to avoid data races
would have been an even bigger language break than the other breaking changes
in 3. Python is not a language for people who want to do fine-grained explicit
concurrency.

And doing it implicitly would require stupidly fine-grained locks on all
objects, destroying any performance gains; there's no way to "optimize" that
to GIL performance.

~~~
Doxin
There's still plenty of programs where you'd want to spawn a thread that
basically doesn't touch any of the data of the parent thread. It'd be nice if
there was a way around the GIL for those cases.

As an example take the multiprocessing.pool.ThreadPool.map function. _Most_ of
the use-cases of that function only read from the parent threads memory once
to pass the function arguments. After that the thread may very well spend a
lot of time only reading memory it reserved itself. It's rather wasteful to
have that thread wait on the GIL.

Of course I haven't got an obvious solution either but it seems to me that for
at least a subset of the uses of threading you could work around the GIL
without breaking python too badly.

Perhaps split the GIL into a per-thread Thread-interpreter-lock. That way each
object can simply be annotated with which lock it belongs to. That way you
still get proper atomicity like you're used to in python, but a thread might
also _actually_ run concurrently a lot of the time if it only touches objects
it created itself.

~~~
loeg
> There's still plenty of programs where you'd want to spawn a thread that
> basically doesn't touch any of the data of the parent thread. It'd be nice
> if there was a way around the GIL for those cases.

Subinterpreters gets you that (eventually; not in v1 apparently).

> Perhaps split the GIL into a per-thread Thread-interpreter-lock. That way
> each object can simply be annotated with which lock it belongs to. That way
> you still get proper atomicity like you're used to in python, but a thread
> might also actually run concurrently a lot of the time if it only touches
> objects it created itself.

I don't think this scheme would work especially well (or not better than
subinterpreters). Message passing still requires copying, or some way to
quiesce other threads and recursively change lock ownership of an object
graph. To avoid deadlocks, at least one thread will need to drop its own lock
to acquire the thread lock of objects owned by a 2nd thread. I'm not sure you
could do that and preserve the legacy global atomicity behavior, and it would
by definition be slower and bloatier on any individual thread than the
existing GIL behavior.

------
smabie
Why does Python move at such a glacially slow pace? The language itself is
unfortunate, the implementation even worse. Why can none of these problems be
fixed?

Though, maybe it doesn't matter. Julia is better in every way and supports
easy Python interop. It's already a questionable decision to start a new
Python project today and it probably will only become more questionable in the
future.

The language no longer serves any niche nor has any purpose. Let's just all
move on.

~~~
BiteCode_dev
Read the other comments and people says it moves too fast.

People are never happy.

~~~
nurettin
And beazley wouldn't care, right?

