I know nothing about Python internals, but my understanding from the article is that this "overhead" is about creating a new sub-interpreter (loading modules is particularly slow in Python), not the performance of executing code after it's created.
The article also makes it clear that each sub-interpreter still has its own GIL, but two sub-intepreters can run at the same time without having to care about each other's GILs.
Message passing via serialization / copy is inherently more expensive just passing a reference between threads. So any benefit vs threaded Python depends on the ratio of IPC to actual Python bytecodes interpreted.
IPC-heavy programs with low concurrency may suffer worse under this model than threaded with traditional single GIL. As threads approach infinity, though, anything that scales beats the single GIL model.
Other overheads include spinning up a full interpreter state (including object and malloc caches, GC, etc) per sub-interpreter. And there are some modules with process-global semantics, such as signal-handling — it's unclear how that will be coordinated between co-interpreters, if at all.
Exactly. This seems more like a way to advertise that the GIL is gone while not actually having shared memory parallelism via threads.
It reminds me of how python advertising tricks people into thinking python has real parallelism by talking about ayncio and import multiprocessing; people then actually try those out only to discover the sad state of affairs.
So ... No.