(I authored the linked post) While the "maybe you shouldn't use Python" comment ...

mhneu · on May 2, 2018

I would characterize Python's weaknesses differently.

Startup time is a problem for Python. But concurrency is much more complex than you state: threading is not the only or best concurrency model for many applications. And certainly removing the GIL will not just enable Python "to achieve linear speedups on CPU-bound code". Distributed computing is real. One of Python's problems for a long time was not the GIL, it was the sorry state of multi-process concurrency.

The speed issues that JITs solve for other languages may not be solvable in Python due to language design.

indygreg2 · on May 2, 2018

I'm totally OK with Python's threading choice of saying only 1 Python thread may execute Python code at any time. This is a totally reasonable choice and avoids a lot of complexity with multithreaded programming. If that's how they want to design the language, fine by me.

But the GIL is more than that: the GIL also spans interpreters (that's why it's called the "global interpreter lock").

It is possible to run multiple Python interpreters in a single process (when using the embedding/C API). However, the GIL must be acquired for each interpreter to run Python code. This means that I can only effectively use a single CPU core from a single process with the GIL held (ignoring C extensions that release the GIL). This effectively forces the concurrency model to be multiple process. That makes IPC (usually serialization/deserialization) the bottleneck for many workloads.

If the GIL didn't exist, it would be possible to run multiple, independent Python interpreters in the same process. Processes would be able to fan out to multiple CPU cores. I imagine some enterprising people would then devise a way to transfer objects between interpreters (probably under very well-defined scenarios). This would allow a Python application to spawn a new Python interpreter from within Python, task it with running some CPU-expensive code, and return a result. This is how Python would likely achieve highly concurrent execution within processes. But the GIL stands in its way.

The GIL is an implementation detail, not poor language design.

sitkack · on May 2, 2018

It is a tractable amount of work ~40-80 hrs to convert CPython from a sea-of-globals to a context based system where one could then have a distinct Python interpreters in the same address space, as it is now. You get one. Lua got this right from the beginning, Lua state doesn't leak across subsystems. There is zero chance I would do this work and then see of it would stick. I am going to waste 2 weeks of full time work and then have the CPython folks say, yeah, no, because reasons.

Startup time should be fixed, Python does way too much when it boots, using blank files.

    $ time lua t.lua 

    real	0m0.006s
    user	0m0.002s
    sys	        0m0.002s

    $ time python t.py 

    real	0m0.052s
    user	0m0.036s
    sys	        0m0.008s

armitron · on May 2, 2018

Lua supports the scenario you describe effortlessly not to mention that it's actually designed for embedding.

Python can't even be re-initialized in the same process without introducing memory leaks and other non-deterministic gotchas! [1]

[1] https://docs.python.org/3.6/capi/init.html#c.Py_FinalizeEx

std_throwaway · on May 2, 2018

> The GIL is an implementation detail, not poor language design.

As I understood the GIL simplifies data structures by removing any regard for concurrent access.

If you remove the GIL you must move your synchronization (mutexes) into the data structures and immediately get a big performance penalty.

If you wanted to avoid this overhead you run into swamplands where the programmer must take care of concurrent access patterns and everything. Also many CPython modules would stop working because they assume the GIL.

It can be done but last time I read about the GILectomy there was no clear way forward.

dragonwriter · on May 3, 2018

Yeah, I think this kind of issue is why Ruby, which also has a GIL, seems to be heading for a new concurrency and parallelism model that introduces a new level (Guilds) between threads and processes where the big lock would be held, and where Guilds communicate only by sharing read access to immutable data, and transferring ownership or copies of mutable data.

TimJYoung · on May 3, 2018

I agree that this is an implementation detail. If they were to simply use the JS model of "every thread gets its own environment and message passing is how you interact", then you could still use threads safely and achieve some pretty impressive performance improvements in some cases.

Knowing literally nothing about Python other than what I read, I'm kind of confused as to how the current implementation came to be, because it is much easier to design an interpreter that uses the JS model than one that uses a shared environment among multiple threads. I created an Object Pascal interpreter, and it has this design: it can spin up an interpreter instance in any thread pretty quickly because it's greenfield all the way with a new stack, new heap, etc.

quotemstr · on May 2, 2018

Python's slowness can help improve performance by teaching you to use techniques that end up being faster no matter the language.

Python is so slow that it forces you to be fast.

Consider data analysis: on modern machines, you're almost always better off with a columnar approach: if you have a struct foo { int a, b, c; }, you want to store int foo_a[], foo_b[], foo_c[], not struct foo data[]. It's better for the cache, better for IO, and better for SIMD.

numpy makes it much easier to use the latter than the former, whereas in C, you might be tempted with the former and not even realize how much performance you were leaving on the table. Likewise for GPU compute offloading, reliance on various tuned libraries for computationally intensive tasks, and the use of structured storage.

rossdavidh · on May 2, 2018

Sorry, I didn't mean it to be trolling, I just meant it more or less literally. If Rust (for example) gets used for things like Mercurial and Mozilla, is that bad? I'm not saying Python shouldn't care, if it could improve the startup time without sacrificing other things. But presumably the transition from py2 to py3 was not intending to make things slower, it was intending to solve other problems. There are almost always tradeoffs. Even the mercurial folks quoted in the article said that the things py3 solved were not what they needed. That's a good indicator that Python is not the right language (anymore) for what they're doing.

I am primarily a Python programmer, but if Rust, Go, etc. take over as the language of choice in certain cases, I don't think that's a bad thing. Which doesn't mean one shouldn't write an article to highlight this cost of not having short startup time, just in case this cost wasn't understood by Guido, et al. But my guess (and it's only a guess), is that it was.

bartread · on May 3, 2018

> While the "maybe you shouldn't use Python" comment could be construed as trolling to some, there is definite truth to your line of reasoning and I agree with comment.

I wouldn't say I construed it as trolling. More like, "You might be right, but where does that get us?" Not trolling, but also not that constructive, because it's extremely easy to write something like "maybe you shouldn't use Python" but likely hard and time-consuming to make it so.

There are a lot of questions when considering such a move. For example:

- What's the opportunity cost of migrating $lots_of Python to Rust, or some other language?

- Is that really where you can add (or want to add) the most value?

- And what does having to do that do to your roadmap? Maybe it enables it, but surely it's also stealing time from other valuable work you could be doing?

- Longer term, are we sacrificing maintainability for performance? (In your case it sounds like the opposite?)

- How easily can we hire and onboard people using $new_tech? (Again, it sounds like you might reduce complexity.)

Basically I suppose what I'm saying is I find it a little trite when people say, "well, maybe you should do X," without having weighed the costs and benefits of doing so. And in a professional environment, if that's allowed to become a pattern of behaviour, it can contribute to the demotivation of teams. Hence, I found myself a bit irritated by the grandparent post.

pas · on May 3, 2018

Python was always slow to start. Not as slow as the JVM, but maybe around the 300th test case for hg and maybe around the 100th python script invocation in any build system, people should start to wonder about how to get all of that under one Python process.

It's not like Python is so ugly it'd be messy to do. (It was possible with the JVM after all. It even works by simply forking the JVM, with all its GC threads and so on: https://github.com/spray/sbt-revolver )

Make style DAGs are nice, but eventually the sheer number of syscalls for process setup (and module import and dynamic linking) are going to be a waste of time.

qaq · on May 2, 2018

If one needs Rust, C/C++ level of performance I doubt there is much Python can do and one can wonder if Python was ever the right tool for such a project.

mixmastamyk · on May 3, 2018

It’s a great tool for prototyping.

rrcaptain · on May 3, 2018

If you expect to need the performance of a statically typed, compiled language I don't see why you'd prototype in a dynamically typed, interpreted language.

pas · on May 3, 2018

That's why build systems still look like black magic infused with even darker sh, and a bit of perl sprinkled all over, presumably because the previous maintainers were all out of goat blood.

mixmastamyk · on May 5, 2018

Most layers of a large project that need to be designed and figured out care nothing of those concerns.

RantyDave · on May 2, 2018

I feel bad for even thinking it but ... I bet go's startup times are great.

metalliqaz · on May 2, 2018

I think your characterization of the GIL is not accurate. Show me ANY real world program that can achieve linear speedups on multicore or multi-processor systems. Humans have not sufficiently mastered multithreading to be able to make such a claim. I am not aware of any "CPU-bound" use cases that would actually use Python like this instead of, say, C or Fortran. And anyway, I submit that it would benefit (both from a design and an execution standpoint) from being multi-process (in other words, using explicitly coded communication).

m_mueller · on May 3, 2018

Regarding the GIL I‘ve always wondered about Jython but never gotten around to trying it. What are the drawbacks of running it on a JVM to get true multithreading? Having to properly sync the threads like in other environments without global locks?

pas · on May 3, 2018

Nothing, it's just not maintained. People realized, that yeah, python is nice, but why spend years reimplementing it on the JVM, when there's Kotlin. (And Java itself is quite a breeze to program in nowadays. And of course Scala, if you dare go beyond the Pythonic simplicity.)

yorwba · on May 3, 2018

Jython doesn't look completely unmaintained: https://hg.python.org/jython

It's also not completely obsoleted by Kotlin, e.g. for the use case of calling a Python library from Java. However, the Python semantics are not a great fit for the JVM, so you should expect it to be slower than plain CPython: https://pybenchmarks.org/u64q/jython.php