The "Faster Python" team are doing a fantastic job, it's incredible to see the progress they are making.
However, there is also a bit of a struggle going on between them and the project to remove the GIL (global interpreter lock) from CPython. There is going to be a performance impact on single threaded code if the "no GIL" project is merged, something in the region of 10%. It seems that the faster Python devs are pushing back against that as it impacts their own goals. Their argument is that the "sub interpreters" they are adding (each with its own GIL) will fulfil the same use cases of multithreaded code without a GIL, but they still have the overhead of encoding and passing data in the same way you have to with sub processors.
There is also the argument that it could "divide the community" as some C extensions may not be ported to the new ABI that the no GIL project will result in. However again I'm unconvinced by that, the Python community has been through worse (Python 3) and even asyncIO completely divides the community now.
It's somewhat unfortunate that this internal battle is happening, both projects are incredible and will push the language forward.
Once the GIL has been removed it opens us all sorts of interesting opportunities for new concurrency APIs that could enable making concurrent code much easer to write.
My observation is that the Faster Python team are better placed politicly, they have GVR on the team, whereas No GIL is being proposed by an "outsider". It just smells a little of NIH syndrome.
The GIL will never be removed from the main python implementation. Histortically, the main value of GIL removal proposals and implementations has been to spur the core team to speed up single core codes.
I think it's too late to consider removing the gil from the main implementation. Like guido said in the PEP thread, the python core team burned the community for 10 years with the 2-3 switch, and a GIL change would be likely as impactful; we'd have 10 years of people complaining their stuff didn't work. Frankly I wish Guido would just come out and tell Sam "no, we can't put this in cpython. You did a great work but compatibility issues trump performance".
Kind of a shame because Hugunin implemented a Python on top of the CLR some 20 years ago and showed some extremely impressive performance results. Like jython, and pypy and other implementations, it never caught on because compatibility with cpython is one of the most important criteria for people dealing with lots of python code.
> The GIL will never be removed from the main python implementation.
I don't see why. It's a much easier transition than 2 to 3.
Make each package declare right at the top if they are non-GIL compatible. Have both modes available in the interpreter. If every piece of code imported has the declaration on top, then it runs in non-GIL mode, otherwise it runs in classic GIL mode.
At first most code would still run in GIL mode, but over time most packages would be converted, especially if people stopped using packages that weren't compatible.
Python would be too dynamic a language to have non-GIL compatibility declared simply at top. Code can be imported/evaled/generated at runtime, and at any part of a script, which means that python would need to be able to switch from non-GIL to GIL at any time of execution.
It's true that there are dynamic imports, but presumably it would be on the library maintainer to know about that, but also you could throw a catchable error about GIL imports or something like that.
All I'm saying is that it's solvable, and more solvable than 2 to 3.
As long as all the data structures stay the same, all you really need to do is flush out all the non-GIL bytecode and load in (or generate) GIL bytecode.
Sure, there might be a stutter when this happens. You will also want a way to either start in GIL mode, or force it to stay in non-GIL mode, throwing errors. But it's a very solvable problem.
There is no programming language problem related to concurrency and C-level APIs that change the amount of code Python represents that is NOT insurmountable. This is like linux "we don't break userspace" motto. There is conceivably no possibility of removing GIL without breaking the userspace. Programs will be buggy, some will stop working. Worse, until someone works around the halting problem, we can't just crawl python code and decide if it uses GIL. People will have to be paid to inspect one function at a time to determine where GIL is needed.
Removing the GIL is easy. It's already been done with Jython and IronPython. No changes to python code is needed.
You do still need locks for concurrency, it's just that instead of a corse-grained global lock, you replace it with a bunch of fine-grained locks. Which is actually one of the sticking points with the proposal to remove the GIL. All those fine-grained locks actually impact performance for a single-threadded python programs.
The impossible problem is removing the GIL without breaking compatibility with the cpython API; There is a bunch of non-python code (mostly c/c++) that interfaces to python code with the cpython API and it's that code which breaks when you remove the GIL.
The proposals floating around don't actually fix that issue, it's more of a "let's break API compatibility anyway", allowing libraries to be updated the new API incrementally.
There are no issues with the halting problem here. It's really easy to detect at runtime if python code is interacting with c/c++ code via the old cpython API. Remember, the halting problem only applies to static analysis, not dynamic runtime checks.
This is extremely naive, and you answer your own question:
> [Par 2] ["Removing GIL creates performance costs that people don't want"]
> [Par 3] ["Removing GIL breaks CPython API which tons of codebases rely on and people will have to be paid to fix all that code"]
In short, it's not possible to remove GIL without requiring code change.
> There are no issues with the halting problem here. It's really easy to detect at runtime if python code is interacting with c/c++ code via the old cpython API. Remember, the halting problem only applies to static analysis, not dynamic runtime checks.
This is not true and you're thinking of this wrong. When presented with a "will my code work after GIL removal" type of problem, the question becomes "will my code run C/C++ API". Whether it's detected at runtime is irrelevant because the question you want to answer is "does this piece of software require any change at all to work without GIL". To do that, you need a tool that statically analyzes whether known-bad functions (those that run C/C++ functions trivially/deterministically) are called from the rest of the codebase.
Imagine you work for a company that has a package "xyz" written in C/C++ called in Python. Now that GIL is removed in an imaginary Python 4, my boss comes to me and asks "how much code needs to be changed for us to port to Python 4". Now, you can apply heuristics such as "how many modules import xyz". But since the correctness implications are co-infective, any module that doesn't import xyz but imports a module that import xyz will be affected as well. You can be more granular and verify whether individual functions use objects in xyz. Which brings me to my original point that there are two cases, either this is a static analysis problem (which we lack) or it's work someone has to crawl the entire codebase, inspect function by function, run unittest by unittest to determine if GIL removal breaks anything. I know because people did exactly this in 2 -> 3 change and it's an extraordinarily expensive ($$$) transition. Your view is naive beyond comprehension and it's hard to think of GIL removal as anything easier than the 2 -> 3 transition which was a shitshow of the magnitude software industry hasn't seen before.
It's super easy to check if any C/C++ code has been imported. Not too hard to get a reasonably good result with static analysis, just find all import statements and see if it maps to a python file or a dll/so file, recusing as needed. You don't need to look at any other code except the import statements.
For an even better result the main python interpreter can be modified with a flag to just list all imported dll/so files. Run your application with that flag and see what it imported and it will catch weird edge cases (like, python code that messes with import paths)
There is no need to check the entire codebase. It's only the C/C++ code that directly uses the cpython api that needs to be checked. Any python code can be ignored, along with C/C++ code that doesn't interact with the cpython API.
This isn't like the 2 -> 3 transition where you needed to touch almost every line of python code. The amount of touched code for most companies should be tiny.
Also, the proposal isn't for a python 4 were everyone is required to upgrade to this new paradigm. It's for an optional mode that allows running without the GIL. The only people who actually need to worry about it are people who want multithreadding.
Your theoretical company can do a quick check to see what c/c++ libraries are imported and estimate how much code would need to be checked. If it's too daunting, they can just keep operating with the GIL enabled.
> Your theoretical company can do a quick check to see what c/c++ libraries are imported and estimate how much code would need to be checked. If it's too daunting, they can just keep operating with the GIL enabled.
Once again, this doesn't make sense and you're simply naively glossing over things. This won't work because likely one of your dependencies use no-GIL mode (such as numpy or something like that). This essentially forks the ecosystem into two, and if you have a 5 million LoC GIL python code you have no way of importing a no-GIL dependency. Nobody will sign up for that kind of mess, python package management is already a huge mess. What happens if the library goes from GIL to no-GIL? Runtime exception. Yeah, thanks but no thanks.
Beyond that, it's unclear how best to respond to your comment about static analysis being easy and that you don't need to inspect python code line by line. It needs to be understood that there are companies who have millions of lines of code where code is mixed with C/C++ FFI calls all over the place. I have a codebase like that where almost everything is FFI/Python mixed. I know many companies do. In our case we do need to go line by line to see what's inspected. Even if we don't, our boss will ask us to because if anything fails it'll be our responsibility so we at least need to do the busy work of confirming nothing is affected. It seems like you suffer from the problem that you identify a small subset of reality, then determine the solution you propose is practical, when in real life it causes tons of codebases to be almost rebuilt. Knowing Python folks, I know no one has any interest in such a thing.
I'm confused. Why do people think my suggested solution is difficult enough to be comparable to the halting problem or "a smart enough compiler"?
We are talking about a scenario where cpython has already been modified to support both GIL and non-GIL operating modes, and a set of libraries that support the new non-GIL mode have been manually annotated by programmers.
All I'm suggesting is that on top of this, we modify the "load library" code to detect at runtime when we are loading a library that doesn't support the new non-GIL mode and switch modes.
I mean that seems horribly tricky, but also totally doable.
Having read about JavaScript/Java VM optimisations in JITs and GC I would be surprised if a global state change like this is not manageable - think deoptimising the JS when you enter the debugger in the dev tools in your browser.
This is wildly misunderstood by many just how insanely good is the current JVM. It scares me how much it has improved in the last ten years. Since it has been open source for a long time, many university researchers have done experiments and proposed HotSpot improvements. Plus, Sun/Oracle employs a small army of PhD researchers to work on it "night and day". If Project Valhalla ever sets sail (pun intended), I could imagine other languages begin to target JVM instead of their own (worse) VM. (Raku talked about it a few years ago.) When I need to move from Java to Python, I am always painfully reminded how terrible is the VM performance in a tight loop that does not use list comprehension.
I'm no fan of Python after the 2-3 debacle, but to be fair, you're comparing apples and oranges here. Why not instead compare it to another C-like language (Java was originally intended to be the replacement for C++ back in 1995, and Python dates back to 1989.)
Go or Rust, or even Erlang or Haskell would be great examples that are more inline with the design goals (then and now) of Java.
That is exactly my point. People start a small project in Python. Velocity is very high. Then, the project grows into a monstrosity. Suddenly, the awful Python "VM" (slow as hello) and lack of true multi-threading are major project liabilities. They should have started with someone much less sexy: DotNet (C#) or Java. Repetitive? Hell yes, but easy to maintain and cheap to hire devs.
FWIW, the JVM is one of the supported backends of the Raku Programming Language. Has been for quite some time actually. It was the first additional backend supported after Parrot, and provided a template for supporting the MoarVM and Javascript backends since then.
I don't see why you wouldn't have holdouts using GIL for valid single-threaded performance reasons for years/decades. And that's ignoring legacy code - even 2.7 is still alive and kicking in some corners.
I'm sure you would, but anyone who cared about it would work around it, either by not using that library or finding a different one or even forking the one that isn't updated. Just like most people use Python 3.x now, but there are some 2.7 holdouts. But those holdouts aren't holding back the entire ecosystem at this point.
The difference is that no-GIL is a minority use case. Most people are using Python in single threaded workflows (numpy/scipy, machine learning, build scripts, etc etc) and who don’t care about the GIL. The GIL mostly affects webdevs.
Probably the library support will be much worse than 2-3, but if only a minority of users are impacted by libraries that don’t support it then it’s not a big issue. GIL concerned users can either carefully select their dependencies to be no-GIL supporting, or they can decide to go with the GIL.
2-3 was bad because every single library, app, and tiny script had to be migrated, if it used a string or a print statement.
Even skilled programmers often make the mistake of saying "why don't you just..." or "it's easy, just..." when completely ignoring large important factors, such as following process, ensuring backward compatibility, stakeholder alignment (ugh), and addressing long tail problems.
"There are two types of programmers: new ones who don't know how complicated things are, and experienced ones who have forgotten it" -from an article on HN the other day about setting up python envs.
There are usually two kinds of coders giving advises. A fresh one that has no idea how complex things really are, yet. Or an experienced one, that forgot it.
I really think the GIL is saving a bunch of poorly written multi-threaded C++ wrappers/libraries out there. If they remove it, a bunch of bugs will appear in other libraries that might not be Pythons fault.
They're not "poorly written", the fact that you don't need to do any locking in C/C++ code is part of the existing Python API. Right now when Python code calls into C/C++ code the entire call is treated as if it's a single atomic bytecode instruction. Adding extra locking would just make the code slower and would accomplish absolutely nothing, which is why people don't do it.
In order for the call into C to appear atomic to a multithreaded interpreter, all threads in the interpreter would need to be blocked during the call. That's possible to do, but you've just re-introduced the GIL whenever any thread is within a C extension.
In the unlocked case, one could use low-overhead tricks used for GC safepoints in some interpreters. One low-overhead technique is a dedicated memory page from which a single byte is read at the beginning at of every opcode dispatch, and you mark that page non-readable when you need to freeze all the threads executing in the interpreter. You'd then have the SIGSEGV handler block each faulting thread until the one thread returned from C. That's fairly heavy in the case it's used, but pretty light-weight if not used.
Nevertheless, this is still a concern to wider ecosystem, if Python libraries suddenly start to break due to underlying issues. I don't think this can be neglected.
I think it's too late to consider removing the gil from the main implementation.
I think it'll happen one day. Is Python going anywhere?
Give it 20 years. The 2 -> 3 switch will be like the Y2K bug, only remembered by the oldest programmers. The memories of pain will fade, leaving only entertaining war stories. The GIL will still be there, and still be annoying.
Then, when everyone has forgotten, the community will be ready. For an incredibly long and grinding transition to Python 4.
Wat! I'm experiencing 2-3 pain right now! Google has given me till January next year to port all my apps that have been running happily for 10 years. It's no small job either.
I'm already in a state where I can't upgrade to the latest gcloud tools because 2.7 is removed from them. I'm stuck on a version from late last year until I've finished my ports or abandon my projects.
I was a very happy user of GAE standard for many many years, but they burned me in the end and I've learnt my lesson about building on proprietary tools.
Standard frameworks and standard databases for me from now on!
In 20 years, you wouldn't have Python 4. You will have something like ChatGPT that you interact with and it writes code for you, down to the machine level instructions that are all hyperoptimized. Coding will be half typing, half verbal.
Imagine debugging hyperoptimized machine code. - Or would you just blame yourself for not stating your natural language instructions clearly enough and start over? I guess all of these complex problems would somehow be solved for everyone within the next 21 years and 364 days.
You wouldn't debug it directly. The interaction will be something like telling the compiler to run the program for a certain input, and seeing wrong output, and then saying, "Hey, it should produce this output instead".
The algorithm will be smart enough to simply regenerate the code minimally to produce the correct output.
What you wrote is naive. I dont think AI will ever be able to guess what correct output for arbitrary inputs should look like, just based on examples. You need to specify an algorithm that produces the correct output for all inputs. AI could be able to optimize that, but not fill in gaps in the definition of the goal.
While titillating to think about, what you're describing is more than 20 years away for AI. The best it could do is that for generalized problems, to actually write new original code and complex code is not within the bounds of current AI capabilities.
Not sure why this was flagged dead. If you look at many stackoverflow answers, around threading, many explicitly rely on the GIL, quoting source/documentation “proving” that it’s safe, avoiding the use of threading.Lock() and the like.
As an early python programmer, I copy pasted these types of answers. My old threaded code absolutely has these bugs. I’ve even seen code in production that has these bugs, because they’re sometimes dumb performance enhancements, rather than bugs, unless you happen to use a Python interpreter without a GIL, especially one that that doesn’t exist yet.
Great care would have to be taken to make sure the GIL was not disabled by default, for anything an existing thread touches (or some super, dynamic aware, smarts to know if it can be disabled).
That's why gilectomy carries an unreasonable single-threaded performance cost: many operations now need to take a lock where before they relied on the GIL.
Why do you consider code that relies on the GIL to be buggy? Isn't the GIL a documented, stable part of Python? (Hence why it will probably never be removed).
> I’ve even seen code in production that has these bugs, because they’re sometimes dumb performance enhancements, rather than bugs, unless you happen to use a Python interpreter without a GIL, especially one that that doesn’t exist yet.
“Interpreter resources” are just python primitives, from the user perspective. And, from that perspective, you can sometimes get away without using user managed locks, by relying on the GIL, in your objects. For example, you can trivially use a list as a multi threaded queue, using `.append()` and `.pop()`.
> Like jython, and pypy and other implementations, it never caught on because compatibility with cpython is one of the most important criteria for people dealing with lots of python code.
I think this is more an issue of popular packages were developed and tested against cpython and there wasn't enough effort available to port/test them against anything else. There's no special magic in cpython that Python programmers love, they just want their code to run. If they've got a numpy dependency (IIRC it doesn't support pypy but I'm not going to look it up so I may be corrected on that point this is a long parenthetical) they can't use an interpreter that doesn't support it. Even if it worked but had bugs it didn't have in cpython, they're still going to use cpython. Most people aren't writing Python for its super duper fast performance so they're fine leaving a little performance on the table by using the interpreter that their dependencies support. Whatever that is.
Sure, I didn't claim it did. My point is that Python programmers don't tend to have a particular fondness for an interpreter. They tend to care only about the ecosystem. If you came up with a cxpython interpreter that was faster than cpython and supported all the modules the same way (including C interop) Python programmers would jump over to it. If your cxpython was faster than cpython but didn't support everything they'd ignore it.
Case in point: Python 2.7. While 3.x offered a lot of improvements it took years for some popular modules to support it. No one bothered to look twice at Python 3 until their dependencies supported it.
Python programmers don't tend to care much about the interpreter so much as the code they wrote or use running correctly.
>the python core team burned the community for 10 years with the 2-3 switch, and a GIL change would be likely as impactfu
A core team led by him, which also had the opportunity to make much more impactful changes during 3, including removing the GIL, since they were going to mess up compatibility anyway, but didn't.
All that mess (and resulting split and multi-year slowdown in Py3 adoption) just to put in the utf-8 change and some trivial changes. It's only after 3.5 or so that 3 became interesting.
I believe it's a lot easier to state this long after the migration took place. Removing the GIL, or any host of other changes, could have split the community to a place where the language would have been abandoned except amongst a small minority.
At this point, removing the GIL is such a large change, it would probably be better to wait for the end of (practical) performance to single-threaded code, and then remove the GIL. At that point you would have a community consensus behind the change, and the holdouts wouldn't have a stranglehold on keeping the community from moving forward.
I'm not sure if I would call the move from 2 -> 3 a trivial change, but maybe you're more well versed in Python than I am (and I'm fully willing to admit that may be the case).
Edit: To be clear, it specifically made it difficult to support 2 and 3 at the same time, which was a problem for libraries. And without the libraries supporting 3, app code wouldn't be able to migrate either.
What people wanted was features to help with the migration.
Yes, that of course having those features would have helped.
But doing that required experience the Python developers didn't have when they were doing 3.0!
The Python developers thought people could do a one-off syntactic code translation (2to3), perhaps even at install time, rather than what most people did - write to the common subset of 2 and 3, with helpers like the 'six' package.
What are the "more substantial changes" you propose? The walrus operator? Matching? Other things that Python 3 eventually gained, and which took years in some cases to develop?
Or are you proposing something that would have it more difficult to write to the common subset?
That subset compatibility necessity extends to the Python/C API. Get rid of global state and you'll need to replace things like:
with something that passes in the current execution state. Make that too hard, and you inhibit migration of existing extension modules, which further inhibits the migration.
>The Python developers thought people could do a one-off syntactic code translation (2to3), perhaps even at install time
Perhaps they overstimated how much of the change 2to3 would handle, but they surely didn't think that, because they already knew that they put in all kinds of not automatable via "syntactic code translation" changes - where context is needed (e.g. the str changes).
>What are the "more substantial changes" you propose? The walrus operator? Matching? Other things that Python 3 eventually gained, and which took years in some cases to develop?
No GIL, more optimization - a wasted opportunity for a big speed bump, dropping legacy stuff, typing support, an improved C extension model, JIT, specialization, green threads, and so on. Things like the walrus operator would be at the very bottom...
They could have gone for "compatibility" to ease the migration (or make it a non-issue, just run backwards compatibly) sure.
But since they decided to go ahead and break things, and bump the version name, a change the community has been discussing since the mid-90s as some grand vision of big changes, they could have done more substantial stuff at least.
At it stands, they broke compatibility and cost the community 5-6 years for nothing much. Everything big came later in 3.x. And not because it couldn't just as well be added piecemeal to some 2.8 and onwards...
The recommended development model for a project that needs to
support Python 2.6 and 3.0 simultaneously is as follows:
0. You should have excellent unit tests with close to full coverage.
1. Port your project to Python 2.6.
2. Turn on the Py3k warnings mode.
3. Test and edit until no warnings remain.
4. Use the 2to3 tool to convert this source code to 3.0 syntax.
Do not manually edit the output!
5. Test the converted source code under 3.0.
6. If problems are found, make corrections to the 2.6 version of
the source code and go back to step 3.
When it’s time to release, release separate 2.6 and 3.0 tarballs (or
whatever archive form you use for releases).
They really did not consider having a usable subset that could work on Python 2 and Python 3. The same PEP says:
There is no requirement that Python 2.6 code will run unmodified on
Python 3.0. Not even a subset. (Of course there will be a tiny subset,
but it will be missing major functionality.)
Remember, it wasn't even until Python 3.3 that they restored the u'unicode' syntax "provided solely to reduce the number of purely mechanical changes in migrating to Python 3, making it easier for developers to focus on the more significant semantic changes (such as the stricter default separation of binary and text data)." https://docs.python.org/3.3/whatsnew/3.3.html
And it wasn't until Python 3.5 that they supported old-style %-formatting, like (b"%d" % n), both because it's useful in wire protocols, and "to help ease migration from, and/or have a single code base with, Python 2" https://peps.python.org/pep-0461/ .
> No GIL ... and so on
The features you listed required years of development, and some, like the JIT support, are still in-progress, while the no-gil proposal seems beyond what the steering committee is willing to accept.
It sounds like you want them to have delivered Python 3.12, with all these features (many of them iterated on over several releases), instead of 3.0, and without any migration path from 2.x Python or Python/C extensions beyond wholesale rewriting.
There is no way they would have manged to timely deliver a stable release following your suggestion. As it was, it took about a decade to deliver a version that had a effective migration pathway. Python 3.5 is the first version I supported, in no small part because it supported bytes % interpolation.
> And not because it couldn't just as well be added piecemeal to some 2.8 and onwards...
Perhaps it could have been done in an abstract and technical sense. But the Python core developers decided they didn't have the resources (money, people, etc) for that path.
I don't see how some of the changes, like exception chaining, could have been implemented without breaking backwards compatibility. Even things as simple as evaluating (a<b) or "Hello".encode("base64") changed at a pretty deep level.
Remember, Python 3 was always meant as "a relatively mild improvement on Python 2, [because] we can gain a lot by not attempting to reimplement the language from scratch." https://peps.python.org/pep-0461/ That was how the effort was justified.
The Perl6/Raku experience was definitely part of the zeitgeist in the discussion against larger changes like you think were needed.
What are your counter-examples that might sooth the concerns should a cusp like this appear in the future? I can't think of any good ones.
> and cost the community 5-6 years for nothing much. Everything big came later in 3.x.
The core developers argued that the technical debt in the code base was high, and while it would take years, those changes would enable the later big improvements you now see.
You seem to look at the big improvements and somehow believe they could have been done 15 years ago, on the old code base, with the available knowledge and resources of the people then.
Raku failed to timely deliver a stable release. And didn't stick to sensible new features, but tried to make an uber-language with everything plus the kitchen sink, and even a multi-language vm.
Agreed. It would have pushed the 2->3 migration from "very painful for the ecosystem" to a full on perl5->6 break between the two versions. Not sure it would have survived that.
It's not like Python adoption and use was significantly boosted by Python3: it was and would have been just fine without it.
The problem with languages written by language designers/implementers is that they often don't know when to stop. Wirth is a shining example here: Pascal -> Modula(n) -> Oberon. You need to move on, and leave the old alone.
I wouldn't say that GIL will never be removed, but I believe the GIL cannot be removed without breaking a lot of existing code.
That means there could be another drama with migrations if that would be done.
I think the most likely way they can effectively eliminate GIL would be to provide a compile option that would basically say "this enables paralellization, but your code is bo longer allowed to do A, B, C (there probably would be a lot more things)"
People who want to get it would then adapt their code, and there could be pressure for other packages to make them work in that mode.
> Histortically, the main value of GIL removal proposals and implementations has been to spur the core team to speed up single core codes.
While I'm all for increasing single threaded performance[side rant] is this really a good argument? My understanding is that Moore's Law only currently exists because of the existence of more cores. Clock frequency has stagnated since 2004[0]. I mean feature size (Moore's Law) still exists[1] but IPS/core seems significantly flatter than IPS[2]. Aren't most of our performance gains from mutlicore? And as we push more in this direction (e.g. GPU taking over compute) doesn't this make this even worse for python? (Yes I understand that you can make C/CUDA calls but doesn't the GIL make problems here as well as prevent a native python solution?)
[side rant] Still frustrated by things like that difference in speeds between math.sqrt(2.) (~0.002s/10k), np.sqrt(2.) (~0.010/10k), torch.sqrt(torch.tensor([2.])) (~0.039/10k). I know there's more going on in capabilities, but it is of course frustrating by the large differences.
That difference (at least in the math->numpy case) is entirely down to conversions (or at least it was when I profiled it ~7 years ago). If you're careful with types (i.e. avoiding converting from float to numpy scalar values), then the difference disappears.
> if any c extension is loaded which is incompatible with running lockless?
It'd be slightly magical to do. You'd probably have a giant RWLock, which is used for checking whether you're in the "lock free" mode. But at least almost all code could hit it only for read, and it could go away one day.
GIL is one of the things that make Python an annoyance to work with. In saner languages, you could handle multiple requests at the same time, or easily spin something off in a thread to work in the background. In Python you can't do this. You need to duplicate your process, then pay the price of memory usage and other things multiple processes hinder (like communication between threads or pre-computed values now aren't shared so you need something external again). To deploy your app, you end up with 10 different deploys because each of them have to have a different entry point and separate task to fulfill.
GIL is one of the things that make Python an annoyance to work with
For your particular usecase, yes. Personally I've been using Python for like 20 years for various tasks and so far never got really bothered by the presence of it once. Worst case was having to wait somewhat longer for things to complete. For my case: still worth it compared to making things multithreaded. And async fixed the rest. And the things which I actually need to be fast aren't usually in Python anyway. I'm not saying the GIL should stay, it's just that it doesn't seem as much of a problem in the general land of Python. Or in other words: how many Python users out there even know what GIL means and does?
The use case they are describing is a standard web server or web application. That's a pretty important and widely applicable use case to dismiss out of hand as "your particular usecase".
If that’s too limiting, preforking and other forms of process-based parallelism are a tried and true approach that has been used for years to run python, ruby, PHP, and once upon a time Perl web applications at enormous scale. The difference between threads and processes on Linux is relatively minor.
Saying that python doesn’t work for web application use cases because of the GIL is frankly sort of bizarre given the large number of python web applications in the wild chugging along delivering value.
> The GIL is not held during IO, which is what most web applications and web servers should be spending the vast majority of their time doing.
While this has been oft-repeated for years, more or less language-independently, I have become convinced it no longer accurately describes ruby on rails apps. People still say it about ruby/rails too though. But my rails web apps are spending 50-80% of their wall time in cpu, rather than blocking on IO. Depending on app and action. And whenever I ask around for people who have actual numbers, they are similar -- as long as they are projects with enough developer experience to avoid things like n+1 ORM problems.
I don't have experience with python, so I can't speak to python. but python and ruby are actually pretty similar languages, including with performance characteristics, and the GIL. Python projects tend to use more stuff that's in C, which would make more efficient use of CPU, so that could be a difference. (Also not unrelated to what we're talking about!)
But I have become very cautious of accepting "most web applications are spending the vast majority of their time on io blocking rather than CPU" as conventional wisdom without actually having any kind of numbers. vast majority? I would doubt it, but we need empirical numbers.
There’s a difference between spending most of your time in CPU, and being CPU limited. If I’m serving 10 requests/second then even if I’m 90% CPU and 10% IO, it doesn’t matter because I’m 99% idle.
My mental model for a very high-interpreter overhead language like Ruby or Python for webdev, is that it is appropriate for sites that don’t see that much absolute interactive traffic. e.g. it’s good for a blog where you can put a cache in front of it, or an intranet service where you control the number of users. I would never use Python for a fully interactive high traffic service sitting on the open internet. There are languages that are much better at that.
> If I’m serving 10 requests/second then even if I’m 90% CPU and 10% IO, it doesn’t matter because I’m 99% idle.
I get your point I think, but that depends on the capacity of the host you are on, and how much the total amount of CPU is, not just the proportion vs IO.
Rails may not be a good comparison to python, perhaps it is especially a CPU hog, but I have definitely seen rails apps for which 90% CPU and 10 rps would saturate the hosts CPU yeah.
I mean, the math is--- if you have a 111ms response time, 90% of that was spent on cpu (90% of 111 is 100), and you have 10 requests per second (100ms * 10 == 1 second) -- you are now saturating a single core cpu, right? Those are not crazy numbers.
But of course Rails has a GIL too -- so you can be "cpu limited" with spare CPU available on other cores, depending on how you've set things up -- that's the original conversation topic here, right? How the GIL may or may not complicate attempts to make efficient use of resources?
For what we're talking about, the issue I guess is whether they would be CPU limited with the GIL but not without the GIL. Which seems plausible.
(And of course Rails is very very commonly used "for a fully interactive high traffic service sitting on the open internet" -- so I don't see why python, a language with pretty similar relevant characteristics, couldn't or shouldn't be? Despite having little python experience, the issues with GIL are something very very similar in Ruby/Rails, is why I'm in the conversation)
It's easy to start getting confused about what we're talking about here, or be talking about different things at once.
The dismissiveness really goes the other way. Pythons like IronPython and Jython don't have a GIL. CPython does because it's primarily a glue language for extensions that might not be thread-safe. Web apps were given huge accommodation with async, so you can't say their needs are being dismissed. Why must we break the C in CPython for a use-case that could use one of the GIL-free Pythons?
That's somewhat out of context. With the bit you quoted I meant "sure working around the GIL by implemening a web server in that particular way is annoying". I'm not saying that "web server" as a whole is not important or not widely applicable, merely that amongst all other usecases and applications of Python out there, web servers are just one of many. And the particular implementation stated like "10 different deploys" is even a subset of that 'one' and as explained by fellow comments, probably not the most appropriate one.
It is, until the GIL bites you in the ass. As it is you get different behavior if your call is calling out to external code vs being pure python. Note that you really don't know if a random function call is python or wrapping external class so you really get random behavior.
The time it got me was a thread just to timeout another process. Tests worked great but the timeout didn't work in production because the call it was wrapping was calling out to C code so nothing would run until the call returned. We even still got the timeout error in the logs and it looked like it was working (it even tossed the now waited for valid results), but not at the time of the timeout but after the call finally returned a few hours later.
> you could handle multiple requests at the same time
To be fair to Python and the GIL, it's totally capable of parallelizing requests when most of the work is network-bound, which is probably the common case. And when the work is CPU-bound, but the CPU-intensive part is written in C, it's also possible for C to release the GIL. So it's really only "heavy computational work directly in Python" programs that are affected by this. (On the other hand, Python applications do naturally expand to look like this over time...)
> You need to duplicate your process, then pay the price of memory usage ...
I believe the implementation of the Python multiprocessing package uses fork() on *nix systems which means memory should be copy-on-write.
Processes do have the advantage of being self-contained meaning if one crashes, then it will not take down the entire system. Also, the message passing model can theoretically scale to processes running on separate hardware. Thread's cannot do that.
This is your opinion, but many disagree. I personally find processes infinitely easier to deal with than thread, it's not even a discussion in my mind. For me, threads are banned from usage, until it's absolutely necessary to do so. When I write code in C++ there are cases where a problem cannot be solved unless threads are used. It's fine to use threads there. For everything else I appeal to either multiprocessing or `async` concurrency.
The other languages are not saner. You are basically saying "Python GIL is annoying because I can't write parallel processing performant code in Python". Python has never been and is not a performant language. Its designed for rapid and easy development.
The multiprocessing+asyncio in Python fulfills the aspect of utilizing all the resources, albeit at a higher memory cost, but memory is dirt cheap these days. You have a master process and then worker threads. For all things that you would write in Python, where in >90% cases you are network latency limited, the paradigm of a master process and worker processes with IPC on unix sockets works extremely well. Set up a web app with fast api/gunicorn master/uvicorn workers, and it will be plenty fast enough for anything you do.
If you want to reach peak performance single-threaded app with no locks is the way to go, and work being sharded (not shared) among multiple single-threaded apps.
Multi-threaded apps with shared state introduce more complexity, than the performance when compared to multiple single-threaded apps running asyncio event loop.
It's relatively simple to make the GIL go away: just compile to some VM that has a good concurrent garbage collector would be one approach. Yes, this will break some assumptions here and there, but not too difficult to overcome especially if you bump the version number to Python 4.
However, that leaves a lot of C code that you can't talk to anymore because the C code requires the old Python FFI. I think this is where the main problem lies.
>It's relatively simple to make the GIL go away: just compile to some VM that has a good concurrent garbage collector would be one approach. Yes, this will break some assumptions here and there, but not too difficult to overcome especially if you bump the version number to Python 4.
"It's easy to lower the air-conditioning costs of Las Vegas: just move the town to New England".
The problem is "how to remove the GIL" in abstract. It's how to remove the GIL, not impact extensions at all (or as little as possible), keep single threaded performance, and have zero impact to user programs.
> It's relatively simple to make the GIL go away: just compile to some VM that has a good concurrent garbage collector would be one approach
Sure, if you don't mind paying a 50-90% performance impact on single threaded performance or completely abandon C-API compatibility and have C extensions start from scratch then there are simple approaches.
If you look at any example in the past to remove the GIL you would see that keeping these two requirements of not having terrible single threaded performance and not having almost a completely new C-API is actually very complex and takes a lot of expertise to implement.
This might be a dumb question, but why would removing the GIL break FFI? Is it just that existing no-GIL implementations/proposals have discarded/ignored it, or is there a fundamental requirement, e.g. C programs unavoidably interact directly with the GIL? (In which case, couldn't a "legacy FFI" wrapper be created?) I know that the C-API is only stable between minor releases [0] compiled in the same manner [1], so it's not like the ecosystem is dependent upon it never changing.
I cannot seem to find much discussion about this. I have found a no-GIL interpreter that works with numpy, scikit, etc. [2][3] so it doesn't seem to be a hard limit. (That said, it was not stated if that particular no-GIL implementation requires specially built versions of C-API libs or if it's a drop-in replacement.)
> C programs unavoidably interact directly with the GIL?
Bingo. They don't have to, but often the point of C extensions is performance, which usually means turning on parallelism. E.g. Numpy will release the GIL in order to use machine threads on compute-heavy tasks. I'm not worried about the big 5 (numpy, scipy, pandas, pytorch, and sklearn), they have enough support that they can react to a GILectomy. It's everyone else that touches the GIL but may not have the capacity or ability to update in a timely manner.
I don't think this is something which can be shimmed either or ABI-versioned either. It's deeeep and touches huge swaths of the cpython codebase.
> or is there a fundamental requirement, e.g. C programs unavoidably interact directly with the GIL?
Both C programs can use the GIL for thread safety and can make assumptions about the safety of interacting with a Python object.
Some of those assumptions are not real guarantees from the GIL but in practise are good enough, they would no longer be good enough in a no-GIL world.
> I know that the C-API is only stable between minor releases [0] compiled in the same manner [1], so it's not like the ecosystem is dependent upon it never changing.
There is a limited API tagged as abi3[1] which is unchanging and doesn't require recompiling and any attempt to remove the GIL so far would break that.
> so it's not like the ecosystem is dependent upon it never changing
But the wider C-API does not change much between major versions, it's not like the way you interact with the garbage collector completely changes causing you to rethink how you have to write concurrency. This allows the many projects which use Python's C-API to relatively quickly update to new major versions of Python.
> I have found a no-GIL interpreter that works with numpy, scikit, etc. [2][3] so it doesn't seem to be a hard limit.
The version of nogil Python you are linking is the product of years of work by an expert funded to work full time on this by Meta, the knowledge is sourcing many previous attempts to remove the GIL including the "gilectomy". Also you are linking to the old version based on Python 3.9, there is a new version based on Python 3.12[2]
This strays away from the points I was making, but with this specific attempt to remove the GIL if it is adopted it is unlikely to be switched over in a "big bang", e.g. Python 3.13 followed by Python 4.0 with no backwards compatibility on C extensions. The Python community does not want to repeat the mistakes of the Python 2 to 3 transition.
So far more likely is to try and find a way to have a bridge version that supports both styles of extensions. There is a lot of complexity in this though, including how to mark these in packaging, how to resolve dependencies between packages which do or do not support nogil, etc.
And even this attempt to remove the GIL is likely to make things slower in some applications, both in terms of real-world performance as some benchmarks such as MyPy show a nearly 50% slowdown and there may be even worse edge cases not discovered yet, and in terms of lost development as the Faster CPython project will unlikely be able to land a JIT in 3.13 or 3.14 as they plan right now.
>> However, that leaves a lot of C code that you can't talk to anymore because the C code requires the old Python FFI. I think this is where the main problem lies.
This is exactly the problem, but people have a hard time grasping this because most people interacting with Python have no understanding of how C code interacts with Python, or don't understand the C module ecosystem. I'm not sure if the Python community has a good accounting of this either because I don't recall seeing much quantitative analysis of how many modules would need to be updated etc.
This would help compare with the Python 2 to 3 conversion efforts. Even then, the site listing (shaming?) popular modules with compatibility made a mid-to-late appearance in the process of killing Python 2. Quantification of module updates is obvious thing to have from the get-go for anyone looking to follow through on removing the GIL, but it's not a fun task.
This needs more thinking but how about a hybrid approach, where you have Thread objects, and GILFreeThread objects?
The Thread objects work with old code, but run more slowly.
The GILFreeThread objects are fast.
If an object is passed from a Thread to a GILFreeThread or the other way around, then special safety code is attached to the object so that manipulating the object from the other side doesn't cause issues.
The advantage is that now the module implementers have time to migrate from the old system to the new system. And users can work with both the old modules and "converted" modules in the same system, with minor changes.
That sounds like a maintenance and stability nightmare, if it's even possible. You are effectively red/blue splitting the entire codebase. PyObject and the GIL touch everything in the codebase.
> But the red/blue splitting happens behind the scenes, so it's different.
Respectfully, I don't believe you have spent any appreciable time looking at the CPython source code. If you had, you would understand how unreasonable this expectation is. I don't say this to tear you down, I say this to convey the magnitude of what you are describing. It would involve touching tens of thousands of LoC. You are talking about a multi-million dollar project that would result in a ton of near-duplication of code.
The red/blue is inescapable because you have to redefine PyObject to have two flavors, PyObject with GIL and GilFreePyObject. You now have to check which one you are dealing with constantly.
> You now have to check which one you are dealing with constantly.
No, because if you're running inside a Thread you will know that you will see only PyObjects, whereas if you're running inside a GilFreeThread you will know that you will only see GilFreePyObjects.
If you're manipulating the PyObject (necessarily from a Thread) then there will be behind-the-scenes translation code that will manipulate the corresponding GilFreePyObject for you. But you don't have to know about it.
What exactly does "running inside a Thread/GilFreeThread" in the context of the cpython runtime mean? You pretty much need an entire copy of the virtual machine code.
These are C structs we are talking about here, not some Rust trait you can readily parameterize over abstractly. That either means lots of manual code duplication, or some gnarly preprocessor action. Both are a maintenance nightmare.
Yes, the assumption is that writing a "double-headed Python" runtime is far less work than converting the entire ecosystem to a new Python runtime.
I think this is the correct view, because at this moment people are writing various approaches in an attempt at getting rid of the GIL. It's the ecosystem of modules that's the real problem, where you want to basically put in as little effort as possible per module, at least initially.
This sounds a bit like COM and its apartment-threaded vs. free-threaded objects. The "special safety code" in that case is a proxy object that sends messages to the thread that owns the actual object when its methods are invoked.
~40 of the programs I am responsible for are single threaded. They were relatively quick to develop and were made by Electrical Engineers rather than career/degreed programmers.
2 programs use multithreading, I had to do that. The learning curve was not a huge deal, but the development time adds at least hours. In my case days(due to testing).
I imagine its too hard to have an optional flag at the start of each program that can let the user decide?
The problem is that Python types are not thread safe so you have to jump through more hoops to have safe parallelization in Python. These changes would make it so that writing multithreaded code will be much easier it seems like.
It's not like CPython is depriving anyone of GIL-free Python. IronPython and Jython are GIL-free. You can have it yesterday.
CPython maintains Python's original purpose as a glue language. You know how much worse CPython would be for that use-case without the GIL.
GVR has said he'll support removing the GIL as long as there's no performance hit. Otherwise, it's simply asking for a fundamental change of direction and too much sacrifice from everyone who needs CPython specifically, just to improve the productivity of people who could use a different Python.
Not for Python, I feel. In sheer volume, the vast majority of my Python programs are single-threaded. I want my programs to be very quick in runtime when they run.
Those that are multi-threaded are seeing minor to medium load.
If expecting extreme load (like Twitter scale), then Python is usually not the answer (rather go to a statically typed language like Java, Go, Rust etc).
Well, yes, that's pretty much obvious: the majority of python programmers do not care, those that cared have moved on as it is easier than changing the language.
And I'll be first to admit that improving single threaded performance has an higher priority than removing the GIL.
> Not for Python, I feel. In sheer volume, the vast majority of my Python programs are single-threaded.
Yes, they are single threaded, because using multiple threads brings very little benefit in most cases...
> If expecting extreme load (like Twitter scale), then Python is usually not the answer (rather go to a statically typed language like Java, Go, Rust etc).
So that means we shouldn't get any performance improvements, because there are faster languages out there?
Yes, with ML powered compilers recognizing what you are trying to do and generating the actual multithreaded code for you.
And it won't be multithreaded code like you know it, in the sense of os specific threading code with context switching and what not. It will be compiled compute graphs targeted at specific ML hardware, likely with static addressing.
My point of view is that anyone who wants to write multithreaded code, shouldn't be trusted to. Making it easier for people to justify this kind of footgun is a problem.
Also, no matter how much you wish it otherwise, retrofitting concurrency on an existing project guarantees that you'll wind up with subtle concurrency bugs. You might not encounter them often, and they're hard to spot, but they'll be there.
Furthermore existing libraries that expect to be single-threaded are now a potential source of concurrency bugs. And there is no particular reason to expect the authors of said libraries to have either the skills or interest to track those bugs down and fix them. Nor do I expect that multi-threaded enthusiasts who use those libraries in unwise ways will recognize the potential problems. Particularly not in a dynamic language like Python that doesn't have great tool support for tracking down such bugs in an automated way.
As a result if "no GIL" ever gets merged, I expect that the whole Python ecosystem will get much worse as well. But that's no skin off of my back - I've learned plenty of languages. I can simply learn one that hasn't (yet) made this category of mistake.
My deep interest is multithreaded code. For a software engineer working on business software, I'm not sure if they should be spending too much time debugging multithreaded bugs because they are operating at the wrong level of abstraction from my perspective for business operations.
I'm looking for an approach to writing concurrent code with parallelism that is elegant and easy to understand and hard to introduce bugs. This requires alternative programming approaches and in my perspective, alternative notations.
One such design uses monotonic state machines which can only move in one direction. I've designed a syntax and written a parser and very toy runtime for the notation.
And your approach can be built into a system that does multi-threading away from Python, thereby achieving parallelism without requiring that Python supports it as well.
That's basically what all machine learning code written in Python does. It calls out to libraries that can themselves parallelize, use the GPU, etc. And then gets the answer back. You get parallelism without any special Python support.
Just to add a bit of my opinion after reading your comment in the context of this thread and not to the merit of your idea. You are precisely the type of person I'd keep very very far away from multithreading in any business software project and also why I advocate the GIL to stay. If you want to do that, go solo in your own time, or try apply for a research position in some giant tech Co.
>My point of view is that anyone who wants to write multithreaded code, shouldn't be trusted to. Making it easier for people to justify this kind of footgun is a problem.
Out of curiosity, have you done any Rust programming and used Rayon?
It's hard to convey how easy and impactful multi-threading can be if properly enclosed in a safe abstraction.
Python is dynamic AF and Rust's whole shtick is compile-time safety. Python was built from the ground up to be dynamic and "easy", Rust was meticulously designed to be strict and use types to enforce constraints.
It's hard to convey how difficult it would be to retrofit python to be able to truly "enclose multithreading in a safe abstraction".
I have only read about and played a tiny bit with Rust. But as I noted at https://news.ycombinator.com/item?id=36342081, I see it as fundamentally different than the way people want to add multithreading to Python. People want to lock code in Python. But Rust locks data with its compile-time checked ownership model.
>My point of view is that anyone who wants to write multithreaded code, shouldn't be trusted to. Making it easier for people to justify this kind of footgun is a problem.
Did you know that Coverity actively REMOVED checks for concurrency bugs?
It turns out that when the programmer doesn't understand what the tool says, managers believe the programmer and throw out the tool. Coverity was finding itself in situations where they were finding real bugs, and being punished for it by losing the sale. So they removed the checks for those bugs.
I'll revisit my opinion of multithreaded code when things like that stop happening. In the meantime there are models of how to run code on multiple threads that work well enough with different primitives. See Erlang, Go, and Rust for three of them. Also, if you squint sideways, microservices. (Though most people set up microservices in a way that makes debugging problematic. Topic for another day.)
> As an example, for many years we gave up on checkers that flagged concurrency errors; while finding such errors was not too difficult, explaining them to many users was.
On 12 March 1988, "Never Gonna Give You Up" reached number one in the American Billboard Hot 100 chart after having been played by resident DJ, Larry Levan, at the Paradise Garage in 1987. The single topped the charts in 25 countries worldwide.
Actually it just meant that this is a tired old argument when C/C++ programmers were new to multithreaded code.
Now we have languages and language facilities (consider Rust, Haskell, and others) to make it much safer. Same with green threads and what Go and now Java does.
And yet, the actual proposals for how to do multithreading in Python are essentially the same as the old C/C++ approaches. And since that hasn't changed, there is no reason to expect better results.
This just posted on the Python forum is a brilliant rundown of the conflicting "Faster Python" and "No GIL" projects, and a proposal (plus call for funding) for a route forward.
I think everyone would agree that trying to combine both would be ideal!
It is from the person most involved in the faster-with-GIL effort, and its recommendation is to prioritize that effort in any case, and if the resources are available for that and no-gil, do both.
Not that I disagree with the recommendation, but one of the sides saying “as long as resources make us choose, choose my side” is...not really surprising.
Not surprising, but I'm very happy they are trying to find a route forward for both. That I commend.
I think from memory "Faster Python" is Microsoft funded and "No GIL" is funded by Facebook. If they can find a way to fund a combined effort that would be good.
I suspect the conflicting funding also adds to the general political difficulty around this.
> There is also the argument that it could "divide the community" as some C extensions may not be ported to the new ABI that the no GIL project will result in. However again I'm unconvinced by that, the Python community has been through worse (Python 3) and even asyncIO completely divides the community now.
I think the fact that you can name two other recent things which have divided the community is a solid argument for being a least a little gunshy about making big, breaking changes. There's the cost of the changes themselves, but there's also a cost to the language as a whole to add yet-another-upheaval.
Performance is important, but not breaking things is also important. I can understand the appeal of doing something suboptimal (but better than current) in favor of not introducing a bunch of harder to predict side effects, both in code and the community.
Do sub interpreters actually work with c extensions? I get the extension API has long supported it. However, I wonder if in practice extensions rely on process global state to stash information.
If so, sub interpreters invite all kinds of nasty bugs. Keep in mind that porting the most popular extensions is an easy exercise so the more interesting question is how this hidden majority of extensions fares.
The other thing I don't get is that the whole sub interpreters thing seems to totally break extension modules as well: https://github.com/PyO3/pyo3/issues/2274. In theory parts of sub-interpreters have been around for a while and it just happens that every extension module out there is incompatible with it because no one used it. But if it's going to become the recommended way to do parallelism going forward then they'll have to become compatible with it.
The serialization thing is also a huge issue. Half of the time I want to use multiprocessing I end up finding that the serialization of data is the bottleneck and have to somehow re-architect my code to minimize it.
I would much prefer a world in which asyncio is 2x faster and can benefit from real parallelism across threads. Libraries like anyio already make it super easy to work with async + threads. It would make Python a viable option for workloads where it currently just isn't.
In particular note how everything is declared static - ie only one instance of that data item will exist. If there are multiple interpreters then there needs to be one instance per sub-interpreter. That means no more static and initialisation has to be changed to attach these to the module object which is then attached to a specific interpreter instance. It also means every location you needed to access a previously static item (which often happens) has to change from a direct reference through new APIs chasing back from objects to get their owning module and then get the reference. That is the code churn the PyO3 issue is having to address. One bonus however is that you can then cleanly unload modules.
This may still not be sufficient. For example I wrap SQLite and it has some global items like a logging callback. If my module was loaded into two different sub interpreters and both registered the logging callback, only one would win. These kind of gnarly issues are hard to discover and diagnose.
Removing the GIL also won't magically help. I already release it at every possible opportunity. If it did go away, I would have to reintroduce a lock anyway to prevent concurrency in certain places. And there would have to be more locking around various Python data structures. For example if I am processing items in a list, I'd need a lock to prevent the list changing while processing. Currently the GIL handles that and ensures fewer bugs.
I've also experienced the serialization overhead with multiprocessing. I made a client's code so much faster that any form of Python concurrency was slower because of all the overhead. I had to rearchitect the code to work on batches of data items instead of the far more natural one at a time. That finally allowed a performance improvement with multiprocessing.
The GIL has been a blocker for many years. It's nice that the team is making progress of course. IMHO it's one of those bandaids they need to rip off.
I was listening to the interview with Chris Lattner with Lex Friedman a week ago or so. Very interesting discussion on his project mojo which intends to build a new language that is backwards compatible and a drop in replacement for python with opt in strict typing, better support for native/primitive types where this makes sense, easier integrations with hardware optimizations, and of course no GIL. The idea would be that the migration path for existing code is that it should just work and then you optimize it and provide the compiler with type hints and other information so it can do a better job. Very ambitious roadmap and I'm curious to see if they'll be able to deliver.
The main goal seems to be to enable programmers to do the things you currently can't do in python because it's too slow in python without running into a brick wall in terms of performance.
I mostly work with JVM languages and a few other things but I occasionally do a bit with python as well. I've always liked it as a language but I'm by no means an expert in it. I recently spent a day building a simple geocoder and since I know about the GIL, I went straight for the multi processing library and did not bother with threads. IMHO there's absolutely no point in attempting to use threads with python with the GIL in place. I needed to geocode a few hundred thousand things in a reasonable time frame, so all I wanted to do was use a few different processes concurrently so I could cut down the runtime to something reasonable.
Python is ok for single threaded stuff but you run into a brickwall doing anything with multiple processes or threads and juggling state. In the end I just gave up and wrote a bunch of logic that splits the input into files, processes the files with separate processes, waits for that to finish, and then combines the output files. Just a lot of silly boiler plate and abusing the file system for sharing state. It does what it needs to but it feels a bit primitive and backwards and I'm not proud of the solution.
Removing the GIL, adding some structured concurrency, and maybe some other features, would make python a lot more capable for data processing. And since there are a lot of people already use python for that sort of thing, I don't think that would be such a bad thing. Data science and data processing are the core use case for python. I don't think people actually care a lot about the raw python performance. It's never been that great to begin with. If it's performance critical, it's mostly being done via native libraries already.
> Data science and data processing are the core use case for python
indeed. one would almost hope that all the different aspects of "performance" and "concurency", their memory, disk or network profile etc get their own dedicated labels. The conflation of these distinct dimensions is a major source of confusion (and thus a waste of bandwidth).
Mostly IO bound. In this case the actual limit was the API rate limiting of the geocoder I was using. A couple of thousand calls per minute. Quite a bit more than what you can do with a single thread but not quite what a decent laptop would be able to do.
Python has blocking IO. So a network call blocks the process. So if you have 250ms response times, you are doing 4 requests per second. Without the GIL, threads would be a good way to scale that. With 10 threads you should be able to do 2400 requests per minute. But with the GIL forget about that. With co-routines and non blocking IO, it could all be single threaded. There are some ways to do that with python of course but then you are going to have to use some specialized frameworks and step away from the standard library a bit.
Using the shell vs. the multi processing module is the same difference. I've done both with python. I've a slight preference for the multi processing module so I don't have to deal with bash weirdness on top of all the python boiler plate. The first time I did stuff like that with python was probably 13 years ago or so around 2010. Not a lot has changed or progressed on this front since then in python. The GIL made scaling this unnecessarily hard then and it still does. Threads are a no go because of it, the standard library mostly offers blocking IO, and when you go with multiprocessing things like shared memory are very limited so you end up using files or databases for state.
Things like node.js, go, or kotlin would handle this type of work load with a lot less fuss. With Kotlin, I'd be using co-routines and some multi threaded scope to launch them in. Or I could build on some of the Java internals. Or a mix of both. I'd be able to write similar code and choose between blocking or non blocking IO. I'd be forking and joining things. Maybe use channels and flows to pass data around and rely on back pressure to keep things progressing at a more or less optimal rate. Not an option for this project as my client just is more python focused and that's fine. But just signaling that python is a bit out of its depth here. Just singling out kotlin here because I just have spend a lot of time with it. If you have a hammer everything seems a nail.
I think python could be so much better but that starts with modernizing its internals. Mojo seems like a huge step in the right direction.
Single threaded performance is still more important than multi-threaded. Most applications are single threaded, and single threaded programs are much easier to write and debug. Removing the GIL from python will not change that.
If no-GIL has a 10% single thread performance hit, that means that essentially all my existing python code would be that much worse.
Maybe that's just my bubble, but I see much more python in data science projects than in web servers. And in (python) data science even your file reading/writing code quickly gets CPU bound.
That's because you are doing it wrong. You'll need to split every step of your data science pipeline into a microservice, then put it in the cloud for resilience. Then the application will be so fast that it is no longer CPU bound but I/O bound.
That's going to be CPU-bound in numpy's C extensions rather than Python itself, one would hope. The worst of all worlds is that we get a 10% perf cut to python execution and numpy breaks because the C API is ripped up.
>If no-GIL has a 10% single thread performance hit, that means that essentially all my existing python code would be that much worse.
So? Especially since the "Faster Python" team already made Python 1.11 "10–60% Faster than 3.10", and 1.12 is even faster still, whereas their overall plan is to get it to 2-5 times faster compared to 3.9.
So at the worst case, with a 10% hit, you'd balance out the 3.11 speed, and your code would be as fast as 3.10.
>But your software would still run 10% slower than it needs to
There's no absolute objective "needs to" or even any static baseline. Python can have, and often has had, a performance regression that drop your code by 10% at any time. It's no big deal in itself.
Also consider a further speedup of e.g. 50% in upcoming versions (they have promised more).
If you're OK with the X speed of today's Python, you should be ok with X + 40% - even if it's not the X + 50% it could have been due to the 10% GIL's removal toll.
Because it's a global solution to a local problem.
With threads, I can encapsulate the use of threads in a class, whose clients never even notice that threads are in use. Sure, threads are a global resource too, but much of the time you can get away with pretending that they're not and create them on demand. Not so with multiprocess. If you use that, then the whole program has to be onboard with it.
Threads work great in Python. Well not for maximising multicore performance, of course, but for other things, for structuring programs they're great. Just shuttle work items and results back and forth using queue.Queue, and you're golden - Python threads are super reliable. And if the threads are doing mostly GIL-releasing stuff, then even multicore performance can be good.
>Not so with multiprocess. If you use that, then the whole program has to be onboard with it
Huh? In Python you just need a function to call, and multiprocess will run it wrapped in a process from the pool, while api-wise it would look as it would if it was a threadpool (but with no sharing in the process case, obviously).
So what would the rest of the program be onboard with?
And all this could also be hidden inside some subpackage within your package, the rest of the program doesn't need to know anything, except to collect the results.
multiprocessing needs to run copies of your program that are sufficiently initialised that they can execute the function, yet no initialisation code must be run that should not be run multiple times.
That means you either use fork - which is a major can of worms for a reusable library to use.
Or you write something like this in your entry point module:
if __name__=='__main__':
multiprocessing.freeze_support()
once_only_application_code()
Suppose I don't realise that your library is using multiprocessing, and I carelessly call it from this two-line script:
>multiprocessing needs to run copies of your program that are sufficiently initialised that they can execute the function, yet no initialisation code must be run that should not be run multiple times.
Huh? As far as I remember multiprocessing just sends pickled versions of the function to run and any of its dependencies (other functions, closures, etc). As long as the function doesn't use global state that's not available when pickled, it's fine. But it doesn't re-initialize your whole program for each process in the pool.
>That means you either use fork - which is a major can of worms for a reusable library to use.
How did we get into a reusable library authoring?
Yes, multiprocessing is not just turnkey to use inside a reusable library you make.
But the context were programs here, or not?
>Or you write something like this in your entry point module:
Hmmm? This is to have it support freezing the script (that is, using a tool to make it a distributable, like PyInstaller). That's not necessarily a use case most have.
> How did we get into a reusable library authoring?
That was always my premise. Maybe I didn't make it clear enough, because I tend to just take it for granted that that's how you write code, in a style that's suited for reuse.
> But the context were programs here, or not?
That's the "global" I was talking about: Code that's using multiprocessing needs to know the context that it's embedded in. Any moment I might grab that piece of code and transfer it to a library of reusable components, because that's how I work - code that starts out as part of a standalone program doesn't necessarily stay that way. Multiprocessing gets in the way of that.
>That was always my premise. Maybe I didn't make it clear enough, because I tend to just take it for granted that that's how you write code, in a style that's suited for reuse.
That's somewhat condescending. You can write code "in a style that's suitable for reuse" without being a library author - well, without publishing public packages anyway. Re-use is not only about some totally generic package that can run under any arbitrary context willy nilly.
And of course there are tons of programs where the parts don't make sense as libraries, because they're tied to the specific functionality and overall design (whether because of the domain logic required or due to optimization or other constraints). You write them to be modular and clean, but not with "arbitrary people running my code in whatever context" in mind.
Not to mention the mountains of purpose-specific throwaway scripts, e.g. in the scientific community especially, where Python is big, there's little regard for reuse (even less so for library building), and it's not because multiprocess is stopping them :)
So, yeah, I'd say, even if not 100% suitable for generic reusable library-style code, it doesn't mean multiprocess can't be applied in a huge number of specific people's problems and codebases.
>Code that's using multiprocessing needs to know the context that it's embedded in.
If you want to speed up your Python program and there's something that can run in parallel with no shared state, you can use multiprocess to run it.
If having it as a "reusable component" that hides away the fact multiprocess is used, and that can be called in any arbitrary context, is your concern, it's a valid one, but then perhaps a specific Python program and its performance is not your main priority. Library writing is, instead :)
Else, it's enough that the user calling multiprocessing knows the function that is to be passed and its dependencies (or lack thereof). Other than that, they don't have to change their top level program's architecture.
I didn't mean to say that no one should ever use multiprocessing. I was laying out the reasons why I don't.
I'm really looking forward to subinterpreters. I think they have great potential for supporting a style of multiprocessing that is both faster and better isolated.
CPython is so hopelessly slow, I wouldn't care about 10%. For most of the stuff written in Python, users don't really care about speed.
The impact won't be on users / Python programmers who don't develop native extensions. It will suck for people who had a painful workaround for Pythons crappy parallelism already, but now will have to have two workarounds for different kinds of brokenness. It still pays off to make these native extensions, however their authors will create a lot of problems for the clueless majority of Python users, which will like end up in some magical "workarounds" for problems introduced by this change very few people understand. This will result in more cargo cult in the community that's already on the witch hunt.
Right, and single thread performance won’t matter as much if it becomes easier to implement multithreading. This hurts legacy code, but I imagine it would be worth it in the long run.
It would remain as hard as it has always been. Also threads are very heavy, locking kills performance, and if you don't have GIL, you'll need to manage explicit locks, that will be just as slow but also cause an incredible amount of subtle bugs.
> if you don't have GIL, you'll need to manage explicit locks
You need to do that with multithreaded Python code with the GIL. The GIL only guarantees that operations that take a single bytecode are thread-safe. But many common operations (including built-in operators, functions, and methods on built-in types) take more than one bytecode.
> locking kills performance, and if you don't have GIL, you'll need to manage explicit locks
I was under the impression that the Python thread scheduler is dependent on the host OS (rather than being intelligent enough to magically schedule away race conditions, deadlocks, etc.), so you still need to manage locks, semaphores, etc. if you write multi-threaded Python code. I don't see how removing the GIL would make this any worse. (Maybe make it slightly harder to debug, but at that point it would be in-line with debugging multi-threaded Java/C/etc. code.)
Or would this affect single-threaded code somehow?
In python you always have a lock, the GIL. If you take it away you end up actually having to do synchronization for real. Which is hard and error prone.
> If your program needs an improvement in speed, you can multithread it. The opposite isnt true.
What do you mean by "the opposite"? "If your program doesn't need an improvement in speed, you can't multithread it"? "If you can multithread your program, then it doesn't need an improvement in speed"? Well, yeah, obviously both of those statements are false but they're also quite useless, so who cares?
You can improve performance by moving to a single thread. Pinning work to a single core will improve cache performance, avoid overhead of flushing TLBs and other process specific kernel structures, and more.
> Their argument is that the "sub interpreters" they are adding (each with its own GIL) will fulfil the same use cases of multithreaded code without a GIL, but they still have the overhead of encoding and passing data in the same way you have to with sub processors.
This is smart, though, because (even if it's not great) there's a lot of evidence that it works in practice. Specifically, this is almost exactly what JavaScript does with workers. It's not a great API and it's cumbersome to write code for, but it got implemented successfully and people use it successfully (and it didn't slow down the whole web).
If we could just get efficient passing of objects graphs from one subinterpreter to another, which is not in the current plan, I think that would solve a lot of use cases. That would allow producer/consumer-style processing with multiple cores without the serialization overhead of the multiprocessing module.
Removing the GIL seems like it could make things more complicated in many ways by making lots of currently thread-safe code subtly unsafe, but I might be wrong about this. (...in which case it would just make things very slow because everything is synchronized?)
As someone who observed Python core development for many years, a major change to the interpreter REQUIRES core-dev buy in. There have been at least 5 big projects which proposed large changes, they have all been declined.
It is a NIH syndrome, if a big project doesn't originate in the dev team, it will not be accepted.
I have often wondered what the solution to the serialisation of objects between subinterpreters is.
If its garbage collection that's the problem, I think you could transfer ownership between threads, so the subinterpreter takes ownership of the object and all references to it in the source interpreter are voided.
Alternatively you can do something like Java and all objects are in a global object allocator, passing things between threads doesn't require interpretation, just a reference.
This feels like it's playing out as I expected. I followed Python, and the Python community, really closely from 2008-2016 when there were tons of relatively small scale experiments happening. This all happened organically to a large extent and there was no one coordinating a grand vision. It seems like we have a continuation of this giving rise to the concern that there is some battle.
I suspect there will be some butting of heads for a while before they work things out after seeing how the community reacts. All of this could be handled better with some thoughtful proactive engagement, but that's not really how things operate and there is no one to really enforce it.
Where is the progress? They claim "10-60%" improvements over Python 3.9 (I believe) and I don't notice much, partly because Python has always been sped up by C-extensions.
The price is an added complexity of the code base.
I truly don't understand why Python always gets a pass and people applaud every announcement, no matter how trivial or elusive it is.
The GIL effort is another matter. I'd rather have a simple interpreter with no-GIL than this mess of relatively small speedups.
But like other GvR efforts where he presided over a small group of people who did the work his efforts will of course go in. Like asyncio, the suboptimal "match" statement, the peg parser (which adds new workloads for other implementations because the de-facto DFL cannot be bothered with publishing an LALR grammar).
Python is a glue language, people can go elsewhere for speed.
I do hope the dialogue stays cordial, constructive and open rather than becoming distinct entrenched camps - the Python community has a strong and mature community spirit so this seems plausible and not too much wishful thinking.
Much as No GIL would be an adventure, I'm leaning towards the more gradual and stable changes from the FasterPython team and I can see that throwing No GIL into the mix adds complexity at an inopportune moment.
It’s overstating the case to call this a “struggle” between factions; it’s an important discussion with a lot of ramifications (and while it is unresolved a lot of work is stalled).
Oh, boy. Will any of that impact backward compatibility?
I don't develop anything in Python, but it is used by several applications of importance to me. The lack of compatibility between versions is a thing that bites me hard, and I tend to curse Python because of it.
>There is also the argument that it could "divide the community" as some C extensions may not be ported to the new ABI that the no GIL project will result in
I think the arguments are a red herring. It's more rationalizations for not wanting to do it.
I'm going to admit that what I really want to see is a strong push to standardize and fully incorporate package management and distribution into python's core. Despite the work done on it, it's still a mess as far as I can see, and there is no single source of truth (that I know of) on how to do it.
For that matter, pip can't even search for packages any more, and instead directs you to browse the pypi website in a browser. Whatever the technical reasons for that, its a user interface fail. Conda can do it!!!!! (as well as just about any package management system I've ever used)
> I'm going to admit that what I really want to see is a strong push to standardize and fully incorporate package management and distribution into python's core. Despite the work done on it, it's still a mess as far as I can see, and there is no single source of truth (that I know of) on how to do it.
Package management is standardized in a series of PEPs[1]. Some of those PEPs are living documents that have versions maintained under the PyPA packaging specifications[2].
The Python Packaging User Guide[3] is, for most things, the canonical reference for how to do package distribution in Python. It's also maintained by the PyPA.
(I happen to agree, even with all of this, that Python packaging is a bit of a mess. But it's a much better defined mess than it was even 5 years ago, and initiatives to bring packaging into the core need to address ~20 years of packaging debt.)
It depends (unfortunately) on what you mean by a Python project:
* If you mean a thing that's ultimately meant to be `pip` installable, then you should use `pyproject.toml` with PEP 518 standard metadata. That includes the dependencies for your project; the PyPUG linked above should have an example of that.
* If you mean a thing that's meant to be deployed with a bunch of Python dependencies, then `requirements.txt` is probably still your best bet.
I meant the second. requirements.txt is a really bad solution for that, and that is the frustration many of us have that have used languages with much better solutions.
Requirements feels like a dirty hack but it does work fine. It has ==, ~=, and >= for version numbers, as well as allowing you to flag dependencies for different target os, etc. And then you can add setup.py if you need custom steps. But yes, it feels dirty to maintain requirements.txt, requirements-dev.txt, etc.
Poetry is the most common solution that I've seen in the wild. You spec everything using pyproject.toml and then "poetry install" and it will manage a venv of its own. But you still need to tell people to "pip install poetry" as step 0 which is annoying.
If you don't care about deploying python files, and rather just the final product, I'd recommend either nuitka or pyinstaller. These are for bundling your project into an executable without a python runtime needed (--onefile type of options for single file output). Neither supports cross compilation though.
What flow do you use with requirements.txt that gives you reproducible builds across a team and environments? Using ==, ~=, and >= will not give you reproducible builds.
- use pip-compile (from the pip-tools package) to create a lockfile,
- commit the lockfile into git,
- whenever we want to update the dependencies, do it through pip-compile again (if you give it an existing lockfile as output, it will keep what's in there and change only what's required).
Since all our requirements are cross-platform and on PyPI, we can install the same env everywhere.
Hash-pinning with requirements.txt will get you the closest to this, but it's not possible in the general case to have a cross-environment reproducible build with Python. The closest you can hope for is a build that reproduces in the same environment.
This problem is shared by the majority of language packaging ecosystems; the only one I'm aware of that might avoid it is Java.
Rust and Go both have proper lock files... Both of which are good enough to satisfy Nix's requirements for reproducability such that they re-use the lock file hashes. This "it's hard, no one does it right" feels like a cope.
We get (very) close to cross-environment reproducible builds for Python with https://github.com/pantsbuild/pex (via Pants). For instance, we build Linux x86-64 artifacts that run on AWS Lambda, and can build them natively on ARM macOS. (As pointed out elsewhere, wheels are an important part of this.)
This is not raw requirements.txt, but isn’t too far off: Pants/PEX can consume one to produce a hash-pinned lock file.
How do you get a reproducible build in python for the same os/arch? As in, what concrete steps do you take?
This is very easy in nearly every other language that is popular. No one ever answers this clearly in threads like this short of saying “use poetry” which makes my point. I’ve asked many times.
I explicitly said that you can't. Python's packaging ecosystem wasn't designed with reproducibility in mind, and has never claimed to prioritize reproducibility. The best you can do is get close, and hash-pinning gets you pretty close.
I'm not aware of any other major language or language packaging ecosystem that makes reproducibility straightforward. Certainly not Ruby or NPM, and not even brand new ones like Rust's Crates. Java appears to be the closest[1], but is operating with significant advantages (distributing reproducible bytecode to all users, minimizing system dependencies, etc.).
Edit: In addition to hash-pinning, you can instruct `pip` to only install built distributions, i.e. wheels. If you do both hash-pinning and built distributions only, your package installation step _should_ reproduce exactly on machines of the same OS, architecture, and Python version. But again, this is guaranteed nowhere.
You are splitting pedantic hairs in order to avoid talking about the obvious. Python's dependency management is much worse than ruby's and nearly every other popular language.
In ruby you add dependencies to a Gemfile then ...
$ bundle install
$ git add Gemfile Gemfile.lock
and other members of your team can have the same build as you.
I think you're confusing lockfiles with reproducibility. Lockfiles are good, but they don't guarantee reproducibility: a locked (or pinned, hashed, etc.) dependency might always be the exact same source artifact, but it can install in different ways (e.g. due to local toolchain differences, different versions of dependencies, sensitivity to timestamps, sensitivity to user-controlled environment variables, etc.).
Reproducibility is a much harder problem than dependency locking, and (again) I'm not aware of any language level packaging ecosystem that really supports it out of the box.
Python doesn't have reproducible builds, but it does have lockfiles (via hashed and pinned requirements). They're not particularly good (for all the reasons mentioned upthread), but they do indeed exist. If you use them as I've said, then your environment will be approximately as repeatable as with any other language packaging ecosystem (and arguably more so in some cases, since wheel installs are reproducible where gem installs aren't).
> Binary wheels are just archives, I don't see differences in the way they install between different systems.
The subtlety here is in which binary wheel is selected: a particular (host, arch, libc) tuple may cause `pip` to select a more specific wheel for the same version of the package, or even an entirely different wheel. This makes wheels themselves reproducible between systems, but it also means that which wheel isn't guaranteed.
Thanks, is there any kind of tutorial on how you should use this in a python project?
Edit: I think this is one of the biggest problems someone coming to python has. Python advocates say some version of, "you can roughly do that" but there isn't a clear explanation of how to do it.
Edit 2: I see that the official docs have a Pipenv flow outlined. Is Pipenv the way people do this in python these days?
Those docs say to use Pipenv, or am I looking at the wrong docs? Really not sure why python people can’t articulate a clear flow to follow. It’s all riddles.
> * If you mean a thing that's meant to be deployed with a bunch of Python dependencies, then `requirements.txt` is probably still your best bet.
This is exactly how we got in this mess. Using ``setup.cfg`` or ``pyproject.toml`` for all projects makes this easy as now your deployable project can be installed via pip like every other one.
The terminology here is confusing: the first is the flow that produces a "distribution" (i.e., an sdist or bdist), while the second is the flow that produces an "environment" (i.e., a specific set of packages installed in some prefix).
It's beyond "mess" well into "fiasco" and frankly I'm astounded people think there's a more important issue facing the language right now. Look, for an example of a high-prestige project, at Spleeter, which spends multiple pages of its wiki describing how to install it with Conda and then summarizes "Note: as of 2021 we no longer recommend using Conda to install Spleeter" and nothing else.
What are you smoking? The readme for spleeter clearly shows the two simple commands needed to install -- one being a conda install for system level dependencies and one being pip for the spleeter python package itself.
Python’s dependency management, or lack there of, its import system, and the lack of strong typing really make me hate it. It’s the first language I really felt adept with, but once I learned Go, I never looked back. Every time I have to use python it’s like coding with crayons.
Python isn't perfect, and mostly unsuitable for any system where performance is in consideration.
That leaves everything else including personal utility scripts and packages I use each day to automate random stuff. And I hugely appreciate how fast and simple it is to develop in python, unlike certain languages that literally depends on IDEs due to the verbosity and unnecessary cognitive load.
> And I hugely appreciate how fast and simple it is to develop in python
Indeed it’s crazy. A few pip installs and I had a multiprocessing pandas (dask) with a web gui, and a workflow system (also with a web gui), and a pipeline to convert csv to parquet in like 20 lines of code
I gave a talk about this at the Packaging Summit during Pycon which was well received, so the team is definitely aware of the problem.
However, the sense I got was that it was going to be a lot of work to “fix Python packaging” which wasn't feasible with an all-volunteer group.
At work, we're migrating away from pip as a distribution mechanism for this reason; I don't expect to see meaningful improvements to the developer experience anytime soon.
This is especially true because pip today is roughly where npm was in 2015, so there's a lot of fundamental infrastructure work (including security) that still needs to happen.
An example of this is that PyPI just got the ability to namespace packages.
That article is about the packaging summit talk on introducing namespaces, not about organizations. In fact, when talking about organizations, it explicitly says:
> But support for namespaces is not part of the new feature.
Not the parent but pipenv is decent, poetry is even better:
- clear separation of dev and production dependencies
- lock file with the current version of all dependencies for reproducible builds (this is slightly difference than the dependency specification)
- no accidental global installs because you forgot to activate a virtual environment
- (not sure if supported by pip) allows installing libraries directly from a git repo, which is very useful if you have internal libraries
- easier updates
There it is. The obligatory comment on every Python thread on HN. It's most popular programming language in the world. Other people can figure it out, apparently.
I think we can be more charitable than this: it's possible to be both immensely popular and to have a sub-par packaging experience that users put up with. That's where Python is.
The trouble is people compare it to greenfield languages of the past few years with nowhere near the scope, userbase or legacy of Python. Long time Python users like me don't have any of the problems that the non-Python users that always post these comments have. It would be nice to have improvements to packaging, sure, but it's always just completely non-constructive stuff like "it's not as easy as <brand new language with no legacy>".
As someone who dealt with Java and Python 20yrs back, I don't think Java is a valid comparison.
Java had a terrible or non existent OS integration story - it didn't even try to have OS native stuff. It was it's own separate island that worked best when you stayed on the island. On Linux, Python was included in the OS so you had the two worlds of distro packaging and application development/deployment dependencies already in conflict. Macs also shipped their own Python that you had to avoid messing up. And on Windows Python was also trying to support the standard download a setup.exe method for library distribution. Java only ever had the developer dependency usecase to think about.
Before Maven most Java apps just manually vendored all their dependencies into their codebase, or you manually wrangled assembling stuff in place using application specific classpaths and additions to path env vars etc.
Yeah, but you were refuting someone pointing out why that was with the difference between a later green field system in a smaller problem space that could learn from earlier systems vs one with a lot of extra use cases pulling in different directions, complications and legacy to overcome. The competing 3rd party Python projects for packaging early on were learning a lot of lessons the hard way that both left behind legacy to clean up and paved the way for other languages to skip ahead all that usually with a single blessed solution.
Of course a well resourced language that didn't have to worry about native OS/distro integration and only started solving dependency distribution and management later after learning from others is going to have a better system. It would be a total surprise if it didn't.
I agree with all of this! Ironically, grievances around Python packaging are a function of Python’s overwhelming success; non-constructive complaints about the packaging experience reflect otherwise reasonable assumptions about how painless it should be, given how nice everything else is.
(This doesn’t make them constructive, but it does make them understandable.)
Packaging is a big topic right now, and a lot is happening - that includes a lot of good tool improvements. I think that's one reason for these comments, because it's close to top of mind
I love python. It is my go-to language for just about everything. But that also means that I feel the pain points pretty acutely. And you know what, I'm not alone.
Every single HN thread on python performance, ever:
Person with limited to zero experience with CPython internals> I hate the GIL, why don't they just remove it?
That just is doing an incredible amount of heavy lifting. It'd be like saying, "Why doesn't the USA just switch entirely to the metric system?" It's a huge ask, and after being burned by the 2/3 transition, the python team is loathe to rock to boat too much again.
The GIL is deep, arguably one of the deepest abstractions in the codebase, up there with PyObject itself. Think about having to redefine the String implementation of a codebase in your language of choice.
Whatever your monday-morning quarterback idea on how to pull out the GIL is, I can almost guarantee, someone has thought about it, probably implemented it, and determined it will have one or more side effects:
- Reduced single-thread performance
- Breaking ABI/FFI compatibility
- Not breaking ABI immediately, but massively introducing the risk of hard to track concurrency bugs that didn't exist before, because GIL
- Creating a maintenance nightmare by adding tons of complexity, switches, conditionals, etc to the codebase
The team has already decided those tradeoffs are not really justifiable.
The GILectomy project has probably come the closest, but it impacts single-thread performance by introducing a mutex (there are some reference type tricks [1] to mitigate the hit, but it still hurts run time 10-50%), and necessitates any extension libraries update their code in a strongly API-breaking way.
It's possible that over time, there are improvements which simultaneously improve performance or maintainability, while also lessening the pain of a future GILectomy, e.g. anything which reduces the reliance on checking reference counts.
[1] PEP 683 is probably a prerequisite for any future GILectomy attempts, which looks like it has been accepted, which is great https://peps.python.org/pep-0683/
Removing the GIL will also make some currently-correct threaded code into incorrect code, since the GIL gives some convenient default safety that I keep seeing people accidentally rely on without knowing it exists.
Unavoidable performance costs and reduced safety equals massive friction no matter how it's approached. It's not a question of "are we smart enough to make this better in every way", since the answer is "no, and nobody can be, because it can't be done". It's only "will we make this divisive change".
To think about it, perhaps US should have started metrification back in 1980 when EU enforced it for its members. It takes time, but is doable - if took Ireland 25 years to fully switch to metric.
The cost of switching increases over time and will increase further and further, making it less and less likely. Would have been way cheaper back then.
There were already patches to "just" remove the GIL for Python 2.0 or thereabouts, over 20 years ago. Strictly technically speaking it's not even that hard, but as you mentioned it comes with all sorts of trade-offs.
Today Python is one of the world's most popular programming language, if not the most popular language. The GIL is limiting and a trade-off in itself of course – designing any programming language is an exercise in trade-offs. But clearly Python has gotten quite far with the set of trade-offs it has chosen: you need to be careful to "just" radically change the set of trade-offs that have proven themselves to be quite successful.
You should check out the new nogil project by Sam Gross, which is what's being talked about these days -- he actually successfully removed the gil, but yes with the tradeoffs that you mention. The other projects were, by comparison, "attempts" to remove the gil, and didn't address core issues such as ownership races (which are far harder than making refcount operations atomic).
I indirectly referenced it. IIRC the main problems with his approach are the ABI problem and the mutex overhead. He tries to "solve" the performance hit by offsetting with performance gains elsewhere, but that raises the question "why not take the gains by themselves".
> That just is doing an incredible amount of heavy lifting. It'd be like saying, "Why doesn't the USA just switch entirely to the metric system?" It's a huge ask, and after being burned by the 2/3 transition, the python team is loathe to rock to boat too much again.
> The GIL is deep, arguably one of the deepest abstractions in the codebase, up there with PyObject itself. Think about having to redefine the String implementation of a codebase in your language of choice.
I am somewhat familiar with the cpython code base, and talk to some folks involves in some of thousand new python runtime.
The problem is not that deep, and cpython is not that much different from other projects : the GIL is an implementation details that leaked through the API and now we have a bunch of people are relying on it. The question is what do we do about it. The trade-off in such situation are well known and just a question to pick the right one.
> The team has already decided those tradeoffs are not really justifiable.
That's from my view the crust of the issue, the TLDR is that GIL-less python is not a priority for the python team, so they view any trade-off/comprise as "not justifiable". Especially something when it come to added complexity to the code base or what not...
> Person with limited to zero experience with CPython internals> I hate the GIL, why don't they just remove it?
People have this reaction because having a single-threaded interpreter in 2023 is just ... embarrassing, what ever the reasons behind it.
> It only takes a couple of minutes to educate yourself on this topic.
You can disagree without assuming that i am not familiar with the subject.
The link you provided is from 2007 so i am not sure how fair it is to quote guido from then. But i am not sure the link demonstrate the point you are trying to make, from guido himself :
> While it is my personal opinion, based upon the above considerations, that there isn't enough value in removing the GIL to warrant the effort.
> From a steering council perspective we effectively view a 703 threading enabled interpreter at a high level as a fork of the CPython VM.
"Fork" and "effort" are different problems. It's reductive to say that effort alone is the tldr.
If the 2007 article and the linked current FAQ seem too old to be relevant, you're not really understanding what the steering committee is saying today.
> "Fork" and "effort" are different problems. It's reductive to say that effort alone is the tldr.
It's very unclear what you mean here, what does that have to do with the problem.
Pep 703 is not a fork of CPython.
Sam had (very highly impressive) protype of a gil-less cpython. It clearly demonstrate the feasibility and viability of nogil. It can be a drop in replacement, with the disabled/enable-gil option.
Pep 703 is about integrating it into the main cpython codebase...
> If the 2007 article and the linked current FAQ seem too old to be relevant
I never said that the age of the article make irrelevant . What i said is that a lot of things change in 18 years, and it might not represent the current thinking of the person today. So i was trying to charitable to guido perspective. But in the article (from 2007) that linked it's clear that from his perspective the time/effort investment is not worth the pay out.
> you're not really understanding what the steering committee is saying today.
> It only takes a couple of minutes to educate yourself on this topic
You keep insinuating that my current view on the situation is rooted in some kind of ignorance or some kind of misunderstanding. But you still haven't really made clear and cogent arguments for your point of view...
GVR called it a fork two weeks ago in his justification for not accepting it [1]. If that surprises you, the backstory is summarized pretty well in the article [2] and FAQ [3].
I'm really not trying to be secretive. We can debate whatever you want, but we have to at least acknowledge CPython's stated position.
> but we have to at least acknowledge CPython's stated position.
We know the council and cpython stated position. None of the arguments in [1],[2] or [3] are new...
The point is we disagree with it...
You seem to imply that because GVR and/or the council call it a fork, then it is a fork... Again, we understand the position, we just think is not a valid point...
> We can debate whatever you want
You haven't debated anything yet... you just repeat/link what member of the council says and somehow call that an argument... This is very confusing
It was never my intention to dispute the council decision (choice really). One can disagree with a decision without needing to dispute it.
Concerning the justifications, here again i am a bit confused, because all references you are giving, point to the amount of effort as being one (if not the biggest) concerns... which the what i am saying.
But to take a step back, my main point was about the OP who described the situation as some cancer-cure level of complexity, comparing it to moving 300 million people from the imperial system to the metric system... : And i am saying that the situation here is not that complicated all the technical problem s can be address "at the same time" , but at the price a very large dev. effort and increase in complexity of the code base. The council doesn't think that gil-less is worth that amount of trouble.
703 is probably the right path towards true gilless python, but to quote Greg Smith (gpshead) on the PR,
> We're not ready to simply pronounce on 703 as it has a HUGE blast radius.
703 isn't even out of draft PEP status. I think they need to come to consensus on 703, or modify it until it is acceptable, and then figure out how to merge the code and release it.
I don't know for sure, but it looks like even with the compile nogil flag off, there are enough low-level API changes that extensions would break. If they can do it without breaking extensions with the flag off, I can see this happening over the next few years. But it's no small feat.
I don't think anybody is saying to accept PEP 703 (and i don't want to make this conversation just about PEP 703, there was many other promising gil-less proposals) tomorrow.
The main complain of the author of PEP (and other people as well) is the lack of clarity on the criteria for inclusion, as well as the constant goal post moving.
> it looks like even with the compile nogil flag off, there are enough low-level API changes that extensions would break.
From my understand, after testing from the author (and couple of other people on the discord thread) no case have been found yet. Greg's point is that the blast radium is so large that there is high probability that this might happen. We won't know until we test in the wild... And we won't test in the wild until the council arrives at a conclusion.
The burden of proof here is so absurdly high, and has to be paid be Sam alone here that this will essentially kill this proposal.
> the lack of clarity on the criteria for inclusion, as well as the constant goal post moving
This is where I think Sam is being unfair. He's asking for a breaking change. For 25 years, CPython has consistently and clearly said that they fully support no-gil in other implementations, and have a minimum criteria of no performance regressions for CPython. He made his performance demonstration before getting real buy-in. I understand he made that choice because of how his funding works, and now he's desperate to get a result. They acknowledged his demo and said they're considering it. Lashing out and accusing them of unclear criteria and moving goalposts misrepresents the situation.
It's not his decision to risk burning the community on a breaking change. He knows they haven't decided on it. It's reasonable to expect someone to do that work. He hasn't done it, so they will have to do it.
The steering committee also has to consider how this change will impact pursuing other goals. It doesn't exist in a vacuum all by itself.
They reiterated that they do in fact want to remove the gil, even after noting how much work it will be to do due diligence. They've also said they want to take in big projects like creating a JIT. Seems to me like effort itself isn't an issue. By calling it a fork, coming from the steering committee, they're saying either the transition plan is non-existent, or it's the wrong design for their goals, or it's coming at a bad time. Maybe all of them.
His task was to convince the committee to remove the gil. He interpreted that as simply doing a technical demonstration. Not sure why, especially given the history.
Technically yes, the data structure can be abstracted over and removed, in that sense, it isn't "deep". But the reason I say it's deep is because of the deep implications and assumptions of the GIL being there. For decades, the codebase (and extensions) has grown with the expectation that the GIL enforces a certain behavior.
> That's from my view the crust of the issue, the TLDR is that GIL-less python is not a priority for the python team, so they view any trade-off/comprise as "not justifiable". Especially something when it come to added complexity to the code base or what not...
Precisely. Because it's such a hard problem. If someone had a proposal to actually remove the GIL, limit the single core performance hit to say <10%, and not break extensions, I have no doubt the steering team would go for it. It would be huge PR for python, simply because of the Gil's bad rap.
Turns out that multi-interpreters give you most of the benefit of multithreading with nowhere near as much complexity.
> People have this reaction because having a single-threaded interpreter in 2023 is just ... embarrassing, what ever the reasons behind it.
What other dynamic languages invented over 3 decade ago have natively multithreaded interpreters? Oh javascript, the (fundamentally simpler on a technical level and uses different gc techniques) language which has gotten millions of dollars in development by the biggest tech companies because it runs on nearly every browser and thus has huge incentives to optimize? If python ran in every browser, the Gil would be gone by now.
Java? Again, huge amount of backing, 15 years older than python.
Perl, ruby, lua, all use refcount and have single-threaded interpreters. Objective C might fit the bill for multithreaded refcount gc (I don't know, I'm asking gpt). Nim does, but it's rather modern, (2008). Swift is statically typed, which helps, and much newer (2014).
I guess that leaves Erlang, which again uses different gc techniques.
I guess the takeaway is refcount is probably not the best design choice for writing a modern dynamic language if you value multithreading.
> But the reason I say it's deep is because of the deep implications and assumptions of the GIL being there. For decades, the codebase (and extensions) has grown with the expectation that the GIL enforces a certain behavior.
As i mentioned... The idea of keeping backward compact. of an implementation details which leaked into the API is kinda common. Some of the current proposal are about making the gil optional, keeping the compact for those cases...
> Precisely. Because it's such a hard problem. If someone had a proposal to actually remove the GIL, limit the single core performance hit to say <10%, and not break extensions, I have no doubt the steering team would go for it. It would be huge PR for python, simply because of the Gil's bad rap.
I think this might be were we disagree the most. One doesn't sit around, put a collection of constraints into the ether and expect a solution to magically appear... Obviously every one wants a perfect solution to technical problems, but those don't really exists.
The true measure of the willingness of the council to accept gil is about the compromises they are willing to make to accept external solutions or the effort put into developing their own solutions.
Even if the council just had put some effort into developing a suits of test cases (both performance and correctness), or try to pull some statistical information on how many C extensions rely on the gil for correct behavior, how hard would a transition for those be? can we leverage runtime tooling like valgrind and/or llvm (thread|memory)sanitizer to detect some and ease the transition.
But i haven't seen any real effort, just the same list of constraints we had the last 10 years.
> Turns out that multi-interpreters give you most of the benefit of multithreading with nowhere near as much complexity.
I am only tangentially familiar with the sub interpreter proposal, but what i understand, i disagree. This the classic sharing memory vs message passing trade-off...
Also not to forget that this does might still break C-extension which sometime relies on static variable (like the comment from the author or mysql extension mentioned).
But i don't want to make this about this or that specific proposal. if the council prefer sub interpreter approach to nogil, they should say so and don't have people waste their time.
> What other dynamic languages invented over 3 decade ago have natively multithreaded interpreters? Oh javascript, the (fundamentally simpler on a technical level and uses different gc techniques) language which has gotten millions of dollars in development by the biggest tech companies because it runs on nearly every browser and thus has huge incentives to optimize? If python ran in every browser, the Gil would be gone by now.
> Java? Again, huge amount of backing, 15 years older than python.
> Perl, ruby, lua, all use refcount and have single-threaded interpreters. Objective C might fit the bill for multithreaded refcount gc (I don't know, I'm asking gpt). Nim does, but it's rather modern, (2008). Swift is statically typed, which helps, and much newer (2014).
I don't want to sound to harsh, but you go to war with the army you have, not the army you want. The question is not what the optimal solution would look like if we had optimal conditions (deeper pockets, gc vs refc, etc... etc...).
Python is what python is, the python organization have the resources that they have. No doubt we could design a better solution with more money or a with a language with better semantic, but don't have neither and that shouldn't prevent the council of acting on the best solution we have right now...
Can someone give a good argument of why subinterpreters are an interesting or useful solution to concurrency in Python? It seems like all the drawbacks of multiprocessing (high memory use, slow serialized object copying) with little benefit and higher complexity for the user.
The nogil effort seems like such a better solution, that even if it breaks the C interface, subinterpreters aren't worth considering.
> - they can move objects much faster between subinterpreters because they don’t need to serialize through a format like JSON
Why do you think that you would need to serialize to JSON? Pipes and sockets can deal with binary data just fine. With shared memory, there wouldn't be any difference at all.
> - since they are all in the same process you can implement things like atomics or channels easily.
This is also possible with shared memory.
AFAICT the advantage of subinterpreters over subprocesses are:
- lower memory overhead
- faster creation/destruction time
- ability to share global data (with subprocesses the data would either need to be duplicated or live in shared memory)
Sure, you could do something like that. But a shared memory segment python with a stable binary object format doesn't exist (and isn't even being worked on). Comparing the proposed PEP 554 solution to a non-existent theoretical solution isn't very useful.
But you do bring up some good points for ways you could achieve similar goals without the need to make the interpreters thread safe.
I don't know if it uses shared memory, or rather sockets or pipes, but this is just an implementation detail.
My point is that there is no fundamental difference between isolated interpreters and processes when it comes to data sharing. Either way, you need a (binary) serialization format and some thread/process-safe queue.
I would have naively assumed that you could repurpose multiprocessing.Queue for passing data between multiple interpreters; you would just need to replace the underlying communication mechanism (sockets, pipes, shared memory, whatever it is) with a queue + mutex. But then again, I'm not familiar with the actual code base. If there are any complications that I didn't take into acccount, I would be curious to hear about them.
Interestingly, the PEP authors currently don't propose an API for exchanging data and instead suggest using raw pipes:
> they can move objects much faster between subinterpreters because they don’t need to serialize through a format like JSON
That would be a huge advantage, but it's not there yet. According to PEP 554 [1] the only mechanism for sharing data is sending bytes through OS pipes, which is exactly the same as for multiprocessing and requires the same sort of serialization.
In a message passing system there’s always going to need to be some form of serialization. I’ll wager that pickle is fast and flexible enough for most cases and for those that aren’t, using something like flatbuffers or capn proto in shared memory wouldn’t be too much of a lift to integrate.
Although all of that has long been possible in a multiple-process architecture, so I’m also curious to know if there are any real advantages to subinterpreters. From this message [1] linked to from the PEP it sounds like the author once thought that object sharing was a possibility, but if it’s not there seem to be no real benefits over multiprocessing and one big downside (the GIL).
Contrast with ruby’s Ractor system [2], which is similar to the subinterpreter concept but allows true parallelism within a single process by giving each ractor its own interpreter lock, along with a system for marking an object as immutable so it can be shared among ractors.
In my experience, yes, pickle is painfully slow and can't be used for anything real. It's fantastic for prototyping (especially https://mpi4py.readthedocs.io/en/stable/reference/mpi4py.MPI...) but it will be your bottleneck. But I work in more computational science so I understand my constraints are different than most folks.
Hmm. Apparently version 5 of the pickle protocol (which is faster) isn't yet used by the multiprocessing module [1] even though it was mentioned as part of the rationale in PEP 574 [2].
> - they don’t negatively impact single threaded (basically all existing) python code performance
I think this deserves an extra callout because even your multi-threaded Python programs are effectively single threaded and benefiting from the performance gain.
- They absolutely do have to serialize, usually via pickle. I'm pretty sure objects are not sharable between subinterpreters and there is not a plan for that. The main reason people think subinterpreters are good ("you can just share the memory!") is not actually true.
- They don't require any changes to the C interface because those changes were already made, and a fair amount of cost was paid by C library maintainers. So it's true, subinterpreters are at an advantage in this regard, but that's more of a political question than a technical one
Eric Snow mentioned in his Pycon talk that memory sharing would be used, especially big data blobs, arrays etc. Sure, not directly sending python objects, but passing pointers can be done.
Because they don't share memory. The C interpreters share memory, so they could have data races, but the python code can't. Just like how the C interpreter can have memory unsafely but python can't (or shouldn't).
Your 'application' that uses them can absolutely have data races (you also have to fight the GC on every interpreter not just one). I think what they mean is that a local object within an interpreter without a shared (replicated) state does not have data races. Once you start message passing of course this is no longer true.
Which isn't really different than a similar pattern of isolation in threading or multiprocessing.
Easy interop with C is one of the core selling points of Python. It's not just about performance - it's about Python being able to be a glue language that can interface with any legacy or system library written in C or being accessible via FFI. A LOT of systems exist that take advantage of this, and changes to the C interface impact all of these. Unless the porting story is thought out extraordinary well, it would be another disaster like Python 2->3. Since this affects C code, it won't be anything but tricky.
From reading the nogil repo and related PEP, C ABI breaking does not seem to be the worst problem. An updated Cython, Swig, etc seems like it would be enough in most cases to get something running. More extreme cases might only have ~dozen LoC changes to replace the GIL Ensure/Release patterns.
The hidden really hard problem is that extension modules may have been written relying on GIL behavior for thread safety... these may be undocumented and unmaintained.
Even so I hope the community decides it is worth it. A glue language with actual MT support would be much more useful.
Indeed, thanks for clarifying that. I actually had the extensions in mind when I wrote my comment, but got somehow distracted by the C interop aspect. Extensions indeed have a more intimate relation with the interpreter and (directly or indirectly) rely on the GIL and its associated semantics.
Sub interpreters are still faster than processes. With them you could do continuation message passing which other modern languages, such as go, use for multi-threading.
Also, for cases such as web apps or data science training, you don't need to share memory between threads, and a sub interpreter uses a lot less resources than a full python process.
> Even if it breaks the c interface
Than most of your python packages wouldn't work. A python that isn't backwards compatible? Ya, that has been tried once before, and was a disaster.
If you want a non-backwards compatible gil-less python, it already exists. You can find versions of nogil online.
I suspect sub-interpreters are a punt and a feint.
My guess is that there will likely be exactly 2 sub-interpreters in most Python code. One which talks to an old C API with a GIL and one which talks to a new C API without a GIL.
It's going to be a lot easier to manage handing objects between two Python sub-interpreters than to manage handing objects between two incompatible ABIs.
I noticed that java with a connected debugger can be very fast even with breakpoints, but stepping over can be very slow. Which is a big weird, since "step over" is basically just putting the breakpoint on the next line.
Well, yes, but it quickly gets as painful as debugging raw assembly with almost no symbol informations. You might get lucky from time to time and get some debug info available for some variables, but in hot loops and highly inlined code, you're back to raw assembly. Doable for sure, but 100x slower to figure out anything compared to non-optimized code.
Building a whole new interpreter and accompanying compiler tiers is a lot of work, still in 2023. Many different projects have tried to make this easier, provide reusable components, to offer toolkits, or another VM to build on top of. But it seems that none of these really apply, we're still at the "every language implementation is a special snowflake" territory. That's partly the case in Python just because the codebase has accumulated, rather nucleated, around a core C interpreter from 30 years ago.
IMHO the Python community has intentionally torpedoed all competing implementations of Python besides CPython to its own detriment. They seem to constantly make the task of making a compatible VM harder and harder, on purpose. The task they face now is basically building a new, sophisticated VM inside a very much non-sophisticated VM with a ton of legacy baggage. A massive task.
>> In this case, multiple interpreter support provide a novel concurrency model focused on isolated threads of execution. Furthermore, they provide an opportunity for changes in CPython that will allow simultaneous use of multiple CPU cores (currently prevented by the GIL–see PEP 684).
This whole thing seems more like something developer wants to do (hey, it's novel!) than something users want. To the extent they do want it, removing the GIL is probably preferable IMHO. That a global lock is a sacred cow in 2023 seems strange to me.
Maybe I'm misunderstanding, but I don't want an API to manage interpreters any more than I want one for managing my CPU cores. Just get me unhindered threads ;-)
Most typical application developers don't want, but that doesn't mean it doesn't have very specific important use cases. For example, an application might embed python in memory and need isolated contexts. Or an http server needs to have multiple threads of concurrent work with minimal overhead
The beauty of the GIL was that it made writing good python libraries easy, from the utility perspective.
Python is such an amazing multi-tool because of its rich library.
Personally I'd like to see Python continue as the amazing multitool that it is has always been.
Go is the natural concurrency sibling for Python. High performance parts of Python applications should be linked to Go binaries, or simply be written entirely in Go.
Ideally Go linkage would be highly integrated into Python. There is a huge place in the world for GIL constrained Python. Enabling Python to push further into high performance environments is just going to make a problem worse.
Speaking of Python, as a beginner I have tried to grasp Classes and Objects and watched countless YouTube videos and Reddit/StackOverflow comments to understand them better but I'm just banging my head against the wall and it's not sticking. The whole __init__ method, the concept of self and why it's used, instance and class variables, arguments, all of it. When learning something, when you simply cannot grasp a concept however hard you try, what's the course of action? I have tried taking breaks but that's not helping.
Others have said something similar, but I will still chime in.
It sounds like you may want to feel like you have a complete understanding of classes, objects, and the way the work before you can begin working with them. This can almost never work. I haven't come across a topic where just reading about the topic is sufficient to fully understand it or even become proficient in it.
In my experience this is true for cooking, lab techniques in chemistry, electronics, and programming. Even when I have read something and felt that I understood it completely, as soon as I begin the activity I immediately realize that I had a fundamental misunderstanding of what I had read. That my brain had made some oversimplification or skipped passed some details so that I felt like I grasped the concept.
So if I had to describe the way I learn a new concept it would look like this:
1. Read many different descriptions/watch many different videos of the topic to get used to the terms and concepts
2. Try to apply those concepts in real life (write software, build a circuit, cook a meal)
3. Figure out where my understanding fell short, and go back to step 1
You want to make this loop as tight as possible. When you become expert at something you can do this activity super fast. When you first start out you may have to do quite a bit of reading to even have the baseline needed to attempt a concrete task. However, it is important to get started towards a real goal as soon as possible or you will be wasting your time feeling like you are learning and moving towards you goal, but you are not.
Either keep looking for explanations until it clicks, or try to break it down into more fundamental elements.
Classes and blueprints have a simple analogy that I'm sure you are familiar with.
Blueprints are instructions for how to build something, like a house. Once built, the house is a physical thing you can interact with. It has attributes, such as a height or color. It has things it can do or that can be done to it, like open a door or turn on some lights.
Classes are blueprints - they tell a computer how to build something. An object is what is built by the class - it has attributes and methods.
Classes and objects exist in all object oriented program languages. Perhaps stepping back from Python and trying some more generic material might help?
I think working with it in practice is the only way forward. Looking up info, build something, work more, and let it become a feedback loop. People work long with systems before they "get" them.
Programming is not a "school" activity for me. It's a craft, you do stuff. That has eventually led to a depth of knowledge in various topics inside programming.
(Digression: With that said - we should think of a lot more "school" stuff as skills and crafts - stuff that you don't read to understand but you get better at with practice. A lot of maths class is skills, not knowledge.)
> When learning something, when you simply cannot grasp a concept however hard you try, what's the course of action?
Learn by doing, don't learn by studying.
Pick a (hopefully real) problem, solve it using classes. Repeat. Continue repeating. You will either learn what OO is good for, what it is not good for and how you can use it to write better code. Or, learn that programming is not something you can grok.
If you don't have a "real" problem; Then write program to play tic-tac-toe; manage the and display board, take player inputs, detect when game is done (winner or draw).
Then expand it to 8x8 board, then to 3 players. Those changes should be easy with good OO design (lots rewrites, code changes). And probably "harder" with bad OO or no OO design.
btw this is 2nd interview question I and my team used for years.
I was the same way when I was a beginner, I didn't really "get" python classes until years later and using other programming languages. I recommend SICP if you want a more first principles understanding: https://web.mit.edu/6.001/6.037/sicp.pdf
I disagree on the school part. There are so many good resources available especially for Python programming. There's no way you need to pay for lessons. I don't think the kind of people you get teaching Python will be the most knowledgeable anyway.
But I would maybe recommend a good book if you're struggling.
Community college is still a thing, not expensive, and won’t make the mistake of teaching specifically Python. It also forces a schedule which is helpful in its own right.
What you just proposed is exactly what isn’t working for this person.
OK. It might help to keep in mind a couple things:
- Python started as a simple language, but it's grown over the decades, and professional programmers learned the new complications along the way. They don't naturally appreciate what it's like if you get hit with them all together, as sometimes happens -- it all seems simple to them. So to understand OO I would try to screen out the fancier bits like the class variables you mentioned -- you can come back to them later.
- The original central idea of objects (from Smalltalk) is, an object is a thing that can receive different commands ('methods' in Python), and update its own variables and call on other objects. The way Python gives you to define objects (by defining a class and then creating an instance of the class) is not the most direct possible way it could've been designed to do this -- if it feels more complicated than necessary, it kind of is. But it's not too bad, you can get used to how it works for the most central stuff as I mentioned, and learn more from there.
They must've explored this, but couldn't one make a GIL emulator/wrapper for legacy extensions so that they can still work in GIL-less Python while they are (hopefully) updated to work with the new synchronization primitives?
Just one question: Do you realy want Python to be held back by the GIL 10 years from now? If not, when do you want to start the change? If so, what do you think a CPU will look like in 10 years, and how would a GIL Python facilitate getting value from it?
How would this (or any of the recent changes) affect popular web frameworks like Django / Flask / FastAPI? Would this increase performance in serialization or speed in general?
I'm actually rooting for RustPython to reach a level of maturity that we'd just be able to ship apis and stuff with it.... https://github.com/RustPython/RustPython
question: If RustPython did indeed reach maturity, are we going to need to choose which python interpreter to install on PC? or like on certain python version we will switch to the new interpreter?
CPython developers are still the leaders and those controlling the important Python infrastructure. You have to ask them. It's quite likely RustPython has no influence whatsoever on the question, and also quite likely that CPython is not going to ever throw out its existing work in favour of a different implementation.
FWIW, parallel implementations already exist, and maybe RustPython can join them.
Many projects start out in Python b/c often new libs are python-first. Many of those run into performance issues and eventually determine that Python will never be fast b/c of the GIL.
I think it's very magnanimous of the python team, by not removing the GIL, to give Go, Java and C++ a chance.
However, there is also a bit of a struggle going on between them and the project to remove the GIL (global interpreter lock) from CPython. There is going to be a performance impact on single threaded code if the "no GIL" project is merged, something in the region of 10%. It seems that the faster Python devs are pushing back against that as it impacts their own goals. Their argument is that the "sub interpreters" they are adding (each with its own GIL) will fulfil the same use cases of multithreaded code without a GIL, but they still have the overhead of encoding and passing data in the same way you have to with sub processors.
There is also the argument that it could "divide the community" as some C extensions may not be ported to the new ABI that the no GIL project will result in. However again I'm unconvinced by that, the Python community has been through worse (Python 3) and even asyncIO completely divides the community now.
It's somewhat unfortunate that this internal battle is happening, both projects are incredible and will push the language forward.
Once the GIL has been removed it opens us all sorts of interesting opportunities for new concurrency APIs that could enable making concurrent code much easer to write.
My observation is that the Faster Python team are better placed politicly, they have GVR on the team, whereas No GIL is being proposed by an "outsider". It just smells a little of NIH syndrome.