I think it's really cool that Haoran Xu and Fredrik Kjolstad's copy-and-patch technique[0] is catching on, I remember discovering it through Xu's blog posts about his LuaJIT remake project[1][2], where he intends to apply these techniques to Lua (and I probably found those through a post here). I was just blown away by how they "recycled" all these battle-tested techniques and technologies, and used it to synthesize something novel. I'm not a compiler writer but it felt really clever to me.
I highly recommend the blog posts if you're into learning how languages are implemented, by the way. They're incredible deep dives, but he uses the details-element to keep the metaphorical descents into Mariana Trench optional so it doesn't get too overwhelming.
I even had the privilege of congratulating him the 1000th star of the GH repo[3], where he reassured me and others that he's still working on it despite the long pause after the last blog post, and that this mainly has to do with behind-the-scenes rewrites that make no sense to publish in part.
Copy and patch is a variant of QEMU's original "dyngen" backend by Fabrice Bellard[1][2], with more help from the compiler to avoid the maintainability issues that ultimately led QEMU to use a custom code generator.
They were called "template-based" JIT and copy-and-patch approach is not new in this regard. The novel idea of copy-and-patch JIT was an automatic code generation via relocatable objects. (By the way, QEMU is indeed cited as an inspiration for copy-and-patch JIT.)
While I'm sure some people theorize that Fabrice Bellard is actually a pseudonym of a collective of 10x programmers, he is as far as I know just one person.
Which itself is how I understand the compilation step of quaject code in the synthesis kernel to have worked. Ie. do constant propagation and dead code elimination on the input format, and use a simple template driven backend to dump out native machine code.
In fact I wouldn't be surprised if the earliest compilers were template based, as that's about the only implementation that would fit in a RAM as a compiler pass.
M. Anton Ertl and David Gregg. 2004. Retargeting JIT Compilers by using C-Compiler Generated Executable Code. In Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques (PACT '04). IEEE Computer Society, USA, 41–50.
While bears a significant resemblance, Ertl and Gregg's approach is not automatic and every additional architecture requires a significant understanding of the target architecture---including an ability to ensure that fully relocable code can be generated and extracted. In comparison, the copy-and-patch approach can be thought as a simple dynamic linker, and objects generated by unmodified C compilers are far more predictable and need much less architecture-specific information for linking.
Does Ertl and Gregg's approach have any "upsides" over copy-and-patch? Or is it a case of just missing those one or two insights (or technologies) that make the whole thing a lot simpler to implement?
I think so, but I can't say this any more confident until I get an actual copy of their paper (I used other review papers to get the main idea instead).
The copy-and-patch also assumes the compiler will generate patchable code. For example, on some architecture, have a zero operand might have a smaller or different opcode compared to a more general operand. Same issue for relative jumps or offset ranges. It seems the main difference is that the patch approach also patches jumps to absolute addresses instead of requiring instruction-counter relative code.
Context: I've been on a concatenative language binge recently, and his work on Forth is awesome. In my defense he doesn't seem to list this paper among his publications[0]. Will give this paper a read, thanks for linking it! :)
If they missed the boat on getting credit for their contributions then at least the approach finally starts to catch on I guess?
(I wonder if he got the idea from his work on optimizing Forth somehow?)
Thanks a lot!! I'm something of a beginner language developer and I've been collecting papers, articles, blog posts, anything that provides accessible, high level description of these optimization techniques.
Reminds me of David K who is local to me in Florida, or was, last I spoke to him. He has been a Finite State Machine advocate for ages, and its a well known concept, but you'd be surprised how useful they can be. He pushes it for front-end a lot, and even implemented a Tic Tac Toe sample using it.
Regardless of the work being done in PyPy, Jython, GraalPy and IronPython, having a JIT in CPython seems to be the only way beyond "C/C++/Fortran libs are Python" mindset.
Looking forward to its evolution, from 3.13 onwards.
The only way to achieve C/C++/Fortran efficiency is a statically compiled, strongly typed language. Witness the effort put into Java JITC and the rest of the modern Java (and Graal) runtime. Still well short of the promised “C equivalence”.
To me, Mojo looks like the best approach to fusing that with the Python ecosystem! (I have no doubt about it being open sourced at some point.)
JIT and static typing are not mutually exclusive or necessarily opposed. For example, MKL added JIT for small matrix multiplications quite a while ago.
I think people just reach for JIT more often in dynamic languages because they carry more information around and have more of a performance deficit that they want to mitigate.
I dunno. I mostly program in Fortran but JIT seems way cool. Fundamentally I don’t see why a JITer couldn’t beat my code in a very dynamic language, it would just need to find a big loop that calls a kernel a bunch of times, where the exact computation in the kernel is determined at run-time, and then jit the kernel and the loop together.
I meant to reply a while back, I hope you run across this.
You should definitely look into Julia. It’s a beautiful language, and squarely aimed at the Fortran space. It also relies heavily on JITC and GC. That’s fine for purely scientific computing, but not so good as a general purpose language.
That’s where C, C++, Rust, and Mojo are the current major contenders, IMO.
I love Python and use it for everything other than web development.
One reason is performance. So if Python has a faster future ahead of it: Hurray!
The other reason is that the Python ecosystem moved away from stateless requests like CGI or mod_php use and now is completely set on long running processes.
Does this still mean you have to restart your local web application after any change you made to it? I heard that some developers automate that, so that everytime they save a file, the web application is restarted. That seems pretty expensive in terms of resource consumption. And complex as you would have to run some kind of watcher process which handles watching your files and restarting the application?
The restart isn't expensive in absolute terms, on a human level it's practically instant. You would only do this during development, hopefully your local machine isn't the production environment.
It's also very easy, often just adding a CLI flag to your local run command.
edit: Regarding performance, Python today can easily handle at least 1k requests per second. The vast vast vast majority of web applications today don't need anywhere near that kind of performance.
The thing is, I don't run my applications locally with a "local run command".
I prefer to have a local system set up just like the production server, but in a container.
Maybe using WSGI with MaxConnectionsPerChild=1 could be a solution? But that would start a new (for example) Django instance for every request. Not sure how fast Django starts.
Another option might be to send a HUP signal to Apache:
apachectl -k restart
That will only kill the worker threats. And when there are none (because another file save triggered it already), this operation might be almost free in terms of resource usage. This also would require WSGI or similar. Not sure if that is the standard approach for Django+Apache.
Is the problem you're having that you feel the need to expose a WSGI/ASGI interface instead of just a reverse proxy? Take a look at gunicorn, and for static files server you can use whitenoise.
With those two you can just stand up an python program in a container that serves html, and put it behind whatever reverse proxy you want.
I would still recommend running it properly locally, but whatever. Pseudo-devcontainer it is. I assume the code is properly volume mounted.
In production, you would want to run your app through gunicorn/uvicorn/whatever on an internal-only port, and reverse-proxy to it with a public-facing apache or similar.
Set up apache to reverse proxy like you would on prod, and run gunicorn/uvicorn l/whatever like you would on prod, except you also add the autoreload flag. E.g.
If production uses containers, you should keep the python image slim and simple, including only gunicorn/uvicorn and have the reverse proxy in another container. Etc.
I hate this argument that “most web apps don’t need that kind of performance.” For one thing, with responsive apps that are the norm it wouldn’t be surprising for a session to begin with multiple requests or to even have multiple requests per second. At that point all it takes is a few hundred active users to hit that 1k limit.
But even leaving that aside, you never know when your application will be linked somewhere or go semi-viral and not being able to serve 1000 users is all it takes for your app to go down and your one shot at a successful company to die a sad death.
I didn't say python can handle <=1K, I was saying >=1K. I feel confident that I am orders of magnitude off the real limit you'd meet.
The specifics of that aside, any unprepared application is going to buckle at a sudden mega-surge of users. The solution remains largely the same, regardless of technology: Make sure everything that can be cached is cached, scale the hardware vertically until it stops helping, optimize your code, scale horizontally until you run out of money. I imagine the DB will be the actual bottleneck, most of the time.
There are other reasons to not choose python for greenfield application, but performance should rarely be one IMO.
At one shop, our load tests targeted 300,000 POST requests per second with a relatively small number of load balanced servers.
You wouldn’t want to have a long, complex call path through that code, but just parsing the body and adding it to a Celery queue before returning a 201 Created was perfectly manageable.
I personally like Quart, which is like Flask, but with asyncio. Django is also incredibly popular and has been around forever, so it is very battle-tested.
I've been using FastAPI which is made by the same guy as Flask but taking it seriously this time and using asyncio and making space for multithreaded. It's almost a drop-in replacement for Flask.
If you run the debug web server, (e.g. Django's `manage.py runserver`) command, yes it has watcher that will automatically restart the web server process if there is a code changes.
Once you deploy it to production, you usually run it using a WSGI/ASGI server such as Gunicorn or Uvicorn and let whatever deployment process you use handles the lifecycle. You usually don't use watcher in production.
> Does this still mean you have to restart your local web application after any change you made to it? I heard that some developers automate that, so that everytime they save a file, the web application is restarted. That seems pretty expensive in terms of resource consumption.
All of the popular frameworks automatically reload. It’s not instantaneous but with e.g. Django it was less than the time I needed to switch windows a decade ago and it hadn’t gotten worse. If you’re used to things like NextJS it will likely be noticeably faster.
With `reload(module)` you don't even have to restart the server if you structure it properly.
Think server.py and server_handlers.py, where server.py contains logic to detect a modification of server_handlers.py (like via inotify) and the base handlers which then call the "modifiable" handlers in server_handlers.py.
This is not limited to servers (anything that loops or reacts to events) and can be nested multiple levels deep and is among the top 3 reasons of why i use Python.
Reloading is instantaneous and can gracefully handle errors in the file (just print an err messge or stack trace and keep running the old code)
> The other reason is that the Python ecosystem moved away from stateless requests like CGI or mod_php use and now is completely set on long running processes.
The long-running process is a WSGI/ASGI process that handles spawning the actual code, similar to CGI. The benefit is that it can handle how it spawns the request workers via multiple runtimes, process/threads, etc. It's similar to CGI but instead of nginx handling it, it's a special program that specializes in the different options for python specifically.
> Does this still mean you have to restart your local web application after any change you made to it? I heard that some developers automate that, so that everytime they save a file, the web application is restarted. That seems pretty expensive in terms of resource consumption. And complex as you would have to run some kind of watcher process which handles watching your files and restarting the application?
Only for development!
To update your code in production you first deploy the new code onto the machine, and then you tell the WSGI/ASGI such as Gunicorn to reload. This will cause it to use the new code for new request, without killing current requests.
It's a graceful reload, with no file watching needed. Just a "systemctl reload gunicorn"
> That seems pretty expensive in terms of resource consumption. And complex as you would have to run some kind of watcher process which handles watching your files and restarting the application?
What? No, in reality it’s just running your app in debug mode (just a cli flag), and when you save the files the next refresh of the browser has the live version of the app. It’s neither expensive nor complex.
It's interesting to see these 2-9% improvements from version to version. They are always talked about with disappointment, as if they are too small, but they also keep coming, with each version being faster than the previous one. I prefer a steady 10% per version over breaking things because you are hoping for bigger numbers. Those percentages add up!
I spent about one week implementing PyPy's storage strategies in my language's collection types. When I finished the vector type modifications, I benchmarked it and saw the ~10% speed up claimed in the paper¹. The catch is performance increased only for unusually large vectors, like thousands of elements. Small vectors were actually slowed down by about the same amount. For some reason I decided to press on and implement it on my hash table type too which is used everywhere. That slowed the entire interpreter down by nearly 20%. The branch is still sitting there, unmerged.
I can't imagine how difficult it must have been for these guys to write a compiler and succeed at speeding up the Python interpreter.
This is happening mostly because Guido left, right? The take that CPython should be a reference implementation and thus slow always aggravated me (because, see, no other implementation can compete because every package depends on CPython kirks, in such a way that we're now removing the GIL of CPython rather than migrating to Pypy for example)
Partly, yes, but do note he is still very much involved with the faster-cpython project via Microsoft. Google faster cpython and van rossum to find some interviews and talks. You can also check out the faster-cpython project on github to read more.
It's fascinating to me that this process seems to rhyme with that of the path PHP took, with HHVM being built as a second implementation, proving that PHP could be much faster -- and the main project eventually adopting similar approaches. I wonder if that's always likely to happen when talking about languages as big as these are? Can a new implementation of it ever really compete?
Probably. Without a second implementation proving it out the bureaucracy can write it off as not possible and the demand may be less just because users don’t know what they’re missing
In my experience, pypy works with basically everything these days. I remember having some struggles with a weird fortran based extension module a few years ago, but it might work now too.
Most c extension modules should work in pypy, there's just a performance hit depending on how they're built (cffi is the most compatible).
The point of using C extensions is to have better performance. Python is already slow in general; code that use such extensions typically depend on this performance to not be unbearable (such as data science)
People wanting to use Pypy usually do so because they want better performance. Having a performance hit while using pypy is disconcerting.
I was speculating that in the future, C extensions in pypy would be faster, but I now see that the GIL is actually unrelated to this performance hit. Anyway it's really a pity.
I get what you are saying, but normally this wouldn't matter too much right? You will have a small number of calls into the C-extension that together do a lot of work. So as a percentage the ffi-overhead is small.
It's fine with C packages these days, it's increasingly rare to find libraries that it won't work with.
That said, it has happened often enough I'm cautious about where I use it. It would suck to be dependent on pypy's excellent performance, and then find I can't do something due to library incompatibility.
This is way better than before when no C packages worked.
Now, a lot of C packages work - and where they don't it's worth raising bugs: with PyPy, but also in the downstream program - occasionally they can use something else if it looks like the fix will take a while.
Seems a bit silly to think that- Guido is still involved with Python... and in fact is the one heading the Faster CPython project at Microsoft which is responsible for many of these improvements.
As a compiler, python is an optimized (C/kernel) implementation of parse generation.
JIT (PyPy) is a method that parses a 'trace' variation of grammar instead of syntax,
where the optimizer compiles source objects that are not limited to python code.
It's goal is to create as many 'return' instructions as it can decide to.
GIL is merely a CPython problem but synchronization can also be a compilation problem.
Because it took 10 years to have Python 3 being as fast as Python 2 while being more strict. 2-9% means it will be another 10 years to have Python 3 being significantly faster.
5.5% compounded over 5 years is a bit over 30%: not a huge amount but an easily noticeable speed-up. What were you thinking of when you typed “significantly faster”?
Compunding a decrease works differently than an increase. If something gets 10% faster twice it actually got 19% faster. In other words, the runtime is 90% of 90%, i.e. 81%.
In a certain sense, decreases are reciprocals of increases. You can calculate the reciprocal of each “faster”, or you can work with the figures as given and what you want is the reciprocal of the final result. This follows from the elementary fact that division is the same as multiplication by the reciprocal and we are only treating multiplication and division.
Not for a programming language because it's extremely rare for the computation rate to increase, rather than the work being done to compute something decrease.
If you've rewritten something to better use cachelines, removed saturating memory bandwidth, etc then sure you've increased The computation rate. But that's rarely how these language specific optimizations occur.
It's easy to see there must be something wrong with this since if you get 10% faster 8 times, we can be sure that this doesn't mean you're 114% faster (1.1^8 = 2.14). You can't get more than 100% faster!
When you say something is 10% faster, what you mean is it took 10% less time to finish. So 19% is correct.
100 percent faster means doubling the speed,just like a 100 percent salary increase means doubling the salary, or 100 percent car speed increase means doubling its speed.
Unless one uses a bit more esoteric definition of speed, in which a 50 percent of car speed increase, makes it go from 100 km/h to 200 km/h, such that it arrives in half the time.
The reference really doesn't say what you posted. It just says that python 3.6 was up to 45% faster than 2.7 on some benchmarks and up to 54% slower on some others (which the author of the suite considered largely unrealistic).
Where is the evidence that before python 3 was significantly slower than 2.7 before 3.6?
That link is pretty clear that it's a non-realistic benchmark that results in a Python 3 slowdown. (And there are many benchmarks that are faster in Python 3.)
The link seems fairly clear to me - One explanation given is that python3 represents all integers in a "long" type, whereas python2 defaulted to small ints. This gave (gives?) python2 an advantage on tasks involving manipulating lots of small integers. Most real-world python code isn't like this, though.
Interestingly they singled out pyaes as one of the worst offenders. I've also written a pure-python AES implementation, one that deliberately takes advantage of the "long" integer representation, and it beats pyaes by about 2000%.
I think it's more an issue of framing sometimes causing confusion, i.e. the ultimate trajectory is probably "crushingly slow → slow" rather than fastness coming in to it.
I'd rather they add up. Minus -5% runtime there, another -5% there... Soon enough, python will be so fast my scripts terminate before I even run them, allowing me to send messages to my past self.
log(2)÷log(1.1) ~= 7.27, so in principle sustained 10% improvements could double performance every 7 releases. But at some point we're bound to face diminishing returns.
Wasn't CPython supposed to remain very simple in its codebase, with the heavy optimization left for other implementations to tackle? I seem to remember hearing as much a few years back.
The problem is that:
* CPython is slow, making extension modules written in C(++) very attractive
* The CPython extension API exposes many implementation details
* Making use of those implementation details helps those extension modules be even faster
This resulted in a situation where the ecosystem is locked-in to those implementation details: CPython can't change many aspects of its own implementation without breaking the ecosystem; and other implementations are forced to introduce complex and slow emulation layers if they want to be compatible with existing CPython extension modules.
The end result is that alternative implementations are not viable in practice, as most existing libraries don't work without their CPython extension modules -- users of alternative implementations are essentially stuck in their own tiny ecosystem and cannot make use of the large existing (C)Python ecosystem.
CPython at least is in a position where they can push a breaking change to the extension API and most libraries will be forced to adapt. But there's very little incentive for library authors to add separate code paths for other Python implementations, so I don't think other implementations can become viable until CPython cleans up their API.
That was the original idea, when Python started attracting interest from big corporations. It has however become clear that maintaining alternative implementations is very difficult and resource-intensive; and if you have to maintain compatibility with the wider ecosystem anyway (because that's what users want), you might as well work with upstream to find solutions that work for everyone.
To date, no Python implementation has managed to hit all three:
1. Stay compatible with any recent, modern CPython version
2. Maintain performance for general-purpose usage (it's fast enough without a warmup, and doesn't need to be heavily parallelized to see a performance benefit)
3. Stayed alive
Which, frankly, is kind of a shame. But the truth of the matter is that it was a high bar to hit in the first place, and even PyPy (which arguably had the biggest advantages: interest, mindshare, compatibility, meaningful wins) managed to barely crack a fraction of a percent of Python market share.
If you bet on other implementations being the source of performance wins, you're betting on something which essentially doesn't exist at this point.
Isn't PyPy up to 3.10 by now? At least that's what Homebrew reports to me.
PyPy seems pretty alive, all things considered, and for my code bases I've seen pretty dramatic speedups on the order of 2-5x. That's basically a no brainer unless I'm doing something with incompatible C extensions, which I think is the real Achilles heel of all of these alternative implementations.
PyPy has definitely had the most success of all other implementations, but it still has a painful warmup period for many workloads. I can't imagine it's an effective option for anyone to install as the default Python implementation on their laptop, for instance. And for many, many years, it had almost no modern Python support (but I'm of course very glad to see it's slowly catching up).
It is encouraging for PyPy to see some influx of money in recent years. But I will continue to patiently wait for it to hit enough of a sweet spot of performance vs usability vs compatibility to see real adoption.
Does Python even have a language specification? I've been told that CPython IS the specification. I don't know if this is still true. In the Java world there is a specification and a set of tests to test for conformation so it's easier to have alternative implementations of the JVM. If what I said is correct, then I can see how the optimized alternative implementation idea is less likely to happen.
Well, for Python the language reference in the docs[0] is the specification, and many things there are described as CPython implementation details. Like: "CPython implementation detail: For CPython, id(x) is the memory address where x is stored." And as another example, dicts remembering insertion order was CPython's implementation detail in 3.6, but from 3.7 it's part of the language.
There is a pretty detailed reference that distinguishes between cpython implementation details and language features at least. There was a jvm python implementation even. The problem is more that a lot of the libraries that everyone wants to use are very dependent on cpython's ffi which bleeds a lot of internals.
I wish the money could be spent on PyPy but pypy has its problems - you don't get a big boost on small programs that run often because the warmup time isn't that fabulous.
For larger programs like you sometimes it some incredibly complicated incompatibility problem. For me bitbake was one of those - could REALLY benefit from pypy but didn't work properly and I couldn't fix it.
If this works more reliably or has a faster warmup then....well it could help to fill in some gaps.
Isn’t that similar to JS? My understanding was modern JS runtimes had something like this:
interpreted -> basic JIT -> fancy JIT
The interpreter gets you going fast. The basic JIT is extremely fast to compile but not the most performant. If the code can be JITed it quickly will be.
From there the engine can find hotspots or functions that get run a lot and use the fancy JIT on them in the background. That means the slow compile doesn’t block things but when the result can be swapped in performance can take a big jump.
At any point the engine can drop down to the interpreter if an assumption is violated (someone passes a string where they had always used numbers before) or a function is redefined.
It wouldn’t surprise me if something like that appeared as an option in Python over time to get the best of both worlds.
The article presents a copy and patch jit as something new, but I remember DOS's quickbasic doing the same thing. It generated very bad assembly code in memory by patching together template assembly blocks with filled in values, with a lot of INT instructions toward the quickbasic runtime, but it did compile, not interprete.
Template JITs in general aren't a new technique, but Copy-and-Patch is a specific method of implementing it (leveraging a build time step to generate the templates from C code + ELF relocations).
QBasic was a slightly cut down version of Quickbasic that didn't include the compiler, so your assumption was correct in that case. QBasic was bundled with DOS but you had to buy Quickbasic.
I always assumed, because they where so similar, that they both were compilers, generating executable code in memory. qbasic just had the option to write a standalone .exe removed. I just saw the quickbasic compiler output, though, no idea if this guess is true. Now I wonder if a compiled executable was faster than an in-memory application.
This was a fantastic, very clear, write-up on the subject. Thanks for sharing!
If the further optimizations that this change allows, as explained at the end of this post, are covered as well as this one, it promises to be a very interesting series of blog posts.
The last two-ish years have been insane for Python performance. Something clicked with the core team and they obviously made this a serious goal of theirs and the last few years have been incredible to see.
It’s because the total dollars of capitalized software deployed in the world using Python has absolutely exploded from AI stuff. Just like how the total dollars of business conducted on the web was a big driver of JS performance earlier.
AI heavy lifting isn't just model training. There's about a million data pipelines and processes before the training data gets loaded into a PyTorch tensor.
Ehhh... if you're lucky. I've seen (and maybe even written) plenty of we-didn't-have-time-to-write-this-properly-with-dataframes Python data munging code, banged out once and then deployed to production. I'll take performance gains there.
When Node.js started really taking off, I was actually a bit worried that Python would become No Longer Best Practice, but now it definitely seems to be still going strong.
Right now, if you develop something in Python, not many people complain at you, although some might say you shoulda used JS and some of them might be right....
Maybe not absolute best practice, but it's not like PHP or Ruby or Perl, where there's rarely ever any new projects everyone blogs about using them.
All the big web frameworks are still maintained, there are still new coders learning it, it's not a language that will make people be like "Oh ew, I'm not learning that language just to work on that", etc.
Which is strange considering how bad the tooling to use Python on Windows is. There's a few workflows where people have gone down the beaten path before (Conda, etc.), but outside of that you have to just pretend you're on Linux and use the cygwin toolchains and even that doesn't always work so well. Better support on Linux was a top 5 reason for me making the switch to using it full time when I went off to college, and it hasn't changed substantially in the 8 years since then.
Not sure what you mean. I use Python for Windows all the time at work, and on Linux at home. I've not noticed any meaningful difference in my workflow.
> There were no noticeable performance improvements in the course of the last two years.
In fairness, Python did get faster. Python 3.9 took 82 seconds for sudoku solving and 62 seconds for interval query. Python 3.11 took 53 and 43 seconds, respectively [1]. v3.12 may be better. That said, whether the speedup is noticeable can be subjective. 10x vs 15x slower than v8 may not make much difference mentally.
It was a different time. Microsoft had a different strategy towards languages not developed by Microsoft. Similar to how there also used to be JScript, but now Node.js is basically a Microsoft's pet project.
There are actually plenty of popular Microsoft's projects that took even more than two tries. Azure is like their third attempt at cloud services, iirc. Credit where credit is due, they learn from mistakes... unfortunately, that only makes them more insidious.
What is it really JIT-ing? Given it says that it's only relevant for those building CPython. So it's not JIT-ing my Python code, right? And the interpreter is in C. So what is it JIT-ing? Or am I misunderstanding something?
> A copy-and-patch JIT only requires the LLVM JIT tools be installed on the machine where CPython is compiled from source, and for most people that means the machines of the CI that builds and packages CPython
Code fragments that implement each opcode in the core interpreter loop are additionally compiled in the way that each fragment is compiled into a relocatable binary. Once processed in that way, the runtime code generator can join required fragments by patching relocations, essentially doing the job of dynamic linkers. So it is compiling your Python code, but the compiled result is composed of pre-baked fragments with patches.
IMO the biggest issue with Python right now is the greater culture of code quality. This is probably biased from my recent exposure to ML code and ML packages. But there is a disgusting level of disregard for basic code practices in even fairly well funded libraries. Dead code, commented-out code, unused variables, unused imports, inconsistent quote types, unreadable variable names, no attempt to use type hints, etc.
Granted, the code I see from more web and ops focused teams is miles better. But I worry the collective is not where it should be.
Despite the existence of WebAssembly, which some have suggested to be able to run a non-JS language, I just have a hard time seeing anything non-JS get popular for the web. The in browser debug/development environment just works too well.
Short of Google implementing something if Chrome hits 85-90% of all use in an attempt to dump JS it just doesn’t seem like something that would happen. I doubt any browser team would want to implement multiple languages. I doubt Google would want to switch.
For Python to work in the browser via WebAssembly you'd need to have python itself compiled and running to execute python. It would consume a lot of energy and be much slower, I see no upsides. WebAssembly is meant to have lower level languages and programs run in the browser, not so Python devs do not have to learn a second language.
Plus Python is not suited for event driven systems.
I like python but I would never choose it for anything more than trivial on the backend. I want to know what types are being passed around from one middleware function to the next. Yes python has annotations but that’s not enough.
Type annotations are pretty good. I'd like to see a strict mode though, something you import that makes missing annotations anywhere into a fatal error
Just wait until you see what the enterprise Java developers passing around with type Object and encoded XML blobs. Type checking is really useful but it can be defeated in any language if you don’t have a healthy technical culture.
Is that why I see much object serialization/deserialization in Java?
They're trying to pass data between layers of middleware, but Java has very strict typing, and the middleware doesn't know what kind of object it will get, so it has to do tons of type introspection and reflection to do anything with the data?
There are multiple causes but a lot of the nastiest code I’ve seen did suggest “I don’t have time to update those other layers”, along with the very common misperception that you’re working on a huge shared service for the ages and need everything to be as generic and customizable as possible.
The article describes that the new JIT is a "copy-and-patch JIT" (I've previously heard this called a "splat JIT"). This is a relatively simple JIT architecture where you have essentially pre-compiled blobs of machine code for each interpreter instruction that you patch immediate arguments into by copying over them.
I take some issue with this statement, made later in the article, about the pros/cons vs a "full" JIT:
> The big downside with a “full” JIT is that the process of compiling once into IL and then again into machine code is slow. Not only is it slow, but it is memory intensive.
I used to think this was true also, because my main exposure to JITs was the JVM, which is indeed memory-intensive and slow.
But then in 2013, a miraculous thing happened. LuaJIT 2.0 was released, and it was incredibly fast to JIT compile.
LuaJIT is undoubtedly a "full" JIT compiler. It uses SSA form and performs many optimizations (https://github.com/tarantool/tarantool/wiki/LuaJIT-Optimizat...). And yet feels no more heavyweight than an interpreter when you run it. It does not have any noticeable warm up time, unlike the JVM.
Ever since then, I've rejected the idea that JIT compilers have to be slow and heavyweight.
I think Mike Pall has done enough work on LuaJIT for several lifetimes. If nobody else wants to merge pull requests and make sure everything still works then maybe LuaJIT isn't important enough to the world.
Honestly I don't understand the pessimistic view here. I think every release since Microsoft started funding python has increased high single digit best case performance.
Rather than focussing on the raw number compare to python 3.5 or so. It's still getting significantly faster.
If they keep doing this steady pace they are slowly saving the planet!
I think the pessimism really comes from a dislike for Python
While very very very popular, Python is i think is very disliked languages, it doesnt have or it is not built around the current programming language features that programmers like, its not functional or immutable by default, its not fast, the tooling is complex, it uses indentation for code blocks (this feature was cool in the 90s, but dreaded since at least 2010)
so i guess if python become fasters, this will ensure its continued dominance, and all those hoping that one day it will be replace by a nicer , faster language are disappointed
this pessimism is the aching voice of the developers who were hoping for a big python replacement
> (this feature was cool in the 90s, but dreaded since at least 2010)
LOL this is a dead giveaway you haven't been around long. There have been people kvetching about the whitespace since the beginning. Haskell went on to be the next big thing for reddit/HN/etc for years and it also uses whitespace.
To each his own, but the things you list are largely subjective/inaccurate, and there are many, many, many developers who use Python because they enjoy it and like it a lot.
Python is a very widely used language, and like any popular thing, yes many many many like it , and many many many dislike it .. it is that big, python can be disliked by a million developer and still be a lot more liked than disliked
but i also think that its true that python is not and have not been for a while considered as a modern or technically advanced language
the hype currently is for typed or gradually typed languages, functional languages, immutable data , system languages, type safe language, language with advanced parallelism and concurrency support etc ..
python is old , boring OOP, if you like it, than like millions of developers you are not picky about programming language, you use what works, what pays
but for devs passionate about programming languages, python is a relic they hope vanish
Python is designed to be "boring" (in other words, straightforward and easy to understand). It is admittedly less so, now that it has gained many features since the 2.x days, but it is still part of its pedigree that it is supposed to be teachable as a beginner language.
It is still the only beginner language that is also an industrial-strength production language. You can learn Python as your first language and also make an entire career out of it. That can't really be said about the currently "hyped" languages, even though those are very fun and cool and interesting!
> but for devs passionate about programming languages, python is a relic they hope vanish
Such devs are increasingly rare and, in some domains, almost nonexistent. For example, Kotlin borrowed a lot of nice features from Scala and Groovy, yet 99% of Kotlin code I've seen professionally never touched those features. Kotlin on Android seems to be overwhelmingly written by barely (or not at all) re-trained Java devs; moreover, those who never learned anything more recent than 1.8 (at least they know what lambdas are.)
In short, it's not about the language; it's about the people who use that language. Wishing a language to vanish is misguided - it wouldn't change anything. The people would just switch to the next language and would still program in the same style. You can write Fortran in every language - this is as true today as it was back in the 70s, but the percentage of people who can't be bothered to stop writing Fortran (metaphorically, in more literal meaning their first language, whatever it was) even after changing to another language got much higher. IMO due to the changes in how programming as a trade is perceived in society... but that's perhaps a rant for another time :)
i really like this opinion, a large percentage of developers write procedural code in any language, libraries and market demand is their main decision criteria, not language features
> devs passionate about programming languages, python is a relic they hope vanish
Statements like this are obviously untrue for large numbers of people, so I'm not sure of the point you're trying to make.
But certainly it's true that there are both objective and subjective reasons for using a particular tool, so I hope you are in a position to use the tools that you prefer the most. Have a great day!
Python disliked? That doesn't resonate with my experience or repeated Stack Overflow surveys where Python is often near the top in admired and desired languages:
At the same time, if 1% of python programmers dislike it, then there are more dissatisfied python programmers than there are programmers in total for most other languages.
Because it only increases high single digit each release. If they keep up the 10% improvement for the next 10 release, we will reach a speedup of around 2.5 times. That's very small, considering how Python is like 10-20 times slower than JS (not even talking about C or Java like speeds).
Also, such a shame that it takes sooo long for crucial open source to be funded properly. Kudos to Microsoft for doing it, shame on everyone else for not pitching in sooner.
FYI Python was launched 32 years ago, Python 2 was released 24 years ago and Python 3 was released 16 years ago.
To be clear Microsoft isn't directly funding Python, excluding any PyCon sponsorship.
Microsoft hired Guido in late 2020 giving him freedom to choose what project he wanted. Guido decided to go back to core Python development and with approval of Microsoft created a "faster-cpython" project, at this point that project has hired several developers including some core CPython developers. This is all at the discretion of Microsoft, and is not some arms length funding arrangement.
Meta has a somewhat similar situation, they hired Sam Gross (not the cartoonist) to work on a Python non-gil project, and contribute it directly to CPython if they accept it (which they have), and they have publicly committed to support it, which if I remember right was something like funding two engineering years of an experienced CPython internals developer.
Julia is my source of pessimism. Julia is super fast once it's warmed up, but before it gets there, it's painfully slow. They seem to be making progress on this, but it's been gradual. I understand that Java had similar growing pains, but it's better now. Combined with the boondoggle of py3, I'm worried for the future of my beloved language as it enters another phase of transformation.
I'm not that up to date on the language, it's been a few years since I did anything nontrivial with it because the experience was so poor. And while that might not seem fair to Julia, it's my honest experience: my concern isn't a pissing match between Julia and the world, it's that bad JIT experience is a huge turnoff and I'm worried about Python's future as it goes down this road.
There has been so much progress in Julia’s startup performance in the past “few years” that someone’s qualitative impressions from several major releases before the current one are of limited relevance.
You're making this about Julia despite my repeated statements to the contrary. Please reread what I've written, you aren't responding to the actual point I've made twice now. A reminder: I'm talking specifically about my outlook on the future of Python, vis a vis my historical experience with how other JIT languages have developed.
If you wanted to rebut this, you'd need to argue that Julia has always been awesome and that my experience with a slow warmup was atypical. But that would be a lie, right?
And, subtext: when I wrote my first commebt in this thread, its highest sibling led with
> I think the pessimism really comes from a dislike for Python
So I weighed in as a Python lover who is pessimistic for reasons other than a bias against the language.
> I'm talking specifically about my outlook on the future of Python, vis a vis my historical experience with how other JIT languages have developed.
But your assessment of the other language you mentioned is several years out of date and made largely irrelevant by the fast pace of progress. Therefore your conclusions about the probable future of Python, which may be correct, nevertheless do not follow.
I was sharing feelings and opinions, when you refer to my "conclusions" you're speaking to elements of the empty set. I get that you're a big Julia evangelist, but if you hope to reach people, you must learn to listen.
How long did it take Julia to solve its warmup issue? The language is about 12, and I last tried in earnest two years ago. So, a decade, give or take? You speak from the top of a mountain, and you say the view is nice. Sitting at the base of a similar mountain, it's the journey that I dread, because Python's recent long-term journeys have been pretty rough. And I'm just not convinced that the destination is so great.
Amdahl's Law is about expected speedup/decrease in latency. That actually isn't strongly correlated to "saving the planet" afaik (where I interpret that as reducing direct energy usage, as well as embodied energy usage by reducing the need to upgrade hardware).
If anything, increasing speed and/or decreasing latency of the whole system often involves adding some form of parallelism, which brings extra overhead and requires extra hardware. Note that prefetching/speculative execution kind of counts here as well, since that is essentially doing potentially wasted work in parallel. In the past boosting the clock rate the CPU was also a thing until thermodynamics said no.
OTOH, letting your CPU go to sleep faster should save energy, so repeated single-digit perf improvements via wasting less instructions does matter.
But then again, that could lead to Jevons Paradox (the situation where increasing the efficiency encourages more wasteful than the increase in efficiency saves - Wirth's Law but generalized and older, basically).
So I'd say there's too many interconnected dynamics at play to really simply state "optimization good" or "optimization useless". I'm erring on the side of "faster Python probably good".
Basically a JIT (Just In Time), is also known as a dynamic compiler.
It is an approach that traces back to original Lisp and BASIC systems, among others lesser kwown ones.
The compiler is part of the language runtime, and code gets dynamically compiled into native code.
Why is this a good approach?
It allows for experiences that are much harder to implement in languages that tradicionally compile straight to native code like C (note there are C interpreters).
So you can have an interpreter like experience, and code gets compiled to native code before execution on the REPL, either straight away, or after the execution gets beyond a specific threshold.
Additionally, since dynamic languages per definition can change all the time, a JIT can profit from code instrumentation, and generate machine code that takes into account the types actually being used, something that an AOT approach for a dynamic language cannot predit, thus optimizations are hardly an option in most cases.
The article will be a confusing read to someone who does not know what a JIT is.
Look at the paper after the heading "What is a JIT?"
The first paragraph moves towards an answer - "compilation design that implies that compilation happens on demand when the code is run the first time" But then it backtracks on this and says that it could mean many things, and gets wishy-washy, and says that python is already a JIT.
The second paragraph says, "What people tend to mean when they say a JIT compiler, is a compiler that emits machine code." What point is the author trying to make here? An Ahead of Time compiler emits machine code. But then it goes on to say that an Ahead of Time compiler also emits machine code. So what is a JIT?
The third paragraph starts talking about about mechanism, which is a distraction from the question it posed above - what is a JIT?
The article talks around points instead of making points.
Yeah as a junior without a CS degree I was reading this article thinking "this is very interesting" but found it very hard to really grasp the difference between Ahead of Time and JIT from their explanations. Just that it was different from the previous python interpreter method, which seems woefully inefficient. I do know that Java has a JIT and I've read about this, but I guess it became quickly clear that I didn't really understand it since I couldn't follow this article. I think I will need to read more about this elsewhere and come back to fully grasp the impact.
> very hard to really grasp the difference between Ahead of Time and JIT
JIT = "just in time" = bytecode is converted to native code while the program is running, either at the startup of the program or just before a particular function is called. Sometimes even after the function is called (since the JIT process itself takes time, it may be optimal to only run it once a function has been called N times or taken M microseconds total run time)
AOT = "ahead of time" = bytecode is converted to native code before the program starts. i.e. by the developer during their distribution or deployment process. AOT compilation knows nothing about the specific run time conditions.
>> JIT, or “Just in Time” is a compilation design that implies that compilation happens on demand when the code is run the first time.
>> What people tend to mean when they say a JIT compiler, is a compiler that emits machine code.
A JIT compiler is a compiler that emits machine code the first time that code is run, vs an AOT compiler which emits machine code when the code is built.
Did you already know what a JIT was before reading the article though? Confirming what you already know is a different thing than grokking it the first time. Plus in my brief but intense experience as someone teaching programming to artistic types who are scared of maths, it's more useful to evaluate explanations by the possibility of being misunderstood and overwhelming, than the possibility of correctly interpreting it.
Indeed, this is exactly the kind of workload where PyPy historically outperformed CPython by a long shot.
I think there have been incremental optimizations in the bytecode interpreter for things like this in recent versions, but a JIT is going to be a real game-changer.
It will be interesting to see what happens to the PyPy project after this, as well as the HPy C API effort.
> The initial benchmarks show something of a 2-9% performance improvement. You might be disappointed by this number, especially since this blog post has been talking about assembly and machine code and nothing is faster than that right? Well, remember that CPython is already written in C and that was already compiled to machine-code by the C compiler.
WTF has this to do with JITing the code written in Python?
Why has it taken so much longer for CPython to get a JIT than, say, PyPy? I would imagine the latter has far less engineering effort and funding put into it.
For the longest time, CPython was deliberately optimized for simplicity. That's a perfectly reasonable choice: it's easier to reason about, easier for new maintainers to learn it, easier to alter, easier to fix when it breaks, etc. Also, CPUs are pretty good at running simple code very quickly.
It's only fairly recently that there's been critical mass of people who thought that performance trumps simplicity, and even then, it's only to a point.
> It's only fairly recently that there's been critical mass of people who thought that performance trumps simplicity
This definitely wasn't true, from the user perspective. And, I'm not even convinced it's some "critical mass" of developers. These changes aren't coming from some mass of developers, there's coming from a few experts that had a clear plan, backed by the sanity of the huge disconnect that languages are actually meant for users of the language, not the developers of the language.
Context: I use python for data processing and webdev. When doing data processing, Python is merely glue for libraries in compiled languages. When doing webdev, I mostly use python itself.
First, any numbers regarding benchmarks need to be treated with contempt. JITs are unbenchmarkable. No matter what you do, someone says you do it wrong. Warmed up the JIT? You did it wrong. Didn't warm up the JIT? You did it wrong. Warmed up and didn't warm up the JIT? Wrong.
You lose predictable performance characteristics due to the above. It's difficult to describe the importance of this to the people who look at Python as glue for their compiled code.
Next, on the face of it, it doesn't look like it will compose well with subinterpreters. If each subinterpreter does it's own tracing, it's going to be harder to hit the 10k watermark of jitting hot code.
This uses LLVM's JIT which is particularly slow and heavy (16MB added to the binary size) last time I tried to use it. So this limits Python attractiveness in being an embedded language.
While this remains experimental - and hence strictly optional, packaging this in distributions that use gcc would now apparently need llvm tools installed to build python. Expect feedback from distribution packagers.
Idea for improvement: when I do data processing, I'm using python to glue bits of C, Rust, and other compiled languages so this is not very useful. When I'm doing webdev, it could be useful - and I am deploying using Docker. So why not make this AOT so I can add it to a docker build step and get all the benefit without the complications of tracing jits.
IMO, Python's performance can be a huge problem even when using it for purposes it's supposedly good for, like data processing. If you're doing NLP, you can use NLTK's built-in tokenisers and get pretty good performance. But if your situation calls for a slightly different tokeniser? You're gonna either write that in Python and absolutely tank your performance, or have a miserable time writing your own tokeniser as a C extension.
Or you can use PyPy of course, but that's a compatibility nightmare exactly when you're gluing together a bunch of C libraries with Python interfaces.
And in principle, JIT doesn't necessarily mean non-embeddable; LuaJIT is pretty good from an embedding perspective. Besides, I would assume they make JIT a build-time option so that people whose use case makes it problematic can use only the bytecode interpreter instead.
That said, your concerns about performance variability are warranted, and there are probably specific criticisms to be made about the choice of LLVM. LLVM is a gigantic dependency and I hope the performance wins of choosing LLVM rather than, say, crankshaft are worth it.
More glaring about the claimed performance gain is that they didn't mention what code they ran (as far as I could see). Was it a selected small piece of code or was it a big Django monolith? As I didn't see any mention of what code it was I'm going to assume the former, and that the latter will actually get worse performance. Because that is what typically happens when someone build a new JIT compiler. It takes a couple of iterations of optimizations for them to become useful on nontrivial code.
> JITs are unbenchmarkable. No matter what you do, someone says you do it wrong.
Why does that matter?
Presumably if you care about Python performance, you have a real world scenario that you can benchmark. Just make a benchmark that is as representative as possible and check if the JIT helps. If it doesn't help, it surely can be disabled via an option.
Because you need to have an idea of how many machines you need. And you need an idea of whether 'it will take as long as it takes' or if there's an issue in your code.
Sometimes Java decides to JIT using aes instructions. Sometimes it doesn't. Even if you run the same benchmark suite twice in a row it just does its own thing and gives wildly different results.
My learning was that one shouldn't depend on JIT performance and prefer using JNI libraries if you want consistency. Hence my comment being negative on JIT.
I don't know a lot about the Java JIT. But my guess is that the scenario you observed is the result of profile-guided optimisations done at runtime.
Hopefully this is a feature that can be disabled when you want deterministic behaviour. There's no reason to make it mandatory in a well-engineered VM.
I'm confused, what kind of time measurement system is this? Are those 31 seconds versus 13 seconds? 0.31s vs 0.13s? If it's the first case, something is wrong with that machine, Hello World should not take that long. If it's the second, are we talking about 0.18s? See below.
> it's 3x slower to start up… it means that any bash script with a loop calling a java command will run 3x slower.
I don't know how you write your scripts, but in practice most scripts hang waiting for IO (especially network) or waiting for a specific command to process something...
> 3x slower isn't my definition of minuscule.
Nope, but based on what you've presented so far, unless I'm misunderstanding, it's my definition of "premature optimization".
double click selection issue. 0m0,031s vs 0m0,013s.
> I don't know how you write your scripts,
You know how… I put print hello world in them… that was the test you asked for?
> premature optimization
Not using something that has 3x startup time, for a short lived command is just "common sense".
I'm starting to think that the issue here is that you know java but don't know python or C, and you are unconsciously trying to get water to your own watermill.
Printing hello world does not show the cost / benefit of a tracing jit that needs to be warmed up over thousands of iterations. And counter example: it takes maven 0.788s to realize I don't have a pom.xml in my PWD and fail a build. It takes meson 0.163s.
Maven versus Meson is purely about architecture, just look at Maven's design.
There is a reason I didn't say anything about that aspect of JITs that you mention, it's because it's not relevant to commands which are launched from shell scripts and then exit.
> At the moment, the JIT is only used if the function contains the JUMP_BACKWARD opcode which is used in the while statement but that will change in the future.
Isn't this the main reason why it's only a 2-9% improvement? Not much Python code uses the while statement in my experience.
I believe a JIT using this technique could eliminate dead code at the Python bytecode level, but not at the machine code level. That seems pretty reasonable to me.
Not sure, these optimizations multiply in power when used together. Propagate constants and fold constants, after that you can remove things like "if 0 > 0", both the conditional check and the whole block below it, and so on.
That's entirely possible with the approach I described. Such optimizations can easily be done on the bytecode prior to the compilation to machine code.
If you're interested in learning more about the challenges and tradeoffs, both Jython (https://www.jython.org/) and IronPython (https://ironpython.net/) have been around for a long time and there's a lot of reading material on that subject.
I've found the startup time for Graal Python to be terrible compared with other Graal languages like JS. When I did some profiling, it seemed that the vast majority of the time was spent loading the standard library. If implemented lazily, that should have a negligible performance impact.
Python is a convenient friendly syntax for calling code implemented in C. While you can easily re-implement the syntax, you then have to decide how much of that C to re-implement. A few of the builtin types are easy (eg strings and lists), but it soon becomes a mountain of code and interoperability, especially if you want to get the semantics exactly right. And that is just the beginning - a lot of the value of Python is in the extensions, and many popular ones (eg numpy, sqlite3) are implemented in C and need to interoperate with your re-implementation. Trying to bridge from Java or .NET to those extensions will overwhelm any performance advantages you got.
This JIT approach is improving the performance of bits of the interpreter while maintaining 100% compatibility with the rest of the C code base, its object model, and all the extensions.
> The big downside with a “full” JIT is that the process of compiling once into IL and then again into machine code is slow. Not only is it slow, but it is memory intensive.
Given how many Microsoft employees today steer the Python decision making process, I'm sure in not so distant future, we might see a new CLR-based Python implementation.
Maybe Microsoft don't know yet how to sell this thing, or maybe they are just boiling the frog. Time will tell. But I'm pretty sure your question will be repeated as soon as people will get used to the idea of Python on JIT.
“Python code runs 15% faster and and 20% cheaper on azure than aws, thanks to our optimized azurePython runtime. Use it for azure functions and ml training”
How much of Python in the world is ran on Azure? It's my language of preference and the main professional one, along with JS/TS and Go. But if they pulled something like this I'm pretty sure I'd reluctantly pack things up and move to Go.
Microsoft developed both JScript and Node.js. They could've continued with JScript, but obviously decided against it because JScript didn't earn the reputation they might have hoped for. Even if they invested efforts into rectifying the flaws of JScript, it would've been just too hard to undo the reputation damage.
Microsoft made multiple attempts to "befriend" Python. IronPython was one of the failures. They also tried to provide editing tools (eg. intellisense in MSVS), but kind of given up on that too (but succeeded to a large degree with VSCode).
The whole long-term Microsoft's strategy is to capture and put the developers on a leash. They won't rest until there's a popular language they don't control.
So they compile the C implementation of every opcode into templates and then patch in the actual values from the functions being compiled. That's genius, massive inspiration for me. It's automatically ABI compatible with the rest of CPython too.
Is there an similarly accessible article about the specializing adaptive interpreter? It's mentioned in this article but not much detail is given, only that the JIT builds upon it.
I wonder if I can skip the bytecode compilation phase.
For the lazy who just want to know if this makes Python faster yet, this is foundational work to enable later improvements:
> The initial benchmarks show something of a 2-9% performance improvement.
> I think that whilst the first version of this JIT isn’t going to seriously dent any benchmarks (yet), it opens the door to some huge optimizations and not just ones that benefit the toy benchmark programs in the standard benchmark suite.
You're right, and in this case "foundational work" even undersells how minimal this work really is compared to the results it already gets.
I recommend that people watch Brandt Bucher's "A JIT Compiler for CPython" from last year's CPython Core Developer Sprint[0]. It gives a good impression of the current implementation and its limitations, and some hints at what may or may not work out. It also indirectly gives a glimpse into the process of getting this into Python through the exchanges during the Q&A discussion.
One thing to especially highlight is that this copy-and-patch has a much, much lower implementation complexity for the maintainers, as a lot of the heavy lifting is offloaded to LLVM.
Case in point: as of the talk this was all just Brandt Bucher's work. The implementation at the time was ~700 lines of "complex" Python, ~100 lines of "complex" C, plus of course the LLVM dependency. This produces ~3000 lines of "simple" generated C, requires an additional ~300 lines of "simple" hand-written C to come together, and no further dependencies (so no LLVM necessary to run the JIT. Also "complex" and "simple" qualifiers are Bucher's terms, not mine).
Another thing to note is that these initial performance improvements are just from getting this first version of the copy-and-patch JIT to work at all, without really doing any further fine-tuning or optimization.
This may have changed a bit in the months since, but the situation is probably still comparable.
So if one person can get this up and running in a few klocs, most of which are generated, I think it's reasonable to have good hopes for its future.
An important context here is that the same code was reused for interpreter and JIT implementations (that's a main selling point for copy-and-patch JIT). In the other words, this 2--9% improvement mostly represents the core interpreter overhead that JIT should significant reduce. It was even possible that JIT itself might have no performance impact by itself, so this result is actually very encouraging; any future opcode specialization and refinement should directly translate to a measurable improvement.
Copy&patch seems not much worse than compiling pure Python with Cython, which roughly corresponds to "just call whatever CPython API functions the bytecode interpreter would call for this bunch of Python", so that's roughly a baseline for how much overhead you get from the interpeter bit.
There is no reason to use copy-and-patch JIT if that were the case, because the good old threaded interpreter would have been fine. There are other optimization works in parallel with this JIT effort, including finer-grained micro operations (uops) that can replace usual opcodes at higher tiers. Uops themselves can be used without JIT, but the interpreter overhead is proportional to the number of (u)ops executed and would be too large for uops. The hope is that copy-and-patch JIT combined with uops have to be much faster than threaded code.
From the write-up, I honestly don't understand how this paves the way. I don't see an architectural path from a cut-and-paste JIT to something optimizing. That's the whole point of a cut-and-paste JIT.
> . I don't see an architectural path from a cut-and-paste JIT to something optimizing.
One approach used in V8 is to have a dumb-but-very-fast JIT (ie. this), and keep counters of how often each block of code runs (perhaps actual counters, perhaps using CPU sampling features), and then any block of code running more than a few thousand times run through a far more complex yet slower optimizing jit.
That has the benefit that the 0.2% of your code which uses 95% of the runtime is the only part that has to undergo the expensive optimization passes.
Note that V8 didn't have a dumb-but-very-fast JIT (Sparkplug) until 2021; the interpreter (Ignition) did that block counting and sent it straight to the optimizing JIT (TurboFan).
V8 pre-2021 (i.e., only Ignition+TurboFan) was significantly faster than current CPython is, and the full current four-tier bundle (Ignition+Sparkplug+Maglev+TurboFan) only scores roughly twice as good on Speedometer as pure Ignition does. (Ignition+Sparkplug is about 40% faster than Ignition alone; compare that “dumbness” with CPython's 2–9%.) The relevant lesson should be that things like very carefully designed value representation and IR is a much more important piece of the puzzle than having as many tiers of compilation as possible.
In case anyone is interested, V8 pre-ignition/TurboFan had different tiers [1]: full-codegen (dumb and fast) and crankshaft (optimizing). It's interesting to see how these things change over time.
> keep counters of how often each block of code runs ... and then any block of code running more than a few thousand times run through a far more complex yet slower optimizing jit.
That's just all JITs. Sometimes its counters for going from interpreter -> JIT rather than levels of JITs, but this idea is as old as JITs.
There's a lot of effort going on to improve CPython performance, with optimization tiers, etc. It seems the JIT is how at least part of that effort will materialize: https://github.com/python/cpython/issues/113710
> We're getting a JIT. Now it's time to optimize the traces to pass them to the JIT.
Isn't it the case that Python allows for type specifier (type hints) since 3.5, albeit the CPython interpreter ignores them? The JIT might take advantage of them, which ought to improve performance significantly for some code.
That what makes Python flexible is what makes it slow. Restricting the flexibility were possible offers opportunities to improve performance (and allows for tools and humans to spot errors more easily).
AFAIK good JITs like V8 can do runtime introspection and recompile on the fly if types change. Maybe using the type hints will be helpful but I don't think they are necessary for significant improvement.
Well, GraalPython is a Python JIT compiler which can exploit dynamically determined types, and it advertises 4.3x faster, so it's possible to do drastically better than a few percent. I think that's state of the art but might be wrong.
Note that this is with a relatively small investment as these things go, the GraalPython team is about ~3 people I guess, looking at the GH repo. It's an independent implementation so most of the work went into being compatible with Python including native extensions (the hard part).
But this speedup depends a lot on what you're doing. Some types of code can go much faster. Others will be slower even than CPython, for example if you want to sandbox the native code extensions.
Pypy is a different JIT that gives anything from slower/same to 100x speedup depending on the benchmark. They give a geometric mean of 4.8x speedup across their suite of benchmarks.
https://speed.pypy.org/
To the contrary. In CL some flexibility was given up (compared to other LISP dialects) in favor of enabling optimizing compilers, e.g. the standard symbols cannot be reassigned (also preserving the sanity of human readers). CL also offers what some now call 'gradual typing', i.e. optional type declarations. And remaining flexibility, e.g. around the OO support, limits how well the compiler can optimize the code.
Surely this is the job for a linter or code generator (or perhaps even a hypothetical ‘checked’ mode in the interpreter itself)? Ain’t nobody got time to add manual type checks to every single function.
Of course, this is not a good example of good, high-performance code, only an answer to the specific question... the questioner certainly also knows MyPy.
I actually don't know anything about MyPy, only that it exists. Does it run that example correctly, that is, does it print "nopenope"? Because I think it's the correct behaviour, type hints should not actually affect evaluation (well, beyond the fact that they must be names that are visible in the scopes thay're used in, obviously), altough I could be wrong.
Besides, my point was that one of the reasons why languages with (sound-ish) static types manage to have better performance because they can omit all of those run-time type checks (and the supporting machinery) because they'd never fail. And if you have to put those explicit checks, then the type hints are actually entirely redundant: e.g. Erlang's JIT ignores type specs, it instead looks at the type guards in the code to generate specialized code for the function bodies.
Of course dynamism limits performance (and as said, standard symbols and class is also an unhygienic macro thing) but I meant that you can have both high performance and high dynamism in a programming language, dynamism itself is no excuse to not even try.
I doubt it with a copy-and-patch JIT, not the way they work now. I'm a serious mypy/python-static-types user and as is they currently wouldn't allow you to do much optimization wise.
- All integers are still big integers
- Use of the typing opt-out 'Any' is very common
- All functions/methods can still be overwritten at runtime
- Fields can still be added and removed from objects at runtime
The combination basically makes it mandatory to not use native arithmetic, allocate everything on the heap, and need multiple levels of indirection for looking up any variable/field/function. CPU perf nightmare. You need a real optimizing JIT to track when integers are in a narrow range and things aren't getting redefined at runtime.
It should be fairly easy to add instruction fusing, where they recognize often-used instruction pairs, combine their C code, and then let the compiler optimize the combined code. Combining LOAD_CONST with the instruction following it if that instruction pops the const from the stack seems an easy win, for example.
In the interpreter, I don’t think it wouldn’t reduce overhead much, if at all. You’d still have to recognize the two byte codes, and your interpreter would spend additional time deciding, for most byte code pairs, that it doesn’t know how to combine them.
With a compiler, that part is done once and, potentially, run zillions of times.
If fusing a certain pair would significantly improve performance of most code, you'd just add that fused instruction to your bytecode and let the C compiler optimize the combined code in the interpreter. I have to assume CPython as already done that for all the low hanging fruit.
In fact, for such a fused instruction to be optimized that way on a copy-and-patch JIT it'd need to exist as a new bytecode in interpreter. A JIT that fuses instructions is no longer a copy-and-patch JIT.
A copy-and-patch JIT reduces interpretation overhead by making sure the branches in the executed machine code are the branches in the code to be interpreted, not branches in the interpreter.
This is make a huge difference in more naive interpreters, not so much in an heavily optimized threaded-code interpreter.
The 10% is great, and nothing to sneeze at for a first commit. But I'd actually like some realistic analysis of next steps for improvement, because I'm skeptical instruction fusing and other things being hand waved are it. Certainly not on a copy-and-patch JIT.
For context: I spent significant effort trying to add such instruction fusing to a simple WASM AOT compiler and got nowhere (the equivalent of constant loading was precisely one of the pairs). Only moving to a much smarter JIT (capable of looking at whole basic blocks of instructions) started making a difference.
Support for generating machine code at all seems like a necessary building block to me and probably is quite a bit of effort to work on top of a portable interpreter code base.
I wouldn't be so enthusiastic. Look at other languages that have JIT now: Ruby and PHP. After years of efforts, they are still an order of magnitude slower than V8 and even PyPy [1]. It seems to me that you need to design a JIT implementation from ground up to get good performance – V8, Dart and LuaJIT are like this; if you start with a pure interpreter, it may be difficult to speed it up later.
PyPy is designed from the ground up and is still slower than V8 AFAIK. Don’t forget that v8 has enormous amounts of investment from professionally paid developers whereas PyPy is funded by government grants. Not sure about Ruby & PHP and it’s entirely possible that the other JIT implementations are choosing simplicity of maintenance over eking out every single bit of performance.
Python also has structural challenges like native extensions (don’t exist in JavaScript) where the API forces slow code or massive hacks like avoiding the C API at all costs (if I recall correctly I read that’s being worked on) and the GIL.
One advantage Python had is the ability to use multiple cores way before JS but the JS ecosystem remained single threaded longer & decided to use message passing instead to build WebWorkers which let the JIT remain fast.
PyPy is only twice as slow as v8 and is about an order of magnitude faster than CPython. It is quite an achievement. I would be very happy if CPython could get this performance but I doubt.
Anyone know if there will be any better tools for cross-compiling python projects?
The package management and build tools for python have been so atrociously bad (environments add far too much complexity to the ecosystem) that it turns many developers away from the language altogether.
A system like Rust's package management, build tools, and cross compilation capability is an enormous draw, even without the memory safety. The fact that it actually works (because of the package management and build tools) is the main reason to use the language really. Python used to do that ~10 years ago. Now absolutely nothing works. It takes weeks to get simple packages working, only can do anything under extremely brittle conditions that nullify the project you're trying to use this other package for, etc.
If python could ever get it's act together and make better package management, and allow for cross-compiling, it could make a big difference.
(I am aware of the very basic fact that it's interpreted rather than compiled yada yada - there are still ways to make executables, they are just awful). Since python is data science centric, it would be good to have decent data management capabilities too, but perhaps that could be after fundamental problem are dealt with.
I tried looking at mojo, but it's not open source, so I'm quite certain that kills any hope of it ever being useful at all to anyone. The fact that I couldn't even install it without making an account made me run away as fast as possible.
I can't answer your initial question, but I do like to pile onto the package management points.
Package consumption sucks so bad, since the sensible way of using are virtual envs where you copy all dependencies. Then for freezing venvs or dumping package versions, so you can port your project to a different system, doesn't consider only packages actually used/imported in code, but it just dumps everything in the venv. The fact you need external tools for this is frustrating.
Then there is package creation. Legacy vs modern approach, cryptic __init__ files, multiple packaging backends, endless sections in pyproject.toml, manually specifying dependencies and dev-dependencies, convoluted ways of getting package metadata actually in code without having it in two places (such as CLI programs with --version).
Cross compilation really would be a nice feature to simply distribute a single file executable. I haven' tested it, but a Linux system with Wine should in theory be capable of "cross" compiling between Linux and Windows.
Still, like you, as a beginning I would prefer a sensible package management and package creation process.
Can you expand on what you mean by that? I have trouble imagining a Python packaging problem that takes weeks to resolve - I'd expect them to either be resolvable in relatively short order or for them to prove effectively impossible such that people give up.
- Trying to figure out what versions the scripts used and specifying them in a new poetry project
- Realizing some OS-dependent software is needed so making a docker file/docker-compose.yml
- Getting some of it working in the container with a poetry environment
- Realizing that other parts of the code work with other versions, so making a different poetry environment for those parts
- Trying to tie this package/container as a dependency of another project
- Oh actually, this is a dependency of a dependency
- How do you call a function from a package running in a container with multiple poetry environments in a package?
- What was I doing again?
- 2 weeks have passed trying to get this to work, perhaps I'll just do something else
I removed the SDKs of some big (big for the wrong reasons) open source projects which generates a lot of code using python3 scripts.
In those custom SDKs, I do generate all the code at the start of the build, which takes a significant amount of time for mostly non-pertinent anymore/inappropiately done code generation.. I will really feel python3 speed improvement for those builds.
Honestly, 2-9% already seems like a very signficant improvement, especially since as they mention "remember that CPython is already written in C". Whilst it's great to look at the potential for even greater gains by building upon this work, I feel we shouldn't undersell what's been accomplished.
What is this supposed to say? Most scripting language interpreters are written in low level languages (or assembly), but that alone doesn't say anything about the performance of the language itself.
I think they mean that a lot of runtime of any benchmark is going to be spent in the C bits of the standard library, and therefore not subject to the JIT. Only the glue code and the bookkeeping or whatnot that the benchmark introduces would be improved by the JIT. This reduces the impact that the JIT can make.
Isn't the point that if pure Python was faster they wouldn't need to be written in other [compiled] languages? Having dealt with Cython it's not bad, but if I could write more of my code in native Python my development experience would be a lot simpler.
Granted we're still very far from that and probably won't ever reach it, but there definitely seems to be a lot of progress.
Since Nim compiles to C, a middle step worth being aware of is Nim + nimporter which isn't anywhere near "just python" but is (maybe?) closer than "compile a C binary and call it from python".
Or maybe it's just syntactic sugar around that. But sugar can be nice.
Also recall that a 50% speed improvement in SQLite was caused by 50-100 different optimisations that each eeked out 0.5-1% speedups. On phone now don’t have the ref but it all adds up.
Many small improvements is the way to go in most situations. It's not great clickbait, but we should remember that we got from a single cell at some time to humans through many small changes. The world would be a lot better if people just embraced the grind of many small improvements...
Trains are never going to beat jets in pure speed. But in certain scenarios, trains make a lot more sense to use than jets, and in those scenarios, it is usually preferable having a 150 mph train to a 75 mph train.
Looking at the world of railways, high-speed rail has attracted a lot more paying customers than legacy railways, even though it doesn't even try to achieve flight-like speeds.
Two decades ago, you could (as e.g. Paul Graham did at the time) argue that dynamically typed languages can get your ideas to market faster so you become viable and figure out optimization later.
It's been a long time since that argument held. Almost every dynamic programming language still under active development is adding some form of gradual typing because the maintainability benefits alone are clearly recognized, though such languages still struggle to optimize well. Now there are several statically typed languages to choose from that get those maintainability benefits up-front and optimize very well.
Different languages can still be a better fit for different projects, e.g. Rust, Go, and Swift are all statically typed compiled languages better fit for different purposes, but in your analogy they're all jets designed for different tactical roles, none of them are "trains" of any speed.
Analogies about how different programming languages are like different vehicles or power tools or etc go way back and have their place, but they have to recognize that sometimes one design approach largely supersedes another for practical purposes. Maybe the analogy would be clearer comparing jets and trains which each have their place, to horse-drawn carriages which still exist but are virtually never chosen for their functional benefits.
I cut my teeth on C/C++, and I still develop the same stuff faster in Python, with which I have less overall experience by almost 18 years. Python is also much easier to learn than, say, Rust, or the current standard of C++ which is a veritable and intimidating behemoth.
In many domains, it doesn't really matter if the resulting program runs in 0.01 seconds or 0.1 seconds, because the dominant time cost will be in user input, DB connection etc. anyway. But it matters if you can crank out your basic model in a week vs. two.
> Python is also much easier to learn than, say, Rust
I don't doubt it, but learning is only the first step to using a technology for a series of projects over years or even decades, and that step doesn't last that long.
People report being able to pick up Rust in a few weeks and being very productive. I was one of them, if you already got over the hill that was C++ then it sounds like you would be too. The point is that you and your team stay that productive as the project gets larger, because you can all enforce invariants for yourselves rather than have to carry their cognitive load and make up the extra slack with more testing that would be redundant with types.
Outside of maybe a 3 month internship, when is it worthwhile to penalize years of software maintenance to save a few weeks of once-off up-front learning? And it's not like you save it completely, writing correct Python still takes some learning too, e.g. beginners easily get confused about when mutable data structures are silently being shared and thus modified when they don't expect it. People who are already very comfortable with Python forget this part of their own learning curve, just like people very comfortable with Rust forget their first borrow check header scratcher.
I never made a performance argument in this thread so I'm not sure why 0.01 or 0.1 seconds matters here. Even the software that got you into a commercial market has to be maintained once you get there. Ask Meta how they feel about the PHP they're stuck with, for example.
I tried searching for that article because I vaguely recall it, but can't find it either. But yeah, a lot of small improvements add up. Reminds me of this talk: https://www.youtube.com/watch?v=NZ5Lwzrdoe8
Unfortunate to see a couple of comments here drive-by pulling out the “x% faster” stat whilst minimising the context. This is a big deal and it’s effectively a given that this’ll pave the way for further enhancements.
A JIT compiler is a big deal for performance improvements, especially where it matters (in large repetitive loops).
Anyone cynical about the potential a python JIT offers should take a look at pypy which has a 5x speed up over regular python, mainly though JIT operations: https://www.pypy.org/
It is a very big deal, as it will finally shift the mentality regarding:
- "C/C++/Fortran libs are Python"
- "Python is too dynamic", while disregarding Smalltalk, Common Lisp, Dylan, SELF, NewtonScript JIT capabilities, all dynamic languages where anything can change at any given moment
What do you mean by "it will shift the mentality"? There is no magical JIT that will ever make e.g. the data science Python & C++ amalgamations slower than a pure Python. Likely never happening, too.
Also no mentality shift is expected on the "Python is too dynamic" -- which is a strange thing to say anyway -- because Python is not getting any more static due to these JIT news.
I'm fairly certain that this is false, and am working on proving it. In the cases that Numba is optimised for it's already faster than plausible C++ implementations of the same kernels.
Sure, for UNIX scripting, for everything else it is plainfully slow.
I know Python since version 1.6, and is my scripting language in UNIX like environments, during my time at CERN, I was one of the CMT build infrastructure build engineer on the ATLAS team.
It was never been the language I would reach for when not doing OS scripting, and usually when a GNU/Linux GUI application happens to be slow as mollasses, it has been written in Python.
shrug. If we're talking personal experience, I've been using Python since 1.4. It's been my primary development language since the late 1990s, with of course speed critical portions in C or C++ when needed - and I know a lot of people who also primarily develop in Python.
And there's a bunch of Python development at CERN for tasks other than OS scripting. ("The ease of use and a very low learning curve makes Python a perfect programming language for many physicists and other people without the computer science background. CERN does not only produce large amounts of data. The interesting bits of data have to be stored, analyzed, shared and published. Work of many scientists across various research facilities around the world has to be synchronized. This is the area where Python flourishes" - https://cds.cern.ch/record/2274794)
I simply don't see how a Python JIT is going to make that much of a difference. We already have PyPy for those needing pure Python performance, and Numba for certain types of numeric needs.
PyPy's experience shows we'll not be expecting a 5x boost any time soon from this new JIT framework, while C/C++/Fortran/Rust are significantly faster.
> And there's a bunch of Python development at CERN for tasks other than OS scripting
Of course there is, CMT was a build tool, not OS scripting.
No need to give me CERN links to me to show me Python bindings to ROOT, or Jupyter notebooks.
> PyPy's experience shows we'll not be expecting a 5x boost any time soon from this new JIT framework, while C/C++/Fortran/Rust are significantly faster.
I really don't get the attitude that if it doesn't 100% fix all the world problems, then it isn't worth it.
The link wasn't for you - the link was for other HN users who might look at your mention of your use at CERN and mistakenly assume it was a more widespread viewpoint there.
> I really don't get the attitude that if it doesn't 100% fix all the world problems, then it isn't worth it.
Then it's a good thing I'm not making that argument, but rather that "Having a Python with JIT, in many cases it will be fast enough for most cases." has very little information content, because Python without a JIT already meets the consequent.
A Python web service my team maintains, running at a higher request rate and with lower CPU and RAM requirements than most of the Java services I see around us, would like a word with you.
~5k requests/second for the Python service, we tend to go for small instances for redundancy so that's across a few dozen nodes. The workload comparison is unfair to the Java service, if I'm honest :). But we're running Python on single vCPU containers with 2G RAM, and the Java service instances are a lot larger than that.
Flask, gunicorn, low single digit millisecond latency. Definitely optimised for latency over throughput, but not so much that we've replatformed it onto something that's actually designed for low latency :P. Callers all cache heavily with a fairly high hit ratio for interactive callers and a relatively low hit ratio for batch callers.
I really wouldn't mind Python being faster than it is and I really didn't mind at all getting an practically free ~30% performance increase just by updating to 3.11. There's tons of applications which just passively benefit from these optimizations. Sure, you might argue "but you shouldn't have written that parser or that UI handling a couple thousand items in Python" but lots of people do and did just that.
By default any code loaded into something like SBCL gets AOT compiled.
In Common Lisp not anything can change at any moment. Especially not in implementations where one uses AOT compilation like SBCL, ECL, LispWorks, Allegro CL, ... and so on. They have optimizing compilers which gradually can remove dynamic runtime behavior, upto supporting almost no dynamic runtime behavior.
Stuff which is supported: type specific code, inlining, block compilation, removal of development tools, ...
JIT implementations are rare in the Common Lisp world. They are mostly only used in implementations which use a byte-code virtual machine (CLISP, ABCL, ...). Common Lisp implementations mostly compile either directly to native code or via C compilers. The effect is that native AOT compiled code is much faster.
I wrote tons of perl in my life. I would rather keep writing perl than touching python. Every time I see a nice utility and see that it's written in python - tab closed.
Ah I'd say the exact opposite, python in general is pretty good but jupyter sucks because the syntax isn't compatible with regular python and I avoid it like the plague.
Take the code you find in an average notebook, copy it to a .py text file, run it with python. Does it run? In my experience the answer is usually 'no' because of some extra-ass syntax sugar jupyter has that doesn't exist in python.
is it any different or comparable to numba or pyjion? Not following python closely in recent years but I recount those two projects with huge potential
I don’t know Pyjion, but I have used Numba for real work. It’s a great package and can lead to massive speed-ups.
However, last time I used it, it (1) didn’t work with many third-party libraries (e.g. SciPy was important for me), and (2) didn’t work with object-oriented code (all your @njit code had to be wrapped in functions without classes). Those two has limited for which projects I could adopt Numba in practice, despite loving it in the cases it worked.
I don’t know what limitations the built-in Python JIT has, but hopefully it might be a more general JIT that works for all Python code.
maybe, maybe not. time will tell. ahead-of-time compilation is even better known for improving performance and yet perl's compile-to-c backend turned out to fail to do that
> ahead-of-time compilation is even better known for improving performance
Not necessarily, not for dynamic languages.
With very dynamic languages you can make only very limited assumptions about e.g. function argument types, which lead you to compiled functions that have to handle any possible case.
A JIT compiler can notice that the given function is almost always (or always) used to operate on a pair of integers, and do a vastly superior specialized compilation, with guards to fallback on the generic one. With extensive inlining, you can also deduplicate a lot of the guards.
yes, that is true. but aot compilers never make things slower than interpretation, and they can afford more expensive optimizations
also, even mature jit compilers often only make limited improvements; jython has been stuck at near-parity with cpython's terrible performance for decades, for example, and while v8 was an enormous improvement over old spidermonkey and squirrelfish, after 15 years it's still stuck almost an order of magnitude slower than c https://benchmarksgame-team.pages.debian.net/benchmarksgame/... which is (handwaving) like maybe a factor of 2 or 3 slower than self
typically when i can get something to work using numpy it's only about a factor of 5 slower than optimized c, purely interpretively, which is competitive with v8 in many cases. luajit, by contrast, is goddam alien technology from the future
with respect to your int×int example, if an int×int specialization is actually vastly superior, for example because the operation you're applying is something like + or *, an aot compiler can also insert the guard and inline the single-instruction implementation, and it can also do extensive inlining and even specialization (though that's rare in aots and common in jits). it can insert the guards because if your monomorphic sends of + are always sending + to a rational instance or something, the performance gain from eliminating megamorphic dispatch is comparatively slight, and the performance loss from inserting a static hardcoded guess of integer math before the megamorphic dispatch is also comparatively slight, though nonzero
this can fall down, of course, when your arithmetic operations are polymorphic over integer and floating-point, or over different types of integers; but it often works far better than it has any right to. in most code, most arithmetic and ordered comparison is integers, most array indexing is arrays, most conditionals are on booleans (and smalltalk actually hardcodes that in its bytecode compiler). this depends somewhat on your language design, of course; python using the same operator for indexing dicts, lists, and even strings hurts it here
meanwhile, back in the stop-hitting-yourself-why-are-you-hitting-yourself department, fucking cpython is allocating its integers on the heap and motherfucking reference-counting them
There is already an AOT compiler for Python: Nuitka[0]. But I don't think it's much faster.
And then there is mypyc[1] which uses mypy's static type annotations but is only slightly faster.
And various other compilers like Numba and Cython that work with specialized dialects of Python to achieve better results, but then it's not quite Python anymore.
> fucking cpython is allocating its integers on the heap and motherfucking reference-counting them
And here I thought that it was shocking to learn that v8 allocates doubles on the heap recently. (I mean, I'm not a compiler writer, I have no idea how hard it would be to avoid this, but it feels like mandatory boxed floats would hurt performance a lot)
nanboxing as used in spidermonkey (https://piotrduperas.com/posts/nan-boxing) is a possible alternative, but i think v8 works pretty hard to not use floats, and i don't think local-variable or temporary floats end up on the heap in v8 the way they do in cpython. i'm not that familiar with v8 tho (but i'm pretty sure it doesn't refcount things)
Correct, to the point where at work a colleague and I actually have looked into how to force using floats even if we initiate objects with a small-integer number (the idea being that ensuring our objects having the correct hidden class the first time might help the JIT, and avoids wasting time on integer-to-float promotion in tight loops). Via trial and error in Node we figured that using -0 as a number literal works, but (say) 1.0 does not.
> i don't think local-variable or temporary floats end up on the heap in v8 the way they do in cpython
This would also make sense - v8 already uses pools to re-use common temporary object shapes in general IIRC, I see no reason why it wouldn't do at least that with heap-allocated doubles too.
so then the remaining performance-critical case is where you have a big array of floats you're looping over. in firefox that works fine (one allocation per lowest-level array, not one allocation and unprefetchable pointer dereference per float), but maybe in chrome you'd want to use a typedarray?
As I understand it, V8 keeps track of an ElementsKind for each array (or, more precisely, for the elements of every object; arrays are not special in this sense). If an array only contains floats, then they will all be stored unboxed and inline. See here: https://source.chromium.org/chromium/chromium/src/+/main:v8/...
I assume that integers are coerced to floats in this mode, and that there's a performance cliff if you store a non-number in such an array, but in both cases I'm just guessing.
In SpiderMonkey, as you say, we store all our values as doubles, and disguise the non-float values as NaNs.
Maybe, at that point it is basically similar to the struct-of-arrays vs array-of-structs trade-off, except with significantly worse ergonomics and less pay-off.
I so much agree with your comment on memory allocation. Everybody is focusing on JIT, but allocating everything on the heap, with no possibility to pack multiple values contiguously in a struct or array, will still be a problem for performance.
Ahead-of-time compilation is a bad solution for dynamic languages, so that is an expected outcome for Perl.
The base line should be how heavily dynamic languages like my favourite set, Smalltalk, Common Lisp, Dylan, SELF, NewtonScript, ended up gaining from JIT, versus the original interpreters, while being in the genesis of many relevant papers for JIT research.
JIT compilation is rare in Common Lisp. I wouldn't think that Dylan implementations used JIT compilation.
Apple's Dylan IDE and compiler was implemented in Macintosh Common Lisp (MCL). MCL then was not a part of the Dylan runtime.
I would think that Open Dylan (the Dylan implementation originally from Harlequin) can also generate LLVM bitcode, but I don't know if that one can be JIT executed. Possibly...
thank you! i didn't know clisp had a jit compiler, and i certainly should have thought of abcl, and also you mentioned them in your other comment in https://news.ycombinator.com/item?id=38933091
how much of a performance boost does abcl get from the hotspot jit compared to, say, interpreted clisp
when i wrote ur-scheme one of the surprising things i learned from it was that ahead-of-time compilation worked amazingly well for scheme. scheme is ruthlessly monomorphic but i was still doing a type check on every primitive argument
this is great, thanks! but it sounds like it was an aot compiler, not a jit compiler; for example, it explains that a drawback of compiling functions to native code is that they use more memory, and that the compiler still produces bytecode for the functions it compiles natively, unless you suppress the bytecode compilation in project settings
Yeah, I guess if one wants to go more technical, I see it as the first step of a JIT that didn't had the opportunity to evolve due to market decisions.
i guess if they had, we would know whether a jit made newtonscript faster or slower, but they didn't, so we don't. what we do know is that an aot compiler sometimes made newtonscript faster (though maybe only if you added enough manifest static typing annotations to your source code)
that seems closer to the opposite of what you were saying in the point on which we were in disagreement?
I guess my recolection regarding NewtonScript wasn't correct, if you prefer that I put it like that, however I am quite certain in regards to the other languages in my list.
i agree that the other languages gained a lot for sure
maybe i should have said that up front!
except maybe common lisp; all the implementations i know are interpreted or aot-compiled (sometimes an expression at a time, like sbcl), but maybe there's a jit-compiled one, and i bet it's great
probably with enough work python could gain a similar amount. it's possible that work might get done. but it seems likely that it'll have to give up things like reference-counting, as smalltalk did (which most of the other languages never had)
Note that interpreter in the Lisp world by default has a different meaning.
A "Lisp interpreter" runs Lisp source in the form of s-expressions. That's what the first Lisp did.
A "Lisp compiler" compiles Lisp source code to native code, either directly or with the help of a C compiler or an assembler. A Lisp compiler could also compile source code to byte code. In some implementations this byte code can be JIT compiled (ABCL, CLISP, ...).
The first Lisp provided a Lisp to assembly compiler, which compiled Lisp code to assembly code, which then gets compiled to machine code. That machine code could be loaded into Lisp and functions then could be native machine code.
The Newton Toolkit could compile type declared functions to machine code. That's something most Common Lisp compilers do, sometimes by default (SBCL, CCL, ... by default directly compile source code to machine code).
I've entered a function and it gets ahead of time compiled to non-generic machine code.
Calling the function ADD with the wrong numeric arguments is an error, which will be detected both a compile and at runtime.
* (add 3.0 2.0)
debugger invoked on a TYPE-ERROR @7006E17898 in thread
#<THREAD "main thread" RUNNING {70088224A3}>:
The value
3.0
is not of type
FIXNUM
when binding A
Redefinition of + will do nothing to the code. The addition is inlined machine code.
Not pursuing JIT or efficient compilation in general was a deliberate decision way back when Python made some kind of sense. It was the simplicity of implementation valued over performance gains that motivated this decision.
The mantra Python programmers liked to repeat was that "the performance is good enough, and if you want to go fast, write in C and make a native module".
And if you didn't like that, there was always Java.
Today, Python is getting closer and closer to be "the crappy Java with worse syntax". Except we already have that: it's called Groovy.
What are you talking about? From what I can read here there is no syntax change. Just a framework for faster execution. Plus, Python's usecase has HEAVILY evolved over the last few years since it's now the defacto language for machine learning. It's great that the core devs are keeping up with the time.
The language is definitely getting more complex syntactically, and I'm not a huge fan of some of those changes but it's no where near Java or C++ or anything else. You can still write simple Python with all of these changes.
Read it again. It seems you were reading too fast. I'm talking about the future, not the change being discussed right now.
> It's great that the core devs are keeping up with the time.
You mistake the influence of Microsoft and their desire to sell features for progress. Python is actually regressing as a system. It's becoming worse, not better. But it's hard to see the gestalt of it if all you are looking for is the new features.
> it's no where near Java
That is true. Java is a much more simple and regular (not in the automata theory sense) language. Today, if you want a simpler language, you need to choose Java over Python (although neither is very simple, so, preferably, you need a third option).
> You can still write simple Python
I can also write simple C++ if I limit what I use from the language to a very small subset. This says nothing about the simplicity of the language...
I always wondered how Python can be one of the world's most popular languages without anyone (company) stepping up and make the runtime as fast as modern JavaScript runtimes.
A big part of what made Python so successful was how easy it was to extend with C modules. It turns out to be very hard to JIT Python without breaking these, and most people don’t want a Python that doesn’t support C extension modules.
The JavaScript VMs often break their extensions APIs for speed, but their users are more used to this.
Which is why I'm shocked that Python's big "we're breaking backwards compatibility" release (Python 3) was mostly just for Unicode strings. It seems like the C API and the various __builtins__ introspection API thingies should've been the real focus on breaking backwards compatibility so that Python would have a better future for improvements like this.
On the other hand, rewriting the C modules and adapting them to a different C API is very straightforward after you've done 1 or 2 of such modules. Perhaps it's even something that could be done by training an LLM like Copilot.
That's breakage you'd have to tread carefully on; and given the 2to3 experience, there would have to be immediate reward to entice people to undertake the conversion. No one's interested in even minor code breakage for minor short-term gain.
anyone (company) stepping up and make the runtime as fast as modern JavaScript runtimes.
There are a lot of faster python runtimes out there. Both Google and Instagram/Meta have done a lot of work on this, mostly to solve internal problems they've been having with python performance. Microsoft has also done work on parallel python. There's PyPy and Pythran and no doubt several others. However none of these attempts have managed to be 100% compatible with the current CPython (and more importantly the CPython C API), so they haven't been considered as replacements.
JavaScript had the huge advantage that there was very little mission critical legacy JavaScript code around they had to take into consideration, and no C libraries that they had to stay compatible with. Meaning that modern JavaScript runtime teams could more or less start from scratch. Also the JavaScript world at the time were a lot more OK with different JavaScript runtimes not being 100% compatible with each other. If you 'just' want a faster python runtime that supports most of python and many existing libraries, but are OK with having to rewrite some your existing python code or third party libraries to make it work on that runtime, then there are several to choose from.
JS also had the major advantage of being sandboxed by design, so they could work from there. Most of the technical legacy centered around syntax backwards compatibility, but it's all isolated - so much easier to optimize.
Python with it's C API basically gives you the keys to the kingdom on a machine code level. Modifying something that has an API to connect to essentially anything is not an easy proposition. Of course, it has the advantage that you can make Python faster by performance analysis and moving the expensive parts to optimized C code, if you have the resources.
Google/Instagram have done bits, but the company that's done the most serious work on Python performance is actually Oracle. GraalPython is a meaningfully faster JIT (430% faster vs 7% for this JITC!) and most importantly, it can utilize at least some CPython modules.
They test it against the top 500 modules on PyPI and it's currently compatible with about half:
Node.js and Python 3 came out at around the same time. Python had their chance to tell all the "mission critical legacy code" that it was time to make hard changes.
As much as I would have loved to see some more 'extreme' improvements to python, given how the python community reacted to the relatively minor changes that python 3 brought, anything more extreme would very likely have caused a Perl 6 style situation and quite possibly have killed the language.
Part of the issue with 3 is that the changes were so minor that they were just annoying. Like 2/3 now equals 0.66 instead of 0, thanks for the hard-to-find bugs. `print "foo"` no longer works, cause they felt like it. Improvements like str being unicode made more sense but were quite disruptive and could've been avoided too, just add a new type.
What I would've preferred is they leave all that stuff alone, add nice features like async/await that don't break existing things, and make important changes to the runtime and package manager. Python's packaging is so broken that it's almost mandatory to have a Dockerfile nowadays, while in JS that's not an issue
Python is already fast where it matters: often, it is just used to integrate existing C/C++ libraries like numpy or pytorch. It is more an integration language than one where you write your heavy algorithms in.
For JS, during the time that it received its JITs, there was no cross platform native code equivalent like wasm yet. JS had to compete with plugins written in C/C++ however. There was also competition between browser vendors, which gave the period the name "browser wars". Nowadays at least, the speed improvements for the end user thanks to JIT aren't also that great, Apple provides a mode to turn off JIT entirely for security.
Having recently implemented parallel image rendering in corrscope (https://github.com/corrscope/corrscope/pull/450), I can say that friends don't let friends write performance-critical code in Python. Depending on prebuilt C++ libraries hampers flexibility (eg. you can't customize the memory management or rasterization pipeline of matplotlib). Python's GIL inhibits parallelism within a process, and the workaround of multiprocessing and shared memory is awkward, has inconsistencies between platforms, and loses performance (you can't get matplotlib to render directly to an inter-process shared memory buffer, and the alternative of copying data from matplotlib's framebuffer to shared memory wastes CPU time).
Additionally a lot of the libraries/ecosystem around shared memory (https://docs.python.org/3/library/multiprocessing.shared_mem...) seems poorly conceived. If you pre-open shared memory in a ProcessPoolExecutor's initializer functions, you can't close them when the worker process exits (which might be fine, nobody knows!), but if you instead open and close a shared memory segment on every executor job, it measurably reduces performance, presumably from memory mapping overhead or TLB/page table thrashing.
> Python's GIL inhibits parallelism within a process, and the workaround of multiprocessing and shared memory is awkward, has inconsistencies between platforms, and loses performance
Well, imho the biggest problem with this approach to paralellism is that you're stepping out of the Python world with gc'ed objects etc. and into a world of ctypes and serialization. It's like you're not even programming Python anymore, but more something closer to C with the speed of an interpreted language.
ProcessPoolExecutor doesn't let you supply a callback to run on worker process exit, only startup. Perhaps I could've looked for and tried something like atexit (https://docs.python.org/3/library/atexit.html)? In any case I don't want to touch my code at the moment until I regain interest or hear of resource exhaustion, since "it works".
Fair enough. If you ever do decide to touch it to address that, I suggest (in preference order):
- Subclassing ProcessPoolExecutor such that it spawns multiprocessing.Process objects whose runloop function wraps the stdlib "_process_worker" function in a try/finally which runs your at-shutdown logic. That'll be as reliable as any try/finally (e.g. SIGKILL and certain interpreter faults can bypass it).
- Writing custom destructors of objects in your call arguments which are aware of and can do appropriate cleanup actions for associated SharedMemory objects. This is less preferred than subclassing because of the usual issues with custom destructors: no real exception handling, and objects sneaking out into long-lived/global caches can cause destructors to run late (after the interpreter has torn down things your cleanup logic needs) or not at all.
- Atexit, as you suggest. This is least-preferred because the execution context of atexit code is .... weird, to say the least. Much like a signal handler or pthread_atfork callback, it's not a place that I'd put code that does complicated I/O or depends on the rest of the interpreter being in ordinary conditions.
JavaScript has to be fast because its users were traditionally captive on the platform (it was the only language in the browser).
Python's users can always swap out performance critical components to another language. So Python development delivered more when it focussed on improving strengths rather than mitigating weaknesses.
In a way, Python being slow is just a sign of a healthy platform ecosystem allowing comparative advantages to shine.
I think the thing with python is that it's always been "fast enough" and if not you can always reach out to natively implemented modules. On the flipside javascript was the main language embedded in web browsers.
There has been a lot of competition to make browsers fast.
Nowadays there are 3 main JS engines, V8 backed by google, JavaScriptCore backed by apple, and spidermonkey backed by mozilla.
If python had been the language embedded into web browsers, then maybe we would see 3 competing python engines with crazy performance.
The alternative interpreters for python have always been a bit more niche than Cpython, but now that Guido works at microsoft there has been a bit more of a push to make it faster
Because it's already fast enough for most of us ? Anecdote, but I've had my share of slow things in Javascript that are not slow in Python. Try to generate a SHA256 checksum for a big file in the browser...
Have you tried to generate a SHA256 checksum for a file in the browser, no matter what crypto lib or api is available to you ?
Have you tried to generate it using Python standard lib ?
I did, and doing it in the browser was so bad that it was unusable. I suspect that it's not the crypto that's slow but the file reading. But anyway...
> SHA256 in pure Python would be unusably slow
None would do that because:
> Python's SHA256 is written in C
Hence why comparing "pure python" to "pure javascript" is mostly irrelevant for most day to day tasks, like most benchmarks.
> Javascript is fast. Browsers are fast.
Well, no they were not for my use case. Browsers are really slow at generating file checksums.
I thought that perhaps the difference could be due to the JavaScript version having to first read the entire file before getting started on hashing it , whereas the Python does it incrementally (which the browser API doesn't support [0]). But changing the Python version to work like the JavaScript version doesn't make a big difference: 30 vs 35 ms (with a ~50 MB file) on my machine.
The slowest part in the JavaScript version seems to be reading the file, accounting for 70–80% of the runtime in both Firefox and Chromium.
Maybe 8 years is not much in a career ? Maybe we had to support one of those browsers that did not support it ? Maybe your snarky comment is out of place ? And even to this day it's still significantly slower than Python stdlib according to the tester. So much for "why python not as fast as js, python is slow, blah blah blah".
The Pytthon standard lib calls out to hand optimized assembly language versions of the crypto algos. It is of no relevance to a JIT-vs-interpreted debate.
It absolutely is relevant to the "python is slow reee" nonsense tho, which is the subject. Python-the-language being slow is not relevant for a lot of the users, because even if they don't know they use Python mostly as a convenient interface to huge piles of native code which does the actual work.
And as noted upthread that's a significant part of the uptake of Python in scientific fields, and why pypy despite the heroic work that's gone into it is often a non-entity.
This is a major problem in scientific fields. Currently there are sort of "two tiers" of scientific programmers: ones who write the fast binary libraries and ones that use these from Python (until they encounter e.g. having to loop and they are SOL).
This is known as the two language problem. It arises from Python being slow to run and compiled languages being bad to write. Julia tries to solve this (but fails due to implementation details). Numba etc try to hack around it.
Pypy is sadly vaporware. The failure from the beginning was not supporting most popular (scientific) Python libraries. It nowadays kind of does, but is brittle and often hard to set up. And anyway Pypy is not very fast compared to e.g. V8 or SpiderMonkey.
The major problem in scientific fields is not this, but the amount of incompetence and the race-to-the-bottom environment which enables it. Grant organizations don't demand rigor and efficiency, they demand shiny papers. And that's what we get. With god awful code and very questionable scientific value.
There are such issues, but I don't think they are a very direct cause of the two language problem.
And even these issues are part of the greater problem of late stage capitalism that in general produces god awful stuff with questionable value. E.g. vast majority of industry code is such.
fyi: the author of that post is a current Julia user and intended the post as counterpoint to their normally enthusiastic endorsements. so while it is a good intro to some of the shortfalls of the language, I'm not sure the author would agree that Julia has "failed" due to these details
Yes, but it's a good list of the major problems, and laudable for a self-professed "stan" to be upfront about them.
It's my assesment that the problems listed in there are a cause why Julia will not take off and we're largely stuck with Python for the foreseeable future.
It is worth noting that the first of the reasons presented is significantly improved in Julia 1.9 and 1.10 (released ~8 months and ~1 month ago). The time for `using BioSequences, FASTX` on 1.10 is down to 0.14 seconds on my computer (from 0.62 seconds on 1.8 when the blog post was published).
There is pleeeenty of mission critical stuff written in Python, for which interpreter speed is a primary concern. This has been true for decades. Maybe not in your industry, but there are other Python users.
The point of Python is quickly integrating a very wide range of fast libraries written in other languages though, you can't ignore that performance just because it's not written in Python.
In lots of applications, all the computations already happen inside native libraries, e.g. Numpy, PyTorch, TensorFlow, JAX etc.
And if you have a complicate computation graph, there are already JITs on this level, based on Python code, e.g. see torch.compile, or TF XLA (done by default via tf.function), JAX, etc.
It's also important to do JIT on this level, to really be able to fuse CUDA ops, etc. A generic Python JIT probably cannot really do this, as this is CUDA specific, or TPU specific, etc.
Because the reason why Python is one of the world's most popular language (a large set of scientific computing C extensions) is bound to every implementation details of the interpreter itself.
There have been several attempts. For example, Google tried to introduce a JIT in 2011 with a project named Unladen Swallow, but that ended up getting abandoned.
Unladen Swallow was massively over-hyped. It was talked about as though Google had a large team writing “V8 for Python”, but IIRC it was really just an internship project.
You might want to checkout Mojo, which is not a runtime but a different language, but also designed to be a superset of Python. Beware though that it's not yet open source, which is slated for this Q1
1. Javascript is a less dynamic language than Python and numbers are all float64 which makes it a lot easier to make fast.
2. If you want to run fast code on the web you only have one option: make Javascript faster. (Ok we have WASM now but that didn't exist at the time of the Javascript Speed wars.) If you want to run fast code on your desktop you have a MUCH easier option: don't use Python.
You could probably optimistically optimise some code, assuming it doesn't use any of the dynamic features of Python. You're going to get crazy performance cliffs though.
New runtimes like NodeJS have expanded JS beyond web, and JS's syntax has improved the past several years. But before that happened, Python on its own was way easier for non-web scripts, web servers, and math/science/ML/etc. Optimized native libs and ecosystems for those things got built a lot earlier around Python, in some cases before NodeJS even existed.
Python's syntax is still nicer for mathy stuff, to the point where I'd go into job coding interviews using Python despite having used more JS lately. And I'm comparing to JS because it's the closest thing, while others like Java are/were far more cumbersome for these uses.
I think python is very well suited to people who do computation in Excel spreadsheets. For actual CS students, I'd rather see something like scheme be a first language (but maybe I'm just an old person)
They do both Python and Scheme in the same Berkeley intro to CS class. But I think the point of Scheme is more to expand students' thinking with a very different language. The CS fundamentals are still covered more in the Python part of the course.
Always interested in replies to this kind of comment, which basically boil down to "Python is so slow that we have to write any important code in C. And this is somehow a good thing."
I mean, it's great that you can write some of your code in C. But wouldn't it be great if you could just write your libraries in Python and have them still be really fast?
When i was a scientist, speed was getting the code written during my break, and if it took all afternoon to run that's fine because i was in the lab anyway.
Even as i moved more into the software engineer direction, and started profiling code more, most of the bottlenecks come from things like "creating objects on every incovation rather than pooling them", "blocking IO", "using a bad algorithm" or "using the wrong datasctructure for the task". problems that exist in every language, though "bad algorithm" or "using the wrong datasctructure" might matter less in a faster language you're still leaving performance on the table.
> "Python is so slow that we have to write any important code in C. And this is somehow a good thing."
The good thing is that python has a very vibrant ecosystem filled with great libraries, so we don't have to write it in C, because somebody else has. We can just benefit from that when the situation calls for it
>I mean, it's great that you can write some of your code in C. But wouldn't it be great if you could just write your libraries in Python and have them still be really fast?
That really depends.
To make the issue clear, let's think about a similar situation:
bash is nice because you can plug together inputs and outputs of different sub-executables (like grep, sed and so on) and have a big "functional" pipeline deliver the final result.
Your idea would be "wouldn't it be great if you could just write your libraries in bash and have them still be really fast?". Not if you make bash into C, tanking productivity. And definitely not if that new bash can't run the old grep anymore (which is what usually is implied by the proposal in the case of Python).
Also, I'm fine with not writing my search engines, databases and matrix multiplication algorithm implementations in bash, really. So are most other people, I suspect.
Also, many proposals would weaken Python-the-language so it's not as expressive anymore. But I want it to stay as dynamic as it is. It's nice as a scripting language about 30 levels above bash.
As always, there are tradeoffs. Also with this proposal there will be tradeoffs. Are the tradeoffs worth it or not?
For the record, rewriting BLAS in Python (or anything else), even if the result was faster (!), would be a phenomenally bad idea. It would just introduce bugs, waste everyone's time, essentially be a fork of BLAS. There's no upside I can see that justifies it.
But wouldn't it be great if you could just write your libraries in Python
Everybody obviously wants that. The question is are you willing to lose what you have in order to hopefully, eventually, get there. If Python 3 development stopped and Python 4 came out tomorrow and was 5x faster than python 3 and a promise of being 50-100x faster in the future, but you have to rewrite all the libraries that use the C API, it would probably be DOA and kill python. People who want a faster 'almost python' already have several options to choose from, none of which are popular. Or they use Julia.
The reason this approach is so much slower than some of the other 'fast' pythons out there that have come before is that they are making sure you don't have to rewrite a bunch of existing libraries.
That is the problem with all the fast python implementations that have come before. Yes, they're faster than 'normal' python in many benchmarks, but they don't support the entire current ecosystem. For example Instagram's python implementation is blazing fast for doing exactly what Instagram is using python for, but is probably completely useless for what I'm using python for.
Yes, but not so good when the JIT-ed Python can no longer reference those fast C code others have written. Every Python JIT project so far has suffered from incompatibility with some C-base Python extension, and users just go back to the slow interpreter in those cases.
> basically boil down to "Python is so slow that we have to write any important code in C. And this is somehow a good thing."
I think that's a pretty ignorant interpretation. Python has been built to have a giant ecosystem of useful, feature-complete, stable, well built code that has been used for decades and for which there is no need to reinvent the wheel. If that already describes the universe of libraries that you /need/ to be extremely fast and the rest of your code is IO limited and not CPU limited, why reinvent the wheel?
That makes your comment even more inaccurate because you likely don't need to write any "important" (which you are stretching to mean "fast") code in C -- you utilize existing off the shelf fast libraries that are written in Fortran, CUDA, C, Rust or any other language a pre-existing ecosystem was built in.
Try and think of a language that has mature capabilities for domains as far away as what Django solves for, what pandas solves for, what pytorch solves for, and still has fantastic tooling like jupyter and streamlit. I can't think of any other language that has the combined off the shelf breadth and depth of Python. I don't want to have to write fast code in any language unless forced to, because the vast majority of the time I can customize a great off the shelf package and only write the remaining 1% of glue. I can't see why a professional engineer would 99% of the time would need to take a remotely different approach.
languages don’t need to all be good at the same thing. Python currently excels as a glue language you use to write drivers for modules written in lower-level languages, which is a niche that (afaik) nobody else seems to fill right now.
While I’m all for making Python itself faster, it would be a shame to lose the glue language par excellence.
At the end of the day, the number of optimizations that even a JIT can do on Python is limited because all variables are boxed (each time the variable is accessed the type of the variable needs to be checked because it could change) and then function dispatches must be chosen based on the type of the variable. Without some mechanism to strictly type variables, the number of optimizations will always be limited.
Couldn’t you say the same for e.g. JavaScript? The variables aren’t typed there either and prototypes are mutable. I could definitely see things being harder with Python which has a lot of tricky metaprogramming available that other interpreted languages don’t but I don’t think it’s as simple as a lack of explicit types.
Per the spec all JS values are boxed too (aside from values in TypedArrays). The implementations managed to work their way around that too for the most part.
javascript is insanely more optimized but has the same limitations as Python. So there is likely a lot more you can do despite the flexibility, like figure out in hot code what flexibility features are not used, & optimize around that
Don't worry. Python already has syntactical constructs with mandatory type annotations. I will not be surprised if few years from now those type annotations will become mandatory in other contexts as well.
> The initial benchmarks show something of a 2-9% performance improvement. You might be disappointed by this number, especially since this blog post has been talking about assembly and machine code and nothing is faster than that right?
Indeed, reading the blog post build much higher expectations
Just running machine code itself does not make a program magically faster. It‘s all about the amount of work the machine code is doing.
For example, if the JIT compiler realizes the program is adding two integers it could potentially replace the code with two MOVs and a single ADD. However, what about the error handling in the case of an overflow? Python switches to its internal BigInt representation in this case and cannot rely on architecture specific instructions alone once the result gets too large to fit into a register.
Modern programming languages are all about trading performance for convenience and that is what makes them slow — not because they are running an interpreter and not compiling to machine code.
> At the moment, the JIT is only used if the function contains the JUMP_BACKWARD opcode which is used in the while statement but that will change in the future.
It's a bit less underwhelming if you consider that only function objects with loops are being JITed. nb: for loops in Python also use the JUMP_BACKWARD op.
PyPy was never able to get fast enough to replace CPython in spite of its lack of compatible C API. CPython is trying to move fast without breaking C API, and 2--9% improvement is in fact very encouraging for that and other reasons (see my other comment).
'Twas the night before Christmas, when all through the code
Not a core dev was merging, not even Guido;
The CI was spun on the PRs with care
In hopes that green check-markings soon would be there;
...
...
...
--enable-experimental-jit, then made it,
And away the JIT flew as their "+1"s okay'ed it.
But they heard it exclaim, as it traced out of sight,
"Happy JIT-mas to all, and to all a good night!"
As great as such improvements are. Isn't it concerning for the future of Python's freedom that the president of the PSF is a Microsoft employee and a lot of important core devs are nowadays MS employees? Doesn't this give MS a great lever in the future the steer Python wherever they want it to be?
Could you please stop posting flamewar comments, in particular about Python? You've done it repeatedly in this thread, as well as on many other occasions (e.g. https://news.ycombinator.com/item?id=38010109), and we've already had to ask you more than once to stop:
This is not about Python, it's about not having tedious squabbles (or worse) on HN. We'd make the same request of any user on any topic. Sometimes a commenter is fixated on a particular topic, can't let go, and posts way too many low-quality and/or hostile comments about it. That's definitely not an ok use of HN, and we eventually have to ban such accounts.
Well, as to the news guidelines... I'm sorry to say this, but they are either very easy to interpret many different ways, or are very frequently violated. I'm not complaining though. Any rule-based system to govern societies big or small will either be too rigid, or to prone to individual (mis-)interpretation.
As for tedious squabbles, you seem to be the only one commenting on this post. You also don't seem to be interested in the subject. So, I don't see much potential for the squabble.
----
Anyhow. Since you welcomed a meta-response... here's another story. This time not a folk tale.
Before Adobe bought Macromedia, Macromedia was in the process of revamping its tooling and workflows around Flash. They've created an open-source framework that included a compiler -- something that allowed for a much wider audience to start writing in ActionScript. They also significantly upgraded the language. The transition from AS2 to AS3 was in some ways more difficult than from Python 2.X to Python 3.X. Similar to how Python transition created a lot of zealotry, everyone in the world of Flash who wanted to post their opinion on the Web was cursing the old and praising the new.
At the time I was a moderator on a user board dedicated to Flash and ActionScript. Just like everyone around me, I was hyped up about the advent of the new version of the language. Well, it wasn't every one though. There was one member of the forum who found himself in opposition to everyone around himself. He believed that the transition to AS3 was without merit, for show, a waste of time.
Every time he'd appear in any forum thread, the discussion would inevitably shift towards AS2 vs AS3. At the time I thought he was ridiculously wrong... but the actual problem wasn't him being wrong. It was the quality of counter-arguments. Whenever someone disagreed with him, the counter-argument was laughably bad. So, let's call him "the AS2 guy", would spend time deconstructing the counter-argument showing how it's just a bad argument... but it was too much to read, and by the time there was a reply, there'd be a queue of more counter-arguments of similarly laughably low quality.
Not surprisingly, the AS2 guy would get upset, threads would grow very long, eventually mods would start deleting posts and... temporary ban the AS2 guy. That is until we banned him permanently. I kind of felt bad for the guy, but thought it was for the greater good...
As you can imagine, I believe I was wrong to ban him (well, the decision wasn't entirely mine, but that's going into too much detail). But this isn't the only thing I came to regret following the development of Web forums over last few decades. Below are some unfortunate revelations that I had at different times in connection to public discourse:
* Opinions of more knowledgeable people are more polarized.
* Most people in programming trade, if compared to other knowledge workers, have very little knowledge of their own trade.
* Programmers have many naive beliefs about other areas of knowledge, believing ourselves to be experts in those fields for no good reason. This is particularly relevant when talking about programmers designing rules for online communities.
As a result, programmers are very prone to creating self-reinforcing information bubbles. Designing and manipulating the rules of the system to secure a win of the opinion they subscribe to instead of refining their opinion to account for counter-arguments. The systems thus built are made to be less and less tolerant of a difference of opinion. Every superficially inclusive initiative, s.a. popular today "codes of conduct", becomes an instrument for fighting dissent.
So, to tie back the story to the argument I'm trying to make: yes, of course I think Python is an awful technology. Yes, I never pretended that the people behind it are doing a good job. It's awful and it only gets worse. I also wrote plenty about the "why" side of things. I've never came across anything close to a convincing argument to the opposite, not here not in any other public forum that discusses Python. But there's often a lot of laughably bad arguments.
If you choose to see my opinion as a violation of the forum rules -- that's up to you of course. But banning me for this will not make Python better, nor will it have any impact on my opinion, nor on my willingness to share it. It will, however, make some people feel better about themselves for a while. There's of course a possibility of remorse, but it doesn't come to everyone, and definitely not in a timely way.
I highly recommend the blog posts if you're into learning how languages are implemented, by the way. They're incredible deep dives, but he uses the details-element to keep the metaphorical descents into Mariana Trench optional so it doesn't get too overwhelming.
I even had the privilege of congratulating him the 1000th star of the GH repo[3], where he reassured me and others that he's still working on it despite the long pause after the last blog post, and that this mainly has to do with behind-the-scenes rewrites that make no sense to publish in part.
[0] https://arxiv.org/abs/2011.13127
[1] https://sillycross.github.io/2022/11/22/2022-11-22/
[2] https://sillycross.github.io/2023/05/12/2023-05-12/
[3] https://github.com/luajit-remake/luajit-remake/issues/11