Hacker News new | past | comments | ask | show | jobs | submit login
Faster CPython (2021) [pdf] (github.com/faster-cpython)
247 points by Radim 3 months ago | hide | past | favorite | 228 comments



Shameless plug: Python 3.11 comes out this October and is slated to be 10-20% faster, but Pyston is already available and is 30% faster. We do the things that are listed in this pdf, but also additional things that will probably never make it into mainline Python (such as adding a JIT)

https://github.com/pyston/pyston


I just want to toss out that Pyston actually gives a solid 2x speedup on my Python project. You've done amazing work. Thank you!


Why won't JIT compilation make it to Python? Are the reasons technical or political?


The current CPython maintainers believe that a JIT would be far too much complexity to maintain, and would drive away new contributors. I don't agree with their analysis; I would argue the bigger thing driving new contributors away is them actively doing so. People show up all the time with basic patches for things other language runtimes have been doing for 20+ years and get roasted alive for daring to suggest something as arcane as method dispatch caches. The horror!

This cements the CPython team as sort of a known "don't waste your time there" in PL research communities, distancing themselves from the people most likely and willing to help maintain their runtime.


But a JIT is already planned for a future release it’s just there’s a ton of other improvements they can do to speed it up first. See: https://github.com/markshannon/faster-cpython/blob/master/pl... where the plan is detailed.

This is by a core python dev who is currently employed by Microsoft to enact this plan.


It's possible things are changing. I just remember things like this, which was extremely frustrating at the time: https://lwn.net/Articles/754163 . Guido got visibly upset for this talk.

In the last roughly 10 years, Google, Instagram, Dropbox, Facebook, and reddit all invested a lot of time into trying to make Python fast and was pushed away with harsh no's. There are dozens of forks and branches with basic optimizations that were rejected.

I wish everyone the best of luck. But there's a side of me that's disappointed since this somewhat shows that it's only when these ideas come from inside the core maintainer team that they get acceptance. That's going to make it difficult to grow the team in the future.


GvR also got visibly upset at the Nuitka author:

https://news.ycombinator.com/item?id=8772124

It is amazing that someone who has been unfriendly (sometimes in a nasty political manner) to so many people still has a cult following him and is ostensibly a CoC proponent, now that the CoC is used to protect the inner circle.

If I were an expert assigned to this project at Microsoft, I'd try to get out of it as soon as possible. CPython is too political.


> In the last roughly 10 years, Google, Instagram, Dropbox, Facebook, and reddit all invested a lot of time into trying to make Python fast and was pushed away with harsh no's. There are dozens of forks and branches with basic optimizations that were rejected.

Is that true?

The talk you link was about a very secretive fork of Python that Instagram was very adamant about not open sourcing until very recently (in fact it's not clear that what's open sourced is their full project).

Other than that until recently all I've ever seen is either breaking the C-API or no clear commitment from the maintainers that this was a sustained effort to support and not just a drive-by complicated patch.

In more recent times we have cinder, pyjion, pyston, and others. And CPython Devs are up-streaming either their work or their general approach.


Yes, it is true. Many years ago someone showed up on python-dev with a very ambitious patch for improved string handling and especially slicing. He had some impressive benchmark numbers to back up his claims. The patch was rejected because it was large and would have made the string.c file double in size.

What goes in CPython depends on whether Guido and his seconds likes it or not. And if it is their ideas and their implementations it is treated differently than if it is from outsiders. Then sometimes they suddenly change their minds and something that previously weren't ever going to happen gets implemented. JITting which, apparently, now is on their road map is one such example. Works for them and Python continues to be a great language, but frustrating for contributors.


Do you have the link? I would be fascinated to read. The way you describe it sounds like it's part of the "drive-by complicated patch" category though.

I think we've seen recent languages that Strings are basically one of the hardest things to "get right". Just look at Swift and Rusts multiple attempts to implement strings even though the authors all had many years of experience of working with string implementations in other mature languages.

Python doesn't have the luxury of messing up strings, it was the number 1 reason that migrating from Python 2 to 3 took a decade. Unlike lower level compiled languages it can't stick in multiple string attempts and tell the users to pick one. So Core Devs are on the hook for maintaining any implementation for the life time of Python.

But maybe this example really was a concerted effort by someone willing to maintain Python strings for as long as needed.


Not to revive the Unicode wars again, but a big portion of Python 3's disaster transition to Unicode wasn't caused by "strings being surprisingly hard", but more rather the Python 3 team having chosen an unworkable text model which multiple people (including me, way back in 2005) all said wouldn't work in practice. Our hard-earned experience was ignored, and it took a long time for future Python editions to eat crow and roll back their mistakes.

A large part of why Python 3.6 is a better port target than 3.0 is because they slowly and silently stepped away from their very strong opinions about "what is text", and "what is bytes", and what each of those things meant. Even back when the model was debated, it was clear that the operations the core maintainers cared about (manipulation of individual codepoints [0]) was a dead-end, as the Unicode Standards committee was just about realizing that codepoints weren't an ideal unit, and grapheme clusters were added to the spec. (my dates might be a bit off here, it's all a blur I've mostly forgotten)

Python 3.0 was released as basically a completely nonfunctional piece of software. You basically couldn't write anything that touched a binary file in it. The built-in zipfile module, I remember, crashed if you tried to store or load a binary file stored inside a .zip, the only tests at the time stored .txt files, and the bug wasn't fixed for a long time. I remember others having troubles with the email and mime modules, though I didn't have any code that worked with those, personally.

I had left the community by that point, but I have heard tales of the mountains Armin Ronacher had to move in order to add the u'' prefix back to Python 3; the core team was against it, instead believing that if 2to3 wasn't working for people, it should be fixed. Alice Bevan–McGregor spent years making a version of the WSGI specification that worked with Python 3's text model. We honestly could have shaved 5-6 years off of that disaster if they had just listened to the community; pretty much every opinion we had shared eventually came true, and the Python 3 text model these days is in a very different, and much healthier, place.

[0] One of their major strongholds was that they really needed O(1) indexing of codepoints (it wasn't clear why this was a priority, and I remember arguing that wanting this is a sign of poorly written code). I think in Python 3.8 they finally caved on this, with a PEP and an implementation that can use UTF-16 or UTF-8 internally.


I'd super appreciate if you have sources (mail archive links or whatever), I'd love to read through Python history stuff like this and the PEP you were referring to? (read through the release notes and couldn't find anything, maybe it relates to PEP 538 in Python 3.7?)

It's pretty clear that there were a lot of mistakes in the Python 2 to 3 transition, I'm certainly not trying to defend every choices of the developers.


The PEP I was thinking of was actually PEP-393 [0], which was done for Python 3.3 (well, I did say I mostly left the community, and that my dates were probably wrong). It seems it didn't include UTF-8 as a representation; only Latin-1 and UCS-2, and it seems like it only picks a representation if it can maintain the O(1) indexing property. So I was wrong on that one, I guess that's the one thing they haven't given up on :/

As for the rest, I'll see if I can dig up some logs, but a lot of the discussion I was involved in took place on IRC, and my logs were lost a few server wipes ago. You might find some old posts of mine on python-ideas.

I've been planning on making a fuller blog post on why I think Python's text model is the wrong one, but Manish has a very good post [1] which covers most of the reason. Basically, Unicode code points don't net you anything over bytes other than O(1) indexing, which is completely useless, and in fact, is in many ways a worse representation. There is no operation where having a list of Unicode code points helps you more than having a list of bytes; well, one that isn't completely broken in practice.

More drastically, the team wanting "Unicode everywhere" meant forcing things that clearly weren't Unicode into Unicode. File paths have not been valid Unicode on any popular system. Windows uses its "UTF-16" which is really more UCS-2 and allows unpaired surrogates, and POSIX says node names are bytes but cannot use ASCII NUL or /. These limitations were hit days into the prototyping process on real-world use cases, which is why there's an algorithm to shove arbitrary binary data into unpaired surrogate characters (see PEP-383 [2]), to be used on file paths. This should have blew a huge hole in the whole scheme, but they pushed onwards. The utopian future of "Unicode everywhere" was more important than dealing with the practical reality of existing systems.

Similar things happened with console output, where it's now assumed that LANG will be set correctly. But no, when I ssh into a server in Japan that's set to use EUC-JP, Python will output EUC-JP-encoded bytes, which gets shovelled over ssh as bytes, which the terminal emulator on my laptop misinterprets as UTF-8. Yes, I know ssh is supposed to tunnel certain LANG envvars as well. No, I don't remember why it didn't in this case. But a lot of people were hitting this, that in Python 3.7 they finally added UTF-8 mode (PEP-450 [3]). There's still a lot of people for whom this general "Unicode console" idea breaks [4] [5].

Pretty much every part of the standard library was riddled with bugs when they rolled this out (as mentioned, I hit bugs in zipfile, others found bugs in email/mime). For a long time, the only way to know if handed a file-like object is going to be in bytes or not was isinstance(file.read(0), bytes), which is generally ugly, so a lot of modules didn't support bytes even when they should have.

Practically everything after has been walking back this text model in favor of something where bytes and str are not even that different anymore. Python 3.5 brought with it PEP-461 [5], which added % formatting back to bytes (after a very long and very tiring discussion with the core maintainers [6]).

I could go on and find more PEPs, but I'm done for now.

[0] https://www.python.org/dev/peps/pep-0393/

[1] https://manishearth.github.io/blog/2017/01/14/stop-ascribing...

[2] https://www.python.org/dev/peps/pep-0383/

[3] https://www.python.org/dev/peps/pep-0540/

[4] On POSIX system, https://stackoverflow.com/questions/11741574/how-to-print-ut...

[5] On Windows, https://stackoverflow.com/questions/17918746/print-unicode-s...

[5] https://www.python.org/dev/peps/pep-0461/

[6] https://bugs.python.org/issue3982


> the operations the core maintainers cared about (manipulation of individual codepoints [0]) was a dead-end, as the Unicode Standards committee was just about realizing that codepoints weren't an ideal unit, and grapheme clusters were added to the spec.

Can you please expand on this? I thought the problems with Python's new text model were due to backwards incompatibiltiy.


The Python developers original plan for the new text model of Python 3 was an expansion of the text model introduced a bit into Python 2. Python 2 has two types: str and unicode. str is a sequence of bytes, and unicode is a sequence of Unicode code points. An unfortunate model, but an understandable one. The difficult part was that since unicode was introduced into Python 2 some time after 2.0, they needed backwards compatibility with existing code, and thus decided that any time an operation wanted unicode and received str, the runtime would automatically decode from str to unicode, and vice versa. This implicit decode/encode became a large hassle, so the aim of Python 3 was to remove it.

Unfortunately, while the developers removed the implicit encode/decode in Python 3, they also made the gulf between str and unicode (now called bytes and str) much larger, removing large swaths of useful functionality from bytes, and doubling down on the Unicode code points nature of the new unicode type, despite it being very apparent by then that this was not a good text model to base a language on anymore.

Backwards compatibility was a large issue indeed, but IMO they broke backwards compatibility all to introduce a subpar stricter text model that delivered on far fewer promises than tney were hoping, and in my opinion is worse than Python 2's text model. A more reasonable approach can be found in other languages, which allow iterating over sequences of byte strings by iterating on the fly. Swift, Rust and Go all get this more correct.


as the Unicode Standards committee was just about realizing that codepoints weren't an ideal unit

Where 'just about' means 1996 at the latest, the release of Unicode 2.0 (cf chapter 5 [1], p. 21, Character Boundaries).

[1] http://www.unicode.org/versions/Unicode2.0.0/ch05.pdf


Exactly this. I still remember all of these pain points. It's one of the main reasons why my team moved to Go.


> Rusts multiple attempts to implement strings

What are you referring to? Rust's str type is the only one I know about [1], and it's really good as a general purpose string representation. Were there other string implementations from before the 1.0 release?

[1] There's also String, which is just the heap-allocated version of str, and OsStr/OsString which represents the operating system's string type for file system paths.


I can believe it. The PSF and its hanger-ons have historically fulminated against improving CPython performance at the cost of complexity in the implementation language. They've caused flamewars on HN in the past, though Guido's recent about-face seems to have settled the field.


Honestly this is why I don't use Python anymore. Someone comes along with real improvements and GvR gets upset about the tone. The core devs just seem entirely full of themselves and even the most standard interpreter improvements are just too wild for them.

Id love to see the core team develop some humility and realize their engineering is barely even mediocre and accept patches from well meaning people that know what theyre doing


Perhaps it took Microsoft funding to finally cave in and do it, and on their own they would never go for it (judging from their stance all those previous years).


So glad somebody finally said this. I've noticed this dynamic for years and it really is sad to watch. I remember reading over the community reviewing a large patchset someone had contributed a couple months ago and the amount of negativity was remarkable.


I had one conversation with Guido at Dropbox where I wanted to get a dependency updated so that I could write quickcheck tests.

It was like pulling teeth - he didn't know what property testing was ("some haskell thing?" - his words) and so he wouldn't update it ("are you implying that how Dropbox writes tests today isn't good enough?" - his words), so I couldn't write generated tests.

Sort of ironically, property testing was used elsewhere in the company to great success, and in fact we were all told the scary story of the 1 hour where empty passwords let anyone log into any account - a bug that a proeprty test is very well designed for.

I walked away pretty embarrassed for him and decided I wasn't interested in further interaction.


Isn't it really a "haskell thing", though? Generated testing of properties seems a bit overkill, but I might be mistaken about how that actually works.


quickcheck was initially a haskell project, but it's a little absurd to reduce proeprty testing down like that.

I also wouldn't call it overkill. It's trivial to do generated tests.


Would you mind telling me exactly which patchset you are referring to?


Not OP but probably this one: https://news.ycombinator.com/item?id=28896367.


The winds are probably changing with the CPython maintainers. I expect in the next few versions the efforts of PyJion and Faster Cpython will add a JIT API to plug in your own JIT in to CPython.


Python has always been adversarial to the PL communities, as seen in Guido's decision to limit the usefulness of the lambda function. He just didn't like the concept of long anonymous functions. I guess it is the inevitable characteristic of the typical "non-academic" languages, because I've also faced similar atmosphere in PHP and JavaScript too.


> He just didn't like the concept of long anonymous functions.

And rightfully so. If a function is long enough to need multiple lines, it's also long enough to be named.

Complaints about this seem to come from the functional language circles, where chaining 10 .filter()'s, .map()'s, etc. together in a single line is considered not only normal, but desirable and idiomatic. Why bother naming the intermediate results, right?

The ability to stuff a million operations on a single line that, for whatever reason, seems so coveted by many is quite clearly one of the least important aspects of language design. Vertical space is cheap and plentiful. Human brain capacity is tiny. Therefore, optimize for readability not clever one-liners.

This is the big benefit of a BDFL, having single a person with good taste be able to veto bad ideas. Guido, for the most part, has good taste so Python's ended up quite well designed.


I don’t think it’s a strong argument coming from a language that will happily allow some gnarly nested list comprehensions


> Complaints about this seem to come from the functional language circles, where chaining 10 .filter()'s, .map()'s, etc. together in a single line is considered not only normal, but desirable and idiomatic.

To go full circle, pandas, an extremely popular python library, forces you to do exactly that, because the straightforward imperative loop is slow as a glacier.


> This is the big benefit of a BDFL, having single a person with good taste be able to veto bad ideas. Guido, for the most part, has good taste so Python's ended up quite well designed.

Good taste would be having proper lambdas. Python is a very inconsistent language and among the last things I am reminded of when I think "good taste".


> If a function is long enough to need multiple lines, it's also long enough to be named.

As a user of other languages, but only occasionally of Python, this type of reasoning (arbitrary limits because we think if you exceed them you’re probably structuring your code wrong) is totally alien to me. Why should the language have opinions about how I should or shouldn’t structure my code?

Imagine if Rust or C++ had an arbitrary rule that you couldn’t have more than 10 functions in a class/impl; maybe it’d usually be correct style but people would still think the language making it a hard restriction was ridiculous.


"Why should the language have opinions about how I should or shouldn’t structure my code?"

Isn't that what attracts so many people to Python in the first place - that it does have strong opinions, and often enforces them on how people should structure their code.

Python, in many places, optimizes for readability, rather than flexibility, and there is a community of users that really appreciate that. Not to say the feeling is universal (clearly isn't) - but it's at least an answer to the "why?"


> Isn't that what attracts so many people to Python in the first place - that it does have strong opinions, and often enforces them on how people should structure their code.

I seriously doubt it. What attracts people to Python is that:

1. It has a big standard library and until the last ~10 years installing 3rd party packages was beyond the capabilities of your average CS student.

2. It is easy to write relative to C++. If you're in data science, those are your two options, so you choose Python.

Python is an extremely loose language - it's not restrictive at all in ways that matter. Is "you can't use two lines (unless you use a \)" restrictive when your language lets you arbitrarily replace functions at runtime?

It's not at all a readable language imo. It is very much write-optimized. It has tons of code golfing and one liners.


It's not at all a readable language imo.

Compared to what? Python's closest peers are arguably the members of the P*-family of programming languages (Perl, Python, PHP, with Ruby as honorary member), and among that group, Python traditionally skews towards the readability end of the spectrum.


Compared to languages without inheritance and with types, primarily. Dynamic typing + inheritance leads + a lot of one liners leads to really hard to read code.


Eh. Complex Ruby can be much more readable than complex modern Python. PHP and Perl, sure, usually.


> It's not at all a readable language imo. It is very much write-optimized. It has tons of code golfing and one liners.

Agreed. This might have been applied to the "Zen of Python" days back in 1999, when the competition was Perl, but python has slowly morphed into a more and more perl-like language.


> it does have strong opinions, and often enforces them on how people should structure their code

Like what? Where else does Python arbitrarily limit how many tokens may be part of a particular construct, or do anything else remotely similar?


What attracts people to Python is the fact people have already solved most problems and they can just import the stuff they need.


> If a function is long enough to need multiple lines, it's also long enough to be named.

I once believed that, but I now think it’s a terrible mistake. Python ends up encouraging classes instead but classes are a much worse abstraction than an anonymous function.


Language design is a different field and specialty than implementation. Whether lambdas are multi-line or not has no bearing on the compilers and VM runtime specialists I know who tell others to stay away. Maybe the language designers feel similarly about the direction of Python, I don't know, but that's a separate community I have much less insight into.


High-speed JITs are complicated and the CPython code base is meant to be kept simple. If you want JITted Python use PyPy (or JAX which is a JIT for tensor computation).


Guido seems quite anti-JIT.


Okay, so what's the catch? I see no Windows support and no ABI compatibility, but other than that it's just... a drop in replacement and I get a 30% speedup? I'm aware that "too good to be true" is only a heuristic, but it generally holds; am I missing some great caveat?


No abi compat is a big one. But if your writing pure python go for it.


Honestly, even that seems trivial? By my reading of https://github.com/pyston/pyston#installing-packages , the only impact is that when you install (compiled) libraries they need to be recompiled, just like if you use Alpine (which is also ABI-incompatible because it uses musl libc), which is a little bit of pain at build/packaging time but doesn't actually break anything (i.e. there are no libraries that you can't use, just libraries with an extra compile step) and doesn't affect runtime behavior at all.

EDIT: I should caveat "doesn't actually break anything" with "although it does require you to install build dependencies because you have to compile the package from source instead of using a prebuilt wheel". I don't think this changes the substance of my comment but best to be precise.


Do you know when you might have Windows support? Since one of the main strengths of Python is its cross-platform support, but for Pyston I only seem to see Linux support.


I respect your work here but CPython already has a roadmap and people paid to work on getting the experimental JIT into python in the next few releases.


While I think it is always good to see progress in the performance of interpreters, ultimately it is a mistake if you have something that needs to be fast and you implemented it entirely in an interpreted language (beyond just prototyping).

You have to be prepared to factor out the key parts of the code into faster languages like C++ with bindings, if necessary. Or you can virtually do this, e.g. learn more about the standard library, make sure you are choosing things that actually are implemented natively underneath instead of in pure Python, etc.

And sometimes, you find that the cause of a slowdown is far more fundamental (e.g. the entire algorithm is not good, and you see huge gains by redoing it, even if everything is still interpreted code).


>... ultimately it is a mistake if you have something that needs to be fast and you implemented it entirely in an interpreted language (beyond just prototyping).

Yes. However, this is exactly where Python (imho) shines. You get a convenient interpreted language which seemlessly interacts with highly optimized native libraries like numpy, scipy, xarray,... and of course ML frameworks (where impact of base language speed is negligable compared to the time it takes to train a model).


I’ve been using Python for 15 years, and while sometimes it’s a good fit for “write the fast parts in C!” type workloads, these seem to be very rare. Most of the time you spend more time marshaling back and forth between Python data structures and C data structures and you end up losing more performance (and maintainability and build system complexity and so on). If you have an application that might ever have a hot path, I heartily do not recommend python.


It probably depends on the type of apps one is writing. I am not advocating writing your own libs (though that is completely feasible too) but just using the existing ones. When one needs to convert from/to Python structures within the hot path, this of course doesn't perform well... However in most cases I worked on this was not needed - convert input to numpy / xarray /..., perform all calculations, get the result. So with about the same amount of experience, I heartily recommend python. :)


The problem is that you can't predict from the outset that all of your performance problems will be a good fit for Numpy or whatever, and if they aren't the options you have are very expensive in terms of engineering time, maintainability, etc. And it's not just performance--dependency management is still a significant problem, as is deployment in many cases.

Honestly, I switched to Go. It's like a fast, statically typed, statically compiled Python. And by "fast", I mean 100-1000x faster than CPython. The package manager resolves dependencies in an instant (compared with pipenv that would take 30 minutes just to update a lockfile for a small project). Everything compiles to a static binary so you can build internal tools without requiring end users to set up a virtualenv/etc.

I've rewritten real world Python programs that would be distributed as a 250MB zip file or a multi-GB docker image and converted them into Go programs which would distribute as a 2.5MB binary or 2.5MB docker image. Docker image builds went from 45 minutes (after weeks of painstakingly optimizing Dockerfiles) to ~2 minutes with a straightforward Dockerfile. The ramifications of a fast iteration loop are also hard to overstate--this was a major source of pain that virtually disappeared when the application was ported to Go. On AWS ECS/Fargate and GKE, container cold start times went from 30-50 minutes to ~30s (not sure why pulling from ECR/GCR and then starting the container took so long on these platforms--I certainly wouldn't expect a few GB to take half an hour to pull over a datacenter network). The performance also improved tremendously--in one case we traded so much maintainability in order to get a Python/Pandas system to complete requests within 60s and a naive Go rewrite would complete the same requests in 2 seconds (CPU intense parallel-friendly workload). We also looked at Dask, Polars, Spark, etc. With Python, we ran into significant Docker For Mac performance problems (the Docker VM would use all available CPU to marshal filesystem events to/from the VM, grinding the app to a halt and chewing through your battery) with Go there's no need to mount a source code filesystem to mount in the first place--you just rebuild on change (either the whole image or just rebuild the binary and `docker cp` it onto the running container).

Of course, I'm sure there are other language environments that also have benefits over Python. I've just found Go to be the best I've tried by a pretty wide margin. Frankly, Python is just littered with pitfalls--you just sort of find yourself painted into these corners unexpectedly and you waste a bunch of time trying out all of these tools, frameworks, and libraries that purport to solve your problem but which introduce their own novel failure modes for nontrivial applications (e.g., Pipenv promises to solve dependency management but every lockfile operation takes half an hour, Cython promises to improve performance but it adds considerable complexity to your build tooling, Mypy promises to fix typing but it can never find the type annotations for dependencies, Pypy promises to improve performance but a ton of important ecosystem packages are unsupported, etc). Python development just feels like wandering from one tarpit to another, while in Go I just get things done.


I like Go and the benefits are real (though a GB Docker image has nothing to do with Python), it is however very verbose. I use both (for different use cases) and coming back from Go to Python is such a relief... As with everything, this is a tradeoff too. I don't feel the pain of python packages though, for me pipenv solves this nicely. To each their own I guess... :)


> ultimately it is a mistake if you have something that needs to be fast and you implemented it entirely in an interpreted language (beyond just prototyping).

Agreed. Equally the faster the interpreted language, the less you need to offload, because baseline speed is good enough.

Faster is rarely a bad thing.


> You have to be prepared to factor out the key parts of the code into faster languages like C++ with bindings, if necessary

Stripe had the innovative idea of compiling ahead of time parts of the code (taking advantage of type annotations). I think this is a very interesting approach that may make usage of faster language unnecessary in a certain amount of cases.


Dropbox has some projects around this. I think the idea is totally silly tbh and it flies in the face of Python's type annotations being just that - annotations.

The generated code is pretty hilarious since you end up with HashMaps and dynamic dispatch everywhere.

Much better gains can be made by simply not using Python.


I learned that lesson with Tcl back in my first startup experience.

Sure the language is great, and it is very easy to write extensions in C, however as the performance pressure keeps increasing, in the end it is a C application where Tcl was reduced to a configuration/orchestration language.

Since then, if it doesn't have a JIT or AOT compiler in the box I am not interested, unless I am obliged to do so by higher levels.


I've found that depending on that "key part", you can actually offload the task to another process (written in C/C++/Rust) via RPC. The end result was a significant performance improvement.


Using something like PyO3 for Rust/Python integration really helps with this, and you don't need the conceptual overhead of RPC.


Interesting, I find that the conceptual overhead of pyo3 (or cytpes) is often higher than making a local rpc


I work on a system like that and it just means you have to deal with a load of complicated FFI stuff and you end up with a C++ program that happens to have some slow parts awkwardly implemented in buggy Python.


I find that subprocesses are primarily valuable for stability and security, e.g. dealing with something that might crash (killing only the subprocess) or something that requires different privileges than the main process. And yet, the communication to (and possibly from) the child incurs a cost that cannot be ignored, more than what a simple wrapper to another language would have.


In my experience, the important part is sometimes that _faster_ means _cheaper_. In many cases, 20% faster code means you need almost 20% fewer CPUs, which really adds up at scale.


When you've done all that you're still left with performance wins you could get that are in the interpreter.

I've worked on a system like that where it was just nothing to optimize really (anymore). It was just lots of code and there was no hot path to optimize.


Or you could just use a good compiled language for everything.


I read another commenter here mention that Python is 50x slower than Javascript. Now browser Javascript VMs underwent enormous speedup efforts over the last 12-15 years (Google chrome starting the race). Is there anything inherent to Python's design as a language (or any other technical reason) that prevents such a speedy VM being created for it as well?


Yes, definitely. Python is much much more dynamic than Javascript -- the jump in dynamicness of Javascript to Python is about equivalent to Java to Javascript, and similarly requires a different set of techniques.

The single example I typically give is that in Python you can inspect stack frames after they've exited! This has concrete implications on your VM design.

Another example is that the multiple dispatch protocol for binary operations (such as addition) is sufficiently complicated that you generally cannot encode it into your compiler. We have been able to have a simplified version of that by building a tracing JIT for C, which is able to understand the dispatch protocol, but it is not always applicable.

Source: I've been working on Python optimization for several years as part of the Pyston project.


>The single example I typically give is that in Python you can inspect stack frames after they've exited! This has concrete implications on your VM design.

How about an implementation of python that doesn't let you do fancy things like that that are not needed in production?


The inspecting of stack frames is used by the logging library, and by most unit testing libraries. There are dozens of these small features which are challenging, and if you remove all of the challenging ones, you end up with something where pretty much no existing Python can run.

People have tried, pypy started with that goal before it became clear it wasn't practical.


I think logging inspects frames while they're still on the stack, not post-fact..?


Do you happen to know something about the graal python implementation? Based on other comments here, the often used C FFI can also hinder the work of Python implementations, and the graal project solves it by JIT compiling C code as well (from llvm ir)

From what I’ve seen last time I heard about graal python is that it is seldom faster than the pypy approach (AST-based vs tracing JIT), even though in case of TruffleRuby and TruffleJS it is insanely fast compared to the maturity of the projects.

Is your mentioned example of stack frame inspection make it map much worse to Java primitives (which happens behind the scenes of the truffle languages)?


>> in Python you can inspect stack frames after they've exited!

How does that work and what is it used for?


Is there any reason that Python's dynamicness (dynamicity?) is so much more of a hindrance than CL's? Is it just that CL is compiled?


The end of this post explains why Common Lisp can run much faster than Python.

https://markmail.org/message/dp56i26zhpf4fehb#query:+page:1+...


Not more dynamic than Smalltalk or SELF, though.


Please share the well understood criteria for "dynamic" in this context, that would let us order different programming languages as more dynamic or less dynamic.


Search the web for Ruby or Smalltalk or Objective-C “object model” or “message dispatch”


Please share your criteria for "dynamic", that would let us order different programming languages as more dynamic or less dynamic.


Or dylan or julia


Unlike JavaScript however, Python is strongly typed.


Yeah but in practice basically all JS code in the wild is strongly typed since the community has long since decided that relying on implicit type coercion is a massive footgun.

Also I think Python’s truthy/falsey semantics make it dangerously close to being weakly typed.


All that means is that you get more exceptions raised when you use ==. It doesn't help performance at all, it doesn't even really confer anything about types, only that __eq__ is implemented between fewer types.


Python exceptions are low-cost. It has to be that way because every for loop in Python throws an exception.


Python's type system is not more strongly typed than JavaScript's in any way that makes it easier to optimise.


I thought this presentation by Armin Ronacher (the creator of Flask and Jinja) was very enlightening:

How Python was Shaped by leaky Internals(2016):

https://www.youtube.com/watch?v=qCGofLIzX6g

See also this HN post (Attempts to make Python fast):

https://news.ycombinator.com/item?id=24848318


From a language design perspective I don't think there's much in the way of making up this ground. The hard part is not breaking any existing modules written in C/C++/Rust/etc that depend on the cpython ABI (memory layout and calling conventions of the runtime's implementation). There are jit compilers for python like pypy which are very fast (>5x they're attempting to gain here) but they often break things like numpy that have a lot of native modules.


Sounds pretty straightforward to get great speedups where no modules are used, and just make module calls slow (ie. you reconstruct the data structures the module will need on any entry to the module).

Then release a new 'fast' module API, and all the performance critical modules will soon move over.


It might sound easy but that would be a critical component of such a rewrite. Python's standard library is implemented using the same API so much of that would need to be rewritten. You then have the risk of introducing bugs, etc. This post is doing something similar. They're rewriting the internal implementation through progressive refactoring and will then look into more long term incompatible changes that can be wrapped in a translation layer. These projects are difficult as a lot of python code exists in the world and a minor change in behavior can have large unintended consequences due to Hyrum's law.


The big difference is that lots of python libraries use C under the hood, and a lot of internals of the language are leaked via the C api. It's a lot harder to do fancy things with a JIT when there are more people observing and depending on the internals.


Since everyone knows Python is slow, no care is taken to make things faster in frameworks etc., since one assumes the users are fine with the speed. If "they wanted it to be fast, they would use something else, right"?

So I stipulate that if the runtime becomes faster, those gains won't be visible to large swathes of code because python libraries do lots of weird things under the hood.


Yes. At the end of the following Usenet post, the developer of a Python implementation compares the language features of Common Lisp to those of Python in order to explain why Lisp code can be compiled to run much much faster than Python. The same kind of comparison can be done for JavaScript. The problem is a collection of Python features, each of which saps performance.

https://markmail.org/message/dp56i26zhpf4fehb#query:+page:1+...


Python is just too dynamic, and users do make use of it. There are JavaScript constructs that JITs give up upon, such as “with” - and this has steered JS programmers to avoid those constructs.

Unfortunately, there is no Python subset people can stick to as such. But you can use e.g. TransCrypt to compile Python to JavaScript; the docs tell you about an efficient subset. It should be possible to build an efficient Python JIT for such a subset (and you sort of have one already through TransCrypt and JS)


Smalltalk, remap all instances of a into b across the complete image, no matter where they are being used or which classes they actually are instances of.

    a become: b
It is a myth that only Python enjoys such dynamic capabilities, it happens to be a convenient excuse.


It's a lot more than an excuse; The simplest "a.b" has to check a lot of places before it gives an answer, and each of these needs its own in-line cache. Indeed, there have been experimental compilers for Python that gave much improved speeds for subsets that drop the ultra dynamic constructs (and usually a lot more dynamism and features to achieve that speed).

And the fact that there are languages as dynamic as Python is not as important as how it is implemented through the language; Lua is about as dynamic as Python, but with effects that are much more local in general. IIRC Mike Pall (of LuaJIT fame) said that Lua was almost optimal for JITting, JS is sort-of-ok if you ignore some things, but Python is beyond hope -- and I tend to think he knows what he's talking about, given that his hobby, one-man-show LuaJIT handily beat all well-funded-with-tens-of-employees JS engines for a very long time (maybe still does).

I wonder how good the SmallTalk JITs really are by today's standards, in the presence of e.g. "become:". The fact that's it is there, and supported, doesn't mean it's in common use, or supported efficiently -- just like JavaScript's "with".


In a sense it is, because in those communities the approach has been, "it is hard but we will solve it sooner or later", whereas in the Python world, PyPy seems to be the only approach that has embraced this culture, whereas everyone else takes the path "it is hard, Python cannot be made fast because it is dynamic".


That’s not been my impression. On the contrary - there have been very many unsuccessful attempts (Unladen swallow, Pyston, Nuitka, cython-for-plain-python, a few more I can’t recall right now); they all tried to maintain compatibility (complete language and binary api), and all delivered very meager speedups.

Those that focused on a subset (numba, shedskin, rpython) were actually able to deliver significant speedups. Even PyPy dropped binary compatibility.

The odd one out is Psyco, the precursor to PyPy. Perfect compatibility with non trivial speedup - but was considered a dead end by its authors, and that’s why they started PyPy


I wouldn't have thought `become:` was an example that distinguished a dynamic programming language from others.

Where have you seen it used as an example in that way?


It doesn't distinguish per se, it is rather a good example of something that a dynamic language can make that in a single call makes invalid all assumptions that JIT might have gathered until that point.

As for become: usefulness, Gilad Bracha explains it better than ever will, https://gbracha.blogspot.com/2009/07/miracle-of-become.html


He does not address JIT efficiency here. A likely implementation is just “lets drop all assumptions related to the type of A”. It would work, but frequent use would kill performance.

Python has all the issues that “become” has, with hundreds of “assumptions might have changed” points along common execution paths.

Liajit, for example, hoists everything it can outside of loops rather quickly, because it knows the cases “become” or Python equivalent can happen, and they are rare and uncommon in idiomatic Lua code.


Introspection is a gigantic issue.


But there are other dynamic languages with crazy introspection that are among the fastest dynamic languages there are. Common lisp is one example.


Could you elaborate on that?


There are a few different classes of introspection. The first is commonly used builtins like “isinstance”, “callable”, “getattr/hasattr” etc.

The second is the inspect module that builds on that, allowing you to get the stack frames, inspect function signatures etc.

The third is trackbacks, which need the stack and themselves are introspectable.

The fourth is metaclasses (making dynamic classes at runtime).

These are heavily used in Python, and are optimisation barriers. For example PyPy (Python with a JIT) disables optimisations when “inspect” is imported or used anywhere in the call stack.


How many of these "levels" (not to confuse them with actual classes) exist in javascript? Only the first one?

(that said, I guess the fourth one (metaclasses) isn't a clear yes/no, since OO is very different in js)


Hard to compare exactly with that classification, but JavaScript only has the first one, and arguably parts of the fourth one.

metaclasses isn't really a major optimization hazard, it effectively boils down to hijacking the "class" syntax to instead run some custom code. It's used by things like dataclasses and the Django ORM to use class syntax to make a little DSL.


dataclasses do not use metaclasses (I’m the dataclasses author).


Apologies, I must have been confusing it with some other Python syntax tricks I've seen. I don't write too much Python anymore..


Smalltalk JITs do just fine with introspection and certainty Python does not have some magic feature that Smalltalk lacks.


Despite being a totally different language based around message passing with no control structures?

None of those are insurmountable, they merely make writing a JIT harder if you want to support them: it’s hard to inline a function call if the inclining can have a visible effect.


That is the whole point, it doesn't get more dynamic than that.


I’d say the lack of control flow structures and message passing would make it far, far easier to JIT.


If that was the case, there would be no need for StrongTalk.


Can you really not think of any other benefits?


I certainly can, and I can also think of several examples of dynamic languages that succeed where Python fails.

All of them just as dynamic, if not more.


If you can think of other benefits to Strongtalk other than “it makes jit compiling better” then why say that there is no reason for it to exist if Smalltalk’s design makes it easy to JIT?


Python is not even able to do the basic Smalltalk part, so it doesn't matter what StrongTalk brings into the picture.


Ok now we are on the same page about there being a need for Strongtalk other than “it may or may not JIT better” we will loop back a few comments and try again:

I’d say the lack of control flow structures and message passing would make it far, far easier to JIT.


cpython's specific implementation is a "feature" that many things rely on, but even without that, python's semantics are very dynamic and it's hard to unbox anything in a way that is guaranteed not to be observable.



The GIL just prevents thread-level parallelism - which JavaScript also doesn’t have.


It depends on the runtime environment, no? For example, Node supports worker threads

https://nodejs.org/api/worker_threads.html


Worker threads are share-nothing, closer to processes than actual threads.


Worker threads aren't share-nothing, you can share data through SharedArrayBuffer which lets you share arbitrary binary data.

If they were share nothing, we probably wouldn't have shipped them.

Source: I'm Node.js core and one of the people who worked on them


You can also share arbitrary binary data between processes by using shared memory (e.g. Python has support for this in multiprocessing.shared_memory module).


Sorry, I realized after posting that I should have said "share-nothing by default". There are ways to share memory with a worker, but it's explicit and opt-in via a dedicated API. It's not like in Python where you can arbitrarily change global state from any thread.



I personally think Python is nice to work with and I really like Django as monolithic web framework with no dependencies- a rare thing these days. But when thinking about long term projects I’m unable to reconcile things I like with the fact that Python is among the slowest modern languages. Why would I start with an unnecessary handicap?


Can't speak for Python/Django but I'm in the Rails world and the fact is on most Rails apps (even huge ones) the Ruby code isn't the bottleneck. IO and database operations are. You don't need a fast language for the things Python (and Ruby, PHP, Perl, etc..) does.


This is a widely-held belief, but it isn't true. Maybe once upon a time in the land of spinning rust that was true, but it certainly isn't now. Databases are fast enough that a scripting language frequently is the performance bottleneck.

https://techspot.zzzeek.org/2015/02/15/asynchronous-python-a...


> scripting language frequently is the performance bottleneck

I'm not sure I'd use the word "frequently" here, but I could just have a different experience. Most projects are boring CRUD stuff, and something else have always been the bottleneck before the actual language itself, most of the times different types of I/O but also just poor choice of algorithms/data structures. I don't think I've ever actually came across a situation where the language itself was the bottleneck of the performance, but I don't normally work super close to performance-sensitive projects as I said, mostly common web tasks.


Choosing a language is a significant factor towards the performance of your services.

A service built in go can req/resp up to 30x faster than one built in python.

https://www.techempower.com/benchmarks/#section=data-r20&hw=...

I know benchmarking isn't always the greatest, especially when the previous maintainer of Actix-Web (Rust) kind of cheated to get to the top of the leaderboard, but it still provides a decent general overview of how many responses/s a service can handle when using the same box.


Even if scripting language is the slowest, Ruby programmers say you need to include dev time in the equation which Ruby and Python optimize


Well, you would say that wouldn't you :-)


They get to say that because they are already done with their implementation and hanging out on HN for the rest of the afternoon.


But why are they working Sunday? ;-)


They spent the rest of the week waiting for the test suite to run.


True, but depending on the service I personally would like efficient languages, not only dev time wise, but also runtime wise.

Especially with regards to the environment.

You may laugh all you want, the ever expanding internet should become way more resource friendly.


> True, but depending on the service I personally would like efficient languages, not only dev time wise, but also runtime wise.

I don't think this really exists today partly because of the comment you replied to.

Rails, Django and Laravel are massively optimized for developer productivity to build web applications and use languages that have been around long enough where the community has created widely successful and popular libraries to implement mostly common features for things aren't included by default. Libraries that go beyond 1 developer maintaining them in their free time.

If you can build a web app and host it for $40 a month on a single VPS that can sustain tens of thousands of users and you can make a million dollars a year as a solo developer what incentive is there to switch to a faster language? Your secret weapon here is being able to build things quickly and not have to single handily develop a bunch of common lower level libraries just to build the app features you want.

For when you need to scale out we have Kubernetes which generally works the same to scale out most types of web apps. It seriously (honestly) doesn't matter if your hosting bill is $10,000 / month instead of $3,000 / month when you have a team of 8 developers ($240,000 / month) on payroll and your web app is happily returning p99 responses in under 150ms while your business profits tens of millions of dollars a year. What does matter is how fast you can build new features while maintaining your app and finding qualified developers.


> Especially with regards to the environment.

> while your business profits tens of millions of dollars a year

These two points from both of your comments don't really reconcile (today).

Sure, if you're optimizing for "profits tens of millions of dollars a year" then go ahead and spend/receive as much money as you can and disregard all else, like the environment, employees health and so on.

But, if we want to have sustainable companies, in terms of profit and impact on the earth together with it's inhabitants, then we need to go further than just considering how we can add that next million to our ARR.


I really don't think the world is going to be saved by building your crud apps in rust instead of python. Consider instead what could be done if the company that is able to grow twice as fast has a charity matching program?

Or, i don't know, imagine we tried to fix something that actually would cause a dent, like not encouraging everyone to buy new smartphones and laptops every year, not allowing international shipping to externalize the costs of burning "lowest imaginable quality" diesel, etc. This comment feels like saying "Turn off your LED lights when you go to bed!" while your neighbors are literally burning tires in their backyards.


> I really don't think the world is going to be saved by building your crud apps in rust instead of python.

If that is all we are doing to "save the environment", then surely not.

On the other hand, if we are going towards a wall with 200km/h and only "lightly break" (so not pushing the break really hard), then we will still crash, maybe with 100km/h, still deadly.

What I want to say: No, it (being more resource-friendly) is not the one solution, but it's part of a greater solution.


> These two points from both of your comments don't really reconcile (today).

I didn't write both comments btw, the parent comment who I replied to is someone else who mentioned the environment.

> Sure, if you're optimizing for "profits tens of millions of dollars a year" then go ahead and spend/receive as much money as you can and disregard all else, like the environment, employees health and so on.

It's possible to be environmentally friendly, have reasonable server costs for an app built with Python / Ruby / PHP, keep employees happy and make major progress towards improving Earth. The place I work at is focused on clean energy management and the environment. It's a privately owned company so it's not my place to share any numbers but I can say we don't throw money at problems as a first line of action.

For context, the company has been around for over 10+ years and uses PHP (public info based on HTTP headers). I joined a few months ago as a full time employee after spending ~20 years being a solo developer doing contract work. I hope that sets the stage for how much I like the folks working there and how they run things. Also for context, I don't actively write PHP there. My role is focused on deploying their app(s) although I have developed some Python based services for them (I did contract work with them for a while before joining full time).


Developer time needed has an environmental cost too. It's not at all clear that optimising server usage will always be worth it if it requires more developer time even from an environmental perspective.


> I don't think this really exists today partly because of the comment you replied to.

Java and JVM languages are reasonably good dev time wise as well as performance/energy efficiency wise. The latter is possible by not running GC all the time so it comes at the expense of slightly higher memory consumption, but JVM GC’s are beasts, they can manage heap sizes up to terabytes of RAM.


Every time I have to wait 10 minutes for pylint to slowly finish its job, I wonder if the IO-is-the-bottleneck crowd ever run their python code.


While a request itself can finish reasonably fast in a slow language, the amount of resources a single request consumes since python can't do multithreading is quite absurd. The python codebases I've worked on need an order of magnitude more servers to handle similar amount of loads, since each request is basically handled by a separate process.


It's fairly standard practice to hack in some "green thread" behaviour using gevent or similar. It's far from perfect but gets you away from the single-process-per-request bottleneck. If it even is much of a bottleneck in the first place - isolating each request in its own process worked well enough back in the CGI days. I suspect the real reason is that autoscaling doesn't generally play well with having thousands of mostly idle processes.


Also worth baring in mind that Shopify is a “monolithic” Rails app. That clearly shows it can scale enormously.


Though, they ended up writing a whole new Ruby JIT to deal with Ruby's performance, so clearly at some point the trade-off changes from "no systems programming, a highly dynamic language instead" to "hardcore systems programming, to make the highly dynamic language fast enough"


Shopify got to huge scale first and then started sponsoring TruffleRuby and YJIT as research projects.


I think the takeaway here is that it was more sensible to reimplement Ruby than to use something else. Think about that for a moment.


Instagram was also initially built in monolithic Django


I mean, if you can get performance for free, why not? Doesn't mean Ruby was too slow...


>Why would I start with an unnecessary handicap?

Because "developer hapiness" and "speed of development" are also tradeoffs with other languages...

In fact, if you're late to the market because you used some speedier language, it's often as good as not launching at all...


For a bigger code base projects fixed typing contributes to speed of development. Add to it dependency management you will have clear win-win for something like Java over Python.


Speed of code and speed of development are not mutually exclusive.


They force different tradeoffs to be made in languages but also in development, so they kind of are.

Obviously: if speed of development and speed of program were non-problematically compatible, everybody would always go for both (why lose speed, if you can have the same development experience)? But speed requires several things, like more careful algorithms and program design, custom solutions instead of one-size-fits-all libraries and frameworks, more care regarding memory usage, time spend for optimization, use of a statically typed language as opposed to a higher level scripting one, slower build/jit times (as opposed to interpreter), and so on...


Statically typed languages can be high level. That was my main point. You can get a 100× speedup just by using Nim, Crystal, Go, … instead of Python.


Garbage collection is an example (probably the most critical current example) where there is a concrete trade-off between the two.


Garbage collectors are not bottle necks at all, hell, for highly dynamic allocation patterns they beat malloc implementations out of the water in performance.

Of course one should not spam them needlessly, but that’s why value classes are important.


"Comparing runtime, space consumption, and virtual memory footprints over a range of benchmarks, we show that the runtime performance of the best-performing garbage collector is competitive with explicit memory management when given enough memory. In particular, when garbage collection has five times as much memory as required, its runtime performance matches or slightly exceeds that of explicit memory management. However, garbage collection’s performance degrades substantially when it must use smaller heaps. With three times as much memory, it runs 17% slower on average, and with twice as much memory, it runs 70% slower. Garbage collection also is more susceptible to paging when physical memory is scarce. In such conditions, all of the garbage collectors we examine here suffer order-of-magnitude performance penalties relative to explicit memory management."

https://www.cs.tufts.edu/comp/150FP/archive/emery-berger/gc-...


Still, a good GCed language will be much faster than Python, especially when not under such memory constraints.


>slowest modern languages

Horizontal scaling is going to be an issue eventually, no matter the language. You can serve tens of thousands of people off a single django server. Unless you're going to hit a growth limit at specifically 3-5x what a single vertically scaled django server can do, or you know for a fact that webserver compute is going to be the bottleneck, development speed rules all else.


I’d agree, but I wonder how much faster writing a backend API in Django is compared to using node.

I’ve written a lot of python outside of the web space and a lot of JavaScript in the web space and I can’t think of a time I wish I was writing in python.

Modern JavaScript has a really expressive syntax without the runtime performance hit.


For every comment like this, there is someone who would say the opposite. Why would I write in a language with the semantics of javascript when I could use python?

I'd also argue that syntax-wise they are virtually the same language and not worth really comparing. It's not like either is ML.


Every month or so I look around to see if there's a monolithic web framework in languages I'm interested that are effectively equivalents of Django, Rails, Laravel, etc. I've done a lot of non-webapp stuff with Go, for example, and would like to build full fledged webapps with a Django-like framework in Go. But nothing like that truly exists. I've found projects like Buffalo [0] that promise this monolithic experience, but they're still very much works in progress. Even Node lacks such a true monolithic web framework from what I can tell.

[0]: https://github.com/gobuffalo/buffalo


Adonis for nodejs

https://adonisjs.com/


This is a few months old....Have there been any updates since then? Seems like there are some perf improvements that are being added in to 3.11 and for some cases 2x faster than 3.10

https://twitter.com/danielmilde/status/1484162983584575491


It's unfortunate how those results are presented. The worst result on the pybenchmark link is comparing a numpy based implementation to one that doesn't use numpy.

If the latter can't use numpy or would be slowed down by it, that's a fair comparison, but it looks like they are just at different stages of optimization.


This slide deck matches up with this interview from Talk Python with Guido and Mark Shannon.

https://www.youtube.com/watch?v=_r6bFhl6wR8


what happened to the recent work that was apparently a successful elimination of the Global Interpreter Lock?

My work would become an order of magnitude easier if I did not have to use multiprocessing to get parallelism across shared memory.


HPy is an interesting possibility. Make the C API opaque, let extension modules migrate to it, which then allows for competing runtimes to have all the libraries out of the box


Yeah, that was my first thought. I think Sam Gross' work is being mainlined already so hopefully this is yet more improvements on top? Unclear.


Ugh yeah, I hope this eventually happens.

If some fork of python that has no GIL becomes popular, another python 3 situation could emerge with the community splitting.

This "reference implementation should be simple and easy to read" is the dumbest part of python, unnecessarily holding it back imo.


There are several projects at work, any of which should land if they are successful.


I believe that is a continuing WIP.


The speed of Python code creates its use cases. If it were to get faster higher volume applications could benefit but for my low-volume applications it works plenty fast.


Shouldn't it be: "The speed of Python code constrains its use cases"?


Speed is one reason why Google is using Jax. With decorators, vmap/pmap it jit compiles your Python/Numpy code for fast/vectorized execution on GPU, TPU and CPU


And Instagram have their own CPython fork called Cinder - https://github.com/facebookincubator/cinder


these two things have nothing to do with each other. jax doesn't compile numpy, it reimplemnts the api using `ufunc`. in general, every single numerical kernel is always mapped to kind of compiled code.


It does for the user who is familiar with Python and Numpy. With some effort your Python and Numpy code becomes orders of magnitude faster. Telling that those two things have nothing to do with each other is missing the point.


>missing the point

facts

1. this is a thread about cpython. jax is as relevant to users of cpython as CUDA or OpenCL or whatever. jax cannot do absolutely anything with e.g. django.

2. for all intents and purposes all numerical code always runs in a lower-level implementation (C++, CUDA, XLA, whatever). so from that perspective, jax is just a convenient way to get from numerical python (i.e., loops and muls and adds) to kernels.


I didn't claim Jax can accelerate Django, it all depends. A lot of our Python code is/was running partly in cpython and partly in extension modules such as Numpy.

There are many ways to achieve faster Python execution. One is a faster cpython implementation, another is moving cpu intensive parts of the code to extension modules (such as Numpy). Yet another is to jit compile Python (and Numpy) code to run on accelerators.


>jit compile Python

given who you are (googling your name) i'm surprised that you would say this. jax does not jit compile python in any sense of the word `Python`. jax is a tracing mechanism for a very particular set of "programs" specified using python; i put programs in quotes because it's not like you could even use it to trace through `if __name__ == "__main__"` since it doesn't know (and doesn't care) anything about python namespaces. it's right there in the first sentence of the description:

>JAX is Autograd and XLA

autograd for tracing and building the tape (wengert list) and xla for the backend (i.e., actual kernels). there is no sense in which jax will ever play a role in something like faster hash tables or more efficient loads/stores or virtual function calls.

in fact it doesn't even jit in the conventional understanding of jit, since there is no machine code that gets generated anew based on code paths (it simply picks different kernels and such that have already been compiled). not that i fault you for this substitution since everyone in ML does this (pytorch claims to jit as well).


I agree with you that making CPython faster, or rewriting CPython entirely into Cinder are more general purpose ways to make Python faster, while Jax is much more specific and limited and require to transform your Python code, often manually.

You miss my point that all of those efforts are making slow Python code run faster. So claiming that 'these two things have nothing to do with each other' is wrong, because they share 'making Python code run faster'.

Some of that involves making cpython faster, some of that means moving execution into c (numpy is mentioned in that PDF) and some involves jit and moving execution onto GPU or TPU (for example using XLA). The common part is 'making Python code run faster'. Some of that is automatic, some requires some manual effort.

Jax can jit some Python functions, but it cannot efficiently jit everything. That is what I meant by decoration and 'some effort'. For example replacing IF conditions by np.where etc. See also https://jax.readthedocs.io/en/latest/jax-101/02-jitting.html

My background is in physics simulation, and I advise the Brax team, basically accelerating a physics engine written in Python run on accelerators, see https://github.com/google/brax The entire physics step, including collision detection and physics solver, is jit compiled.


>You miss my point

no you miss my point

>making slow Python code run faster

there is not a single org anywhere in the world that uses pure python to do numerics. kids do that during their first linear algebra or ml class. that's it.

>For example replacing IF conditions by np.where etc

i've already addressed this - this is not jit compilation.


>> uses pure python to do numerics

Many orgs use Python+Numpy, and that can be made faster using Jax

>> this is not jit compilation.

I disagree. Jax jit uses XLA, and XLA is a JIT compiler. An XLA graph is created during the runtime of the host program, and JIT-compiled to native code for the CPU, GPU or TPU.


Currently it's 50x slower than JavaScript ([https://benchmarksgame-team.pages.debian.net/benchmarksgame/...], look at n-body which is just straightforward math), and apparently they are looking to achieve 2x speedup and maybe 5x later, so not very useful.


You can't cherrypick the worst couple of results on the page and say it's "50x slower" when for most of the other benchmarks it's only 3-5x slower.

Yes, tight numerical loops are going to benefit from JIT compilation, this isn't surprising. It's also not a very typical workload for either Python or Javascript.


It is 50 times slower. That matches all my experience. If you look at the "fast" benchmarks they're all just calling into C libraries.


2x or 5x is very useful if it speeds up common web server applications. A company with 4 servers can run 2 servers with a 2x speedup.

Math-heavy benchmarks are not really representative of what Python is used for. They benefit from things like SIMD and aggressive inlining and fine-grained control over memory layout, etc. If you need that with Python, you can use something like NumPy or implement the hot path with a native language.


Getting the interpreter to within 5x of a JIT is definitely better than "not very useful".


Yeah, I doubt it's ever going to catch on.


One-liner jokes are frowned upon here, but this was pretty good.


Agreed. When newer languages can out perform out of the box, this feels like throwing resources at nostalgia and the personally sacrosanct identity of one retired computer scientist.

At the end of the day code is optimized portable machine state for a given task. The syntax is arbitrary.

More cynically, Python, Ruby, have hurt as many projects as they’ve helped. Their slowness has wasted countless cycles. Now we can quantify it rather than be lost to the hype we felt learning this “magic” to begin with.

With respect, Guido; this feels like a vanity project for a bored retiree, soaking up human effort and real agency. Perhaps you should have simply stayed retired and learn a new hobby.


If python gets some kind of JIT that kind of loop will fly. Tight loops and unboxed maths is a thing that JITs do really well at.


I'd be very interested to see what JAX can do on this problem.



So tackling GIL is not on the table it seems


There's already progress on that from a different angle: https://news.ycombinator.com/item?id=29005573


I used to be really bothered by the Python’s speed and were not enthused by Guido’s insistence that it’s a connector language. But now, everything is a library call, written in C/C++. I don’t ask a question about performance anymore.


Just tested 3.11a4 with python-speed [1]

About 14% faster than 3.10.

Still at only ~80% performance compared to python 2.7

[1] https://github.com/vprelovac/python-speed


So this explains why he went from Dropbox to retirement to Microsoft.


Kudos to Microsoft for hiring Guido. I wish Google had kept him.


He left Google for Dropbox, where he was for several years before "retiring".


can't wait for this to be rather successful in hitting goals but then do nothing and languish like everything else

It seems absurd to not mention the API upstreamed by the Pyjion attempt, especially in the context of optimizing bytecode / touching into machine code?

I dunno, maybe I'm just perturbed by the amount of duplicated effort on this singular topic


This is from the latest 3.11 alpha 4 release just this week:

The Faster CPython Project is already yielding some exciting results: this version of CPython 3.11 is ~ 19% faster on the geometric mean of the PyPerformance benchmarks, compared to 3.10.0.

https://pythoninsider.blogspot.com/2022/01/python-3102-3910-...

So this is definitely not languishing as they are getting these changes directly into main for Python 3.11


This is great, but at the same time 19% is still pretty far from 50%, and there are only ~3 months left until the feature freeze. It will be interesting to see how far they get in this release.


From following their issue tracker: some ideas didn't pan out (yet), some ideas turned out to break backwards compatibility, several pyperformance tests broke because of the changes made so they had to fix upstream (eg gevent and cython) and a lot of the work seems to be setting up future optimizations. They're also getting their changes merged directly into CPython main, so I reckon there's more red tape than if they had done the work on a fork. It does seem to inspire fellow core devs to pitch in and try some of their own ideas, so I can see the effort pick up steam as they make more headway.


Interesting, where is their issue tracker?



Question for JIT experts: JS and Python are extremely hard to optimize because they both allow redefining anything at any time, yet V8 crushes Python by an order of magnitude in many benchmarks[1]:

         All times in seconds (lower is better)
  
  benchmark          Node.js     Python 3    Py takes x times longer
  ==================================================================
  regex-redux          5.06         1.34     ** Py3 is faster (PCRE C)
  pidigits             1.14         1.16     0.02
  reverse-complement   2.59         6.62     2.56
  k-nucleotide        15.84        46.31     2.92
  binary-trees         7.13        44.70     6.27
  fasta                1.91        36.90     19.3
  fannkuch-redux      11.31       341.45     30.19 (wut?)
  mandelbrot           4.04       177.35     43.9  (srsly?)
  n-body               8.42       541.34     64.3  (no numpy fortran cheat?)
  spectral-norm        1.67       112.97     67.65 (Python for Science[TM])

(If Python is allowed to call fast C code (PCRE) for regex-redux, I don't see why Python shouldn't be allowed to call fast Fortran BLAS/etc for n-body, but rules are rules, I guess. V8 doesn't cheat at spectral-norm, it's 100% legit JS.)

Both ecosystems have billions invested by corporations worth trillions; bottomless money exists to make Python faster. So why isn't Python faster?

V8's tactics include dynamically watching which loops/calls get run more than (say) 10,000 times and then speculatively generating[2] native machine instructions on the assumption the types don't change ("yep, foo(a,b) is only called with a and b both float64, generate greased x86-64 fastpath"), but gracefully falling back if the types later do change ("our greased x86-64 'foo(float64,float64)' routine will be passed a string! Fall back to slowpath! Fall back!"). Why doesn't Python do this? Is it because Google recruited the only genius unobtainium experts who could write such a thing? Google is a massive Python user, too.

[1] https://benchmarksgame-team.pages.debian.net/benchmarksgame/...

[2] https://ponyfoo.com/articles/an-introduction-to-speculative-...

EDIT: HN commenter Jasper_ perhaps has the answer in another post[3]: "The current CPython maintainers believe that a JIT would be far too much complexity to maintain, and would drive away new contributors. I don't agree with their analysis; I would argue the bigger thing driving new contributors away is them actively doing so. People show up all the time with basic patches for things other language runtimes have been doing for 20+ years and get roasted alive for daring to suggest something as arcane as method dispatch caches. The horror!"

[3] https://news.ycombinator.com/item?id=30047289#30050248


> If Python is allowed to call fast C code (PCRE) for regex-redux, I don't see why Python shouldn't be allowed to call fast Fortran BLAS/etc for n-body…

The regex and arbitrary precision number libraries were allowed for regex-redux and pidigits because some language implementations wrapped those libraries rather than implement their own thing.


How long until some smartasses propose typing to be mandatory, because "speed"?


99.9% of all software engineering is making it correct. The remaining 0.1% is about making it "fast". I.e. either reducing latency or increasing throughput. This is why Python wins. Once it is correct it is trivial to also make it fast.


> 99.9% of all software engineering is making it correct. […] Once it is correct it is trivial to also make it fast.

That sounds amazing but doesn't square with my experiences as a technical PM/PO. We might have different definitions of "fast", but generally speed-optimized code is very different than prototype code (enough so that prototype "correctness" is non-transferable), more difficult to create in comparison to the prototype, and is often rewritten in a different language (Rust, C, assembly). For example, you're never going to write anything other than a toy video encoder in Python.


Actually my experience from 20+ years of software engineering is that 99.99% of it is about correctness and only 0.01% is about performance. I added one order of magnitude to be on the safe side. Any non-toy video encoder takes advantage of hardware specific features so if you can't write it in Python you can't write it in Rust or C either.


The thing is that other languages are as correct but leaps and bounds faster.


The guy again gets his name in the headline. There has been no shortage of plans how to speed up CPython in the past 20 years. How about delivering a working product when it is done?


His name lends a lot of credibility to anything he does with Python, and that's why this effort is notable. He is the founder and longtime BDFL of the language, and presumably knows what he is talking about when he says he can get big performance improvements.


I don't know if overseeing one of the slowest mainstream languages there is for decades gives me much hope that he knows what he's talking about when it comes to performance improvements.


Ten years ago he didn't think Python was slow: https://lwn.net/Articles/486908/


"InfoWorld: You answered critics who said that Python is too slow. You said every time you try to write something in Python, it is fast enough. Why is there criticism that it is too slow?"

"At some point, you end up with one little piece of your system, as a whole, where you end up spending all your time. If you write that just as a sort of simple-minded Python loop, at some point you will see that that is the bottleneck in your system. It is usually much more effective to take that one piece and replace that one function or module with a little bit of code you wrote in C or C++ rather than rewriting your entire system in a faster language, because for most of what you're doing, the speed of the language is irrelevant."




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: