PyPy is a fantastic achievement and deserves far more support than it gets. Microsoft’s “Faster CPython” team tried to make Python 5x faster but only achieved ~1.5x in four years - meanwhile PyPy has been running at over 5x faster for decades.
On the other hand, I always got the impression that the main goal of PyPy is to be a research project (on meta-tracing, STM etc) rather than a replacement for CPython in production.
Maybe that, plus the core Python team’s indifference towards non-CPython implementations, is why it doesn’t get the recognition it deserves.
Third party libraries like SciPy scikit-learn, pandas, tensorflow and pytorch have been critical to python’s success. Since CPython is written in C and exposes a nice C API, those libraries can leverage it to quickly move from (slow) python to (fast) C/C++, hitting an optimum between speed of development and speed of runtime.
PyPy’s alternative, CFFI, was not attractive enough for the big players to adopt. And HPy, another alternative that would have played better with Cython and friends came too late in the game, by that time PyPy development had lost momentum.
Yes. The C API those libraries use is a good fit to CPython, a bad fit to PyPy. Hence CFFI and HPy. Actually, many if the lessons from HPy are making their way into CPython since their JIT and speedups face the same problems as PyPy. See https://github.com/py-ni
Sorry can you explain more the connection between PyPy and CFFI (which generates compiled extension modules to wrap an existing C library)? I have never used PyPy, but I use CFFI all the time (to wrap C libraries unrelated to Python so that I can use them from Python)
CFFI is fast on PyPy. The JIT still cannot peer into the compiled C/C++ code, but it can generate efficient interface code since there is a dedicated _cffi_backend module built into PyPy. Originally that was the motivation for the PyPy developers to create CFFI.
Thank you for the background info, and sorry for me explaining CFFI (I just wanted to be sure we were talking about the same thing). Being ignorant about PyPy, I honestly had no idea until now that there was a personnel or purpose overlap between CFFI and PyPy. I am very grateful for CFFI (though I only use it API mode).
The Faster Python project would’ve got further if Microsoft hadn’t let the entire team go when they made large numbers of their programming languages teams redundant last year. All in the name of “AI”. Microsoft basically gave up on core computer science to go chase the hype wave.
You’re right, of course: even Guido seems to have been moved off working on CPython and onto some tangentially-related AI technology.
However, Faster CPython was supposed be a 4-year project, delivering a 1.5x speedup each year. AFAIK they had the full 4 years at Microsoft, and only achieved what they originally planned to do in 1 year.
To be fair, they suffered a bit from scope creep, as mid project it was started a second major effort to remove the gil. So the codebase was undergoing two major surgeries at the same time. Hard to believe they could stick to the original schedule under those conditions. Also gil removal decreases performance from sequential execution. I imagine some gains from Faster CPython were/will be spent compensating this hit on gil-less single thread performance.
> PyPy is a fantastic achievement and deserves far more support than it gets
PyPy is a toy for getting great numbers in benchmarks and demos, is incompatible in a zillion critical ways, and is basically useless for large-scale development for anything that has to interoperate with "real" Python.
Literally everyone who's ever tried it has the experience that you mock up a trial for your performance code, drop your jaw in amazement, and then run your whole app and it fails. Until there's a serious attempt at real 100% compatibility, none of this is going to change.
Also none of the deltas are well-documented. My personal journey with PyPy hit a wall when I realized that it's GC is lazy instead of greedy. So a loop that relies on the interpreter to free stuff up (e.g. file descriptors needing to be closed) rapidly runs into resource exhaustion in PyPy. This is huge, easy to trip over, extremely hard to audit, and... it's like it's hidden lore or something. No one tells you this, when it needs to be at the top of their front page before your start the port.
"Ask HN: Is anyone using PyPy for real work?" from 2023 contradicts you about PyPy being a toy. The replies are noticeably biased towards batch jobs (data analysis, ETL, CI), where GC and any other issues affecting long-running processes are less likely to bite, but a few replies talk about sped-up servers as well.
Timely management of external resources is what the `with` statement has been for since 2006, added in python 2.5 or so. To debug these problems Python has Resource Warnings.
Additionally, CPython's gc is also only eager in a best effort kind of way. If cycles are involved it can take long to release memory. This will become even more the case in future versions of CPython, in the free threading variants.
Sorry, the with statement is non-responsive. The question isn't whether you "can" write PyPy-friendly code. Obviously you can.
The question isn't even whether or not you "should" write PyPy-friendly code, it's whether YOU DID, or your predecessors did. And the answer is "No, they didn't". I mean, duh, as it were.
PyPy isn't compatible. In this way and a thousand tiny others. It's not really "Python" in a measurable and important way. And projects that are making new decisions for what to pick as an implementation language for the evolution of their Python code have, let's be blunt, much better options than PyPy anyway.
Strongly disagree. If you're relying on Python garbage collection to free file descriptors in a loop, you have a subtle bug that will rear its head in unexpected and painful ways (and by some unwritten law of software, most notably either at 3 AM or when you have an important demo scheduled). This is true whether you're running in CPython or PyPy. It's not hard to avoid - use `with` or `try...finally`. It's not some newfangled language feature. It's not a surprise - you can't write good RAAI code in Python. It's a sign of someone with a poor grasp of the language they're using. If you find things like this, you should fix them, even if you never intend to use PyPy.
> If you're relying on Python garbage collection to free file descriptors in a loop
Again, that's a proscription for how to write python code for future execution. It's emphatically not a statement for the behavior expected by python code already in production, which tends to rely on this behavior (along with many other such warts and subtleties) implicitly.
And the fact that PyPy doesn't feel the need to clone it (and all the others) explains why PyPy basically doesn't work for existing python code.
I mean, me being an idiot python developer in your eyes does nothing to make the ancient code I received run. It just makes you feel smarter. That's a bad trade.
PyPy needs to be compatible before anyone is going to use it. And it isn't. And so people didn't. And so now it's basically dying as no one wants to work on a project no one uses.
Fundamentally, CPUs use 0-based addresses. That's unavoidable.
We can't choose to switch to 1-based indexing - either we use 0-based everywhere, or a mixture of 0-based and 1-based. Given the prevalence of off-by-one errors, I think the most important thing is to be consistent.
The reason many languages prefer `length` to `count`, I think, is that the former is clearly a noun and the latter could be a verb. `length` feels like a simple property of a container whereas `count` could be an algorithm.
`countof` removes the verb possibility - but that means that a preference for `countof` over `lengthof` isn't necessarily a preference for `count` over `length`.
I tend to use numFoos (short for “number of foos”), and only use fooCount when the variable is used for actual counting (like an errorCount variable that is incremented for each error).
Countof is strange, because one doesn’t talk about the “count of something” in English, other than uses like “on the count of three” (or the “count of Monte Cristo” ;)).
Yeah, you could argue that choosing C is just choosing a particular subset of C++.
The main difference from choosing a different subset, e.g. “Google C++” (i.e. writing C++ according to the Google style guide), is that the compiler enforces that you stick to the subset.
When I developed D, a major priority was string handling. I was inspired by Basic, which had very straightforward, natural strings. The goal was to be as good as Basic strings.
And it wasn't hard to achieve. The idea was to use length delimited strings rather than 0 terminated. This meant that slices of strings being strings is a superpower. No more did one have to constantly allocate memory for a slice, and then keep track of that memory.
Length-delimited also super speeded string manipulation. One no longer had to scan a string to find its length. This is a big deal for memory caching.
Static strings are length delimited too, but also have a 0 at the end, which makes it easy to pass string literals to C functions like printf. And, of course, you can append a 0 to a string anytime.
I agree on the former two (std::string and smart pointers) because they can't be nicely implemented without some help from the language itself.
The latter two (hash maps and vectors), though, are just compound data types that can be built on top of standard C. All it would need is to agree on a new common library, more modern than the one designed in the 70s.
I think a vec is important for the same reason a string is… because being able to properly get the length, and standardized ways to push/pop from them that don’t require manual bounds checking and calls to realloc.
Hash maps are mostly only important because everyone ought to standardize on a way of hashing keys.
But I suppose they can both be “bring your own”… to me it’s more that these types are so fundamental and so “table stakes” that having one base implementation of them guaranteed by the language’s standard lib is important.
You can surely create a std::string-like type in C, call it "newstring", and write functions that accept and return newstrings, and re-implement the whole standard library to work with newstrings, from printf() onwards. But you'll never have the comfort of newstring literals. The nice syntax with quotes is tied to zero-terminated strings. Of course you can litter your code with preprocessor macros, but it's inelegant and brittle.
Because C wants to run on bare metal, an allocating type like C++ std::string (or Rust's String) isn't affordable for what you mean here.
I think you want the string slice reference type, what C++ called std::string_view and Rust calls &str. This type is just two facts about some text, where it is in memory and how long it is (or equivalently where it ends, storing the length is often in practice slightly faster in real machines so if you're making a new one do that)
In C++ this is maybe non-obvious because it took until 2020 for C++ to get this type - WG21 are crazy, but this is the type you actually want as a fundamental, not an allocating type like std::string.
Alternatively, if you're not yet ready to accept that all text should use UTF-8 encoding, -- and maybe C isn't ready for that yet - you don't want this type you just want byte slice references, Rust's &[u8] or C++ std::span<char>
Automatic memory accounting — construct/copy/destruct. You can't abstract these in C. You always have to call i_copied_the_string(&string) after copying the string and you always have to call the_string_is_out_of_scope_now(&string) just before it goes out of scope
For many string operations such as appending, inserting, overwriting etc. the memory management can be made automatic as well in C, and I think this is the main advantage. Just automatic free at scope end does not work (without extensions).
You can make strings (or bignums or matrices) more convenient than the C default but you can never make them as convenient as ints, while in C++ you can.
Yes, but I do not think this is a good thing. A programming language has to fulfill many requirements, and convenience for the programmer is not the most important.
The C++ std::string is both very complicated mechanically and underspecified, which is why Raymond Chen's article about std::string has to explain three different types (one for each of the three popular C++ stdlib implementations) and still got some details wrong resulting in a cycle of corrections.
So that wouldn't really fit C very well and I'd suggest that Rust's String, which is essentially just Vec<u8> plus a promise that this is a UTF-8 encoded string, is closer.
It is when compared with C89, also the ISO C++ requires inclusion of ISO C standard library.
The differences are the usual that occur with guest languages, in this case the origin being UNIX and C at Bell Labs, eventually each platform goes its own merry way and compatibility slowly falls apart with newer versions.
In regards to C89 the main differences are struct and unions naming rules, () means void instead of anything goes, ?: precedent rules, implicit casts scenarios are reduced like from void pointers.
Is Lily intended to be (or could it be used as) a statically-typed alternative to Lua?
Personally I'm happy with dynamic typing for scripting - but I suspect many people would welcome a statically-typed option, and there don't seem to be many available.
The Luau author is always on the official Lua mailing list, and it has twice as many stars, so it seems likely to win the long term popularity contest.
Note that some of those can't run on a regular Lua runtime.
Luau is a separate implementation of a Lua dialect. However, it's backed by Roblox and being increasingly used in high budget games such as Alan Wake 2, and tools like Rive.
And Terra is more of a low-level language embedded in regular Lua for metaprogramming, than a statically-typed Lua.
In this vein there's also Pallene, which integrates better with regular Lua on a slightly-patched Lua runtime.
Also it looks like[1] Luau is the official Roblox Studio scripting language, and is baed on Lua 5.1 (possibly LuaJIT?) which means it's behind mainstream Lua.
Not sure which Lua versions the others are based on.
That’s enough for INDENT, but for DEDENT you also need a stack of previous indentation levels. That’s how, when the amount of indentation decreases, you know how many DEDENTs to emit.
The requirement for a stack means that Python’s lexical grammar is not regular.
reply