- A Python runtime has to support threads + shared memory, while a JS one doesn't. JS programs are single-threaded (w/ workers). So in this sense writing a fast Python interpreter is harder.
- The Python/C API heavily constrains what a Python interpreter can do. There several orders of magnitude more programs that use it than v8's C++ API.
For example, reference counts are exposed with Py_INCREF/DECREF. That means it's much harder to use a different reclamation scheme like tracing garbage collection. There are thousands of methods in the API that expose all sorts of implementation details about CPython.
Of course PyPy doesn't support all of the API, but that's a major reason why it isn't as widely adopted as CPython.
- Python has multiple inheritance; JS doesn't
- In Python you can inherit from builtin types like list and dict (as of Python 2.2). In JS you can't.
- Python's dynamic type system is richer. typeof(x) in JS gives you a string. type(x) in Python gives you a type object which you can do more with. And common programs/frameworks make use of this introspection.
- Python has generators, Python 2 coroutines (send, yield from), and Python 3 coroutines (async/await).
In summary, it's a significantly bigger language with a bigger API surface area, and that makes it hard to implement and hard to optimize. As I learn more about CPython internals, I realize what an amazing project PyPy is. They are really fighting an uphill battle.
All these specific examples are true statements, yet isn’t Common Lisp even more dynamic, and often have even better optimizing compilers?
Lisp has multiple inheritance and also multiple dispatch, and SBCL beats CPython by a country mile in every performance comparison I’ve seen.
I always find it ironic that the CMUCL lisp compiler (upon which SBCL was based) was called 'the python compiler', had machine-code generation in 1992 and that CMUCL sports native multithreading that is largely lock free..
This. Lisp at least 10x faster in the worst case; it can be made to run even faster...
If it's the former, I would say that optimizing a small core is easier than optimizing a big language. Python's core is 200-400K lines of C and there are a lot of nontrivial corners to get right.
I was surprised when looking at Racket's implemetation that it's written much like CPython. IIRC it was more than 200K lines of C code. Some of that was libraries but it's still quite big IMO. I would have thought that Racket, as a Scheme dialect, would have a smaller core.
AFAIK Racket is not significantly faster than Python; it's probably slower in many areas. Maybe it's just that SBCL put a focus on performance from the beginning?
(I looked at Racket since I heard they are moving to Chez Scheme, which also has a focus on performance.)
Note that the old C runtime of Racket has been rewritten to use Chez Scheme. The work is not completely done - but it is getting close.
Talk by Matthew Flatt on the rewrite (almost a year old):
Yes, that's the Meta Object Protocol. It isn't on the standard, yet most Lisp implementation have it, and now you can use it in a portable way as well.
>If it's the former, I would say that optimizing a small core is easier than optimizing a big language. Python's core is 200-400K lines of C
Common Lisp's "core" (that means, not including "batteries") is considerably more involved and complex than Python's. Creating a new CL implementation is a big deal.
to wit, chez scheme was several generations into improving it's native code compilation abilities before python even existed:
history of chez scheme (2006):
The user was comparing some Ackermann computations using Python and GNU Common Lisp (GCL), finding the performance about the same.
But he wasn't compiling the Lisp! So this compared GCL's Lisp raw AST interpreter to Python byte-code.
Like, focus on further optimizing the RPython runtime rather than starting from scratch.
RPython isn't something that's exposed to PyPy users. It's
meant for writing interpreters that are then "meta-traced". It's not for writing applications.
It's also not a very well-defined language AFAIK. It used to change a lot and only existed within PyPy.
I'm pretty sure the PyPy developers said that RPython is a fairly unpleasant language to write programs in. It's meant to be meta-traceable and fast, not convenient. It's verbose, like writing C with Python syntax.
Why not use a faster language to write the interpreter, like C?
This is not meant to be a hostile question, I am just confused as to why PyPy exists
RPython is both a language (a very ill-defined subset of Python... pretty much defined as "the subset of Python accepted by the RPython compiler"), and a tool chain for building interpreters. One benefit of writing an interpreter in RPython is that, with a few hints about the interpreter loop, it can automatically generate a JIT.
The whole point of the PyPy project is to write a more "abstract" Python interpreter in Python.
VMs written in C force you to commit to a lot of implementation details, while PyPy is more abstract and flexible. There's another layer of indirection between the interpreter source and the actual interpreter/JIT compiler you run.
See PyPy's approach to virtual machine construction
This sentence explains it best:
Building implementations of general programming languages, in
particular highly dynamic ones, using a classic direct coding approach, is typically a long-winded effort and produces a result that
is tailored to a specific platform and where architectural decisions
(e.g. about GC) are spread across the code in a pervasive and invasive way.
Normal Python and PyPy users should probably pretend that RPython doesn't exist. It's an implementation detail of PyPy. (It has been used by other experimental VMs,
but it's not super popular.)
I'm assuming 99% of normal python users are not using anything outside of the RPython subset, correct?
Changed type of object:
Changed superclass of type:
The resolution order is calculated when it's defined. It's calculated again whenever you assign to __bases__ (or a superclass's __bases__). But it's not calculated every time it's used, which means there's no significant performance penalty to multiple inheritance unless you're changing a class's bases very often.
Metaclasses can override the MRO calculation, which we can abuse to track when it's recalculated: https://pastebin.com/NdiA12Ce
! Computing MRO
Changing Baz's bases
! Computing MRO
I do agree that basically everything is dynamic in Python. But some things are more dynamic than others.
Though I think the general point that Python is a very large language does have a lot to do with its speed / optimizability. v8 is just a really huge codebase relative to the size of the language, and doing the same for Python would be a correspondingly larger amount of effort.
I don't know the details but v8 looks like it has several interpreters and compilers within it, and is around 1M lines of non-test code by my count!
v8 was written in the ES3 era. And I knew ES3 pretty well and Python 2.5-2.7 very well, which was contemporary. I'd guess at a minimum Python back then was a 2x bigger language, could be even 4x or more.
IIUC this is also true for PHP but HHVM has/had some interesting techniques to deal with it, like pairing up and cancelling out reference count operations, and bulk changing the reference count before taking a side exit or calling a C function.