
PyPy: A Faster Python Implementation - funspectre
https://www.pypy.org/index.html
======
pansa2
I’ve heard PyPy’s JIT described as “meta-tracing - instead of tracing your
code, it traces itself as it interprets your code”.

Is that accurate? What are the pros & cons of this approach compared to a
normal tracing JIT, e.g. LuaJIT?

~~~
andrewl-hn
About 15 years ago there was a series of papers by Andreas Gal (then future
and now former CTO of Mozilla) about precisely this aspect of JIT-compilers
design.

If you have a program and represent it as a series of VM instructions, and you
try to jit them individually the kit compiler has very limited ability for
optimization. Plus, after each such instruction the execution comes back to
the piece of the interpreter that selects the next instruction to execute:
essentially a giant switch statement that makes branch prediction on CPU very
difficult and inefficient.

The alternative is to not only jit the instructions but also embed pieces of
the interpreter between them so that the compiler can see how those
instructions are connected and generate the code for the whole sequence of
them. This the compiler can make better assumptions about the code and
optimize it a lot more.

This work eventually made it to all sorts of virtual machines. Adobe used it
for Flash, Mozilla for their SpiderMonkey javascript engine, Google used this
design for early versions of Android Dalvik VM.

PyPy uses it, too. That's why they call it a meta-compiler. It compiles pieces
of your code and pieces of its own interpreter together to produce the more
optimized binary.

~~~
jashmatthews
If you're interested in trace based compilation then the origins go much
further back than Andreas Gal's work and into research in the 1970s into
compilers for VLIW architectures [1]

It's not the dispatch to the next instruction that's expensive in most virtual
machines. It's the sheer complexity of each instruction as it maps to the
underlying assembly instructions [2]

Just getting rid of the dispatch loop doesn't help much and in many cases the
increased pressure on the instruction cache makes performance worse.

I'm trying to help with this problem at the moment for the CRuby JIT compiler.

SpiderMonkey, LuaJIT and Dalvik record the trace in the bytecode interpreter.
There's no generating baseline JITed code with additional trace recording
stuff.

What PyPy does is quite different. PyPy is a Python virtual machine written in
a language called RPython and has an interpreter and trace compiler for
RPython.

It adds a whole layer of indirection compared to traditional trace compilers
which makes it harder to do some optimizations but makes it easier to
implement some more basic parts of trace recording and compilation and
crucially, makes it somewhat re-usable for different languages.

[1]
[https://archive.org/details/optimizationofho00fish/mode/2up](https://archive.org/details/optimizationofho00fish/mode/2up)
[2]
[https://www.sciencedirect.com/science/article/pii/S157106610...](https://www.sciencedirect.com/science/article/pii/S1571066109004617)

------
HoppyHaus
PyPy is fantastic, it rescued a project I had where a massive amount of json
needed to be parsed before an absurdly short heartbeat. Not even some of the
fast json libraries could do it, but with a drop-in PyPy replacement, it
worked, and still does, wonderfully

------
pjmlp
I love PyPy, just wish that they had more community love.

------
jgwil2
What is stopping PyPy from getting more widespread adoption in the Python
community? Surely it isn't that everyone is using the latest version of
CPython; it's been years and many are still using 2.7.

~~~
traverseda
The startup times are a fair bit worse, and the memory requirements are a fair
bit higher. Despite everything else python does tend to be fast enough, and
the worse startup/memory usage isn't necessarily worth it.

Also it doesn't speed up things that rely on external C libraries, so code
using numpy/scipy/tensorflow/etc doesn't generally run appreciably faster.

------
Rochus
_" The geometric average of all benchmarks is 0.23 or 4.3 times faster than
cpython"_ (see [https://speed.pypy.org/](https://speed.pypy.org/))

If we put this figure in context with the CLBG (see [https://benchmarksgame-
team.pages.debian.net/benchmarksgame/...](https://benchmarksgame-
team.pages.debian.net/benchmarksgame/which-programs-are-fastest.html)) we're
somewhere between Racket and Dart.

Is a further speedup to be expected? Why is it still slower than V8 after so
much development effort?

~~~
bjoli
I suspect that the development effort that V8 has seen has been X orders of
magnitude greater.

~~~
Rochus
I have no detailed information. But the development of PyPy has been going on
for 20 years and was partly sponsored by the EU. If you compare that with
LuaJIT which was developed by a single person in a shorter timeframe and a
performance even faster than V8 I would assume that maybe the conceptual
approach taken by RPython/PyPy is less suited or Python is just that much
harder to speedup. I would guess the former because also the performance of
other RPython based implementations is not insanely impressive.

~~~
jashmatthews
V8 became faster than LuaJIT quite some time ago now but LuaJIT is incredibly
simple compared to V8.

~~~
Rochus
Are you sure? I did a cross-comparison recently and found LuaJIT still to be
factor 1.3 to 1.5 faster than V8 in geometric mean (I used the Node.js
implementation on the CLBG).

~~~
jashmatthews
V8 8.0 is consistently faster than LuaJIT 2.1 on my machine even on heavily
numeric benchmarks like Fannkuch or n-body.

Anything more realistic involving actual object access, not even allocation,
and V8 wins by large margins.

time luajit-2.1.0-beta3 nbody.lua 50000000 real 0m8.437s

time node nbody.js 50000000 real 0m5.065s

~~~
Rochus
I assume you would get another result running all CLBG algorithms and
comparing the geometric means. I also run benchmarks including (hashed)
table/field access and was virtually as fast as the same program when compiled
natively, see [https://github.com/rochus-
keller/Oberon/blob/master/testcase...](https://github.com/rochus-
keller/Oberon/blob/master/testcases/Hennessy_Results). Of course there are
parts where LuaJIT 2.0 is slower than native, but there are also some opposite
cases.

------
RocketSyntax
So if I want to start using this. Do I just make a change at the environment
level in virtualenv? And then all of my Python code will run faster?

~~~
cpburns2009
More or less. It really depends on what your code does and how frequently in
calls into C modules such as Numpy. The Python side is JIT-able while the C
side is not. PyPy3 currently implements Python 3.6.9 so you'll be missing out
on any 3.7+ features (e.g., built-in dataclasses but a backport is available
from PyPI).

------
shuoli84
pypy is great. what makes me sad is "they just stay behind cpython 1 or 2
versions"..

~~~
jononor
1-2 versions minor versions is pretty great. Most Linux distros that are
deployed are the same!

~~~
alexeiz
PyPy claims to be compatible with CPython 3.6.9
([https://www.pypy.org/compat.html](https://www.pypy.org/compat.html)).
CPython 3.7 is currently the most common Python version. The project I work on
since last year depends on features introduced in CPython 3.7. The migration
to CPython 3.8 in already underway on many Linux distros (Fedora is already on
it). So PyPy is unfortunately too far behind.

