
PyPy v5.9 Released - mdf
https://morepypy.blogspot.com/2017/10/pypy-v59-released-now-supports-pandas.html
======
syllogism
This is amazing -- we have a small mountain of Cython code, and almost
everything is working out of the box!

Tests failing:

* [https://github.com/explosion/spaCy](https://github.com/explosion/spaCy)

Confirmed working:

* [https://github.com/explosion/thinc](https://github.com/explosion/thinc)

* [https://github.com/explosion/preshed](https://github.com/explosion/preshed)

* [https://github.com/explosion/cymem](https://github.com/explosion/cymem)

* [https://github.com/explosion/murmurhash](https://github.com/explosion/murmurhash)

I doubt spaCy will ever be faster on PyPy (the neural network library Thinc is
currently 50% slower). It'd still be really great to get it running, so people
who benefit from PyPy for other parts of their stack don't have to manage two
Python environments.

~~~
DonbunEf7
Just keep rewriting C as Python. I still remember the day I switched from
Numpy and CPython to array.array on PyPy for a 60x boost in benchmarks. (Only
a 20x speedup on actual running code; this was for geometry generation in a
networked video game server.)

------
theli0nheart
This is great. Can't wait until Python 3.5 support is out of beta.

Just out of curiosity, I'd love to hear from others who've used PyPy for their
web apps. Are there any issues to look out for? I remember that a few years
ago, packages like psycopg2 were not compatible, which made the migration
somewhat difficult. Would love to hear real-world experiences here.

~~~
jstarfish
I use PyPy almost exclusively. I like the availability of their precompiled
binaries, and other peoples' benchmarking criticisms aside it's orders of
magnitude faster than standard Python in all of my actual applications.

psycopg2cffi fixed the psycopg2 issue some time ago.

I've run into compatibility issues with packages that reference math-centric
libs (matplotlib?) but aside from that I've been quite happy with it.

~~~
chubot
What kind of applications are you using it for?

Coincidentally I tried PyPy today for my shell, which is around 14K lines of
completely unoptimized Python code [1]. I have never used PyPy before, despite
being a long-time Python programmer.

I didn't expect PyPy to speed it up, just based on my impressions of the kind
of workloads PyPy excels at.

In my first test case (parsing a 976 line shell script), PyPy took 2.0 seconds
and Cpython took 1.0. And that 2x slower number held up for other a couple
other tests.

I will probably try running the benchmark in a loop to see if PyPy's JIT warms
up (does it do that?). But I wasn't really expecting to use PyPy -- I just
wanted to see how it does, because there aren't many ways to speed up my code
without rewriting a good portion of it.

My impression is that JITs don't work well in general for certain workloads,
not just PyPy. IIRC LuaJIT is actually slower than Lua for string-processing
workloads. It makes a lot of sense in say machine learning applications which
are all floating point calculations. But string processing is probably
dominated by allocations and locality, and the JIT doesn't do very much there,
whether it's LuaJIT or PyPy.

[1] [http://www.oilshell.org/](http://www.oilshell.org/)

~~~
omaranto
I'd love to see a benchmark on which LuaJIT is significantly slower than Lua,
if you have a link.

~~~
chubot
I probably overstated... I don't think JITs including LuaJIT do much for
string manipulation, but that would mean it's on par with Lua.

Off the top of my head I think of Lua/Python as 10-100x slower than C, but
LuaJIT might get you within 2x for some numerical workloads. But it will not
get you within 2x for string workloads -- it's still in the 10x or worse
category.

Someone reported a slower benchmark here but the maintainer argued that is
invalid. But he says all the time is within C functions, which is what I was
getting at.

[https://www.freelists.org/post/luajit/Luajit-string-
concaten...](https://www.freelists.org/post/luajit/Luajit-string-
concatenation-performance,10)

A couple people mention string performance here, without many details:

[https://news.ycombinator.com/item?id=12573981](https://news.ycombinator.com/item?id=12573981)

I think the bottom line is that the semantics of Lua strings are completely
different than how you manage strings in C, so there's a limit to how much you
can optimize the program, even if you have a lot of time to (which the JIT
does not.)

------
mattip
The original title emphasises that NumPy and Pandas now are functional on
PyPy.

The PyPy JIT cannot look inside C code, and crossing the python-c interface is
slow, but give it a chance and you may be pleasantly surprised how fast your
pure python code can run.

~~~
bischofs
So do you think it is worth switching if you are using pandas and numpy
heavily?

~~~
fijal
It depends: we are heavily invested in making both numpy and pandas faster, so
next releases will improve that. That said, you are quite likely to find your
program slower (but it really depends on what you do). It's a good time to
experiment, but definitely not to fully switch just yet

------
SEJeff
I wonder if this will run apistar:
[https://github.com/encode/apistar](https://github.com/encode/apistar) which
is currently the fastest (python 3.6 generally) python web framework out
there.

~~~
wyldfire
In my experience, the answer is nearly always "yes" for pure python code. Now
that with the last few PyPy releases support the C API via emulation, the
answer is probably still yes for any python package.

EDIT: apparently it does work [1] though not officially supported until/unless
it can be tested on Travis CI.

[1]
[https://github.com/encode/apistar/issues/130#issuecomment-29...](https://github.com/encode/apistar/issues/130#issuecomment-298233451)

------
haikuginger
As I do with every PyPy release, I would like to point out that the PyPy
official benchmarks for comparison against CPython[0] continue to misleadingly
compare their latest and greatest with CPython 2.7.2 (released in 2011), as
opposed to the modern CPython 2.7.13 or 3.5.3 versions for which they target
API compatibility.

[0][http://speed.pypy.org](http://speed.pypy.org)

~~~
fijal
Hi

We should indeed compare PyPy3.5 vs CPython 3.5.3, but having a benchmark
suite that works on both continuous to be a problem.

Regarding 2.7.13 - you might find it surprising, but it's actually SLOWER than
2.7.2, there has been no speed improvements and quite a few speed decreases,
so we decided to keep the faster one.

EDIT: part of the problem is that comparing PyPy 2 vs CPython 3 is apples vs
oranges, but PyPy3 is not ready yet (unicode improvements I'm working on right
now are missing)

~~~
haikuginger
> Regarding 2.7.13 - you might find it surprising, but it's actually SLOWER
> than 2.7.2, there has been no speed improvements and quite a few speed
> decreases, so we decided to keep the faster one.

I don't find it that surprising, but do find it disappointing that you would
run benchmarks against the current version, but not post them online for
perusal, nor provide any sort of explanation for the use of the older version
in head-to-head comparisons. For me, at least, it produces the impression that
PyPy has something to hide, and I doubt I'm the only one.

~~~
fijal
Yes, I'm really sorry.

This is the usual case of budget - if I had budget to have anyone improve the
website, improve the buildbot, improve the benchmark comparison, trust me I
would do it. Right now there are no volunteers and the benchmark side is sort
of lingering on.

------
babataiyoh
Is there a list of well-known c-extensions for which PyPy is known to work or
for which there are well-maintained cpyext ports ?

Update: found it at
[https://bitbucket.org/pypy/compatibility/wiki/Home](https://bitbucket.org/pypy/compatibility/wiki/Home)

------
hyperbovine
Was surprised to see Cython support on this list. Can somebody elaborate on
the relationship between the two? I had always viewed them as alternatives.

~~~
geofft
Cython is a way of writing compiled code that links against Python in a way
that's easier than than using the Python.h API directly.

There are two things you can do with this. The first is you can write your
performance-sensitive code directly in Cython, in which case, yes, that's a
direct competitor to PyPy. (So is writing your performance-sensitive code
directly in C and using Python.h to expose it to Python as a native-code
module.)

The second is that you can write bindings to existing C (or C-ABI-compatible,
really) code in Cython, instead of using C and Python.h to write those
bindings. In that case, it's not quite that you care about the performance of
your C code, but that it already exists, and you just need to call into it
somehow. Having PyPy be able to use these existing codebases is valuable.

------
gkya
Is PyPy _ideally_ plug'n'play, i.e. is it supposed to be able to seamlessly
replace the CPython interpreter ( _ideally_ in that it may not factually be
completely compatible, but is it aimed to be completely compatible)?

~~~
DonbunEf7
No. There's a documented list of things that are different. [0] That said, you
likely don't have many of these issues in your code, if any. The ideal is for
folks who write portable pure Python without relying on implementation-
specific details to be able to run on PyPy and CPython without any hacks.

[0]
[http://pypy.readthedocs.io/en/latest/cpython_differences.htm...](http://pypy.readthedocs.io/en/latest/cpython_differences.html)

~~~
gkya
Thanks for the link!

------
avyfain
Thanks @fijal and team for all the effort! This is awesome.

The last update on the Pypy+Pandas wiki[0] is from this August, and it
mentions that there are still 15 outstanding failing tests. Does this release
mean that 5.9 is now at 100% parity? What does the same metric look like for
Pypy+Numpy, and where can that one be tracked if not 100% yet?

I am looking forward to migrating some pipelines over to 5.9 soon.

[0]
[https://bitbucket.org/pypy/pypy/wiki/cpyext_2_-_cython_and_p...](https://bitbucket.org/pypy/pypy/wiki/cpyext_2_-_cython_and_pandas)

~~~
mattip
As you point out there are still a few failing tests. The The number of
failing tests on both is very low but not zero. The PyPy team has had good
collaboration with Pandas and NumPy, but there are some deeper issues with
these packages depending on refcount semantics in some edge cases, IMO rarely
seen in real world scenarios. For numpy this means using an out keyword
argument can be tricky, and for Pandas it means some galse positives in
determining when a dataframe is being held by another reference, making it
read-only

Of course if your workflow depends on those features, they are critical. We
are working on full compatibility and also on increasing speed.

------
howfun
I am glad to hear it!

