

A brief experiment with PyPy - jnoller
http://lwn.net/SubscriberLink/442268/22f66371348bd7c5/

======
treo
That 3x speed up is about the same that I have seen with my code. I'm
currently writing a database cache simulator to try different algorithms with
it, and if I want to have anywhere near realistic results I have to use
realistic access traces.

Tried it today with a tpc-c trace which has about 500 million accesses. The
result: CPython would have run for about 90 minutes (I stopped it after 30
minutes, and began to look for a speedier possibility), PyPy only took 22
minutes.

~~~
true_religion
I've gotten about a 10x speed up on numerics code where there's so much
branching involved in the calculations that I can't afford to use NumPy.

As for me, the main reason I haven't moved to PyPy yet is the lack of database
and messaging support.

~~~
kingkilr
Which databases? At the moment we have SQLite, Oracle (haven't tested it
myself), and Postgresql. Plus whatever you can find a pure python driver for.
Also, what do you mean by messaging?

~~~
true_religion
Oh wow, I didn't realize that Postgresql was working on PyPy. I heard that
Django was only tested with SQLite so I made my assumptions from then on.

By messaging, I mean something like RabbitMQ, that way I can have batch
scheduling at a little bit more sophisticated grain than "run a cronjob".

~~~
kingkilr
psycopg2 is implemented in a fork of mine:
<http://bitbucket.org/alex_gaynor/pypy-postgresql/> it requires compiling
yourself, but works nicely (I was told by someone that this brought their
script's time from 2 minutes to 8 seconds). As of last test it passes all
Django tests. What's the current standard RabbitMQ lib? I didn't realize it
was a c-extension (hell I've used it myself and never noticed).

~~~
true_religion
Well the most used one is Celery. It depends on multiprocessing which blew up
on me the last time I tried it in PyPy.

But.... I just tried "import multiprocessing" in PyPy 1.5 and it worked! Is
this all part of the C-API compatability layer? Does that mean Cython code may
soon work in PyPy too (that's my pony feature)?

RabbitMQ should work under PyPy currently then, all of its dependencies
purport to be pure python. \---

Another RabbitMQ lib is Rabbitmq-c which is direct wrapping around
librabbitmq-c. It ecks out extra performance vs pure python rabbitmq, but
mostly it isn't needed.

~~~
kingkilr
Nope, multiprocessing was added to the Python standard library in 2.6, our
previous releases implemented python 2.5, 1.5 implements 2.7, so it now
includes multiprocessing.

~~~
asksol
That's good news! Guess it's time to remove the mechanism in Celery that
disables the multiprocessing pool when running under PyPy.

------
TheBoff
This is really really great to see: pypy is such an interesting project, and
it's really encouraging to see it make so much ground.

It's interesting that Guido deliberately didn't go for a full re write for
python 3, but this project which is a full rewrite in a whole different
language has provided a faster implementation with less developers!

~~~
sho_hn
To be fair, it also needed eight years to get here, PyPy inherits much
standard lib code, and the py3k effort wasn't limited to the interpreter but
also involved a lot of standard lib development :-). So, apples and oranges.
Still, amazing work by the PyPy folks any way you look a it.

------
levesque
Can't wait to have a version of PyPy that supports numpy!

Since the benefits have been proven, I can't help but wonder why there aren't
more people working on this...

~~~
fijall
The main reason why more people are not working on this is that:

* volunteers are interested somewhere else

* nobody is willing to put money into numpy on pypy happening

~~~
true_religion
I think part of the issue is that even PyPy + NumPy wouldn't give you
compelling gains versus NumPy + Cython in CPython.

The big win of PyPy vs. Cython is that you get a 10x or so speed improvement
without having to specify types, but if you're using NumPy already, then its
pretty standard to specify types already so NumPy can optimize. At that point,
you really have to ask yourself if it isn't worth it to just build a Cython
module and get next to C speed for that hotspot.

The great thing about Cython is that it lets you call inbetween C and Python
at little overhead _without_ having to write any code for the Python C-API
like you would if you were using CTypes or even Boost::Python.

------
sho_hn
Can anyone comment on startup latency and performance early in a run?

One of my use cases for Python is relatively small, short-running scripts, and
JIT engines often take a fair amount of runtime before all the optimizations
kick in. So I wonder how PyPy does at startup time and whether it's able to
leverage its execution speed prowess over brief runtimes - is it still a net
performance win in the end, or at least not-worse-than-CPython?

~~~
kingkilr
Last I checked we startup faster than CPython. As for early-run-performance.
Startup really depends on the total amount of code you have, if you've got a
few hundred lines of code the JIT's often warmed up and fast in under half a
second. On the other hand if you have a few hundred thousand lines it might
take a minute to warm up.

~~~
sho_hn
Faster startup sounds nifty :-).

How's performance while the JIT is not yet warmed up, compared to CPython?

~~~
nickik
Pypy starts a light weight interpreter first (that is probebly just about the
same speed as CPython maybe a bit faster) and then only compiles if it sees a
recuring pattern.

