Hacker News new | past | comments | ask | show | jobs | submit login
Ten years of PyPy (morepypy.blogspot.de)
200 points by DasIch on Feb 28, 2013 | hide | past | favorite | 32 comments

I'm personally very excited about the prospects for NumPy in PyPy, as this is my main use case for python. Relatedly, on the PyPy homepage, you can give money to help grease the wheels on development of specific PyPy features: http://pypy.org/. There's also quite detailed proposals that describe where the money will go, how it will help each feature's development, why each feature is useful/cool, etc. To me this seems like a really awesome way to organize donations for an open source project and gives me greater confidence that my donation will have an impact.

You might also check out numba (http://numba.pydata.org/), which still feels slightly beta, but can speed up numpy code quite a bit with very little boilerplate. I've started experimenting with it as an alternative to my usual approach which is to write or wrap code with Cython. I'm curious to see where PyPy goes with numpy and cffi though.

I am excited too, but I also wonder how much faster it's really going to be. Synthetic benchmarks are great, but I didn't see much of a speedup when I switched to PyPy for some of my real-world code.

You should complain. There is a myriad of reasons why PyPy won't go very fast, for example you're using C extensions. Usually what happens if you have code where parts go insanely fast and parts go insanely slow. Once you identify it, sometimes it is just replacing the library (like pgsql2 -> pgsql2cffi or so) and stuff flies. But it all depends on your workload and the stack a lot. R&D is typically needed to get good speedups. For most RW examples, 2-10x are to be expected.

Shameless plug - if your stuff is not open source and you can't be bothered to do the profiling yourself, there is typically an option to hire someone to do it for you. Get in touch.

Could you explain what you meant when you wrote:

> Another option we tried was using RPython to write CPython C extensions. Again, it turned out RPython is a bad language and instead we made a fast JIT, so you don't have to write C extensions.


On the surface it would seem that using RPython would be a boon to authors of C extensions, so this surprises me. It is also kind of shocking to hear the authors of such a great project basically condemn the language that it was written in (and really, the original raison d'etre of the project itself) as "bad".

Whether RPython is a good or a bad language is a subjective fact and it should not be related to the fact whether I wrote it or not right? It's good for what it does (writing VMs), or at least better than writing them in C++ including JIT by hand, but it's much worse than Python. Hence using it for a general purpose project is a bad bad idea and you should use Python instead.

Same here--I spent a couple days porting a compiler frontend from Python 3->2 just to try out PyPy, and got rewarded with a 2-3x slowdown. I messed around with some tracing parameters, but didn't get too much benefit IIRC.

Of course, this was a couple years ago, and I don't have access to that codebase anymore. Things might've changed since then...

When I first tried PyPy I got pretty bad slowdown as well. Apparently PyPy wasn't really good with long strings.

I'm using PyPy for real-world code every day, because it runs fast as^H^H^H^H, uhm, very fast! I regularly see 6-15X speedups compared to CPython.

That being said, if your code is mostly IO-bound or calls into foreign-language libraries, PyPy's speedup of the Python code of course won't help much with your wall time.

Note that for for-loop-style number crunching code, the speedup you are aiming for over CPython is on the order of 50x-750x depending on cache locality.

I'm definitely not saying that PyPy won't do better than 15x speedup on number crunching code, just pointing out that in the context of NumPyPy a 15x speedup is not very relevant if you want PyPy to be a viable alternative to, say, Julia.

(Disclaimer: I'm a Cython dev)

Yeah. It depends on the hardware and problem. A 50x speed up with optimised routines(memory optimized, branch reduced, using SIMD etc) multiplied by 8 cores gives you a total of 400 times speed up. This is what I've seen in real life code I've made. Also, if you offload to some other processor (GPU or dedicated hardware) then of course you can get even faster (again depending on the hardware and problem).

So pypy speed ups aren't very good in comparison to the best you can achieve using other techniques... but you can mostly use the same tricks in pypy as you can in CPython to get those results there too :)

PyPy speedups over CPython in numeric code are in 50-200x ballpark, it really depends what you do. We can certainly do better, but it's within 2x of optimized C for most of it (unless there is a very good vectorization going on).

For stuff that I tried pypy was universally faster than Cython on non-type annotated code and mostly faster on type-annotated code from cython benchmarks.

It would be great to see a repo with this set of examples with code for both what is run under PyPy and the type-annotated cython code, and the exact settings that were used.

I run stuff I found in the cython repo only (good or bad) benchmarks/ or so. It was also ages ago so treat it with a grain of salt. My point is that there is no fundamental reason why cython should be faster than pypy (even with type annotations), because type annotations are essentially done during JITting. In fact, pypy should be faster because of other things, like a faster GC or objects.

I actually find it very impressive that they have a quite well working tracing JIT. For Firefox we had TraceMonkey, but the code was hard to maintain and hacked into the infrastructure. And it turned out that most stuff doesn't trace very well in the real world. (not SunSpider, argg!) I haven't looked very recently but I guess the pypy frontend for JavaScript is still not actively maintained? I think I remember seeing some thesis about that recently. Best of luck to them in the coming years!

To make it more likely to be accepted, the proposal for the EU project contained basically every feature under the sun a language could have.

How common is this in EU projects?

Very, at least in the education funding. There are courses (financed by EU, of course) that teach you how to write proposals for EU projects, to make it seem more in line with expectations of the "program" you want to finance your project. They basically say "you have to put 'developing intelectual capital among unemployed people in the countryside' there, that is more likely to get you the money. You may do a few free courses for 50 people on whatever in that building after it's finished, and it will be OK. And while your're at it buy projector and laptop for that courses too, and put it in costs". And the building is intended as library or sth, but to get money you add whatever you can.

EU money on education are wasted so much. On the other hand infrastructural funds are handled more or less OK.

guy working on several EU research projects here.

That's barely true. Although there is some certain amount of buzzwords in every proposal (that's kind of bread an butter for tech people anyway), they are far from what OP says. Sometimes there is some overpromising, but those things are usually secondary aspects, rather than the core idea. Most of the results are beeing used as some developed components by participating companies, some of the work is dropped, and a lot of work is open sourced and few projects are really successful. But that's research anyway. Not all work lead to success here.

Perhaps it depends on the type of project?

I see two very different views here: one thing is to have technical jargon that looks like buzzwords to the uninitiated, another is to have to include, in the proposal, features that you won't implement so that the project is accepted, as mentioned in the original post.

I don't think that's it, so much as offering to implement, and actually implementing (at least in a half-assed way) features that you don't actually want, to make the proposal more attractive.

I can't speak about the EU projects in general, but chances are we overdid it and it's not 100% clear if it increased or decreased our chances.

Awesome. Is there somewhere they've documented the wisdom they've gained in more detail? I'd love to hear more about the issues encountered with JavaScript, ctypes, and LLVM, and lessons learned in general.

#pypy on freenode. LLVM, JS and ctypes are scattered around mailing list posts as well (and blog posts).

This reminded me to make another donation towards py3k support =). Keep on rocking on, pypy.

I recommend using it. It's a great platform. Well done guys

From the article: "the first usable PyPy was released in 2010." For most practical purposes PyPy is only two years old, not ten.

No. Before it was known as PyPy, PyPy was known as Psyco. I remembered playing with it. In Pressyo's internal documentation there were many mentions of using Psyco (we never actually ended up using it)

I think Psyco was a separate code base, and not an actual progenitor of PyPy.

Ahhh... I see I was mistaken. Both are projects by Armin. I thought Psyco became PyPy

Working pypy was released before that. The first release that had a JIT that made stuff faster was released in 2010. So a lot of codebase is much older than that.

Still hoping one day CPython gets replaced by PyPy as default Python implementation.

Good work guys.

According to PyPy site it can run Django.

Was it tested or anyone used it in production to run Dj 1.4? Dj 1.5?

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact