Hacker News new | past | comments | ask | show | jobs | submit login
PyPy 2.5.0 released (morepypy.blogspot.com)
297 points by cyber1 on Feb 3, 2015 | hide | past | web | favorite | 68 comments

"We would like to thank our donors for the continued support of the PyPy project, and for those who donate to our three sub-projects, as well as our volunteers and contributors (10 new commiters joined PyPy since the last release). We’ve shown quite a bit of progress, but we’re slowly running out of funds. Please consider donating more, or even better convince your employer to donate, so we can finish those projects! The three sub-projects are: "

PyPy are doing such incredible work and they seem to ask for very little in the way of funding to make it happen (and they're very explicit about where they're going to spend the money).

Why is it that there aren't more donations from the industry? Is it just a marketing issue? Do they need to do a snazzy kickstarter video to build the hype?

PyPy dev here. I would be interested to explore the ideas what can be done to improve our funding situation. We seem to have moderate success in those fundraisers, but nothing REALLY huge.

Unfortunately, it's not something I have answers to. I guess I just look at the success of something like the Micro Python campaign [0] (also a cool project) and wonder if something similar could be achieved.

There are plenty of companies that could benefit from PyPy doing well though. How many python servers do youtube / reddit etc run? With some investment in PyPy they could reduce their on-going costs. Youtube could easily dump $100k into the PyPy project, so what's stopping them? (I suspect the answer in that case is that Google just aren't interested)

I'm in a startup that doesn't have money to invest in PyPy, but if we we had more cash we'd definitely look at putting some into PyPy (though our use case is the hairy NumPy side of things).

I guess in the case of PyPy it's maybe less interesting to individuals but maybe there's some way of reframing it to make it interesting.

In a way, I'm sort of the target market here, because I use python all the time and I haven't donated to PyPy. So why haven't I donated (other than being quite broke because I'm starting a business)?

I think part of it is the urgency. There are 3 specific campaigns you're running [1] to get funding, along with funding for the general project itself. And the campaigns themselves are great - you have good overviews of the issues that you're solving, a realistic approach and a proven track record of being able to execute. But I don't know when those campaigns started and what the cut-off date is. They sort of feel ongoing and it feels like there's no loss in not donating right now. Compare with something like kickstarter, where you have a date by which if there hasn't been enough funding you may end up losing out on something really cool.

Sorry for the rather rambling brain-dump, but maybe there's some useful perspective here.

[0] https://www.kickstarter.com/projects/214379695/micro-python-...

[1] http://pypy.org/tmdonate2.html http://pypy.org/py3donate.html http://pypy.org/numpydonate.html

I think the number one thing is you have to give people a reason to pay. Being useful is of course the first step, but people also want to feel they're getting something "extra", they want to feel like it's a transaction, however bad.

One of those things can be to provide a very clear road-map tied to donations and then follow up on it. That's what so powerful with these crowd funding campaigns. People feel like they are making a difference though at times they are essentially buying something.

I don't think you necessarily have to "compromise your values" either. You're probably already limited by money, so add some planning and just be honest about it. "With this amount, this will happen" etc. As always of course "under-promise and over-deliver".

That's mainly for individuals though. Companies are somewhat the same, but they want things that fit into their budget.

I'm pretty surprised that you can't get some funding from Canonical, Redhat, Microsoft, SuSE, Linux Foundation, et al. Is there anyone on the team who could engage those organizations?

Several linux distros have much of their customization features written in python.

Microsoft seems to be trying VERY hard to change their course and attract more developers back to their platform(s). They're clearly willing to make strategic investments in open source projects.

I've found that PyPy works wonders for the vast majority of python code I've thrown at it. In order to close remaining gaps, funding is key.

> Microsoft seems to be trying VERY hard to change their course and attract more developers back to their platform(s). They're clearly willing to make strategic investments in open source projects.

RPython/pypy is kind-of a competitor to the CLR (especially with DLR) so unless it's for a CLR/DLR backend I doubt Microsoft would show that much interest.

IMHO, While I have a huge respect for the team from a tech perspective, the pypy core team seems quite inefficient when it comes to communication and networking. Getting money is not real project there.

I may be silly but why don't you contact a professional lobbyist ? Those people know how to get money out of those big corps and could maybe be used for this good cause.

The marketing angle could be also interesting. The Big G would pay a little to appear benevolent to the community. Heck, the new Microsoft (which use Python alot) would pay to continue building their new image.

And ask the guy from LuaJIT, if he can do it...

Off topic but I just wanted to say thank you. You guys are doing killer work, pypy is an incredible project. Thank you

I get the impression that many of the donations are from individuals. That's crazy considering the great work they're doing. They really need a sugar daddy, like the big G, to fund them.

Compare this with much less ambitious projects where developers gets funded when some company or other commits to it. Perhaps the Euro-centric development holds them back here from getting some of well earned Silicon Valley cash?

I wish I had the slighest idea how to help them, apart from pitching in some beer money. They do truly great work.

It's mostly companies (in the terms of money donated, not the number of donors). We mentioned them on the blog post if they want to (a lot want to stay anonymous)

Congrats to the PyPy team on what sounds like a pretty big release!

Something in the release notes caught my eye:

> The past months have seen pypy mature and grow, as rpython becomes the goto solution for writing fast dynamic language interpreters.

I asked this question[1] on the Perl 6 thread from a few days ago but didn't get an answer. Does anyone know why on earth the Perl 6 folks created yet another dynamic language VM+JIT with MoarVM instead of taking advantage of all the great work done with PyPy? Does anyone know whether PyPy was even considered as a target before writing MoarVM?

[1] https://news.ycombinator.com/item?id=8982229

I tried talking to them, at the time they were sold on using JVM (noone got fired for using JVM for your VM). Too much hype, too little cold thought

I remember one of the times you dropped into their IRC channel to ask (not sure if there were more times). I wasn't really thrilled with the way that was handled. AIUI, they have reasons why PyPy isn't considered the best choice for Rakudo, they just did an exceedingly poor job of communicating them.

I remember that conversation. The lead developer of the project had C# and Java experience and had already written his prototypes there.

Perl6 started long before PyPy (2000 vs ~2007).

And yet, work began on MoarVM only in 2012.


I think MOAR was designed specifically to run NQP efficiently.

NQP is a subset of Perl 6 which the Perl 6 compiler is written in. That compiler takes a program written in Perl 6 and converts it into NQP. That NQP program is then compiled to different targets such as JVM, Parrot etc. This is my understanding, of course.

I don't want to hijack a pypy thread with perl stuff, but this needs to be corrected:

> I think MOAR was designed specifically to run NQP efficiently.


> NQP is a subset of Perl 6 which the Perl 6 compiler is written in.


> That compiler takes a program written in Perl 6 and converts it into NQP.

No. moar is a vm backend for nqp, with a traditional bytecode interpreter, a jit, gc, ffi, and binary format.

>That NQP program is then compiled to different targets such as JVM, Parrot etc. This is my understanding, of course.

No. Of the current three perl6 backend moar, parrot and jvm is the fastest, but has problems with its traditional threading model. It does not scale linearily up the number of physical cores. It needs locks on all data structures, but this is still better than with perl5 or pypy. parrot does scale linearily, has locks only in the scheduler, but needs compiler support to create writer functions, only owner can write, access through automatically created proxies. The jvm threading model is also faster and well known, but has this huge startup overhead.

perl5 has to clone all active data on thread init and creates writer hooks to copy updates back to the owner. this is a huge startup overhead, similar to the jvm.

Overall, would you expect any c/perl compiler dev to switch over to rpython to back your perl6 vm? Writing a jit for a good vm was the matter of a summer, and writing the compiler optimizations needs to be done in the high-level language of choice to be more efficient and maintainable.

You should better look over to rperl to compare it to rpython. This is the similar approach. A fast restricted optimizable language subset, which compiles do typed C++, with fallbacks on unoptimizable types to the slow perl5 backend. No need for a jit with its compilation and memory overhead. A jit only makes sense for dynamic profile-guided optimizations, as in pypy or v8.

Even if the simple static jit's as in p2 or luajit without much compiler optimizations still run circles around those huge battleships in the jvm, clr, v8, pypy or moar. optimized data structures still are leading the game, not optimizing compilers.

rurban, I just want to point out that I was not calling for an rpython target for nqp nor was I claiming nqp is the rpython of the perl world. Not really sure what you are correcting.

>No. moar is a vm backend for nqp, with a traditional bytecode interpreter, a jit, gc, ffi, and binary format.

But surely NQP is a "language" and Moar is a vm ?

Sounds like someone needs to write an NQP JIT in rpython.

They would have their work cut out for them, that I can see. They could reuse the .moarvm bytecode and file format, the NQP compiler and Moar backend [0]. I think many of the test cases are even common between the three backends... So all that's left to re-implement is a bytecode interpreter, IO facilities and FFI. That is, if my understanding of MoarVM is accurate.

[0] (https://github.com/perl6/nqp/tree/master/src/vm)

For the question, the start date of MoarVM is more interesting than the start date of Perl 6.

It's also not quite fair to start Perl 6 at announcement and PyPy at their first release (they were organized enough in 2004 to be applying for EU funds).

No, not really, as it ignores all the specifics of the situation which matter. The current leading implementation of Perl 6 is Rakudo, which is implemented in NQP (itself a subset of Perl 6 that's meant to be easier to implement and optimize), which is comparable in some ways to rpython. At the point you already have an intermediate language, a lot of the benefits of using PyPy and being able to write your language in a higher level language aren't as much of a gain. Additionally, using PyPy would lock the implementation into whatever current limitations PyPy has (threads?). MoarVM is not the back-end, it's one back-end, and one meant to optimize specifically for NQP's needs.

I imagine at some point a PyPy targeted Perl 6 will appear. It may or may not be based on the Rakudo implementation as a target for NQP (jnthn was able to get initial JVM support for NQP completed in a summer), or another implementation that targets PyPy through rperl directly. It would be really interesting to see what Perl 6's automatic opportunistic auto-threading of list operations and PyPy's STM work could do together.

That's a whole different (perfectly reasonable) line of reasoning that focuses on the needs of Perl 6 as an explanation.

The comment I replied to frames it such that PyPy wasn't an option for MoarVM because it didn't exist when things got started.

I don't know, but just from your description it sounds like a case of NIH[1].

[1] http://en.wikipedia.org/wiki/Not_invented_here

PyPy is crippled by the GIL, why would you base your next hot new language that aims to be mainstream on that?

The GIL is in PyPy, not in RPython, nothing precludes writing a GIL-less RPython-based VM.

However RPython is not thread-safe and the GC it provides is not thread-aware, so you'd have to provide your own thread-aware GC (or a big GC lock)

the stm-gc can be run without the STM in a way that's thread safe and does not have the STM overhead. It's almost as good as the PyPy GC (it's missing a few optimizations and it's not incremental, but can be made to work).

PyPy is actively working on an STM-based alternative. Does MoarVM avoid a GIL? If so, what approach are they taking?

Since Perl 5 doesn't have a GIL, I doubt any of the Perl 6s will. Threaded Perl has been a thing for 15 years or so. It's not something to "solve", either your interpreter is reentrant, or it isn't.

But spawning threads in your Perl code comes with the same set of problems as anywhere else, it's easy to create races or deadlocks, and Perl happily lets you do these things if you ask it to.

I think Perl 5 threading model is pretty broken. The official documentation explicitly discourages the use of threads [1]. Perl's threading model consists in running a separate interpreter on each thread without any data sharing, which practically the same as using multiple process except without the advantages.

[1] http://perldoc.perl.org/threads.html

It's not quite as bad as that, but it's close. There are some use cases where it's beneficial. The one I use it for is where I'm pre-creating workers and using the shared memory mechanisms and signalling of threads (which work quite well in Perl where it auto releases the lock based on scope).

You mean, like V8 being crippled by the GIL?

Why would you need a GIL if you don't have threads in the first place?

I'm really looking forward to the numpy specific announcements. Numpy is THE basic building for every single scientific library - if pypy can get a high performance numpy, that will go a long way towards allowing scientific users to use pypy (there is still the detail of libraries that use the c-api to wrap c libraries, but cffi is pretty neat).

Personally, I'm not too excited about PyPy, because it has little promise of speeding up the two typical use cases for Python in the kind of scientific code I care about:

1. Python is used to setup/glue together code that is written in a compiled langugage.

2. Someone writes a code by starting out writing everything in Python, then profiling and identifying the slow parts, and then writing these in a compiled language.

Edit: example of 1): HOOMD-blue, example of 2): PyClaw.

If you're using pypy, the slow parts might not be slow anymore. It might even run faster than C, because JITs are better at some things than static compilers. http://morepypy.blogspot.com/2011/02/pypy-faster-than-c-on-c...

If most of your code is numpy stuff, will you actually see a speedup from PyPy? (I mean hypothetically, once NumPyPy is properly optimized.)

Yes, ignoring that most people need a loop now and then, even pure NumPy-using code has a lot of potential.

Consider "a + b + c + d". For large arrays, the problem is that it creates many temporary results of same size as original arrays that must be streamed over the memory bus. And since FLOPs are free and your computation is limited by memory bandwidth, you pay a large penalty for using NumPy (that gets worse as expression gets more complex).

Or "a + a.T"... here you can get lots of speedup using basic tiling techniques, to fully use cache lines rather than read a cache line only to get one number and discard it.

And so on. For numeric computation, there are large gains (like 2-10x) from properly using the memory bus, that NumPy can't take advantage of. So you have projects like numexpr and Theano that mainly focus on speeding up non-loop NumPy-using code.

In my experience, you'll have some parts of your code in which you have to do a for loop or something else that cannot be (cleanly) expressed using numpy and that's where performance starts to suffer. The current solution is to use something like cython, but if pypy simplifies that, I think that's great.

It highly depends. If it's just large linear algebra operations, then it won't matter, since all the work will be done by BLAS.

If it's lots of small operations, I think pypy can inline them and you might see a significant speed up.

Assuming trunk and 2.5.0 are roughly the same thing it seems like a decent performance increase http://speed.pypy.org/

I just tried it on a heavy Python workload I run regularly and it looks like a substantial improvement for my purposes. I'm just measuring by eye but for this particular code I'd say it's around a factor of two faster. This code is really dict-heavy so I'd guess I'm benefitting a lot from those improvements. I love this project and the continual improvements it puts out.

PyPy dev here. Could you give us some info on how to help you more, for instance, a small benchmark of where we are still slow? We love to hear how PyPy is being used "in the real world"

I was too optimistic in my eyeballed assessment, but a real measurement shows there's still a good improvement. I gave my program a typical workload and tested with both PyPy 2.4.0 and 2.5.0, and the result was 13 minutes 31 seconds on 2.4.0, and 11:22 on 2.5.0. That's great!

What would be the best way to profile this thing under PyPy to see where it's spending time? I'm barely familiar with Python profiling in general, and totally unfamiliar with PyPy specifically.

As for how I'm using it, I belong to a glider club. For every weekend day we operate (which is every weekend from around the end of February through mid-December) we need to assign four club members to carry out various tasks for the day. (Specifically, we need a designated tow pilot, an instructor, a duty officer, and an assistant.) I'm the scheduling guy, and I wrote a Python program to generate the schedules for me. I put in various constraints (when people are unavailable, what days they prefer, etc.) and then the program uses a hill-climbing algorithm along with a bunch of scoring routines to find a good schedule. The actual workload operates on Schedule objects, which just contain a list of dates and the members assigned to each position on each date. Then I make a ton of duplicates, each with one adjustment, score them all, pick out the best, repeat. I can also optionally go for two adjustments, which takes much longer but gives better results, and that's what ends up taking 10+ minutes as above.

Hey, if your code is Open Source, then I would be willing to look at your code and help you optimize it. Hit me on #pypy on IRC or just send me a mail at fijall at gmail.

FYI, you'll still need to compile a specific gevent branch if you want to use it with this. lxml built fine, uWSGI seems OK too (except for the lack of gevent workers in my build).

Things seem adequately speedy, haven't investigated the network throughput tweaks yet.

which branch is that?

[update] https://travis-ci.org/gevent/gevent passes w/ pypy fwiw.

Just tried the current pip package and it doesn't build for me on OSX or Ubuntu. I have a local snapshot of the version I used with 2.4.0, but can't remember where I got it from now.

Does anyone have any experience with numpypy? Is it useful for real work yet?

In my experience, almost everything works. Also, workarounds are usually easy.


This is very exciting. I will have to look into using PyPy more regularly.

Now if only someone could fix swig to be compatible with Pypy ..

What packages are you using that use Swig? Can cffi or cppyy work for you?

I'm thinking of Mapscript bindings that allows to interact with Mapserver. They generate biding for several languages through Swig.

Some projects uses Swig precisely because of that characteristic. Project with "low bandwidth" for whom maintaining different binding generator for every language they support would be painful.

In my case, I could probably hack a minimal cffi binding. But solving the problem at the root would be a better solution for everyone.

Swig already use some form of cffi for Common Lisp apparently so maybe python-cffi support could be derived from that. I don't know for sure, just thinking out loud.

The problem with SWIG is that current SWIG bindings would not be reusable. Any major project using SWIG for Python already has a bit of glue and this glue tends to be CPython C API. SWIG has other modes, but it has not been the case in the past.

Sure, but still, having a way within SWIG to generate both standard python binding and more cffi-friendly python could help transitioning.

That way, new projects can still benefit from the strength of SWIG that allows them to propose bindings for multiple languages with a single tool. Because, let's not kid ourselves, using specific binding libs for every single language you want to support has a cost. And in that case backward compatibility does not matter much.

All the while giving a transition path for projects still using what I would call the legacy way. Or those who want/need to poke in the CPython API.

I haven't used SWIG in 15 years, but having it emit a cffi shim would be a great way to get existing projects ported over to a more portable interface without requiring them to change anything.

I wish you could compile scons with pypi.

You mean "I wish you could run scons with PyPY". It looks to me like scons does

contents = open(file).read()

where they should be doing

with open(file) as fid: contents = fid.read()

What is the difference between the two?

The latter uses a Python context manager to open the file, the former doesn't. By using the open() context manager like this, you don't need to worry about closing the file yourself, since the context manager takes care of it. The code that runs inside the 'with' block gets yielded inside the context manager. The former version of the code does not store a reference to the file handle, thus cannot close the file handle. Of course you can close file handles without using a context manager too:

  f = open('foo.txt')
  contents = f.read()
But I'd prefer this way:

  with open('foo.txt') as f:
    contents = f.read()
Context managers can of course be used for all kinds of things. For more information, see https://docs.python.org/2/library/contextlib.html

The equivalent code with out the context manager is actually this:

  f = open('foo.txt')
     contents = f.read()
So yes, extra reason to prefer the context manager.

It's documented in PyPy differences, but generally open('x').read() does not guarantee that a file is closed. In CPython it's not a problem due to refcounting, but in pypy, the file might be closed at some later stage. There is additionally a silly low number of open file descriptors, which limits how many files you can have open at once.

> In CPython it's not a problem due to refcounting, but in pypy, the file might be closed at some later stage.

Technically the guarantee in CPython is "soft" so it can be a problem, just rarely is: if the file gets referenced from an object accessible from a cycle the file won't be released (and closed) until the cycle-breaking GC is triggered.

A nitpick I only bring up because the similarity sometimes confuses me: you mean pypy, which is python-on-python done fast. Pypi is the Python Package Index, a separate thing.

Yes sorry iPhone autocorrect, I'm talking about pypy

What OS are you using scons on? I found that scons is way slower on Windows than it is on Linux.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact