Unfortunately, the only PyPy PPA I can find on launchpad.net hasn't been updated for a year. Please set up a new PPA, or update the one you have - a lot of curious Python tinkerers would love to try out PyPy on their pet projects!
I don't know what it was doing but there was a python process that was using every bit of CPU available and a ton of swap.
This same machine can build ruby 1.9.2 in under 10 minutes. It took less time to download and refresh all of my installed macports (including postgresql 9.0) than it took to attempt to build pypy.
It does however build much, much faster on PyPy.
PyPy has been packaged within Fedora from Fedora 15
onwards, and is available via:
yum install pypy
* Is the Jit in PyPy standalone, i.e. can it be used by other projects independently of Python? Is it documented?
* How does PyPy compare with highly optimised C?
* Assuming that the benchmarks that used to be at the Computer Benchmarks Game for PyPy are somewhat representative of overall performance, why is there still such an enormous difference with optimised C?
* There used to be this thing called restricted python. If one limits oneself to using that only, are benchmarks much better?
* I read somewhere that some projects have merged or combined forces. There used to be this thing called Unladen Swallow. There was also Psyco. Have either of these merged with/been absorbed by PyPy? Are any other such projects still going?
* Cython is another fast project. If it became part of mainline Python, would PyPy become irrelevant?
* Highly optimized C? Not spectacularly, normal C, not bad, on numerical code we often hover around gcc -O1.
* They weren't representative, most of that code was heavily optimized for CPython, which has very different performance characteristics.
* RPython is the language the PyPy interpreter is written in, it's not really meant for general purpose code, it's designed mostly for VMs, but it does run at basically C-like speed.
* Unladen Swallow was Google's fork of CPython, it's basically dead at this point, retrospective here (http://qinsb.blogspot.com/2011/03/unladen-swallow-retrospect...), Psyco was an extension module for CPython that added a JIT, it's no longer maintained (but still works), it's creator is the lead developer of PyPy.
* Cython isn't Python (this could be a meme or something), it does not implement the entire language and thus isn't directly comparable (although we still often compare, because it is a competitor in the space of "making Python-like stuff faster").
When I write Python it is rarely the case that I assign values of different types to the same identifier, or add attributes to a class outside of its definition, or monkey patch a module. In other words most of the time I am not really making use of the dynamic properties of the language. In this scenario Python is a language that saves me "finger typing", and I do not needs its dynamic typing as much. I am more than willing, in these scenarios, to trade some of that unused dynamism for better performance. I think there might be others who are similarly inclined.
The one thing that I miss in RPython is generators. Is there a particular difficulty because of which it was left out ? I assume so, because all the statically typed variants of Python leave it out as well, but I do not have enough background to know what that difficulty might be. The other is Numpy :) but I believe you guys are working on that.
@wbhart A lot of what you ask is answered in some detail in the PyPy blog http://morepypy.blogspot.com/ I wouldnt call PyPy less well known though.
I have no idea why they haven't given the compiler some love so we can use it for our own projects. It sounds like it would become ubiquitous very, very fast.
I fully agree re the highlighting and standardizing RPython.
* What is the Jit itself written in?
* You mention that for numeric code PyPy hovers around gcc -O1. What are the other classes of code that you see PyPy as having to deal with? And how would it perform compared to normal C on those?
* Very interesting comments re the benchmark game. Did anyone ever optimise some of these tasks specifically for PyPy?
* Just a clarification re Cython. Yes, I believe it was forked from Pyrex which was mainly for nailing up C code I think. Though since that time my impression is it has become its own language, similar in parts to C/C++ and similar in parts to Python. They appear to have put a fair bit of work into making it fast. It is compiled though (to C), not interpreted. There was some talk about them trying to get it into CPython. But I'm not sure what was meant by that.
One final followup:
* Do you know if there are any plans to develop a new language built on top of the PyPy VM similar to Python but extended in some way? For example would it be feasible to implement a statically typed language with or without type inference on top of the same technology?
By the way, if you google virtual machine you get pages and pages of links to LLVM, JVM, Parrot and Dalvik (go figure) and the occasional reference to a few others. I'm really surprised that the PyPy VM is not better known!
* Comparing code with C is pretty difficult for all but the easiest 1-1 ports (i.e. numeric stuff), so I don't have good comparison numbers for it.
* Yes, I tried to improve the submissions to the benchmark game, this led down a long path which resulted in the benchmark game only having a single implementation for each language (e.g. tracemonkey is no longer on there, only v8)
* We don't plan on writing a new language, we kind of like Python, but someone absolutely could.
* PyPy doesn't have a VM, we have a toolkit for writing VMs and adding JITs to them. Perhaps these two blog posts will make it more clear: http://morepypy.blogspot.com/2011/04/tutorial-writing-interp... http://morepypy.blogspot.com/2011/04/tutorial-part-2-adding-...
Am I right in thinking that the drawback of this method is very long compile times for producing the interpreter?
I'm not sure if they're going to use Python 3's annotations to make this cleaner, that might be worth exploring...
* http://geetduggal.wordpress.com/2010/11/25/speed-up-your-pyt... lists an example where PyPy is in fact faster than C++ with STL. Not sure how it would compare to well-optimized C, but remember that JIT warmup time will hurt you for short lived jobs.
* Can't answer this one, sorry.
* RPython is still around---PyPy is written in it! Certainly it would be more friendly toward the JIT, which does best when the types of variables are consistent (which is required by RPython). Plus, RPython can be translated directly to native code and you can get rid of the JIT entirely.
* Unladen Swallow tried to do a lot of the same stuff that PyPy is doing, but on top of LLVM. It never did much better than CPython (within an order of magnitude), but if it did they were planning to merge it into CPython (of which it was a branch). PyPy is a followup to Psyco (which only ran on 32 bit x86) by the same people, but Psyco ran under CPython.
* Cython does not interpret Python---it compiles from a language that looks a lot like Python but with C-like semantics (for example, optional strong typing, and the ability to directly call C functions), and turns it into C or C++ code that can be compiled into a Python module. As such, it targets a slightly different use case.
But note a little further down the page. In the hands of a C++ coder (or indeed Cython coder) things look very different.
2) How does it compare to c? It's pretty competitive in highly algorithmic code, cpu intensive and numerical tasks. For anything else, it depends. Overall, it on average up to 4 times faster than regular python.
3) Why there's still a difference? Pypy is a work in progress. You should compare it to other projects such as v8 or tracemonkey.
4) Restricted python: It's the static subset of python used to implement pypy. It takes the place of c in cpython. Yes, it's much faster, but also more limited and less flexible.
It's nicer than c, but not as cool as full python.
You can also find Shedskin and Cython. Shedskin is a true static subset of python that compiles to c++. Cython is a python-like language that adds type declarations to the language to make it closer to c.
Unladen Swallow was a separate project and it's dead now.
Psyco is the predecessor of pypy. It is no longer maintained.
For folks that want to test their projects against various version of CPython, PyPy, Jython, etc., check out tox: http://tox.testrun.org/
* You have to use a special patched version of virtualenv.
* Numpy is unavailable for Pypy. For various reasons the pypy folks are just going to have to re-implement it.
The lack of numpy turned out to be a deal breaker for me in the end. I look forward to using it again once there version of numpy is ready for consumption.
Psyco's author, Armin Rigo, then got disgusted with Psyco and went on to work on PyPy (possibly for more sanity and better funding).
So, yes, I'd be quite happy to use PyPy, even by default, if it was as easy to install as CPython (or ghc, for that matter, which is a bit larger than CPython but also quite easy to install) and worked out of the box. I definitely was when it came to Psyco.
I suspect the author is trying to say that you can not write a compiler to machine code for python. This is wrong.
Compiling dynamic languages to machine code has been done dozens of times in languages with equivalent or greater dynamic properties (Common Lisp, Scheme, and Smalltalk, for example). I guess an example of proof of concept here is that Python was implemented in Common Lisp as a DSL (macros), and it works on the machine code lisp implementations.
The truth is that no one can be bothered to do it, because there is little to be gained from a faster python implementation. All of the slow code is in little parts that can be rewritten in C (or whatever faster language... so almost any language).
The mentioned Python compiler projects are all 'research,' as far as I can tell. Doing something that is actually known to work and is difficult would be of no use to someone who is interested in tenure.
Do you have sources for a static compiler for Smalltalk? Considering how dynamic that language is I have some trouble imagining such a thing.
Please note that Alex is not talking about JIT compilers here (for good reason: PyPy has a JIT compiler), but about static compilers.
> The truth is that no one can be bothered to do it, because there is little to be gained from a faster python implementation.
That's a joke right? A significant part of Pypy's effort is a faster Python implementation.
> The mentioned Python compiler projects are all 'research,' as far as I can tell.
The only "Python compiler projects" mentioned are ShedSkin and Cython (and both actually compile python-like languages, neither pretends to compiling Python), and neither is a research project, both have purely practical goals (although ShedSkin is completely experimental at this point)
> Considering how dynamic that language is I have some trouble imagining such a thing.
Smalltalk always uses a virtual machine, it does not always use a JIT.
I said a static compiler doesn't make sense for a dynamic language (saying you can't do it is tautological, it is like trying to get dry water).
I am talking about dynamic compilation to machine code (Not JIT). From that, you can alter how much code in-lining and optimization happen in nested calls. It is a much used technique and I do not need to prove its validity.
Everyone in here seems blind to the possibility, which puzzles me.
Not at all, dynamically typed languages have varying amounts of effective dynamicity (and staticity), some should be static enough to infer most types statically. Erlang for instance is not overly dynamic.
> I am talking about dynamic compilation to machine code (Not JIT). From that, you can alter how much code in-lining and optimization happen in nested calls. It is a much used technique and I do not need to prove its validity.
You're describing JITs here, why are you saying "not JIT"?
> Everyone in here seems blind to the possibility, which puzzles me.
Everyone "seems blind" because you're describing JITs and saying you're not talking about JITs, you're about as clear as tar during a moonless night here.
I am not describing JITs, I am describing VM based languages, which have the ability to incrementally statically compile functional objects. Does that help?
Ohyes is talking about per-function static compilation performed on the fly to machine code. Not bytecode.
It seems about halfway between static compilers and JITs really: functions are compiled to actual machine code statically, but the VM can recompile functions or compile new functions and replace old ones (of the same name) on the fly, e.g. during a REPL session.
That's not what Python does, Python code is compiled to VM bytecode and the VM does not compile it any further.
Under ohyes's scheme, the VM would compile that bytecode further down to machine code (or just skip the bytecode). It's closer to what HiPE does than what Python does.
At any rate, if I am reading the earlier post, it was tried and not found to be effective. This surprises me greatly. LLVM is of the highest quality and very fast. I'd love to know why people considered it to have gone "wrong" when it came to Unladen Swallow Python.
The idea for LLVM is that you can target the LLVM IR or LLVM Byte-code, and LLVM will provide the platform for your regrettable compiler. It has both a JIT and a native compiler component. You can run the AOT compiler either as incremental or sucking in a bunch of source and doing C style static compilation.
I am by no means an expert obviously, but when I evaluated it for a project it seemed to be geared towards generating fast code for C/++ like languages... for which you tend to know the machine types of things, and be operating in terms of machine floats/doubles/integers/etc. Which doesn't seem to be much of a problem for Python, honestly.
The 'virtual machine' is more of a Bytecode model... as the name implies, it is low level). You would have to build your own virtual machine (PythonVM or what have you) on top of it. This would need to be a complete VM with the ability to generate LLVM bytecode. Then you could take advantage of the SSA transforms constant reduction and other nice parts of the LLVM (peephole optimization for example).
But I guess the point is, LLVM takes care of one hard part for you, but there are a bunch of other difficult parts which would still need to be handled. Particularly the garbage collector. I'm sure unladen swallow generated code is bleeding fast because of its use of LLVM.
All of this said, I'm pretty sure that the project died with Python 3. Maybe this whole discussion is missing the point entirely? How do you write a fast compiler for a language which has no standard? It is bound to change unpredictably and be an incredibly frustrating task.
I guess the point is that you can not write an _efficient_ static compiler, because there is too little information available at compile-time. I don't think your examples give a counter-proof here, most Smalltalk compilers are JIT compilers, so it's really an argument in favour of the authors orginal thesis, and Common Lisp introduces a lot of extra type annotations to achieve good compilation results. Racket (PLT Scheme), the most popular Scheme implementation, uses a JIT compiler as well.
Compiled Common Lisp (without any annotations) is much faster than interpreted python, simply because all of the dynamic dispatch is handled in machine code.
To achieve speeds closer to Java and C++ in Common Lisp, you certainly need type annotations (And a working knowledge of the given compiler).
But we aren't talking about those speeds, we are talking about maybe a 4x improvement, which could be done. Racket is popular because it is easy to use and has a nice library. There are a number of schemes which are faster than it is.
Cython follows the "optional static typing" approach, and implements a rather large (but not complete) subset of Python. It's a very good intermediate solution for those 20% of the code that take up 80% of the running time.
Most of people still didn't get it. CPython 3.x is good enough for its purpose and its goals. It's evolving according to its philosophy (http://www.python.org/dev/peps/pep-0020/) and it is really really good (If you understand some general principles like The Middle Way, Being Good Enough, Divide Et Impera and so on).
Making things too complicated is as bad as making them naively oversimplified. ^_^ And porting everything to JVM is just some kind of sport.
Being simple extendable and at the same time close enough to a hardware and using optimizations provided by an OS is much better.
But there is a bigger view - if most of the modules are mere wrappers around plain C libraries it is very ineffective approach to try to use some complicated VM while you must do zillions of FFI calls. That is of no use.
So, in my opinion, for a scripting language fast and efficient module calls are more important that any modern JIT stuff, while your modules mostly are mere plain .so
btw, this is yet another point where Java sucks. If you are re-implementing everything in Java, that is probably OK (if you don't care about performance. NIO2 is still just a spec), but if you wish to call any code outside JVM - it sucks. The approach itself is deeply flawed. Look what mess JDBC is.
It is so obvious, that I really disappointed by the level of discussions on HN.
As the Java folk found out more speed is an important goal that can mean the difference between having your language used in certain fields
and c extensions can have more then a few deficiencies compared to code designed in the same language (or compiling to the same bytecode). Does it work on both unix like and windows systems?,does it work with alternate implementations?, ect.?
Yeah, how about no?
The point of Python, as far as I'm concerned, is to avoid writing C, not go to it when something is hard. As such, a better-performing Python reduces the need for C extensions and thus makes it better at actually doing things I care about.
Think why Unladen Swallow project went nowhere? ^_^