Hacker News new | past | comments | ask | show | jobs | submit login
So you want to write a fast Python? (alexgaynor.net)
262 points by ericflo on July 11, 2011 | hide | past | web | favorite | 67 comments

Dear PyPy developers: while I'm happy to download binaries and run them out of my home directory for something I use all the time (like Firefox), for less-frequently used tools I'd much rather set up an Ubuntu PPA to install things system-wide and have them kept up-to-date without my having to think too hard about it.

Unfortunately, the only PyPy PPA I can find on launchpad.net[1] hasn't been updated for a year. Please set up a new PPA, or update the one you have - a lot of curious Python tinkerers would love to try out PyPy on their pet projects!

[1] https://launchpad.net/~pypy/+archive/ppa

This is the single, bar none, thing that prevents me from using PyPy in my day-to-day programming. It's a huge oversight, in my opinion, and it greatly hinders its adoption.

or get into the debian repos.

I discovered that PyPy used to be in Debian, back in the days of PyPy 1.0, but it was removed because it took so long to build and because it wasn't yet generally useful. The latest releases of PyPy seem to be a lot more useful, and hopefully the build-time is being addressed...

I tried building pypy on my macbook pro w/4 4gb of ram and 2.4 GHz i5. After over an hour I had to give up and cancel it. It took another fifteen minutes for the system to recover enough to where I could start using it again.

I don't know what it was doing but there was a python process that was using every bit of CPU available and a ton of swap.

This same machine can build ruby 1.9.2 in under 10 minutes. It took less time to download and refresh all of my installed macports (including postgresql 9.0) than it took to attempt to build pypy.

Even though the documentation[1] says you need at least 4GB of RAM to build 64-bit PyPy, I've never successfully building PyPy on a 4GB machine. It took about 20 minutes to successfully build on a machine with 8GB RAM, though.

[1]: http://pypy.org/download.html

AFAIK, the build time is unlikely to change any time soon, since it's a complex process doing whole-program analysis on an interpreter, and translating it to a lower level binary.

It does however build much, much faster on PyPy.

A yum repository for Fedora 15 is welcome as well.


  PyPy has been packaged within Fedora from Fedora 15
  onwards, and is available via:

  yum install pypy

I have some questions:

* Is the Jit in PyPy standalone, i.e. can it be used by other projects independently of Python? Is it documented?

* How does PyPy compare with highly optimised C?

* Assuming that the benchmarks that used to be at the Computer Benchmarks Game for PyPy are somewhat representative of overall performance, why is there still such an enormous difference with optimised C?

* There used to be this thing called restricted python. If one limits oneself to using that only, are benchmarks much better?

* I read somewhere that some projects have merged or combined forces. There used to be this thing called Unladen Swallow. There was also Psyco. Have either of these merged with/been absorbed by PyPy? Are any other such projects still going?

* Cython is another fast project. If it became part of mainline Python, would PyPy become irrelevant?

I have some answers!

* Yes, the JIT is part of the RPython translation toolchain (more on this in a few) and can be used in other interpreters, for example we have Scheme, Javascript, Prolog, and Haskell in varying level of completeness (Prolog being the most complete).

* Highly optimized C? Not spectacularly, normal C, not bad, on numerical code we often hover around gcc -O1.

* They weren't representative, most of that code was heavily optimized for CPython, which has very different performance characteristics.

* RPython is the language the PyPy interpreter is written in, it's not really meant for general purpose code, it's designed mostly for VMs, but it does run at basically C-like speed.

* Unladen Swallow was Google's fork of CPython, it's basically dead at this point, retrospective here (http://qinsb.blogspot.com/2011/03/unladen-swallow-retrospect...), Psyco was an extension module for CPython that added a JIT, it's no longer maintained (but still works), it's creator is the lead developer of PyPy.

* Cython isn't Python (this could be a meme or something), it does not implement the entire language and thus isn't directly comparable (although we still often compare, because it is a competitor in the space of "making Python-like stuff faster").

Hi Alex, mine might be a minority view/request, but I think RPython is an interesting and useful language in its own right and should be standardized and highlighted. Possibly separated out with a website of its own to emphasize its existence.

When I write Python it is rarely the case that I assign values of different types to the same identifier, or add attributes to a class outside of its definition, or monkey patch a module. In other words most of the time I am not really making use of the dynamic properties of the language. In this scenario Python is a language that saves me "finger typing", and I do not needs its dynamic typing as much. I am more than willing, in these scenarios, to trade some of that unused dynamism for better performance. I think there might be others who are similarly inclined.

The one thing that I miss in RPython is generators. Is there a particular difficulty because of which it was left out ? I assume so, because all the statically typed variants of Python leave it out as well, but I do not have enough background to know what that difficulty might be. The other is Numpy :) but I believe you guys are working on that.

@wbhart A lot of what you ask is answered in some detail in the PyPy blog http://morepypy.blogspot.com/ I wouldnt call PyPy less well known though.

http://stackoverflow.com/questions/6277174/are-generators-su... explains why we don't have generators in RPython

You should put that on the pypy site somewhere.

I'd like to second this request, I would be more than happy to write code in a slightly more restricted way if it meant that I could generate C-fast modules from Python code. In my opinion, RPython is an even bigger deal than PyPy, as it's a very easy way to get C speed with Python syntax.

I have no idea why they haven't given the compiler some love so we can use it for our own projects. It sounds like it would become ubiquitous very, very fast.

Generators probably need to be efficiently inlined to be fast? Maybe this depends on the capabilities of the VM. I think they can also be implemented with continuations which few languages implement due to being a headscrew. I'm just guessing though. I'd be interested in hearing an authoritative answer.

I fully agree re the highlighting and standardizing RPython.

@srean, what do you use RPython for?

Thanks for the concise and extremely helpful answers. I have some followups:

* What is the Jit itself written in?

* You mention that for numeric code PyPy hovers around gcc -O1. What are the other classes of code that you see PyPy as having to deal with? And how would it perform compared to normal C on those?

* Very interesting comments re the benchmark game. Did anyone ever optimise some of these tasks specifically for PyPy?

* Just a clarification re Cython. Yes, I believe it was forked from Pyrex which was mainly for nailing up C code I think. Though since that time my impression is it has become its own language, similar in parts to C/C++ and similar in parts to Python. They appear to have put a fair bit of work into making it fast. It is compiled though (to C), not interpreted. There was some talk about them trying to get it into CPython. But I'm not sure what was meant by that.

One final followup:

* Do you know if there are any plans to develop a new language built on top of the PyPy VM similar to Python but extended in some way? For example would it be feasible to implement a statically typed language with or without type inference on top of the same technology?

By the way, if you google virtual machine you get pages and pages of links to LLVM, JVM, Parrot and Dalvik (go figure) and the occasional reference to a few others. I'm really surprised that the PyPy VM is not better known!

* The JIT is written in RPython.

* Comparing code with C is pretty difficult for all but the easiest 1-1 ports (i.e. numeric stuff), so I don't have good comparison numbers for it.

* Yes, I tried to improve the submissions to the benchmark game, this led down a long path which resulted in the benchmark game only having a single implementation for each language (e.g. tracemonkey is no longer on there, only v8)

* We don't plan on writing a new language, we kind of like Python, but someone absolutely could.

* PyPy doesn't have a VM, we have a toolkit for writing VMs and adding JITs to them. Perhaps these two blog posts will make it more clear: http://morepypy.blogspot.com/2011/04/tutorial-writing-interp... http://morepypy.blogspot.com/2011/04/tutorial-part-2-adding-...

I think I understand now. The PyPy implementation of Python is written using the RPython -> Jit interpreter framework that the PyPy project has. Clever.

Am I right in thinking that the drawback of this method is very long compile times for producing the interpreter?

Yes, compile times are rather long, about 40 minutes for the full interpreter on a recent, fairly good, laptop. On the other hand you don't need to do a compile all that frequently, because all the code is testable by executing directly atop another Python.

So you develop the interpreter by interpreting it on itself before compiling it to itself by itself.


You might have some fun reading about the Futamura projections :)

Oh God, that stirs some memories. I'll read it again, thank you.

Cython just has a syntax for annotating your code with types so it can work its magic and make it go fast, although you can add the annotations to a separate file and keep your code in pure Python.

I'm not sure if they're going to use Python 3's annotations to make this cleaner, that might be worth exploring...

* Yes...I have already seen it for some "build your own language interpreter" examples. Not sure about the documentation quality.

* http://geetduggal.wordpress.com/2010/11/25/speed-up-your-pyt... lists an example where PyPy is in fact faster than C++ with STL. Not sure how it would compare to well-optimized C, but remember that JIT warmup time will hurt you for short lived jobs.

* Can't answer this one, sorry.

* RPython is still around---PyPy is written in it! Certainly it would be more friendly toward the JIT, which does best when the types of variables are consistent (which is required by RPython). Plus, RPython can be translated directly to native code and you can get rid of the JIT entirely.

* Unladen Swallow tried to do a lot of the same stuff that PyPy is doing, but on top of LLVM. It never did much better than CPython (within an order of magnitude), but if it did they were planning to merge it into CPython (of which it was a branch). PyPy is a followup to Psyco (which only ran on 32 bit x86) by the same people, but Psyco ran under CPython.

* Cython does not interpret Python---it compiles from a language that looks a lot like Python but with C-like semantics (for example, optional strong typing, and the ability to directly call C functions), and turns it into C or C++ code that can be compiled into a Python module. As such, it targets a slightly different use case.

" http://geetduggal.wordpress.com/2010/11/25/speed-up-your-pyt.... lists an example where PyPy is in fact faster than C++ with STL. Not sure how it would compare to well-optimized C, but remember that JIT warmup time will hurt you for short lived jobs."

But note a little further down the page. In the hands of a C++ coder (or indeed Cython coder) things look very different.

1) Is the JIT standalone? No easy answer. Pypy is a complex project, not just a python implementation. Pypy is a complex toolchain, a framework, for implementing dynamic languages. With pypy, you could write your own implementation of ruby, scheme, php, whatever... And once your interpreter is complete, pypy will generate the JIT automatically for you (you won't need to implement it yourself, you just have to write the interpreter, and pypy will do the rest). Pypy is writen in Rpython (restricted python) but you can implement any vm with it, for any language you may want.

2) How does it compare to c? It's pretty competitive in highly algorithmic code, cpu intensive and numerical tasks. For anything else, it depends. Overall, it on average up to 4 times faster than regular python.

3) Why there's still a difference? Pypy is a work in progress. You should compare it to other projects such as v8 or tracemonkey.

4) Restricted python: It's the static subset of python used to implement pypy. It takes the place of c in cpython. Yes, it's much faster, but also more limited and less flexible. It's nicer than c, but not as cool as full python. You can also find Shedskin and Cython. Shedskin is a true static subset of python that compiles to c++. Cython is a python-like language that adds type declarations to the language to make it closer to c.

Unladen Swallow was a separate project and it's dead now. Psyco is the predecessor of pypy. It is no longer maintained.

Maybe I'm confused, but isn't Cython just a tool for writing C types and C functions directly in Python? Whereas PyPy, Jython, etc, are full runtime environments.

Yeah, that's it. Time to start using PyPy as the primary implementation for some toy projects and see how things go.

For folks that want to test their projects against various version of CPython, PyPy, Jython, etc., check out tox: http://tox.testrun.org/

I tried using it a month or so back and had a great time.


* You have to use a special patched version of virtualenv.

* Numpy is unavailable for Pypy. For various reasons the pypy folks are just going to have to re-implement it.

The lack of numpy turned out to be a deal breaker for me in the end. I look forward to using it again once there version of numpy is ready for consumption.

no longer true, virtualenv 1.6.1 supports pypy nicely!

This reminds me that most of what PyPy delivers today - 10x speedup, full range of the Python language, was available within CPython in a module called Psyco that you could easily install. Like PyPy for a long time, Psyco was only available on x86.

Psyco's author, Armin Rigo, then got disgusted with Psyco and went on to work on PyPy (possibly for more sanity and better funding).

So, yes, I'd be quite happy to use PyPy, even by default, if it was as easy to install as CPython (or ghc, for that matter, which is a bit larger than CPython but also quite easy to install) and worked out of the box. I definitely was when it came to Psyco.

If anything, PyPy's blogs and updates are really interesting to read. (not to mention the awesome work those guys are doing!)

What is the Pypy's plan of action for Py3k?

You can not write a static compiler for python, because python does not have static typing, and it is a dynamic language. So that part is a tautology.

I suspect the author is trying to say that you can not write a compiler to machine code for python. This is wrong.

Compiling dynamic languages to machine code has been done dozens of times in languages with equivalent or greater dynamic properties (Common Lisp, Scheme, and Smalltalk, for example). I guess an example of proof of concept here is that Python was implemented in Common Lisp as a DSL (macros), and it works on the machine code lisp implementations. (http://common-lisp.net/project/clpython/manual.html#compilin...)

The truth is that no one can be bothered to do it, because there is little to be gained from a faster python implementation. All of the slow code is in little parts that can be rewritten in C (or whatever faster language... so almost any language).

The mentioned Python compiler projects are all 'research,' as far as I can tell. Doing something that is actually known to work and is difficult would be of no use to someone who is interested in tenure.

> Compiling dynamic languages to machine code has been done dozens of times in languages with equivalent or greater dynamic properties ([...] and Smalltalk, for example)

Do you have sources for a static compiler for Smalltalk? Considering how dynamic that language is I have some trouble imagining such a thing.

Please note that Alex is not talking about JIT compilers here (for good reason: PyPy has a JIT compiler), but about static compilers.

> The truth is that no one can be bothered to do it, because there is little to be gained from a faster python implementation.

That's a joke right? A significant part of Pypy's effort is a faster Python implementation.

> The mentioned Python compiler projects are all 'research,' as far as I can tell.

The only "Python compiler projects" mentioned are ShedSkin and Cython (and both actually compile python-like languages, neither pretends to compiling Python), and neither is a research project, both have purely practical goals (although ShedSkin is completely experimental at this point)

> Do you have sources for a static compiler for Smalltalk

> Considering how dynamic that language is I have some trouble imagining such a thing.

Smalltalk always uses a virtual machine, it does not always use a JIT.

I said a static compiler doesn't make sense for a dynamic language (saying you can't do it is tautological, it is like trying to get dry water).

I am talking about dynamic compilation to machine code (Not JIT). From that, you can alter how much code in-lining and optimization happen in nested calls. It is a much used technique and I do not need to prove its validity.

Everyone in here seems blind to the possibility, which puzzles me.

> I said a static compiler doesn't make sense for a dynamic language (saying you can't do it is tautological, it is like trying to get dry water).

Not at all, dynamically typed languages have varying amounts of effective dynamicity (and staticity), some should be static enough to infer most types statically. Erlang for instance is not overly dynamic.

> I am talking about dynamic compilation to machine code (Not JIT). From that, you can alter how much code in-lining and optimization happen in nested calls. It is a much used technique and I do not need to prove its validity.

You're describing JITs here, why are you saying "not JIT"?

> Everyone in here seems blind to the possibility, which puzzles me.

Everyone "seems blind" because you're describing JITs and saying you're not talking about JITs, you're about as clear as tar during a moonless night here.

No, JIT is a specific type of dynamic compilation. It is not every type of dynamic compilation. Maybe I mean 'incremental compilation.'

I am not describing JITs, I am describing VM based languages, which have the ability to incrementally statically compile functional objects. Does that help?

Then the confusion probably comes from the fact that Python's main implementation is VM based. So suggesting what they are already doing as an improvement over what they are already doing is confusing to say the least. Perhaps they need a better VM, but that is the technique they use. To see Python's byte code open up a .pyc file.

> Perhaps they need a better VM, but that is the technique they use. To see Python's byte code open up a .pyc file.

Ohyes is talking about per-function static compilation performed on the fly to machine code. Not bytecode.

It seems about halfway between static compilers and JITs really: functions are compiled to actual machine code statically, but the VM can recompile functions or compile new functions and replace old ones (of the same name) on the fly, e.g. during a REPL session.

That's not what Python does, Python code is compiled to VM bytecode and the VM does not compile it any further.

Under ohyes's scheme, the VM would compile that bytecode further down to machine code (or just skip the bytecode). It's closer to what HiPE does than what Python does.

Yes, this is correct.


Do you mean AOT compilation? From an earlier post I gleaned that the Uladen Swallow project used the LLVM backend, which is variously described as a Jit, Ahead of Time compiler, incremental compiler and various other things. It's clear there is some confusion in language, but I got what you are talking about.

At any rate, if I am reading the earlier post, it was tried and not found to be effective. This surprises me greatly. LLVM is of the highest quality and very fast. I'd love to know why people considered it to have gone "wrong" when it came to Unladen Swallow Python.

Good call, I think you are right. My intended point was that the OP was dismissive of the idea of Incremental/AOT compilation as a possibility. I was not terribly clear and may have misread him.

The idea for LLVM is that you can target the LLVM IR or LLVM Byte-code, and LLVM will provide the platform for your regrettable compiler. It has both a JIT and a native compiler component. You can run the AOT compiler either as incremental or sucking in a bunch of source and doing C style static compilation.

I am by no means an expert obviously, but when I evaluated it for a project it seemed to be geared towards generating fast code for C/++ like languages... for which you tend to know the machine types of things, and be operating in terms of machine floats/doubles/integers/etc. Which doesn't seem to be much of a problem for Python, honestly.

The 'virtual machine' is more of a Bytecode model... as the name implies, it is low level). You would have to build your own virtual machine (PythonVM or what have you) on top of it. This would need to be a complete VM with the ability to generate LLVM bytecode. Then you could take advantage of the SSA transforms constant reduction and other nice parts of the LLVM (peephole optimization for example).

But I guess the point is, LLVM takes care of one hard part for you, but there are a bunch of other difficult parts which would still need to be handled. Particularly the garbage collector. I'm sure unladen swallow generated code is bleeding fast because of its use of LLVM.

All of this said, I'm pretty sure that the project died with Python 3. Maybe this whole discussion is missing the point entirely? How do you write a fast compiler for a language which has no standard? It is bound to change unpredictably and be an incredibly frustrating task.

JIT is dynamic compilation to machine code. Feel free to explain why your technique is not JIT, though.

This is not a JIT.


A static compiler has nothing to do with static typing, it is just a traditional compiler that passes through the source code and generates the final result of the compilation in one shot, as opposed to a byte-code JIT compiler.

I guess the point is that you can not write an _efficient_ static compiler, because there is too little information available at compile-time. I don't think your examples give a counter-proof here, most Smalltalk compilers are JIT compilers, so it's really an argument in favour of the authors orginal thesis, and Common Lisp introduces a lot of extra type annotations to achieve good compilation results. Racket (PLT Scheme), the most popular Scheme implementation, uses a JIT compiler as well.

My point is that you do not have to go full static compilation to achieve the benefits of using machine code and static compilation techniques.

Compiled Common Lisp (without any annotations) is much faster than interpreted python, simply because all of the dynamic dispatch is handled in machine code.

To achieve speeds closer to Java and C++ in Common Lisp, you certainly need type annotations (And a working knowledge of the given compiler).

But we aren't talking about those speeds, we are talking about maybe a 4x improvement, which could be done. Racket is popular because it is easy to use and has a nice library. There are a number of schemes which are faster than it is.

Cython, in my eyes, would qualify as a "production"-level compiler project - the lxml library uses it, and you occasionally run into it in other projects.

Cython follows the "optional static typing" approach, and implements a rather large (but not complete) subset of Python. It's a very good intermediate solution for those 20% of the code that take up 80% of the running time.

I mostly agree with you, but there are cases where the problem isn't that "thing X is too slow", but "after a long period of running the garbage collector gets bogged down with long-lived cycles and dies". So I think garbage collector improvements would be really helpful. Core interpreter improvements, not so much.

Absolutely, improving garbage collection is definitely more interesting/important than code that runs 'faster' for arbitrary benchmarks.

I started messing around with PyPy and love the idea of it. Is there a good Postgres lib for PyPy yet? I've been hooked on Psycopg2 since I got into Python and would love to take advantage of the speed boost from PyPy for my surrounding code.

https://bitbucket.org/alex_gaynor/pypy-postgresql/overview unfortunately a fork (a well maintained one at least)

I was going to ask "why 'unfortunately'?" and then I realized that you mean a fork of pypy, not of psycopg2. Unfortunately indeed...

So I really like python and would love to contribute to a project like PyPy, but I don't have much (aka any) experience writing interpreters, compilers, etc. What's the best way to start learning about these things?

Does anyone know what the state of PyPy with regards to Python 3 is? I had a quick google around but could only find an april fool's joke from 2008!

Perfect is the enemy of good enough. The huge goal of CPython (especially 3.x - those who still stuck on 2.x are, well, just stuck on 2.x) is that it is good enough, was designed to be good enough and simple. Need speed? Write extension is C. It is simple and it was designed to be so.

Most of people still didn't get it. CPython 3.x is good enough for its purpose and its goals. It's evolving according to its philosophy (http://www.python.org/dev/peps/pep-0020/) and it is really really good (If you understand some general principles like The Middle Way, Being Good Enough, Divide Et Impera and so on).

Making things too complicated is as bad as making them naively oversimplified. ^_^ And porting everything to JVM is just some kind of sport.

Being simple extendable and at the same time close enough to a hardware and using optimizations provided by an OS is much better.

What are you talking about? And why are you even mentioning the JVM?

Yeah, I was influenced by my previous comment. Original article didn't mention JVM.

But there is a bigger view - if most of the modules are mere wrappers around plain C libraries it is very ineffective approach to try to use some complicated VM while you must do zillions of FFI calls. That is of no use.

So, in my opinion, for a scripting language fast and efficient module calls are more important that any modern JIT stuff, while your modules mostly are mere plain .so

btw, this is yet another point where Java sucks. If you are re-implementing everything in Java, that is probably OK (if you don't care about performance. NIO2 is still just a spec), but if you wish to call any code outside JVM - it sucks. The approach itself is deeply flawed. Look what mess JDBC is.

It is so obvious, that I really disappointed by the level of discussions on HN.

If the scripting language itself is fast, then the game changes completely. That is obvious to the rest of us, and motivates this work.

The good is the enemy of the better especially if you keep thinking its good enough.

As the Java folk found out more speed is an important goal that can mean the difference between having your language used in certain fields

and c extensions can have more then a few deficiencies compared to code designed in the same language (or compiling to the same bytecode). Does it work on both unix like and windows systems?,does it work with alternate implementations?, ect.?

"Write your most important code in a different language that lends itself more to shooting yourself in the foot."

Yeah, how about no?

Ideas (algorithms and data structures) are what matters. Language is just a language, you can translate ideas (not code) between them if you're familiar with both languages and the field. ^_^

Yes, it's very computer-sciency to hold that opinion. Say that next time you're stuck tracking down a buffer overflow.

The point of Python, as far as I'm concerned, is to avoid writing C, not go to it when something is hard. As such, a better-performing Python reduces the need for C extensions and thus makes it better at actually doing things I care about.

Do not forget what is your target CPU and what it can do and what cannot.

Think why Unladen Swallow project went nowhere? ^_^

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact