

So you want to write a fast Python? - ericflo
http://alexgaynor.net/2011/jul/10/so-you-want-write-fast-python/

======
wbhart
I have some questions:

* Is the Jit in PyPy standalone, i.e. can it be used by other projects independently of Python? Is it documented?

* How does PyPy compare with highly optimised C?

* Assuming that the benchmarks that used to be at the Computer Benchmarks Game for PyPy are somewhat representative of overall performance, why is there still such an enormous difference with optimised C?

* There used to be this thing called restricted python. If one limits oneself to using that only, are benchmarks much better?

* I read somewhere that some projects have merged or combined forces. There used to be this thing called Unladen Swallow. There was also Psyco. Have either of these merged with/been absorbed by PyPy? Are any other such projects still going?

* Cython is another fast project. If it became part of mainline Python, would PyPy become irrelevant?

~~~
kingkilr
I have some answers!

* Yes, the JIT is part of the RPython translation toolchain (more on this in a few) and can be used in other interpreters, for example we have Scheme, Javascript, Prolog, and Haskell in varying level of completeness (Prolog being the most complete).

* Highly optimized C? Not spectacularly, normal C, not bad, on numerical code we often hover around gcc -O1.

* They weren't representative, most of that code was heavily optimized for CPython, which has very different performance characteristics.

* RPython is the language the PyPy interpreter is written in, it's not really meant for general purpose code, it's designed mostly for VMs, but it does run at basically C-like speed.

* Unladen Swallow was Google's fork of CPython, it's basically dead at this point, retrospective here ([http://qinsb.blogspot.com/2011/03/unladen-swallow-retrospect...](http://qinsb.blogspot.com/2011/03/unladen-swallow-retrospective.html)), Psyco was an extension module for CPython that added a JIT, it's no longer maintained (but still works), it's creator is the lead developer of PyPy.

* Cython isn't Python (this could be a meme or something), it does not implement the entire language and thus isn't directly comparable (although we still often compare, because it is a competitor in the space of "making Python-like stuff faster").

~~~
srean
Hi Alex, mine might be a minority view/request, but I think RPython is an
interesting and useful language in its own right and should be standardized
and highlighted. Possibly separated out with a website of its own to emphasize
its existence.

When I write Python it is rarely the case that I assign values of different
types to the same identifier, or add attributes to a class outside of its
definition, or monkey patch a module. In other words most of the time I am
_not_ really making use of the dynamic properties of the language. In this
scenario Python is a language that saves me "finger typing", and I do not
needs its dynamic typing as much. I am more than willing, in these scenarios,
to trade some of that unused dynamism for better performance. I think there
might be others who are similarly inclined.

The one thing that I miss in RPython is generators. Is there a particular
difficulty because of which it was left out ? I assume so, because all the
statically typed variants of Python leave it out as well, but I do not have
enough background to know what that difficulty might be. The other is Numpy :)
but I believe you guys are working on that.

@wbhart A lot of what you ask is answered in some detail in the PyPy blog
<http://morepypy.blogspot.com/> I wouldnt call PyPy less well known though.

~~~
kingkilr
[http://stackoverflow.com/questions/6277174/are-generators-
su...](http://stackoverflow.com/questions/6277174/are-generators-supported-in-
rpython/6278734#6278734) explains why we don't have generators in RPython

~~~
dochtman
You should put that on the pypy site somewhere.

------
thristian
Dear PyPy developers: while I'm happy to download binaries and run them out of
my home directory for something I use all the time (like Firefox), for less-
frequently used tools I'd much rather set up an Ubuntu PPA to install things
system-wide and have them kept up-to-date without my having to think too hard
about it.

Unfortunately, the only PyPy PPA I can find on launchpad.net[1] hasn't been
updated for a year. Please set up a new PPA, or update the one you have - a
lot of curious Python tinkerers would love to try out PyPy on their pet
projects!

[1] <https://launchpad.net/~pypy/+archive/ppa>

~~~
rat
or get into the debian repos.

~~~
thristian
I discovered that PyPy used to be in Debian, back in the days of PyPy 1.0, but
it was removed because it took so long to build and because it wasn't yet
generally useful. The latest releases of PyPy seem to be a lot more useful,
and hopefully the build-time is being addressed...

~~~
jcoby
I tried building pypy on my macbook pro w/4 4gb of ram and 2.4 GHz i5. After
over an hour I had to give up and cancel it. It took another fifteen minutes
for the system to recover enough to where I could start using it again.

I don't know what it was doing but there was a python process that was using
every bit of CPU available and a ton of swap.

This same machine can build ruby 1.9.2 in under 10 minutes. It took less time
to download and refresh all of my installed macports (including postgresql
9.0) than it took to attempt to build pypy.

~~~
sirn
Even though the documentation[1] says you need at least 4GB of RAM to build
64-bit PyPy, I've never successfully building PyPy on a 4GB machine. It took
about 20 minutes to successfully build on a machine with 8GB RAM, though.

[1]: <http://pypy.org/download.html>

------
callahad
Yeah, that's it. Time to start using PyPy as the primary implementation for
some toy projects and see how things go.

For folks that want to test their projects against various version of CPython,
PyPy, Jython, etc., check out tox: <http://tox.testrun.org/>

~~~
timtadh
I tried using it a month or so back and had a great time.

Caveats:

* You have to use a special patched version of virtualenv.

* Numpy is unavailable for Pypy. For various reasons the pypy folks are just going to have to re-implement it.

The lack of numpy turned out to be a deal breaker for me in the end. I look
forward to using it again once there version of numpy is ready for
consumption.

~~~
kingkilr
no longer true, virtualenv 1.6.1 supports pypy nicely!

------
sqrt17
This reminds me that most of what PyPy delivers today - 10x speedup, full
range of the Python language, was available within CPython in a module called
Psyco that you could easily install. Like PyPy for a long time, Psyco was only
available on x86.

Psyco's author, Armin Rigo, then got disgusted with Psyco and went on to work
on PyPy (possibly for more sanity and better funding).

So, yes, I'd be quite happy to use PyPy, even by default, if it was as easy to
install as CPython (or ghc, for that matter, which is a bit larger than
CPython but also quite easy to install) and worked out of the box. I
definitely was when it came to Psyco.

------
DrJ
If anything, PyPy's blogs and updates are really interesting to read. (not to
mention the awesome work those guys are doing!)

------
scorpion032
What is the Pypy's plan of action for Py3k?

------
ohyes
You can not write a static compiler for python, because python does not have
static typing, and it is a dynamic language. So that part is a tautology.

I suspect the author is trying to say that you can not write a compiler to
machine code for python. This is wrong.

Compiling dynamic languages to machine code has been done dozens of times in
languages with equivalent or greater dynamic properties (Common Lisp, Scheme,
and Smalltalk, for example). I guess an example of proof of concept here is
that Python was implemented in Common Lisp as a DSL (macros), and it works on
the machine code lisp implementations. ([http://common-
lisp.net/project/clpython/manual.html#compilin...](http://common-
lisp.net/project/clpython/manual.html#compiling-before-running))

The truth is that no one can be bothered to do it, because there is little to
be gained from a faster python implementation. All of the slow code is in
little parts that can be rewritten in C (or whatever faster language... so
almost any language).

The mentioned Python compiler projects are all 'research,' as far as I can
tell. Doing something that is actually known to work and is difficult would be
of no use to someone who is interested in tenure.

~~~
masklinn
> Compiling dynamic languages to machine code has been done dozens of times in
> languages with equivalent or greater dynamic properties ([...] and
> Smalltalk, for example)

Do you have sources for a static compiler for Smalltalk? Considering how
dynamic that language is I have some trouble imagining such a thing.

Please note that Alex is not talking about JIT compilers here (for good
reason: PyPy has a JIT compiler), but about static compilers.

> The truth is that no one can be bothered to do it, because there is little
> to be gained from a faster python implementation.

That's a joke right? A significant part of Pypy's effort _is_ a faster Python
implementation.

> The mentioned Python compiler projects are all 'research,' as far as I can
> tell.

The only "Python compiler projects" mentioned are ShedSkin and Cython (and
both actually compile python-like languages, neither pretends to compiling
Python), and neither is a research project, both have purely practical goals
(although ShedSkin is completely experimental at this point)

~~~
ohyes
> Do you have sources for a static compiler for Smalltalk

> Considering how dynamic that language is I have some trouble imagining such
> a thing.

Smalltalk always uses a virtual machine, it does not always use a JIT.

I said a static compiler doesn't make sense for a dynamic language (saying you
can't do it is tautological, it is like trying to get dry water).

I am talking about dynamic compilation to machine code (Not JIT). From that,
you can alter how much code in-lining and optimization happen in nested calls.
It is a much used technique and I do not need to prove its validity.

Everyone in here seems blind to the possibility, which puzzles me.

~~~
MostAwesomeDude
JIT _is_ dynamic compilation to machine code. Feel free to explain why your
technique is not JIT, though.

~~~
ohyes
This is not a JIT.

<http://paste.lisp.org/+2N29>

------
lysol
I started messing around with PyPy and love the idea of it. Is there a good
Postgres lib for PyPy yet? I've been hooked on Psycopg2 since I got into
Python and would love to take advantage of the speed boost from PyPy for my
surrounding code.

~~~
fijal
<https://bitbucket.org/alex_gaynor/pypy-postgresql/overview> unfortunately a
fork (a well maintained one at least)

~~~
StavrosK
I was going to ask "why 'unfortunately'?" and then I realized that you mean a
fork of pypy, not of psycopg2. Unfortunately indeed...

------
mattlong
So I really like python and would love to contribute to a project like PyPy,
but I don't have much (aka any) experience writing interpreters, compilers,
etc. What's the best way to start learning about these things?

------
glenjamin
Does anyone know what the state of PyPy with regards to Python 3 is? I had a
quick google around but could only find an april fool's joke from 2008!

------
schiptsov
_Perfect is the enemy of good enough._ The huge goal of CPython (especially
3.x - those who still stuck on 2.x are, well, just stuck on 2.x) is that it is
good enough, was designed to be good enough and simple. Need speed? Write
extension is C. It is simple and it was designed to be so.

Most of people still didn't get it. CPython 3.x is good enough for its purpose
and its goals. It's evolving according to its philosophy
(<http://www.python.org/dev/peps/pep-0020/>) and it is really really good (If
you understand some general principles like The Middle Way, Being Good Enough,
Divide Et Impera and so on).

Making things too complicated is as bad as making them naively oversimplified.
^_^ And porting everything to JVM is just some kind of sport.

Being simple extendable and at the same time close enough to a hardware and
using optimizations provided by an OS is much better.

~~~
masklinn
What are you talking about? And why are you even mentioning the JVM?

~~~
schiptsov
Yeah, I was influenced by my previous comment. Original article didn't mention
JVM.

But there is a bigger view - if most of the modules are mere wrappers around
plain C libraries it is very ineffective approach to try to use some
complicated VM while you must do zillions of FFI calls. That is of no use.

So, in my opinion, _for a scripting language_ fast and efficient module calls
are more important that any modern JIT stuff, while your modules mostly are
mere plain .so

btw, this is yet another point where Java sucks. If you are re-implementing
everything in Java, that is probably OK (if you don't care about performance.
NIO2 is still just a spec), but if you wish to call any code outside JVM - it
sucks. The approach itself is deeply flawed. Look what mess JDBC is.

It is so obvious, that I really disappointed by the level of discussions on
HN.

~~~
scott_s
If the scripting language itself is fast, then the game changes completely.
_That_ is obvious to the rest of us, and motivates this work.

