

NumPy in PyPy - status and roadmap  - kingkilr
http://morepypy.blogspot.com/2011/05/numpy-in-pypy-status-and-roadmap.html

======
andrewcooke
it's a pity that they need to implement this in RPython. if it really is for
only a tiny set of functionality (like SSE functions) couldn't they just
implement that part? perhaps in a general sse library that other people could
use? or as PyPy extensions?

~~~
wtallis
I'm not sure I understand what you're saying here. The stuff that has to be
re-written is everything in Numpy that is performance-critical and currently
written in C. Given the nature of Numpy, that's almost everything.

The only alternative would be to make PyPy compatible enough with CPython to
be able to load Numpy, but that would prevent you from taking advantage of
many of PyPy's best features.

The only tiny set of functionality referenced in the article is the portion of
Numpy which has already been rewritten as a proof of concept. Much, much more
still has to be done before there is a generally useful version of Numpy for
PyPy.

~~~
andrewcooke
the idea behind pypy is that _python_ code should be efficient. so you would
expect them to rewrite numpy in python.

they are not doing that. they are using rpython instead. rpython is lower
level. it is what is used to write the core of pypy.

which raises the question - when should people use rpython rather than python?
and that is answered, at the link, by someone saying that people should almost
always use python, not rpython. the need for rpython here is based on low
level considerations like access to sse instructions.

[edit: where i wrote "couldn't they just implement that part?" i meant
"couldn't they just implement that part _in rpython and the rest in python_?".
hopefully that and the above clarifies things.]

~~~
DasIch
What about hints to the JIT and other things you can do at that level? This is
a decision being made by people having worked on PyPy for years, if there is
anyone who is able to make a good decision about when to use rpython instead
of python they are.

~~~
andrewcooke
if hints to the jit are important here, why aren't they going to be important
in other places too? is numpy really so unusual? or is every performance-
critical library going to end up in rpython?

and sorry if thinking for myself rather than blindly trusting others offends
you.

~~~
baltcode
The idea is that most of the regular numpy (for CPython) is not written in
Python anyway. It is actually written in C. For PyPy, they will be writing
major portions of it in RPython instead of C, and RPython is way more high
level than C.

~~~
andrewcooke
i know numpy is largely written in c. i am not sure whether i am incompetent
at explaining myself, or if people simply are not reading what i post.

my point, again, is that pypy is supposed to give speed comparable to c
(particularly in tight loops that don't have changes of type, which you would
imagine numpy to be) without needing to write in either c or rpython - you can
just use python.

and this is confirmed by the poster on that thread, who said that numpy was
exceptional, and that other people should not need to use rpython... my
original comment, then, was that perhaps whatever was unusual could itself be
abstracted out and made available at the python level.

it's really not that hard to understand, is it? :o(

~~~
wtallis
The "unusual code" portion of numpy is the entirety of numpy. It's BLAS for
Python. Once it's well-optimized, basically any other numerical code or
libraries for python (things like scipy and probably large portions of PIL,
and applications built on them) should do reasonably well written in pure
python running on PyPy. The JIT in PyPy should be able to offer good
performance for non-numerical, dynamic, branchy code, so you will very rarely
have to drop down to a lower-level language (C or RPython) except for FFI.

JITs can help with tight loops that can't be identified and optimized at
compile time, but that's not what Numpy is composed of, and code using numpy
shouldn't have to go through several iterations before the Numpy routines get
identified as hotspots and optimized. Numpy implements algorithms that are
designed to be easy to statically compile into machine language code that
makes efficient use of memory, FPU, and cache resources.

~~~
andrewcooke
i still don't understand why the jit can't _also_ do a decent job on "easy"
code. i can see that sse and, as you mention, the need for warm up would be an
issue. but if that is the case why not include ways to support sse and flag
that assumptions about types are reliable in _python_? (or, if that's just too
hard, make it _easy_ to mix rpython and python - perhaps i am mistaken here
but last time i looked that seemed to be completely undocumented and
apparently hard).

it's often the case in numerical code that there's some algorithm that you
can't convert into standard matrix routines, but you still want it to run
fast. when you're using BLAS + fortran (or c) that's not a problem. but here
it will be.

sure, that was also an issue with numpy - but why can't pypy be _better_ than
old numpy?

[i am not trying to ask for the technically impossible here - i'm saying that
numpy is not so amazingly different that rpython isn't going to be necessary
for other projects. either make rpython accessible, or do numpy in some other
way - what bugs me is the "we can use rpython because we're special, but you
can't because you don't need the power" attitude]

