For those who don't know, Travis Oliphant is the creator of NumPy and(?) SciPy.
It's a good read and it puts some of the issues with a port into perspective.
While I generally grossly disagree with Travis in regard to how far PyPy can go I wonder what kind of perspective are you talking about?
"NumPy is just the beginning (SciPy, matplotlib, scikits, and 100s of other packages and legacy C/C++ and Fortran code are all very important)"
I'm not that familiar with matplotlib and not familiar at all with scikits. But, the point is that there is a lot of other C/Fortran code that users of NumPy rely on. How much do you gain by porting NumPy to PyPy? (Not a rhetorical question... I'm genuinely curious why the PyPy folks have chosen this as a goal?)
PyPy team, if you're out there, please don't take my question as criticism -- it's not. I'm just genuinely curious. Congrats on getting the funding and keep doing what you love!
NumPy that's faster is already very interesting for many people, because you don't have to go to great lenghts to shift code to C or Cython to experiment. Besides it integrates seamlessly with your current stack that might be in python.
Regarding low-level API: Calling C/fortran from PyPy's numpy should be dead easy, over say ctypes. You should be able to call to whatever C libraries you wish.
Matplotlib, SciPy and scikits should be relatively easy to get working to some extend using hacks like this - http://morepypy.blogspot.com/2011/12/plotting-using-matplotl...
As for other stuff - well if it depends too much on CPython C API PORT IT. It's not that hard and once you have a respectable Python runtime, you can do it, it has been done.
Just because we won't support all possible users from day one does not mean we should not try. There are very valid usecases where people shy away from Python because as soon as you try to write a loop in Python, stuff gets to such a crawl that you can't even run experiments. I personally believe Cython is not an answer here and you actually need full python to do most, especially for unexperienced users, so we're primarily targeting the niche that can't be possibly attacked by any solution that's based on CPython.
As for other stuff - numpy even if you vectorize stuff is nowhere near the speed of C. We try to attack that as well and even surpass C eventually.
This is pretty much it, feel free to ask more questions.
Yes, there are generic operations in NumPy that you can speed -up with specific code in C (or any-other compiled language). In addition, there is much low-hanging fruit to optimize in NumPy as well (which we at Continuum are working on as I write this).
Fijal, I know you are enthusiastic about PyPy and you should be --- it's a cool system. But, please don't spread mis-information. There are a lot of people who don't understand enough about the details of what you are talking about, and you are just going to alienate them once they realize that you don't have all your facts about NumPy clear.
For people who only make occasional use of NumPy, PyPy and it's version of numpy will likely be fine. But, those people should be well-aware that they are intentionally remaining outside the larger Python/NumPy ecosystem (Matplotlib, SciPy, scikits, etc.) and it will be a long-haul to build the features in PyPy to enable that ecosystem to migrate (and that assumes the individual projects decide that it's even worthwhile to do so).
You consistently spread rumor that we intend to reimplement all of scipy/matplotlib/scikits etc in RPython and this is plain false. I think those projects are completely reusable using one hack or another, for example the blog post I posted where within a day I was able to draw basic stuff using matplotlib on PyPy. We seriously want to reuse as much code as possible from the entire ecosystem, but also a part of the project is to provide people with a really fast python that can perform numeric computations.
Also, which facts about numpy I didn't get clear?
Isn't numexpr a good solution for that problem? numexpr is much much simpler than PyPy, so if it can reduce the performance overhead of complex vector operations with loop fusion, that seems a huge win.
One bit of context that is missing from this discussion (although Travis did allude to it earlier) is that we are actively working on building robust deferred computation support into Numpy, and to make these run much faster than what hand-tuned C can provide, via a variety of mechanisms.
(Disclaimer: I also work with Travis at http://continuum.io, along with the author of Numexpr and PyTables. :-)
Here is a real measure of simplicity: how long does it take to explain to a Numpy user how to wrap an array expression in a string, versus explaining how a JITting compiler compiler works and how to interface its runtime to their existing Python installation and how to build it and what the limitations of RPython are.
Heck, I'm an actual developer (not a scientific programmer) and it took me a little while to understand what PyPy does.
I'm a scientist, not a scientific programmer or a developer and this is all I really care about: PyPy is currently--in it's partially implemented state--much, much faster than CPython on the vast majority of things it can do. If I am able to use PyPy's NumPy and it's faster than traditional NumPy I will do so as long as the opportunity cost doesn't outweigh the speed increases (NumPy is pretty useless to me--maybe not some--without SciPy and matplotlib).
I don't care that PyPy is written in RPython any more than I care that it has a JIT, or that CPython is written in C. I also don't care how that JIT works or how CPython compiles to byte code or how Jython does magic to make my Python code run on the JVM. I do care that it "works," as a scientist. I do care that they are "correct" implementations, as a scientist. As an individual, I am interested in the inner workings of PyPy and CPython, CoffeeScript, Go, and Brain Fuck, but when I'm working on research the only thing that actually is important as far as the language implementation is concerned is that it just works. The interpreter is just a brand of the particular tool that I'm using.
I would certainly prefer it if PyPy was 100% compatible with the CPython C API even if it was at 80% (maybe even 60%) of the CPython C API speed because then I don't even have to think. I'd be using PyPy because it's faster overall and I can do the analyses I want faster.
Anyway, I think if you're explaining all of what you mentioned to a NumPy user or a PyPy NumPy user you'd be doing it wrong. Or maybe the PyPy folks would be doing it wrong. Because this is how that conversation would go with my peers.
Sad Panda: "Ugh my code is running slowly I think I have to jump into C"
Me: "Have you tried PyPy's NumPy yet?"
Sad Panda: "What's that?"
Me: "It's faster Python and NumPy. Go here [link] and download it see if it runs your code faster"
Sad Panda: "Okay I'll do that"
..a while later..
Sad Panda: "It was a little faster, but I ended up getting one of the CS guys to help me run it on a tesla cluster with OpenCL. But I think I can use it on the spiny lumpsucker data I'm collecting."
While part of me agrees with this, if PyPy starts sacrificing performance for CPython compatibility then pretty soon it'll degenerate into CPython.
I know very little about the development aspect of PyPy or Numpy, but I know that at this moment in time Numpy/Scipy/Cython have revolutionized how I do research on a day to day basis. It seems unfortunate that there seems to be such animosity surrounding this issue.
I don't think I'm highly dismissive about numpy/scipy/cython community. I wouldn't be implementing all this stuff if I didn't think those APIs are good and they're the future of scientific computing, they just lack a reasonable replacement for C. If Python is to surpress Fortran on the scientific field, it really does need a way to express fast algorithms in Python and I don't think CPython can provide that.
Personally, I don't like Cython as a way to speed up Python, because it sacrifices the beauty of the language in favor of performance. I think investing time in the Python VM is a much better spent time, but this is a very personal opinion and I won't blame people who thing otherwise. I think Cython is a better way to call to C than all other options that exist right now (like using CPython C API or ctypes), but this is yet entirely different than using it for speedups.
The proposed way so far has been "everything must be 100% backwards compatible, otherwise it won't work". This is all well and good, but I'm not aware about a way to make things both 100% compatible and fast, so we decided to break with some compatibility like CPython C API or reusing most of what's implemented in C in numpy. I argued many times for those decisions, but in short -- there has to be a bit of breakage before we can make a leap. This is not based on dismissal of other people's opinions, especially those that spent tons of time with the scientific community, it's just that I don't see a way forward that does not introduce some breakage.
Actually, Fijal, I have a question - and this is related to our previous Skype discussion as well. I know that a lot of the work on PyPy has been on the JIT, but have you guys really ever pursued the idea of just building PyPy as a front end for LLVM? All your type inferencing logic would probably be a lot more code than what a traditional LLVM front-end normally consists of, but you'd get to leverage all of the massive community efforts on LLVM's optimized code generation and other backend optimizers.
Just a thought..
Has that changed?
Unfortunately I dont have the regex at hand, just remembering it was something with groups and backtracking.
http://pypy.org/compat.html mentions compatiblity according to the standard library. This is fine for (web) servers and command line applications.
But what about desktop applications? Can I (someday) take a PyQt or PyGTK code and compile it with PyPy without modifications?
cpyext is a hack to let PyPy run Python/C API modules. It works for some, but not all, and it's slow. I'm sure it'll keep getting better, but not sure if it'll ever be good enough to run PyGTK or PyQt.
PyPy has good support for ctypes, so ctypes bindings are a good option. There are projects out there like pygir-ctypes and ctypes-gtk. One of them just needs to become complete enough to be a good choice for GTK programming. Compatibility with new PyGObject is more likely than compatibility with legacy PyGTK, though.
PyQt is harder because it's C++.
I expect to see quite a lot of improvement in that area as a side effect of the work on numpy.
also, when i looked at pypy the re implementation was in c and looked so similar to cpython's i assumed it was the same or closely related. but i didn't pay much attention, so i may be mistaken.
 roughly. it actually runs 6x faster on recent pypy that cpython.