

A beginners guide to using Python for performance computing - milvakili
http://www.scipy.org/PerformancePython
How fast we get with python.  By the way i m a perl geek:)
======
hugh3
I once tried taking one of my C++ simulation codes (dealing with
diagonalization of big memory-hogging matrices) and rewriting it using scipy.
It was a _lot_ slower.

I don't have a good idea of exactly what situations you'll find scipy being
significantly slower than C++. Maybe it was the fact I was playing with
thousands-by-thousands arrays. But I do worry that there are folks out there
burning millions of CPU hours running scipy without realizing that they really
ought to be burning mere 10^5s of CPU hours.

~~~
stephth
I often wonder why there is so little interest in a high level language that
would compile to pure C++, with a syntax that makes code more succinct and
friendlier, just like CoffeeScript compiles to JavaScript. We have a clear
problem: higher level languages make programmers more productive, but they
have performance issues. Because of those, developers that need performance
fall back to C++ (look at videogame development, C++ is there to stay).

So why not make a language that is high level, but compiles to pure, cross
platform, C++? Think of the productivity win: a higher level language that
rarely needs optimizations. Obviously it would have to come with some lower
level semantics, like memory management (although it could maybe be solved
somehow like Apple did with Automatic Reference Counting, which is basically a
preprocessor) or static typing (although the compiler can sometimes be
instructed to guess the type, see the := operator in ooc [1]), but even if
some lower level semantics are unavoidable, it still seems like a huge win.
Yet every time I saw the idea mentioned online it's been either mostly ignored
or treated as stupid idea, and the very few projects that attempted to go in
that direction never took off. Maybe I'm missing something? If so I'd love to
know what it is.

No matter how fancy and awesome higher level languages are, we keep going back
to C++ for performance, and that's probably never going to change (at least
until processors are stupidly fast). So why not make it easier? The
CoffeeScript approach has largely been proven to work.

[1] <http://docs.ooc-lang.org/language/syntax.html#declarations> (note that
ooc is a good step in this direction, it compiles to C99, but then takes step
back: it depends on a garbage collector)

~~~
SwellJoe
The weave bits, and Pyrex bits, found in this article are actually snippets of
C/C++ written inline in the Python code (or very simplified Python), and able
to access variables and such defined in the Python earlier (with caveats and
when used with care), and later Python code is able to use the resulting
compiled functions.

So, it's high level Python code, except when you need to write low level,
super fast code.

The reality of making a very high level language compile down to high
performance C/C++ is a much harder problem than "compiling" CoffeeScript to
JavaScript. Both are quite high level, and CoffeeScript is merely very concise
syntactic sugar for a number of common patterns in JavaScript, and maybe a few
higher level constructs tacked on for good measure. But, JavaScript is
effectively Scheme with C-like syntax; it was designed to be built up in this
way, just like Lisp and Scheme (and it's only the lack of control over the
parser and syntax that forces it to be a compiler at all; all the capabilities
are there in JavaScript: closures, first class functions, code as data, etc.).
Building DSLs is what this kind of language is _for_. C and C++ is not for
building DSLs, and you don't magically get C performance by converting high
level code to low level code.

One example is B::C, a Perl to C compiler: <http://www-
rohan.sdsu.edu/doc/perldoc-html/B/C.html>

In short, the CoffeeScript approach is not at all comparable to a high-level
to low-level compiler. It is comparable to a, say, Lua to Perl compiler,
perhaps. From one very high level language to another very high level
language.

It's not lack of interest. Weave and Pyrex, which are discussed in this
article, are proof positive that people want the speed of low level and the
convenience of high level languages. It's just not possible to get it via the
means you've described. JITs seem to be the currently fashionable way to get
closer to that goal, though it's still miles away.

Oh, and there _are_ Lisp compilers that compile to lower level languages like
C or compile very fast Lisp. Lisp is quite high level. So, when it goes fast,
it is impressive. I'm not knowledgeable enough to know whether they could be
used in the same contexts of SciPy, Weave, Pyrex, etc. I imagine there are big
math libraries for Lisp, though...and they're probably pretty fast.

As for ooc and garbage collection: How would you write a compiler for a high
level garbage collected language without including a garbage collector? And,
do you believe that garbage collection is the primary reason dynamic languages
are slower than compiled non-GC languages? (Hint: It is not. GC is very far
down the list of resource users in every garbage collected language I'm aware
of.)

~~~
stephth
_It's just not possible to get it via the means you've described._

Why not? it might harder to implement than coffeescript, but ooc is proof that
it's possible. Remove GC and dependent libs, and you still have a
substancially higher level language than C (or C++).

 _As for ooc and garbage collection: How would you write a compiler for a high
level garbage collected language without including a garbage collector?_

I didn't say it should be garbage collected: _Obviously it would have to come
with some lower level semantics, like memory management_. The coffeescript
approach is it shouldn't have dependencies. By high level, I meant higher
level than c++. it will always be lower level than a language like python,
without dependencies we can't escape all the semantics underlying in the
target language. That said like I mentioned before, memory management could
possibly be handled by something similar to Apple's ARC (and to answer your
last question, I don't think it's that black and white, look at how Apple is
investing on ARC instead of GC; also GC's unpredictability make it a no-go for
low level routines in videogames).

~~~
SwellJoe
"Obviously it would have to come with some lower level semantics, like memory
management."

Why? Garbage collection is nowhere near the top reason high level languages
are slow, and there are GC languages that are fast.

"By high level, I meant higher level than c++. it will always be lower level
than a language like python"

So, Java or C#, then? JIT languages that can be as fast as C/C++ in some
circumstances, and faster in others (and slower in still others). And you
don't even have to manage your own memory, so it's seemingly higher level than
you're asking for.

I think GC is fine for high performance computing. Might not be fine for real
time systems or games, but that's not what this article is about. It's about
high performance scientific computing in Python. Having a GC sweep every now
and then, even at unpredictable times, isn't going to cause things to come to
a crashing halt or ruin the performance of a batch run.

PyPy might be something you'd enjoy reading about. It is an alternative
implementation of Python (mostly written in Python, and using a JIT), with the
goal of being fast like C. This may be the closest thing to a CoffeeScript
like tool where the goal is to make a very high level language faster through
translation or "compilation" or whatever.

~~~
stephth
_"Obviously it would have to come with some lower level semantics, like memory
management."

Why?_

Once again: this approach means zero dependencies. And in a way, you answered
it yourself:

 _Might not be fine for real time systems or games_

And there are more examples where it might not be fine, here's a major one:
building a C++ library. It's true, this article is not about real time systems
(I do apologize for diverging...), but this language could be one solution to
problems approached in this article (it would replace the C++ example).

I am interested in projects like shedskin or pypy, but they have dependencies.
The bottom line to me is: a higher level language that compiles to C++ code
with zero dependencies. Think about it: this language could be used for any
application, deployed in virtually any codebase and any system.

------
pjin
> Finally, for comparison we implemented this in simple C++ (nothing fancy)
> without any Python. One would expect that the C++ code would be faster but
> surprisingly, not by much! Given the fact that it's so easy to develop with
> Python, this speed reduction is not very significant.

This is the problem I have with Python+Scipy v.
MATLAB/C++/YourFavoritePlatform comparisons. I agree that Python code can be
clean and clear, but once you start to worry about vectorizing, inlining,
manual loop optimizations, or a C or Fortran FFI, then you end up with the
same solution that say MATLAB users have to deal with by writing mex functions
in C.

Granted there are differences between Language A v. Language B, one may be
more concise or their functions are _even higher-order_ than the other's.
Point is, once you go down the road of "let's optimize this dynamic language
script," your choices are not that different. You lose the "ease" of
developing with Python, and if you really performance then it's ultimately
inferior to just writing it in C++ or even Fortran.

~~~
andreasvc
I think you're forgetting the scenario where you have a lot of Python code and
only a small part is performance critical. You can speed that part up with
something like Cython, and directly integrate it with the rest of the code. If
you would have to rewrite it all in C++ it would be a tedious and buggy
process, the same goes for ad-hoc interfacing with text files.

------
spatters
Cython is pretty excellent for this sort of thing. The article contains a link
to a Cython solution written by Travis Oliphant showing how concise (and fast)
it is.

I just used Cython to speed up a non-vectorisable bit of code in a Gibbs
sampling algorithm and I was impressed by the performance increase and ease of
use.

------
kingkilr
I wrote an implementation of this using Python's array module, and doing
manual index calculations for PyPy:
[http://www.reddit.com/r/programming/comments/hh8uj/a_beginne...](http://www.reddit.com/r/programming/comments/hh8uj/a_beginners_guide_to_using_python_for_performance/c1vgqdo)
. We'll soon have NumPy multi-dimensional arrays and thus can just run that
code, but for now... we're doing OK I think.

------
kyt
One of the advantages to implementing these types of problems in C++ is you
can use compile-time optimizations, often resulting in 2-3x speed-up.

The results may be a lot different when compiling with just -O2 -ffast-math.

