

Expensive lessons in Python performance tuning - iskander
http://blog.explainmydata.com/2012/07/expensive-lessons-in-python-performance.html

======
zobzu
when you need to inline C code, then you use ctypes. it seems to be a common
idea to use "whatever is the most fancy latest funny lib" because it's
_necessarily_ better. (and some older well known libs which are useful but may
also be rather slow)

Obviously, it's often not better, as the author mention.

In my experience with python, a lot of functions are directly mapped to C code
and those are nearly as fast as well written C code. Some others are calling
python code in between, and those are obviously much slower.

That's the _main_ thing to know when you want speed. So if you start using
fancy python classes and other crap on top, it's going to be slower and
slower, and if it's code that's called a lot, it's going to hurt hard.

When sometimes, some code needs to be fast and there's no functions that call
C code directly, ctypes work just fine.

Now there are _some_ libraries which are well made (performance wise), but
that's not the norm. It's actually pretty damn rare in my experience.

~~~
16s
Boost Python is fabulous too if you want to call C++ routines. We do a lot of
heavy lifting with that and use plain Python for higher-level tasks.

------
DanWaterworth
I've used numpy a little and I enjoyed it and it performed ok, but is python
really a good choice for writing this kind of code? I'm really genuinely
interested.

~~~
iskander
I've been doing data analysis in Python for the past 2ish years and I think
it's a great choice. Before Python I worked in Matlab, which simplifies matrix
operations at the expense of making everything else terrible.

The main other contenders here are R and Mathematica, both of which will fail
you when you need do something that isn't strictly statistical/mathematical.
Python gives you predictable decent performance and the NumPy ecosystem is
awesome for numerical libraries. I've never come across a machine learning
library nearly as well designed as scikit-learn and pandas dataframes are a
lot snappier than R's equivalents. My only gripe is the paucity of good
plotting libraries (matplotlib is impoverished and ugly compared with R's sexy
plotting routines).

Now, I haven't said a word about the faster statically compiled languages: C,
C++, Java, C#, F#, OCaml, Haskell, etc...

The trouble with static languages is that they either lack essential libraries
or don't allow for rapid prototyping (or in some cases, both).

Now, if you're implementing the heart of a numerically intensive algorithm and
your code can't be decomposed into a few already implemented primitives, it
makes sense to write it in C. The first thing to do, though, is to wrap that
native code with a Python interface and test it from python.

~~~
carterschonwald
There will be very nice tools rolling out for Haskell early this fall. :-).

(I can't elaborate much because I'm busy writing theM presently, but stay
tuned and I think y'all will like what you'll see when the public release
lands)

(one sexy hint though: the value add of these works in progress is enough that
I'll be able to hire folks full time to work on it with me starting mid
September or October . )

~~~
carterschonwald
Folks who are intrigued (whether as hypothetical users/customers, or as future
colleagues / collaborators, shoot me an email!)

~~~
carterschonwald
Loving the emails :-)

------
ivan_ah
> Yes, weave is unmaintained, ugly, and hard to debug >

What is this bad mouthing of scipy.weave? I am a big fan of this approach.
Nympy for everything + a simple weave inline for the inner most loop.

Maybe it is not maintained simply because it works?

~~~
srean
Though I love weave myself, it does not get much love in the numpy community.
There you would be discouraged from using weave and strongly nudged towards
Cython.

The weave source hasnt seen development since ages, whereas Blitz++, the C++
array library that it is based upon has moved on quite a bit. Blitz++ has
added SIMD support, or rather restructured its code so that the compilers find
it easy to vectorize. The new version of Blitz++ holds its own against
ifortran in terms of vectorization. These are some of the advantages that you
could have enjoyed had weave been kept uptodate. I dont blame the numpy
community for this though, though Blitz++ sees continuous development there
has not been any formal release in tens of years. So it does become difficult
to incorporate such a library. But I dont think that is the main reason why
weave has languished.

I am sure Cython is great, but what I like about weave is the syntactic sugar
that it brings. I do not have to write raw loops or do pointer arithmetic. If
you want this kind of syntactic sugar in Cython now, you call back into the
numpy API. If the default API does not give you the speed that you want, you
have to expose the raw pointers of the arrays and to the messy pointer
arithmetic and operations yourself. Nothing wrong with that, just that it can
be error prone.

Cython however has other good things going for it, for instance it allows easy
coordination with OpenMP, so it is easy to parallelize array updates, without
incurring the multiple processes overhead.

