

Numba vs. Cython: Take 2 - Bootvis
http://jakevdp.github.io/blog/2013/06/15/numba-vs-cython-take-2/

======
StefanKarpinski
Interesting comparison. A couple of points:

1\. Fortran code being 2x slower than anything for array operations means
you're not using Fortran right. So this result says more about f2py generating
bad Fortran code than anything else. Labeling that as "Fortran" is
disingenuous – it should be labeled "f2py".

2\. I'm surprised that SciPy and Scikit aren't faster than this. You can use
clever techniques to compute pairwise distances asymptotically faster than the
naïve loop approach. For example, see the Julia Distance.jl package [1], which
gets a 125x speedup [2] over a naïve loop implementation for computing
pairwise Euclidian distances. I haven't done a direct comparison but for
simple scalar loops like this, Julia has C-speed, which means its naïve
version would be about the same as Numba and Cython and the Distance.jl
version would be correspondingly faster. Of course, the amount of speedup
depends on the shape of the matrices, so 125x speedup might not happen in this
particular case.

[1]
[https://github.com/lindahua/Distance.jl](https://github.com/lindahua/Distance.jl)

[2] [https://github.com/lindahua/Distance.jl#pairwise-
benchmark](https://github.com/lindahua/Distance.jl#pairwise-benchmark)

~~~
synparb
I think Jake was pretty straight forward in saying in the post that he isn't a
fortran expert and was looking for someone to offer up a better version if he
was doing something non-optimal.

Not that Julia ever misrepresented/misused a language in a benchmark on its
front page, say like calling pure python code numpy. . . .
([http://web.archive.org/web/20120215054907/http://julialang.o...](http://web.archive.org/web/20120215054907/http://julialang.org/))

Also, thanks for sharing the Julia distance links. The explanation of how the
speed-up is achieved is really useful:

[https://groups.google.com/d/msg/julia-
dev/hd1beLPrsVk/6n88H_...](https://groups.google.com/d/msg/julia-
dev/hd1beLPrsVk/6n88H_Iy_y4J)

~~~
mitmatt
That computation is probably something you were already familiar with in
disguise: it's just the law of cosines [1][2] along with the fact that a
matrix of inner products can be computed with a matrix-matrix multiplication
[3].

The first claim is just that for vectors x and y in an inner-product space
||x-y||^2 = <x-y,x-y> = ||x||^2 + ||y||^2 - 2<x,y>, and the second claim is
just that if you collect a bunch of vectors into the columns of X and another
bunch into the columns of Y, then the (i,j) element of X'Y is <x_i,y_j>, i.e.
the inner product of the i'th vector in X with the j'th vector in Y. (Bonus
fact: you can use that relationship to translate the statement "this set of
pairwise distances can be embedded in a Euclidean space" to an equivalent
condition that some simple matrix is negative semidefinite on a subspace
orthogonal to the all-ones vector. Euclidean metric embedding is very useful!)

[1]
[http://en.wikipedia.org/wiki/Law_of_cosines#Vector_formulati...](http://en.wikipedia.org/wiki/Law_of_cosines#Vector_formulation)

[2]
[http://en.wikipedia.org/wiki/Polarization_identity](http://en.wikipedia.org/wiki/Polarization_identity)

[3]
[http://en.wikipedia.org/wiki/Gramian_matrix](http://en.wikipedia.org/wiki/Gramian_matrix)

------
kenster07
Due to Fortran's memory model, its performance could potentially be
drastically improved by altering the arrays such that the program is accessing
consecutive columns rather than rows.

from: r = r + (X(i,k) - X(j,k)) * (X(i,k) - X(j,k))

to: r = r + (X(k,i) - X(k,j)) * (X(k,i) - X(k,j))

------
Bootvis
Can someone share experience about using Numba in practice? I'm interested in
use cases and the performance gain achieved.

I guess that once you have compiled the code the application will be rock
solid, is this correct?

~~~
dagw
I've tried using it twice on real code and in both cases the numba code ended
up significantly slower than straight python/numpy. Admittedly I didn't bother
digging into what was going on under the covers and I'm sure someone who
reallyt understood numba could probably get it to work better. But as some
simple magic to make your python code faster without rewriting anything, numba
hasn't worked for me on real code.

~~~
synparb
This has been my experience as well for "real world" tests, although I'm still
excited about the project. I think once they get a graphical annotate in place
akin to `cython -a`, this will help quite a bit in terms of figuring out what
is going on.

~~~
onalark
Annotations are in early alpha state:
[https://github.com/numba/numba/tree/annotations](https://github.com/numba/numba/tree/annotations).
Feedback/pull requests more than welcome!

