Hacker News new | comments | show | ask | jobs | submit login
Accelerating Python Libraries with Numba (Part 2) (continuum.io)
70 points by corinna on May 23, 2013 | hide | past | web | favorite | 23 comments



I work on the http://wakari.io where the notebook is hosted. You can run the notebook in your own (free) wakari account.

Here is the direct link to the notebook: https://www.wakari.io/sharing/bundle/aron/Accelerating_Pytho...


the blog has been updated with some pypy benchmarks added (just as a gist)

from the blog:

https://gist.github.com/ahmadia/5638980

Update:

At the request of several commenters, here is a test script and benchmarks that we ran on PyPy and Anaconda Python (with Numba). The results are not tuned (I am not a PyPy expert!) so we did not post them in the blog. We’d be happy to look deeper into this with the PyPy developers. While PyPy is not currently installed on Wakari, we are looking at a number of ways we can install and support the PyPy community.


Numba and Cython are faster than C? In my experience, C is the fastest, and the baseline. Was the C compiled with the right optimization flags?

It'd be interesting to see a comparison to Matlab's JIT. Is Numba competitive?


Numba and Cython are faster (for very small kernels) than Python interfacing into a C kernel via ctypes. This is not the same thing as a C kernel as part of a C application. I hope I didn't falsely give that impression! You can see that for medium-size and above kernels, the call overhead makes very little impact on the code execution speed.

The C was compiled with the same flags used to compile Python. In this case: -O2 -g.

I don't have access to a MATLAB license to compare, but I would love to see this comparison done. Let me know if you need any help putting it together.


I would have liked to see PyPy test for the sum function in the comparison.

Update:

Well this is interesting ;D

time python test.py

real 0m11.169s

user 0m11.141s

sys 0m0.022s

time pypy test.py

real 0m0.259s

user 0m0.239s

sys 0m0.018s

cat test.py def python_sum(y): N = len(y) x = y[0] for i in xrange(1,N): x += y[i] return x

python_sum(xrange(100000000))


Hi. I was reluctant to publish any results with PyPy because I am not an expert in working with their code base. If you'd like to look at the GrowCut benchmark being run on my Mac (Intel iCore 7 2.7 GHz) on PyPy vs. Anaconda Python (or even better, download and run it yourself), the gist is [here](https://gist.github.com/ahmadia/5638980).


Yes, it is well known that allocating 100000000 PyObjects is slow.


To be clear, you are comparing pure pypy vs pure python?


yes


i'm curious why anyone would downmod my answer


Possibly because the down and up vote buttons are so close together, it's easy to hit the wrong one if you're using a touchscreen device.


WHY didn't they use the Python sum and reduce built-ins for sum??


Because it wouldn't be fair to test python's built-in sum on a numpy array (it would be very slow compared to numpy.sum). Numba is mostly about optimizing operations on numpy arrays.


Why would it be slow?


One possible reason: Numpy arrays are generally strongly typed, which allows Numpy's custom C implementation of sum to work while only dispatching types once instead of for each element.


Gold star for you :) The NumPy implementation is basically a for loop. The only cost is the very brief inspection of the dtype to dispatch on.


Why can't normal pythons sum do that?


Because CPython has no JIT :)


Because python lists can contain different types, unlike numpy arrays


Exactly. To further elaborate, try the following in python, if you have the memory:

    >>> x = list(range(100000000)) + ["break"]
After some time, you get a list with a hundred million integers with a string at the end.

    >>> len(x)
    100000001
Now take the sum:

    >>> sum(x)
On my machine, I get a traceback, but after about half a second of hesitation. Python's been iterating through each of those elements, testing to see if it's an integer and building a running sum, only to throw an exception because it didn't expect the string at the end. That means that in order to avoid memory corruption (or other terrible fates worse than an exception), Python must double-check each element of the list as it goes; it cannot take shortcuts.


Can't Python notice that a list always has the same types and do something special?


That could have been implemented but it is not. I dont think it will be because it is not that usefull and would be a lot of work/code for the small benefits. If you want faster python use pypy, if you want homogenieus lists you probably could use numpy arrays.


At least in CPython sum() will go through PyNumber_Add over heap allocated Python objects. With numpy.sum, the inner loop of the ufunc is summing over an contiguous block of machine types.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: