
Numba: High-Performance Python with CUDA Acceleration - kumaranvpl
https://devblogs.nvidia.com/parallelforall/numba-python-cuda-acceleration/
======
pavanky
I'd also like to shamelessly point out something I work on:
[https://github.com/arrayfire/arrayfire-
python](https://github.com/arrayfire/arrayfire-python)

It is a python wrapper around
[https://github.com/arrayfire/arrayfire](https://github.com/arrayfire/arrayfire)
and allows the code you write to use CUDA, OpenCL or x86.

~~~
snthpy
Sounds great! However without having looked at the API, my first reaction is
that it's yet another interface to learn. What would be awesome is if I could
just take my existing numpy or theano code and drop in an arrayfire object. Is
that possible or how similar is the API?

~~~
pavanky
There's an effort to have a drop in replacement for numpy using arrayfire.
[https://github.com/FilipeMaia/afnumpy](https://github.com/FilipeMaia/afnumpy)

It is still a work in progress and requires some upstream changes in arrayfire
to support the numpy api better.

~~~
snthpy
Thanks

------
lhenault
How does this compare to CuPy
([https://cupy.chainer.org/](https://cupy.chainer.org/)) ? It is now
independent from Chainer, is highly compatible with numpy and supports both
CUDA and CuDNN.

~~~
Loic
For the CUDA part I cannot tell, but Numba is also compiling on the fly your
Python code into machine code using LLVM. This where it shines. For example,
instead of pushing your code into Cython or a Fortran library, you can keep
writing in simple Python and get your code to run in some cases nearly as fast
as Fortran. This is my use case. I haven't used the CUDA features yet.

~~~
fnl
But LLVM doesn't support vectorizing, like AVX or SSE4, right? So I don't
think that would be nearly as fast as fully (Intel-) CPU optimized code...

EDIT: Let me hedge that a bit, to _advanced_ AVX instructions, as LLVM can do
simple loops and such.

~~~
lliiffee
I believe that LLVM 6 has finally introduced this, e.g. see
[http://llvm.org/docs/Vectorizers.html#vectorization-of-
funct...](http://llvm.org/docs/Vectorizers.html#vectorization-of-function-
calls)

~~~
Joky
Uh, what you're pointing at was introduced in 2012 in LLVM.

~~~
fnl
Only in parts, not all instructions, and some functionality it did have was
buggy. 4 and 5 are much more advanced/competitive on SIMD issues, it seems.

Edit: Oh, sorry you meant that other guy's link to LLVM's vectorization
tutorial. Ignore my reply ...

------
TheAlchemist
This looks really good, however I struggle to find real applications for that.

For almost all practical application, I use pandas or keras / tensorflow. I'm
probably biased as I mostly work with simple data that doesn't require
complicated calculations.

Would somebody have some benchmarks against pandas for some standard
operations ?

~~~
wesm
> Would somebody have some benchmarks against pandas for some standard
> operations ?

pandas creator here. Numba is a complementary technology to pandas, so you can
and should use them together. It is designed for use with NumPy arrays and so
does not deal with missing data and other things that pandas does. It does not
help as much with non-numeric data types.

~~~
j88439h84
Are you saying if that you have missing data you can't use numba, or if you
have missing data, and you use numba together with pandas, that pandas will
handle the missing data where numba alone could not?

~~~
grej
Heavy numba user here. What Wes is saying is that while Pandas handles some of
those missing values in an automated way, if you choose to use numba it uses
numpy arrays so you may have to handle some of those things yourself. I have
at times used a separate numpy array to indicate whether values are missing or
not. You could also use a value which is far out of the bounds of what you
might ever see in your real data, then test for that while you're looping over
those values (eg. fill missing values with -3.4E38 if you have a float32).

Depending on what you're doing, you might be able to use numpy.nan as a value.
It does work inside of numpy arrays. But some methods that operate on those
objects might not work as you expect.

For instance, if you run numpy.mean on a numpy array of [nan, 4, 5], it will
return nan. If you run the same thing on a pandas dataframe of the same
values, you'll get 4.5.

------
sandGorgon
oh wow! does anyone know how this compares to julia CUDA performance ?

~~~
wallnuss
Given that CUDAnative.jl is beating CUDA c in some benchmarks and in others it
is a bit slower, I would suspect that NUMBA is similar in performance.

The thing where the Julia CUDA support really shines is that it is supporting
arbitrary Julia structs and not just a blessed few datatypes like Float32.

------
m3kw9
So what happens if it runs in something like jupyter notebook mixed with
runtime code?

~~~
marmaduke
It works fine.

------
anc84
(2013)

~~~
grej
See the note at the top of the article:

Note, this post was originally published September 19, 2013. It was updated on
September 19, 2017.

~~~
p1esk
So, what exactly was updated?

~~~
bsprings
When I originally wrote the post in 2013, the GPU compilation part of Numba
was a product (from Anaconda Inc., nee Continuum Analytics) called NumbaPro.
It was part of a commercial package called Anaconda Accelerate that also
included wrappers for CUDA libraries like cuBLAS, as well as MKL acceleration
on the CPU.

Continuum gradually open sourced all of it (and changed their name to
Anaconda). The compiler functionality is all open source within Numba. Most
recently they released the CUDA library wrappers in a new open source package
called pyculib.

Some other minor things changed, such as what you need to import. Also, the
autojit and cudajit functionality is a bit better at type inference, so you
don't have to annotate all the types to get it to compile.

We thought it was a good idea to update the post in light of all the changes.

------
moon_of_moon
Real nice NVidia. Now how about some support for Linux on Optimus/Hybrid GPU
laptops?

~~~
jhasse
Stop buying them ;)

