
CuPy – NumPy-compatible matrix library accelerated by CUDA - rahimnathwani
https://cupy.chainer.org/
======
fourier_mode
Not sure what is the motivation behind this library. There are already several
array GPU accelerated array libraries -- PyTorch, TensorFlow, ArrayFire, it
even looks like pycuda has a small array class.

~~~
Smerity
Chainer and potentially CuPy (which was extracted from Chainer to be
independent) were around before PyTorch as it served as inspiration for
PyTorch. I feel like that's a good motivation for diversity in packages and
ecosystems regardless of your feelings otherwise.

Along with a colleague I used CuPy in first Chainer and then PyTorch for
implementing the Quasi-Recurrent Neural Network (QRNN) which at the time was
far faster than even NVIDIA's optimized cuDNN LSTM whilst getting the same (or
better) performance for many tasks.

CuPy at the time was both the easiest and most Pythonic of potential solutions
for that problem - even if it did involve writing CUDA in Python strings =]

n.b. Our use case was literally pushing state of the art in research - CuPy is
even more Pythonic if you're hitting more standard use cases.

[1]: [https://github.com/salesforce/pytorch-
qrnn](https://github.com/salesforce/pytorch-qrnn)

~~~
DanielleMolloy
PyTorch is almost (or even literally?) a fork of chainer, which can be seen
when comparing example code. The latter was much more stable than the former
for quite some time after PyTorch gained big popularity through Facebook. We
have been using chainer for a lot of published NN research projects and only
recently moved to PyTorch because students complained that they feel they
can't put the more popular framework on their CVs..

I continue to have more sympathies for chainer.

~~~
_0ffh
>PyTorch is almost (or even literally?) a fork of chainer That's funny, I
would have assumed PyTorch to be, like, the python version of Torch?

~~~
colesbury
The PyTorch tensor library was originally basically the Python version of
Torch 7. It's now moving closer towards NumPy's API (and farther from Torch
7).

Th autograd library was inspired by Chainer's design and took a lot of
concepts (but not code) directly from Chainer. The neural network API is a bit
of a hybrid. It's built on top of the autograd library but the layer names,
implementations, and some conventions were inherited from Torch 7's NN and
cuNN libraries.

(EDIT: and the name "autograd" originates from HIPS autograd library, which I
think predates Chainer)

------
smoussa
I’m working with Numba’s CUDA API and it works well as a drop in replacement
for embarrassingly parallel functions.

~~~
dotdi
I've done a fair bit of C++11 for CUDA and I was so happy to throw everything
out and switch to Numba. It has some rough edges (like incomprehensible error
messages when the type inference goes wrong) but it's been a pleasure overall
to work with.

~~~
jjoonathan
I've done a fair bit of Numba CUDA and I was so happy to throw everything out
and switch to C++.

NubaCUDA gave me lots of small problems and a few big ones. The poor support
for debug/perf tools and poor integration with other high-level python CUDA
code (FFTs in particular) sent me packing, but the number of small problems
was excessive in comparison to the size of my code. I had 5 reduced bugs at
the bottom of my notebook and two paragraphs of "baggage" at the top to
support a tiny little 50LoC kernel: one paragraph for the environment
variables and one for patching nubacuda itself for a trivial API
incompatibility that hadn't been fixed for the better part of a year. All of
this for a tool that provided a diminutive subset of functionality at the
intersection of both python and C. I've felt more computational freedom
writing BASIC on my TI-83.

CuPy could well have changed that equation!

> incomprehensible error messages when the type inference goes wrong

NumbaCUDA is truly the galaxy-brain of type checking: first it complains
loudly so as to force you to provide type information, then it opts to not
complain about a mismatch, and _then_ it silently reinterpret_casts a double*
to float* behind your back.

I know it's free software and I have no right to complain, but I sure sunk a
lot of time into this dead end and regret it.

Spiffy icon though.

~~~
elcritch
What’s the difference of NuMBA CUDA and Pytorch or similar?

If you’re doing custom kernels you should take a look at the Julia library
CuArray [1] and generic kernels [2]. I really like that I don’t have to dig
into C++ and deal with all of the memory and kernel management.

1:
[https://github.com/JuliaGPU/CuArrays.jl](https://github.com/JuliaGPU/CuArrays.jl)
2:
[http://mikeinnes.github.io/2017/08/24/cudanative.html](http://mikeinnes.github.io/2017/08/24/cudanative.html)

~~~
jjoonathan
My impression was that pytorch focused on linear algebra / deep learning. The
reason I was playing with numbacuda in the first place was because part of my
problem did not fit nicely into a (dense) linear algebra framework, so
numbacuda's custom kernel support seemed attractive. Does pytorch have a good
low-level kernel library? Or sparse linear algebra library?

I love Julia, but I haven't managed to convert anyone else on my team and I
already spent my informal exploration budget for the GPU project on nubacuda,
so JuliaGPU will have to wait for another time. I'll be sure to keep it in
mind, though!

How is the CUDA debug/perf story with Julia? Does it play nice with the nvidia
tooling?

~~~
elcritch
Ah, that makes sense. I've only dabbled a little with DNN's recently, but
pytorh/tensorflow seemed very targeted toward deep learning. Generic tools
seem more useful to me. What are you doing with fft's?

I haven't dug too deep with CudaNative / Cuarray to understand the state of
Julia perf debugging. Though here's one post on the topic:

[https://discourse.julialang.org/t/cudanative-is-
awesome/1786...](https://discourse.julialang.org/t/cudanative-is-
awesome/17861)

In general It's been very pleasant experimenting with gpu programming in
Julia. I couldn't quite grok tensorflow code, and it's cool to just declare a
Julia array and send it the GPU.

------
slaymaker1907
The problem I see with trying to emulate NumPy with a GPU accelerated version
is that the communication overhead to the GPU is so high that you are losing a
lot of performance as opposed to something like TensorFlow.

It takes a couple of microseconds just to start a kernel much less the time it
takes to transfer data back and forth.

~~~
llukas
You don't transfer back and forth if you use managed memory.

------
lp251
Is it possible to pass cupy data to C? Possibly by accessing the array
pointer.

I've used PyCUDA for a very long time. You can use Cython/ctypes/cffi to pass
PyCUDA arrays to standard C/CUDA code.

------
paddy_m
I use pandas mostly, rarely dropping down to NumPy. Are there non ML/neural
network use cases where this library is meaningfully faster than numpy?

~~~
p1esk
Anything that involves computing dot products on large matrices will be
dramatically faster with cupy than numpy (depends on your graphics card of
course).

------
tanilama
How does this compare against Google's JAX?

------
brian_herman__
Cool I wish that other libraries like tensorflow could get cudnn installed
automatically like this library!

~~~
rahimnathwani
pytorch does the same:

[https://news.ycombinator.com/item?id=15622763](https://news.ycombinator.com/item?id=15622763)

~~~
oarfish
At least up until 2 months ago, you had to do it manually, i don't know if
that changed.

~~~
rahimnathwani
When I installed pytorch, I had already installed CUDA and CUDNN (as I needed
them for tensorflow) so I have not verified what smhx said.

------
ngcc_hk
In another thread I ask about pyiodide ... is this an answer to use cuda in
browser under python?

~~~
zamadatix
No, browsers only expose specific Web APIs and CUDA is not one. WebGL doesn't
even provide access to compute shaders.

------
mruts
Can you use Pandas with it? I'm guessing.. no?

~~~
jonathanpoulter
Maybe not immediately, but isn't this what Pandas' new array interface[0]
should facilitate?

[0] [https://tomaugspurger.github.io/pandas-extension-
arrays.html](https://tomaugspurger.github.io/pandas-extension-arrays.html)

