> user of NumPy/CuPy to perform the float32 computation This is just getting tir...

didibus · on April 10, 2020

> Numpy and Cupy are perfectly capable of doing float32 computation

You're arguing that they would perfectly be capable of an implementation which would use float32, ya sure, if you change their source code you can have them be faster, that's an obvious to me? The question is, what kind of performance can you expect as a user using them as a library, and this benchmark is revealing of that in this case. What is wrong with it?

It's like saying that you can change the implementation of Python and it will make Python code run faster, like sure you can, but you don't benchmark hypothetical future versions, the current version of CuPy uses float64 for this function, and that makes it that it barely runs faster then the CPU version. Point in case, I don't know what else you're trying to disagree about here.

> their only fault is that they coerce data to float64 in this one fairly unimportant function

I don't know that, all I know is when picking one function and benchmarking it, we found a flaw which result in performance gains from CPU to GPU to be extremely underwhelming. How many more if we started to benchmark the rest?

rrss · on April 10, 2020

> if you change their source code you can have them be faster, that's an obvious to me? The question is, what kind of performance can you expect as a user using them as a library, and this benchmark is revealing of that in this case. What is wrong with it?

You don't need to change the numpy or cupy source code, just write your own version of the function in a few lines of code using other primitives that numpy/cupy provides.

Note that is exactly what is done for the neanderthal version.

didibus · on April 10, 2020

How am I supposed to know that corrcoef is slow and I should rewrite it? I didn't know before, now thanks to this article I do.