
Fast Convolutional Nets with Fbfft: A GPU Performance Evaluation - fitzwatermellow
https://research.facebook.com/publications/695244360582147/fast-convolutional-nets-with-fbfft-a-gpu-performance-evaluation/
======
varelse
Reports like this are great for embarrassing NVIDIA. But the problem is that
this paper is already out of date, for embarrassing NVIDIA is the best way to
get them to address deficiencies like this. This is mostly already fixed:

[https://github.com/soumith/convnet-
benchmarks](https://github.com/soumith/convnet-benchmarks)

Check out the huge speedup between cuDNN(R2) and cuDNN(R3). Also, while there
is some fantastic ML and DNN expertise at Facebook, there are only a few GPU
experts there.

In contrast, NVIDIA is swarming with them. I would place my bets going forward
on cuDNN. NVIDIA has bet the farm on deep learning at the expense of other
problem domains. This is great for ML experts who want to add GPU support to
their algorithms, and not so great for general HPC* (but then HPC just isn't
as sexy as deep learning, now is it?).

*For example, cuBLAS perf has serious issues and cliffs on Maxwell GPUs. Sigh...

~~~
smhx
I run convnet-benchmarks, and also work at Facebook.

Unlike what the public perception is, NVIDIA, Facebook and other companies
collaborate very very closely. NVIDIA's latest R3 release has FFT-based
convolutions inspired by Facebook's work, and Facebook integrated some of the
optimization suggestions for fbfft that NVIDIA gave. Further optimizations on
the FFT approach (like tiled-FFTs) have been suggested and prototyped at FB
and carried over by NVIDIA. The relationship is more of a deep collaborative
effort.

At the end of the day, we want cuDNN to win, as it is a baseline system for
all software.

However, if you want to do research, you want to have open-source code,
because you often want to do funky changes to the base algorithm. NVIDIA falls
short on this, and we at Facebook pledged to always keep our code open source
(bitten by the fact that we had to start with closed-source blackboxes for our
initial FFT work).

~~~
varelse
Sure, but most of the DNN people are just running glorified AlexNets and
GoogleNets and not really looking deeply under the hood (I speak from 1st hand
experience).

For Caffe _is_ open source. And cuDNN makes it fast. And if it doesn't, just
jump and down like the above report did, and it will be fast in the next
release.

IMO if you're doing stuff for which cuDNN isn't good enough, you're probably
capable of writing your own kernels, and that's awesome. Welcome to the Deep
Learning equivalent of the 1%. Meanwhile, incorporate cuDNN into your
production workflow to maximize comparative advantage.

Or don't, whatever...

------
dharma1
where are large kernel sizes used/useful?

