
Which GPUs to Get for Deep Learning - jonbaer
http://timdettmers.wordpress.com/2014/08/14/which-gpu-for-deep-learning/
======
benanne
I disagree with his position on memory (he mentions in the post that anything
above 1.5GB should be fine). In my experience, anything below 3GB can be
pretty uncomfortable these days, if you want to work on serious problems.

It's not just about fitting the parameters into GPU memory, but also all the
operations you perform on them, which can require a lot of intermediate
storage. The example he gives only has fully-connected layers, but
convolutional neural networks tend to require more space, especially some more
recent implementations (e.g. FFT-based convolutions or the GEMM approach used
by Caffe).

He mentions that his network (fully connected) has 52M parameters and compares
it to Krizhevsky's 2012 ImageNet network (convolutional) network, which had
60M. But Krizhevsky actually explicitly mentions in his paper that memory was
an issue:

"A single GTX 580 GPU has only 3GB of memory, which limits the maximum size of
the networks that can be trained on it. It turns out that 1.2 million training
examples are enough to train networks which are too big to fit on one GPU.
Therefore we spread the net across two GPUs." (from
[http://papers.nips.cc/paper/4824-imagenet-classification-
wit...](http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-
convolutional-neural-networks) )

~~~
PostOnce
Serious is a spectrum, someone working on more "serious" problems than you
might laugh at the idea of doing any work at all on a single consumer GPU.

~~~
benanne
Well, I did say "in my experience" :) I certainly didn't mean to imply that
problems requiring less than 3GB of GPU RAM are laughable, or anything like
that. I should have said something like like "problems that people are
currently writing papers about", maybe.

------
agibsonccc
For those looking to do it on the jvm, I have a prepacked scientific computing
framework that might be interesting:

[http://nd4j.org/](http://nd4j.org/)

This is a generic wrapper with ndarrays for cuda and normal blas operations.
Deeplearning4j (my deeplearning project) also has support for GPUs (it's built
on nd4j)

Stable version coming soon =D

For those of you in python land, I would look in to theano

~~~
e_modad
Hey Adam, I just want to say I loved the recent talks you've given. The one at
Hadoop Summit with Josh Patterson was so cool as a general overview. Keep it
up!

~~~
agibsonccc
Thanks! Things are coming along.

------
driverdan
The best value right now are used AMD GPUs. GPU cryptocurrency mining has
become unprofitable so the used market is saturated with 290, 280x, and other
AMD GPUs. If you have a limited budget this is probably the way to go.

~~~
scottlocklin
What DL packages target AMD GPUs?

------
simplyinfinity
Is there any reason why there isn't any AMD GPU? Mantle?

~~~
TTPrograms
My guess is that while AMD GPUs would likely be better raw performance/$, my
understanding is that their version of CUDA is sorely lacking compared to
nVidia's offering.

~~~
agapos
AMD GPU's have no CUDA support, if someone wants to do some computing on it,
usually OpenCL is the way to go.

~~~
TTPrograms
Sorry, when I say "version of CUDA" I meant "whatever the hell they have on
AMD" :)

Last I heard CUDA will outperform OpenCL on nVidia vs AMD at the same price
point generally as a result of CUDA being in house and closer to hardware. As
a result if you just care about compute performance you would go nVidia. If
AMD offered their own Mantle based compute offering it would probably shift
the other way.

~~~
wmobit
There isn't really anything fundamentally that would make CUDA faster that
OpenCL. There aren't any huge semantic differences between them.

~~~
liuliu
The computing model, no, not really anything fundamentally different. It comes
to tooling and profiling under Linux. Also, NVidia has slightly beefer cores
and fewer ones, where as AMD has more cores (as I heard). Thus, for me, CUDA
is a more complete tool-chain with proper compiler (nvcc), profiler (nvprof,
nvvp) and libraries (cublas, cudnn, cufft).

~~~
wmobit
There is an OpenCL profiler for AMD, and library equivalents for those in
clBLAS / clFFT

------
obrienmd
The new Maxwell 970/980 kit is very interesting from a compute perspective.

970: $329 for 3494 SP / 109 DP GFLOPS @ 145W TDP

980: $549 for 4612 SP / 144 DP GFLOPS @ 165W TDP

~~~
miahi
If the article is correct (saying that the memory bandwidth is very important)
then a 780Ti is still interesting (336GB/s[1] vs 224GB/s[2]). They increased
the memory clock but they decreased the memory interface width from 384 to
256-bit, for some reason.

[1] [http://www.geforce.com/hardware/desktop-gpus/geforce-
gtx-780...](http://www.geforce.com/hardware/desktop-gpus/geforce-
gtx-780-ti/specifications)

[2] [http://www.geforce.com/hardware/desktop-gpus/geforce-
gtx-980...](http://www.geforce.com/hardware/desktop-gpus/geforce-
gtx-980/specifications)

~~~
happycube
nVidia released the first level cutdown (GM204) as the "80" part, while it
really should be the 960 at most - the full/big Maxwell core hasn't been
released yet.

Chances are the first fabbed version simply didn't work and they're waiting
for the GM210.

------
antimora
Does anyone know if there are options to use GPUs on AWS to do calculations
using Python for Deep Learning?

~~~
valarauca1
Amazon has GPU powered instances [1]

Python has pycuda [2]

Python has OpenCL [3] amusing ran by the same person as pycuda

[1]
[http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using_clu...](http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using_cluster_computing.html)

[2] [http://documen.tician.de/pycuda/](http://documen.tician.de/pycuda/)

[3] [http://documen.tician.de/pyopencl/](http://documen.tician.de/pyopencl/)

