
A Full Hardware Guide to Deep Learning - hunglee2
http://timdettmers.com/2018/12/16/deep-learning-hardware-guide/
======
minimaxir
It should be noted that explicit hardware for deep learning is only worth it
if you're training with high uptime (or are using the hardware for other
things, e.g. gaming).

If you're a hobbyist training a model on a small dataset, the current prices
of using cloud computing and running a GPU-backed VM for a couple hours on
demand will still be cheaper in the long run.

~~~
sgillen
Also if your a student you might have access to a cluster free of charge. I’m
shocked how many of my cohort have no idea they could be training with 4x
v100s for free.

~~~
grandmczeb
As a non-ML grad student trying to do ML, where is this available from?

~~~
PinkMilkshake
I can speak for Australia only. Some options include:

* University run HPC facilities.

* Your university might have a deal with government funded facilities like the NCI (National Computational Infrastructure). Their computer Raijin has GPU's.

* Or, they might be a participating University in AARNET (Australian Academic and Research Network), which next year will have web-based Jupyter Notebooks for smaller stuff.

* I 100% guarantee there are multiple unused snowflake GPU clusters sitting under academics (with more money and influence than sense) desks that were purchased under the auspice of being "absolutely essential for their work" or "its the end of the year and we HAVE to spend this money" and because hey ignored all advice from IT it never works and is unsupported. Find them, be the support guy (if you have time) on the condition you get access.

------
liuliu
For RTX cards, water cooling probably is a must nowadays. Had a lot of cooling
issues with my 3 2080Ti FE. Although managed to get it stablize at 85C, it is
not ideal.

Also, getting as much as PCIe lanes is not a bad idea. The RTX cards only
supports 2way NvLink, communication has to go through PCIe lanes between some.

I am thinking about the next build with a motherboard like
[https://www.asrockrack.com/general/productdetail.asp?Model=E...](https://www.asrockrack.com/general/productdetail.asp?Model=EPYCD8-2T#Specifications)
and Epyc 7401P. You have full 16 lanes per GPU and some room to spare for M2
SSDs. The 7401P also has a reasonable price between TR2950X and 2990WX, not
sure why don’t more people have this setup for their workstations.

~~~
ericd
Thanks for sharing. I've been planning on building a deep learning box with
4x2080's, but I've been concerned about the loss of the blower style coolers
in such a dense configuration. Have you seen any good 2080 Ti's with blowers?
Alternatively, I might just go for 2 Titans. The larger memory via 2 way
NVLink and faster tensor cores might outweigh the loss of FP32 compute for my
purposes.

As for the Epyc, I've seen more people going for the TR2. Seems like the clock
on the 7401P is lower, maybe that's why? Or maybe people are just less
familiar with dealing with server hardware and are worried that they don't
know what they don't know.

~~~
sseveran
I have a box with 4 2080 ti's that are cooled with blowers and it has not been
a problem. Temp's max out at 87C under full load.

~~~
ericd
Ah cool. What brand/model?

Also, I heard somewhere that they start throttling at 80C, have you seen that
at all?

~~~
sseveran
I believe they are evgas. I have not seen throttling.

------
hodgesrm
Just curious--anyone have recommendations for building a home box to develop
apps before pushing to Amazon? I like the quick turnaround of local
development but don't need to train/run models at full power on my local
servers.

p.s. great article, many thanks to @hunglee2 for posting. Wish I could upvote
more than once.

~~~
WrtCdEvrydy
/r/homelab on reddit has some good stuff but the 12th gen Dell servers can run
dual 300W cards (I use it with a GTX 1060)

------
bogomipz
The author states:

>"... the choice of your GPU is probably the most critical choice for your
deep learning system."

Could someone say what is it about the deep learning workload that makes a GPU
such a good fit?

Might anyone have some links or literature on this subject?

~~~
timdettmers
Please see my detailed answer on this question that you can find on quora:
[https://www.quora.com/Why-are-GPUs-well-suited-to-deep-
learn...](https://www.quora.com/Why-are-GPUs-well-suited-to-deep-learning)

------
WrtCdEvrydy
I'm very surprised that the RTX are finally being recommended for
price/performance. For the longest time, the best cost/$ for machine learning
was the GTX 1070 / 1060.

------
rshm
Is $86 normal price these days for 1080 Ti ?

    
    
        https://www.ebay.com/itm/143062782506
    

Edit: probably a listing on a compromised account.

~~~
bionsystem
No, it's a scam. Please do not relink it.

~~~
WrtCdEvrydy
What's the scam?

Edit: it doesn't look like the standard 'picture of gpu' or 'box only' so I'm
not sure what the scam is.

Edit2: Found it ([https://community.ebay.com/t5/Archive-Bidding-
Buying/WARNING...](https://community.ebay.com/t5/Archive-Bidding-
Buying/WARNING-Graphics-Card-Scam-on-E-Bay-GTX-1080-1080-Ti-and-
others/td-p/28042753)). Wow, that's fucked up.

~~~
setr
The price ain’t right, so something’s up, and since the seller isn’t talking,
you should probably not be buying.

The scam could be anything (fake card, breaks under stress, etc), but selling
at that price and saying its perfectly fine is a very suspicious mismatch. It
really doesn’t matter what the scam in particular is, it’s suspicious
regardless.

~~~
WrtCdEvrydy
I linked the thread on ebay's site. Apparently, they report the listing as
false, have it deleted and then you can't get your money back because the
listing doesn't exist.

Software engineering at it's best here :)

