Hacker News new | past | comments | ask | show | jobs | submit login
A Full Hardware Guide to Deep Learning (timdettmers.com)
134 points by hunglee2 on Dec 23, 2018 | hide | past | favorite | 28 comments



It should be noted that explicit hardware for deep learning is only worth it if you're training with high uptime (or are using the hardware for other things, e.g. gaming).

If you're a hobbyist training a model on a small dataset, the current prices of using cloud computing and running a GPU-backed VM for a couple hours on demand will still be cheaper in the long run.


Also if your a student you might have access to a cluster free of charge. I’m shocked how many of my cohort have no idea they could be training with 4x v100s for free.


As a non-ML grad student trying to do ML, where is this available from?


I can speak for Australia only. Some options include:

* University run HPC facilities.

* Your university might have a deal with government funded facilities like the NCI (National Computational Infrastructure). Their computer Raijin has GPU's.

* Or, they might be a participating University in AARNET (Australian Academic and Research Network), which next year will have web-based Jupyter Notebooks for smaller stuff.

* I 100% guarantee there are multiple unused snowflake GPU clusters sitting under academics (with more money and influence than sense) desks that were purchased under the auspice of being "absolutely essential for their work" or "its the end of the year and we HAVE to spend this money" and because hey ignored all advice from IT it never works and is unsupported. Find them, be the support guy (if you have time) on the condition you get access.


Just try googling around to see if your University has a HPC center. At my school the cluster isn't even geared towards ML but they happen to have super beefy nodes that are great for it.


There's a free jupyter notebook also provided by Google if you just want to dick around.


.. watched your textgenrnn youtube video a few minutes and I see your comment here :)


Since you brought that video up:

Google Colaboratory allows for a free K80 GPU; not as a strong as a GTX 1080/2080 mentioned in this article, but good enough to prototype things, as it's normally ~$0.57/hr on GCP. (how textgenrnn uses the free Colaboratory notebook: https://minimaxir.com/2018/05/text-neural-networks/)

In the case of textgenrnn, I use the Colaboratory notebook exclusively nowadays. :P


For RTX cards, water cooling probably is a must nowadays. Had a lot of cooling issues with my 3 2080Ti FE. Although managed to get it stablize at 85C, it is not ideal.

Also, getting as much as PCIe lanes is not a bad idea. The RTX cards only supports 2way NvLink, communication has to go through PCIe lanes between some.

I am thinking about the next build with a motherboard like https://www.asrockrack.com/general/productdetail.asp?Model=E... and Epyc 7401P. You have full 16 lanes per GPU and some room to spare for M2 SSDs. The 7401P also has a reasonable price between TR2950X and 2990WX, not sure why don’t more people have this setup for their workstations.


Thanks for sharing. I've been planning on building a deep learning box with 4x2080's, but I've been concerned about the loss of the blower style coolers in such a dense configuration. Have you seen any good 2080 Ti's with blowers? Alternatively, I might just go for 2 Titans. The larger memory via 2 way NVLink and faster tensor cores might outweigh the loss of FP32 compute for my purposes.

As for the Epyc, I've seen more people going for the TR2. Seems like the clock on the 7401P is lower, maybe that's why? Or maybe people are just less familiar with dealing with server hardware and are worried that they don't know what they don't know.


I have a box with 4 2080 ti's that are cooled with blowers and it has not been a problem. Temp's max out at 87C under full load.


Ah cool. What brand/model?

Also, I heard somewhere that they start throttling at 80C, have you seen that at all?


I believe they are evgas. I have not seen throttling.


Yeah. Frequency is higher, but not sure if you can overclock it. The biggest problem I saw on these EPYC chips (potentially) is that the PCIe lanes spread across 4 dies. So the performance of data going through host would be interesting to observe. This is a setup with some risks, so would like to hear more from the crowd the feasibility.


Ah yeah, the higher die count TR's had some interesting performance characteristics, according to Anandtech. One surprising result was that the inter-die communication silicon used more power than the cores themselves sometimes, IIRC. I'd probably check out that article, since I'm guessing some of it applies to the similar core-count Epycs.


Do you think water cooling is necessary for a dual RTX 2070 setup?


No. Two slots away is plenty.


Just curious--anyone have recommendations for building a home box to develop apps before pushing to Amazon? I like the quick turnaround of local development but don't need to train/run models at full power on my local servers.

p.s. great article, many thanks to @hunglee2 for posting. Wish I could upvote more than once.


/r/homelab on reddit has some good stuff but the 12th gen Dell servers can run dual 300W cards (I use it with a GTX 1060)


The author states:

>"... the choice of your GPU is probably the most critical choice for your deep learning system."

Could someone say what is it about the deep learning workload that makes a GPU such a good fit?

Might anyone have some links or literature on this subject?


Please see my detailed answer on this question that you can find on quora: https://www.quora.com/Why-are-GPUs-well-suited-to-deep-learn...


"Basic" math operations like matrix multiplication being needed to done in massive quantities in parallel


I'm very surprised that the RTX are finally being recommended for price/performance. For the longest time, the best cost/$ for machine learning was the GTX 1070 / 1060.


Is $86 normal price these days for 1080 Ti ?

    https://www.ebay.com/itm/143062782506
Edit: probably a listing on a compromised account.


No, it's a scam. Please do not relink it.


What's the scam?

Edit: it doesn't look like the standard 'picture of gpu' or 'box only' so I'm not sure what the scam is.

Edit2: Found it (https://community.ebay.com/t5/Archive-Bidding-Buying/WARNING...). Wow, that's fucked up.


The price ain’t right, so something’s up, and since the seller isn’t talking, you should probably not be buying.

The scam could be anything (fake card, breaks under stress, etc), but selling at that price and saying its perfectly fine is a very suspicious mismatch. It really doesn’t matter what the scam in particular is, it’s suspicious regardless.


I linked the thread on ebay's site. Apparently, they report the listing as false, have it deleted and then you can't get your money back because the listing doesn't exist.

Software engineering at it's best here :)




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: