
Building a $5k ML Workstation with Tiitan RTX and Ryzen ThreadRipper [video] - jeffheaton
https://www.youtube.com/watch?v=iahQOQZVdog
======
gameswithgo
If you go with air cooling on a threadripper, I suggest going with a Noctua
cooler instead of Dark Rock. Dark Rock extended the size of the heat plate to
match the TR cpu size, but they didn't cover it with heat pipes, Noctua did.
Cooling performance really suffers on the 3990X because there are chiplets on
the edge of the cpu. the 32 and 24 core models it may not matter so much.

See: [https://www.kitguru.net/components/cooling/luke-
hill/threadr...](https://www.kitguru.net/components/cooling/luke-
hill/threadripper-3990x-cpu-cooling-comparison-how-to-tame-the-beast/11/)

On non threadripper cpus I actually like Dark Rock better. Cooling is the same
as Noctua but it looks cooler and was quieter for me.

~~~
bicknyers
Also if you go air cooling and are confident with your abilities, consider
delidding to drop temps. more (5 to 20C). If you go air cooling I would assume
it is on the basis of long term stability, so don't use liquid metal either.
Also invest in a nice PSU (gold minimum) with your peak load pulling only 75%
of the rated max wattage

Edit: Like most things look at components real-world testing figures, in this
case, wattage, as opposed to TDP when planning

~~~
eightysixfour
Ryzen CPUs are soldered, delidding has minimal impact.

~~~
formerly_proven
To be honest I can't recall any AMD CPUs that weren't soldered... apart from
those that didn't have an IHS in the first place ;)

Intel was just like "we can save 4 $ BOM cost on a 500 $ part there" with
their 4th-9th generation of CPUs. No thermal headroom? No problem.

~~~
formercoder
0.8%? Could be worth billions.

------
sabalaba
Good choice on the 24 GB Titan RTX (so you can do at least batch size = 1 for
Bert-Large). Not sure if that's the reason it was chosen though to be honest.
If you want to do convnets only then you would do better with NVLinke'd 2080
Tis.

Secondarily, I would suggest that you guys not use windows but instead Ubuntu
18.04 or 20.04 LTS and just install Lambda Stack
([https://lambdalabs.com/lambda-stack-deep-learning-
software](https://lambdalabs.com/lambda-stack-deep-learning-software)). It's a
debian PPA that we maintain at Lambda to keep all of your deep learning
drivers, CUDA, CuDNN, TensorFlow, and PyTorch up to date with just apt. It's
free!

~~~
mastazi
Interesting! Is Lambda Stack going to work on 20.04? The link mentions only
16.04 and 18.04

~~~
hughdbrown
I've asked Lambda Labs 2-3 times if they are going to do an Ubuntu 20.04
stack, but I have not had a reply yet.

~~~
sabalaba
It works for 20.04 LTS and we've updated the website to reflect this, thanks
for bringing it to my attention!

------
neilv
If you only have $1K or less to spend, and you don't already have a sufficient
PC that you can upgrade with a big GPU...

A non-Threadripper Ryzen, a big GPU, and a big PSU in a big case will go most
of the way for most people, and leave you with an easy incremental upgrade
path for bigger GPUs (or maybe add a second GPU).

Slightly dated info for my current ML server, which is nicely quiet in my
living room, thanks to Noctua: [https://www.neilvandyke.org/machine-
learning/](https://www.neilvandyke.org/machine-learning/)

(Side note that's not in that page: I like to use older ThinkPads with
transplanted vintage keyboards for my workstations, so I needed to make a
separate box for the GPU. But life would be easiler, with a lot less juggling
complexity, if I simply had the big GPU in my laptop rather.)

~~~
disgruntledphd2
I recently bought a P73 thinkpad specced out like this, and it's great.

However, putting a GPU and lots of RAM into a laptop makes it very, very heavy
so it's worth thinking about if that's acceptable for you.

~~~
0xfaded
I just bought a specced out 1950x on a x399 with 64gb ram (plus the box,
power, etc) for about $800, which I think is the fair price for 3 year old
hardware. It needs a GPU, but for my usecase it's perfect.

I'm also in Europe, so prices are higher.

~~~
disgruntledphd2
Yeah, me too. I spent a lot of money on this machine, but amortised over about
five years (which is how long my last one lasted), it's acceptable (that's
what I keep telling myself anyway).

------
m0zg
Here's my recommendation (I've built several such machines for my own use):

1\. Go with a 1600W PSU from EVGA or Corsair. Other brands are hit or miss if
you ever need very high current on the rails. This will manifest in your
machine suddenly powering off when all 4 GPUs are hit with data at once (as is
typical at the start of an epoch)

2\. Use a mobo with evenly spaced GPU slots, such as ASRock TRX40 Creator.
That way you can install 4 GPUs eventually and use that 1600W PSU. You also
get 10GbE for distributed training, which is nice.

3\. Don't waste money on Titan RTX, get 2x2080ti's instead. Then after a while
get two more. Buy blower cards which blow hot air _out_ of the case.

4\. Use an extension cable to install SSD and do not install it under a GPU -
it'll die eventually due to overheating.

5\. Air cooling is fine

6\. If you have more than 2 GPUs learn how to adjust fan speeds on GPUs. Crank
them to 85-100% while training to prevent throttling.

------
brian_herman__
Here is their list:

PCPartPicker Part List:
[https://pcpartpicker.com/list/Jhyzcq](https://pcpartpicker.com/list/Jhyzcq)

CPU: AMD Threadripper 3960X 3.8 GHz 24-Core Processor ($1348.00 @ Amazon)

CPU Cooler: be quiet! Dark Rock Pro TR4 59.5 CFM CPU Cooler ($89.90 @ Amazon)

Motherboard: MSI TRX40 PRO WIFI ATX sTRX4 Motherboard ($389.99 @ B&H)

Memory: Corsair Vengeance RGB Pro 64 GB (4 x 16 GB) DDR4-3200 CL16 Memory
($329.99 @ Amazon)

Storage: Sabrent Rocket 4.0 2 TB M.2-2280 NVME Solid State Drive ($399.98 @
Amazon)

Video Card: NVIDIA TITAN RTX 24 GB Video Card ($2499.99 @ Newegg)

Case: Corsair Crystal 570X RGB ATX Mid Tower Case ($179.99 @ B&H)

Power Supply: Corsair RMx 1000 W 80+ Gold Certified Fully Modular ATX Power
Supply ($204.99 @ Best Buy)

Case Fan: Corsair LL120RGB LED 43.25 CFM 120 mm Fans 3-Pack ($120.99 @ Best
Buy)

Total: $5563.82

Prices include shipping, taxes, and discounts when available

Generated by PCPartPicker 2020-07-15 11:13 EDT-0400

~~~
p1esk
You can spend half of the specified costs on every single one of the listed
components with zero impact on your ML work productivity. $330 for 64gb of
ram, really?

~~~
akiselev
That is high end RAM binned at 3200Mhz. Consumer RAM is mostly 2133/2400 and
with servers often using 2666. RAM at 3200 (PC4-25600) gives you about 25%
more peak bandwidth than RAM at 2400 (PC4-19200) and about 16% more than 2666
(PC4-21333)

~~~
junar
Not true, 3200 CL16 can be nearly as cheap as slower RAM nowadays, with
multiple brands at sub-$60 per 16GB stick[1]. OP is paying extra for RGB, as
another commenter points out.

[1]
[https://pcpartpicker.com/products/memory/#sort=price&U=4&Z=1...](https://pcpartpicker.com/products/memory/#sort=price&U=4&Z=16384001&S=3200,5000&L=30,160)

------
paol
It's worth noting that if your ML work is entirely CUDA based (as often
happens), you likely won't benefit from a Threadripper CPU. Downgrading to a
Ryzen 9 or even 7 will reduce costs by a good bit. The savings can be pocketed
or put toward a second Titan RTX + NVLink (48Gb usable VRAM).

~~~
colincooke
Should note (from someone who has a few of these systems at my lab)
unfortunately the consumer RTX cards don't do memory pooling. This means that
although NVLINK is good for inter-GPU comms it doesn't actually allow you to
run giant models that need the entire 48GB of memory for a backwards pass
(treat the combined cards as "one card"). Not typically a problem for most
people but worth mentioning

~~~
paol
From [https://www.nvidia.com/en-us/deep-learning-
ai/products/titan...](https://www.nvidia.com/en-us/deep-learning-
ai/products/titan-rtx/):

"NVIDIA TITAN RTX NVLink Bridge

The TITAN RTX NVLink™ bridge connects two TITAN RTX cards together over a 100
GB/s interface. The result is an effective doubling of memory capacity to 48
GB, so that you can train neural networks faster, process even larger
datasets, and work with some of the biggest rendering models."

~~~
colincooke
Yeah you're not wrong, but it's a bit misleading. This allows you to run
faster, but it does it by allowing you to use a larger batch size (arguably
not best practice but your mileage will vary). Memory pooling is a bit
different in that you can treat the combined cards as a single card from
TF/pytorch.

~~~
ivalm
But batch size is prob least problem since you can do data parallelism (send
half batch to each gpu, combine on cpu).

I think only model bigger than gpu mem is where you really wish for nvlink on
v100s.

------
odomojuli
I'm a bit concerned the build uses a Gold Certified power supply unit?

Even for cheaper builds for non-ML workstations I would still only use
Platinum and nothing less. I've been told Titanium is excessive but I mean I
leave these things on for a while and power is expensive.

For the DIY enthusiast or the WFH researcher, also the amount of heat involved
can be a considerable cost in cooling or utility cost which varies
substantially by floor of a building. It's probably not good, but not that bad
to aircool this many GPUs as I've done in the past but it definitely means I'm
paying a lot for A/C in the summer but almost nothing in the winter.

I think Smerity even said he heated his small bedroom through the San
Francisco winter off of one GPU while researching YOLO.

Point: These things get hot. They require a lot of electricity. You should be
concerned about a good PSU even for smaller builds. My energy cost for a 6GPU
rig ran me about 1/3 of my total rent for a small apartment. That's
electricity BEFORE I calculated my A/C bill which was separate and also
substantial. My landlord hates me because I initially talked him into
including it with my rent.

All in all, it still makes sense to keep investing in local workstations, on-
premises builds. No security concerns about a cloud, no futzing around with
integrated notebooks, you own it you control it, and the price point up front
is extremely attractive compared to base rates for cloud computing even on
specialized hardware like a TPU.

The numbers I come up with for batches still have a wide gap of several
thousand USD most of the time, and then there's how much time it takes and how
likely their service breaks.

So kudos for the person who put in the effort to put this together and share.
Any and all efforts towards making ML/DS affordable and DIY rises the tide for
all boats.

Question to the audience: Does anyone build GPU rigs like this for
cryptocurrency anymore? I was only able to build a workstation once the price
for GPU cards crashed.

~~~
gameswithgo
The efficiency delta between Gold and Titanium is really small. Optimizing
that for heat reasons would be optimizing less than 1% of total heat output.
And most cases keep the power supply thermally separate from the rest of the
stuff anyway.

This guy has a very oversized Gold power supply, the efficiency would be ~92%
with gold vs 94% with platinum. Maybe a smaller titanium one would be a better
overall choice I guess.

~~~
odomojuli
Generally I think going jumbo is good for cooling, because honestly the weight
for this build probably doesn't matter as I imagine it's not getting
transported often.

Conversely, this is not a build for overclocking per se. However I think it's
a safe assumption we are running at capacity of over 90% for multiple days or
weeks even. If batches run into over a month, probably time to get a server
rack instead?

It is worthwhile to note you're not going to save any money in efficiency for
computation you don't use.

~~~
tedunangst
There's no way this build draws 900W.

------
svnpenn
What do people use ML for these days? I do computer programming, and I have
done some work with video encoding, but this just seems like a huge investment
money wise. So I am curious what use it is.

For my needs the most intensive thing I do is compile some large programs or
encode some large video, which you can get a computer for that for like $800.

~~~
proverbialbunny
ML is typically used to find correlations in data. If something happens over
and over again, there is a high chance it will happen again. Having such an
algorithm that has identified this correlation allows it to identify when it
will happen again. This allows for what is called predictive analytics.

This can be as simple as identifying when a customer will end their service
with a business, as there might be a pattern before previous customers have
left, predicting when new customers are going to leave, and giving them a
coupon or similar right before they would otherwise leave. This problem is
called customer churn.

It can be as complex as identifying when hardware will fail ahead of time, or
even bio-ware. For example, I did a project that predicted when people were
falling into depression before they could tell they were with a high accuracy
rate. I also predicted other future medical issues ahead of time, like the
probability an elderly person is going to fall over within the next handful of
days.

On the business side there are a lot of use cases for ML, but it falls more
into analytics than engineering, as it's about predictive insight.

------
darknoon
Now is a particularly bad time to build a rig, since new NVIDIA cards are
launching in a couple months. The value of a used 2080Ti (Turing) will tank,
because Ampere cards will be available with similar performance for half the
price.

~~~
CarbyAu
Agreed. I need to update my gaming rig. Waiting for \- Ryzen 3 \- next round
of GPUs from both vendors (although ML folks likely stay nVidia of course) \-
with luck, a better pcie4 SSD will be out by then too.

I really wouldn't build one now unless I had to.

------
a2h
Interesting video, thanks for sharing. Just curious if you have one with tests
or benchmarks for the completed build and/or temps at high loads? Would be
cool to see :)

~~~
jeffheaton
Those will be coming!

------
dodo6502
I think that tape-like piece that you removed from the SSD compartment is
actually the thermal pad that makes contact between the SSD and the MSI heat
sink cover so you may actually want that!

------
highfrequency
Thanks for the video! Could you comment on the differences between the Titan
RTX and the V100? I am a bit confused because the V100 is significantly more
expensive ($7k on Amazon even for the 16GB version) and has a slower clock
speed, yet it is the standard in ML research papers. I see that it has ~10%
more CUDA cores, but it doesn't seem like this would warrant a 3x price
increase.

~~~
Uehreka
The other commenter mentioned the “pro-level” tradeoffs, but there’s something
else too: Nvidia’s licensing won’t let you use GeForce cards in the cloud. If
you’re building a datacenter, you have to use the Teslas.

------
p1esk
2x 2080ti would be faster than titan rtx, provide the same amount of memory,
and would be cheaper.

~~~
colincooke
Unfortunately multi-GPU training doesn't scale linearly yet [1] so it's often
a better call to get a larger card then two smaller ones, at least for the
single-model case.

[1] [https://github.com/keras-
team/keras/issues/9204](https://github.com/keras-team/keras/issues/9204)

~~~
BadInformatics
(non-TF) Keras has notoriously bad multi-GPU support though (and was generally
not well optimized. Case in point, the latest version just re-exports/forwards
to tf.keras).

Looking at something like [https://lambdalabs.com/deep-learning/gpu-
benchmarks](https://lambdalabs.com/deep-learning/gpu-benchmarks) or
[https://github.com/tensorpack/benchmarks/tree/master/other-w...](https://github.com/tensorpack/benchmarks/tree/master/other-
wrappers), multi-gpu scaling on 2080tis seems pretty darn close to linear.
Plus, there are benefits to having more than one accelerator handy on a local
workstation. For one, it's much easier to have multiple experiments running
simultaneously or to run parallel training (e.g. hyperparameter search or RL
episodes). Given that only the uber-expensive enterprise cards have proper
virtualization/time sharing, trying this workflow on a Titan RTX will most
likely be suboptimal unless you _always_ run models that can make use of most
of the memory and compute (no RNNs, no Neural ODEs, etc.)

------
zmmmmm
I am curious about the opposite end of the spectrum. What is the smallest and
cheapest self contained setup that can be a serviceble development box for
someone doing ML / AI type work? Does not need to run the production load, but
has to be capable enough to allow local development activity that is still
representative enough.

So far the best I have identified is Intel NUC8 + nVidia GPU via Thunderbird.
But it is still $1000 at least by the time you have it all together.

NB: I know lots of people will say, just do it with cloud, but I work in a
setting where much of my data cannot be put in the cloud, and also where the
cost structure of funding well allows for fixed capital expenditure but not
variable cloud costs.

~~~
plasticchris
Just buy a case, motherboard, cpu, GPU, ram, psu, and build it. At the extreme
low end you can buy a refurb Dell tower and drop in a new GPU.

------
Jestar342
With the size of air-coolers these days, and how they all have integrated heat
pipes, I'm beginning to wonder if we've crossed the distinction barrier with
liquid-coolers.

Holy moly is that a big heatsink.

~~~
jrockway
Air cooling is wonderful. I didn't watch this video but I use a giant Noctua
heatsink on my system (Intel 6950X, not quite as demanding as Threadripper but
hardly a cold-running CPU) and I have zero regrets. I have used all-in-one
watercoolers, and ... they just die after a few years. I have owned three and
all three of them died in less than 2 years each. Each time, I very sadly woke
up in the morning with the realization that I'm not going to be using my
computer today. AIOs have not, in my experience, been quieter or yielded
better temps than air cooling. So I am not sure it's worth it.

Obviously a homebuilt watercooling rig is going to be way better than AIOs,
but it requires extensive maintenance. It's water. Stuff will grow in there.
It evaporates. Joints loosen over time and could end up shooting water all
over your $6000 workstation.

To me, it's not worth it. Threadripper requires a big heatsink. So be it.

~~~
smabie
Linus Tech Tips did a video in which they were unable to beat a high-end air
cooled nuctua set-up with any sort of water cooling solution. And by unable to
beat, I mean both temp-wise and sound-wise. Maybe, just maybe, you might be
able to beat air cooling on one dimension with a very expensive custom loop,
but it's totally not worth it.

Water cooled PCs do pretty look cool though.

~~~
mywittyname
Those air cooled units with the heat pipes are effectively heat pumps, and
heat pumps are crazy efficient at cooling. They rely on evaporative cooling,
which soaks up substantially more energy, so much so that it's possible to to
get temperatures noticeably below ambient using the method.

Converting water to steam takes about (from the top of my head) 5x more energy
than it takes to increase it by one degree. I imagine the coolant used in
these heat sinks is at least this efficient, if not more, and is something
that evaporates well below 100°C.

If heat sinks were like using cars and roads to transport people. A water
cooled setup would be like expanding the size of the roads to increase the
number of cars that can transport people. While those heat pipe coolers are
like replacing cars with trains, so you can do a lot more the same amount of
space.

~~~
nkurz
> Converting water to steam takes about (from the top of my head) 5x more
> energy than it takes to increase it by one degree.

I think this greatly underestimates the potential of evaporative cooling. The
"specific heat of water" is 1 calorie/gram °C --- that is, one calorie can
heat one gram of water by one degree Celsius if no phase change is involved.
The "heat of vaporization of water" is more than 500 calorie/gram at 100°C.
That is, the energy necessary to convert a given quantity of liquid water from
just below boiling to steam is not 5x the energy necessary to raise that
amount of water 1°C, but 500x! You are probably remembering that this is 5x
the energy necessary to take liquid water all the way from 0°C (almost frozen)
to 100°C (almost boiling).

------
andrewon
When he said training on Google colab took one day and on his computer took 20
mins, did he compare with google colab CPU? The difference seems too large.

------
colordrops
Being unfamiliar with ML work, when does it make sense to build one of these
vs spinning up some instances on AWS or gcloud?

~~~
bob1029
I think it really depends on how much you care about ML and how performant you
actually need it to be. If you are a hobbyist or prototyping something
speculatively for work, perhaps a cloud instance is prudent. If ML is your
life's work, I'd probably consider throwing down for a proper rig so you don't
get killed on cloud hosting fees.

------
mikece
Does anyone do a measure of how long it would take such a workstation to pay
for itself (including some nominal amount of operational cost for electricity)
compared to simply doing ML on AWS/Azure/GCP? Seems like such a metric could
be a useful measure for comparing such machines.

~~~
CoolGuySteve
A comparable workstation costs about a month of on-demand EC2 time or 3 months
of spot instance time.

AWS GPU instances are really expensive.

The most cost effective imo is to build a workstation for development and then
deploy to AWS spot if you need a cluster.

If you can't use a workstation for whatever reason, then use the new AWS
feature to "stop" spot instances and use the spot instance as your workstation
while being conscious of the high hourly cost and shutting it down when you're
not working.

~~~
FridgeSeal
Azure ML/GPU instanced are also really expensive.

I did the maths recently and figured out I could put together a machine with a
couple of 2080 Ti’s and have it pay for itself in a couple of months.

I’m very seriously considering doing it, especially as I’m the only data
scientist, if I had a team I’d be more in favour of going to the effort of
setting up cloud-based training jobs etc

~~~
CoolGuySteve
That's what my partners and I did. But we bought refurbished 1080 cards for
about $300 each and Ryzen 9 hardware.

We're waiting for the 3000 series to come out which should be a large
performance/dollar improvement over the current gen cards due to the smaller
transistor size.

------
mpfundstein
i have a threadripper 1920x with 2x2080ti.

When running cpuburn, i get around 65 tdie temp and with gpuburn, the upper
card gets to around 86 and the lower one to 81.

i have right now a water cooler for the cpu, 3 inlet fans (bottom back) and 2
outlet fans through the water cooler on top. i was wondering what temperatures
I should aim for and what an optimal fan configuration is. i have a couple of
fans laying around.

the case is lian li O11 Air and the mobo is a taichi x399.

anyone any tips?

also I would want to use SLI. but then I would have to remove the fans on the
gpu. Do I need then to water cool the gpu or what is the solution?

if anyone of the moddibg pros here can help that would be awesome :-)

------
potiuper
Please fix title Tiitan typo.

------
dzink
A bigger box helps reduce cooling and power expenses. I built a ThreadRipper
tower with 2080TI last year and used BeQuiet 900 for it with very nice
results.

------
peterpost2
Did not expect a video that wholesome.

