
DeepLearning11: 10x Nvidia GTX 1080 Ti Single Root Deep Learning Server - tim_sw
https://www.servethehome.com/deeplearning11-10x-nvidia-gtx-1080-ti-single-root-deep-learning-server-part-1/
======
scottlegrand2
It's nice to see NVidia's behavior here is more widely known now. To this day,
I do not understand the justification for asserting that one cannot install
GeForce GPUs wherever one wants to install them. I really wish that Nvidia
felt more secure that killer Tesla features like NVLINK and the Tensor Cores
were sufficient product differentiation to make purchasing Tesla GPUs worth
the price.

And that's because to the best of my knowledge, most of the CUDA ecosystem out
there was developed on GeForce GPUs.

There is currently no GeForce equivalent of Volta at a time when the
underlying programming model has undergone some traumatic changes that really
alter the way to write efficient code going forward. If the only way to access
Volta GPUs turns out to be AWS instances at $25/hour or $150,000 DGX-1V
servers plus hosting costs, I suspect a lot of existing CUDA code will bitrot.

Imagine a near future where AMD Vega GPUs are faster than GTX 1080 TI at FP16
training and inference for deep learning. Without some sort of successor to
that GPU, I really think that could happen because Nvidia went out of its way
to cripple FP16 performance on GeForce.

~~~
dharma1
1080Ti doesn't have double speed FP16, Vega 64 should be more than twice as
fast for half precision training/inference. The problem is that AMD is has
been lacking in framework support and fast optimised kernels. This seemed
promising -
[https://news.ycombinator.com/item?id=15516166](https://news.ycombinator.com/item?id=15516166)

~~~
scottlegrand2
It's worse than that, the FP16 MAD in GTX 1080 TI is significantly slower than
converting 2 FP16 numbers to FP32, performing FP32 math on them, and
accumulating results in FP32. Had it been the same speed, I don't think I
would be anywhere near as annoyed as I am with this crippling.

That said, at Vega's 26 or so FP16 TFLOPS, it wouldn't be hard for NVIDIA to
release a GeForce Volta with 30-40 tensor core TFLOPS, that both stomped on
Vega and remained significantly inferior to V100. Given how hard it is to
program Volta optimally, I'm surprised they haven't done so already, if only
as a Titan XV Edition that can only be purchased from their website.

~~~
dharma1
Yeah, will be interesting to see if they can bring the price of GPUs with
tensor cores down, from what I've read it's too expensive to make them right
now for consumer market.

BTW - do you happen to know what DL frameworks currently support mixed
precision training with Volta tensor cores? Curious to see if AWS V100
instances can really do 120 TFLOPS as advertised. I think latest versions of
CUDA/CuDNN support Volta now?

------
jamesblonde
The commodity GeForce Nvidia cards versus the Enterprise Tesla NVidia cards is
history repeating itself, with enterprise disks vs commodity disks. We all
know who won in the end (commodity disks, of course).

The 1080Ti card is about 60% of the performance of the P100, but costs 700
dollars instead of 5k dollars. Of course, people will try and build these
boxes, they have a much higher ROI compared to Nvidia's DGX-1. So what do
NVidia do? Try and stop vendors from selling them!
[https://www.pcgamesn.com/nvidia-geforce-
server](https://www.pcgamesn.com/nvidia-geforce-server)

~~~
microcolonel
There are some other knobs that NVIDIA will surely try (has surely tried?) to
tweak as well, such as making the consumer devices deliberately less power
efficient, fusing off chip features which make compute workloads more
efficient, or sabotaging the drivers (either in general or on a per-
application basis).

With disks there was always just as much pressure on the consumer side to keep
energy down, cost down, and capacity up, which means that there was no natural
segmentation, and no straightforward unnatural segmentation.

~~~
llukas
I think you're vastly underestimating amount of work that goes into those
things. What if instead what you suggest they do not invest as hard in gaming
parts? Design, QA, QC and support cost money after all.

------
pmorici
How do they keep those cards cool when they are packed so close together? I
built a crypto currency mining rig once and I had to have a reasonable amount
of separation between the cards so the side mounted fans had somewhere to pull
air from.

------
sundvor
Very interesting. I just have a single of these (Gigabyte Aorus 1080 TI), and
have recovered 1/3rd (or a bit more, with the latest bitcoin spike) of the
purchase price over 4 months doing mining.

I'd like to ah, learn about machine learning so here's to hoping Nvidia
doesn't nerf its deep learning capabilities on the driver level.

------
jamesblonde
I gave a talk at the Spark Summit Europe last week, where i went into detail
on this server and how we can scale out deep learning training on Tensorflow
with it and AllReduce (by Uber):
[https://www.slideshare.net/secret/A7b9rAsLaipg6](https://www.slideshare.net/secret/A7b9rAsLaipg6)

TLDR; You can scale out distributed Tensorflow training to tens/hundreds of
GPUs with AllReduce on machines like this one, not just on the DGX-1.

~~~
asparagui
can you set your slideshare to public? thanks!

~~~
jamesblonde
Fixed.

------
pantalaimon
Why use a dual Xeon E5-2650 V4 instead of a single Epyc 7401P?

Are there no appropriate mainboards yet?

