
2080 RTX performance on Tensorflow with CUDA 10 - olavgg
https://www.pugetsystems.com/labs/hpc/NVIDIA-RTX-2080-Ti-vs-2080-vs-1080-Ti-vs-Titan-V-TensorFlow-Performance-with-CUDA-10-0-1247/
======
sabalaba
[https://lambdalabs.com/blog/2080-ti-deep-learning-
benchmarks...](https://lambdalabs.com/blog/2080-ti-deep-learning-benchmarks/)

These numbers match up with the performance that we’ve measured in our own
tests that were posted last week. The Titan V is simply too expensive for Deep
Learning. The 2080 TI is, by far and away, the best GPU from a
price/performance perspective.

As mentioned in the article, only possible reason that you might want a Titan
V is if you care about FP64 performance: i.e., nobody training neural
networks.

------
infocollector
The author recommends Titan V - without justifying its $3k price. The 1080
series is less than half that price with comparable benchmarks. Am I missing
something?

~~~
swerner
The benchmarks where the 1080 doesn't even compete - FP16/Tensor Cores.

~~~
bitL
Many state-of-art models won't train well on FP16. But for inferencing it's
extraordinarily good. 2x1080Ti is the sweet spot for FP32 training on "budget"
at the moment.

~~~
jongomez
Got any sources? Was thinking about buying one just for the tensor cores, but
if this is the case I probably won't.

~~~
bitL
You can even see it in author's comments in the original article:

"When I first looked at fp16 Inception3 was the largest model I could train.
Inception4 blew up until I went back to fp32. Mixed precision needs extra
care, scaling of gradients and such. Still I think it is a good thing. What I
really want to test is model size reduction for inference with TensorRT
targeted to tensorcores. I think that is probably the best use case. Non-
linear optimization is just too susceptible to precision loss."

There was also some NVidia video presentation recommending mixed FP32/FP16
training instead of pure FP16.

~~~
option
Mixed precision training can give you tensor core speedups. Paper:
[https://arxiv.org/abs/1710.03740](https://arxiv.org/abs/1710.03740) Toolkit
which implements it on top of Tensorflow:
[https://github.com/NVIDIA/OpenSeq2Seq](https://github.com/NVIDIA/OpenSeq2Seq)

------
lhlmgr
"I do like the RTX 2080Ti but I just love the Titan V! The Titan V is a great
card and even though it seems expensive from a "consumer" point of view. I
consider it an incredible bargain." .. quite hard to read as a phd student..

~~~
_Wintermute
Compared to other equipment in research environments it's incredibly cheap.

~~~
ekianjo
Usually you dont buy such equipment by yourself anyway.

------
shaklee3
Another thing that's not clear from the benchmarks is the Titan has both more
tensor cores, but also much higher memory bandwidth with hbm2. I'd be curious
to see how much that affected the results compared to the number of cores.

Also the 2080ti can do lower precision math (int8/4) in the tensor cores,
while the Titan v cannot.

------
visionscaper
Although the RTX 2080 Ti performs significantly better than the 1080 Ti, I'm
still drawn towards the 1080 Ti; I can buy two second-hand 1080 Ti's for the
price of one new 2080 Ti, providing me the _double_ amount of memory, plus,
the computing performance of 2x 1080 Ti is much better in FP32 than one 2080
Ti.

I'm using my GPUs to train large sequence to sequence models (with long
sequences) that need FP32 for training and can use FP16 for inference (mixed-
precision training), so I can't even use the FP16 performance of the
Tensorcores for training.

The only disadvantage is that the energy costs are higher using two 1080 Ti's
compared to one 2080 Ti.

------
mychael
Does anyone know why they are using Xeon processors instead of the AMD
Threadripper? Is it the support for ECC memory? If so, why is it that so
important?

Example:
[https://www.pugetsystems.com/nav/peak/tower_single/customize...](https://www.pugetsystems.com/nav/peak/tower_single/customize.php)

~~~
celrod
The Xeon-W 2175 has avx-512. Threadrippers cannot compete in numbering work
relative to price point on well optimized code.

sgemm on 5000x5000 matrices takes about 600ms on a Threadrippers 1950x, but
only around 150ms on the comparatively priced i9 7900x. Vector libraries for
special functions, eg Intel VML or SLEEF also provide a similar performance
advantage there.

If you're mostly crunching numbers, and either compiling the code you run with
avx512 enabled (eg, -mprefer-vector-width=512 on gcc, otherwise it's disabled)
or using explicitly vectorized libraries, you will see dramatically better
performance from avx512, regardless of any thermal throttling. Number
crunching is what it's made for.

Granted, you should be offloading most of those computations to the GPU, which
will be many times faster. But I'd you're in the business of ML or statistics,
I'd still way that more heavily than the difference in how long it takes them
to compile code.

~~~
AnthonyMouse
> Granted, you should be offloading most of those computations to the GPU,
> which will be many times faster. But I'd you're in the business of ML or
> statistics, I'd still way that more heavily than the difference in how long
> it takes them to compile code.

I don't follow the logic. It sounds like you're saying that if you care about
that specific type of highly vectorized computation being fast what you really
want is a GPU rather than any particular CPU. So how should that have a major
influence on which CPU you choose? Particularly when the CPU which is slower
at that is faster at many other things that _aren 't_ suitable for a GPU.

~~~
celrod
I'm saying there is a reason to favor a CPU with avx512. The reason may not
apply to you / your work flow.

If your number crunching is just neural networks on your GPU, then the CPU
doesn't matter.

But there's probably a lot of overlap between the folks who train neural
networks, and those who may do linear algebra, MCMC, or traditional stats that
are much better suited to the CPU. That is, conditioning on person A being
someone who trains NNs, there is a higher probability that they're someone who
would be interested in CPU intensive tasks that benefit from vectorization. If
that isn't you, don't factor it into your decision.

I do most of my number crunching on the CPU, so my choice is clear. The
reviews of avx512 are generally poor (disable it so you don't get thermal
throttling!), while the Threadrippers receive a lot of praise. But within it's
own niche (linear algebra, many iterative algorithms), the widest vectors are
king.

~~~
AnthonyMouse
Isn't linear algebra one of the other things GPUs are good at?

I think you're also looking at the release prices for the CPUs rather than the
current ones. Using today's prices from Newegg, the Threadripper 1950X is
$699, the (newer/faster) 2950X is $859, meanwhile the i9-7900X is $1275, _up_
from its $989 release price presumably due to Intel's current manufacturing
issues. And the AMD processors have 60% more cores/threads with, avx
notwithstanding, generally equivalent performance per thread.

I expect you're right that there are niche workloads where avx512 is a real
advantage, but it's starting from a pretty deep hole on the price/performance
front in general.

------
bigmit37
Are there benefits to using FP32 vs FP16? I’ve been dabbling with deep
learning but not really sure how much affect higher precision is having.
Though more precision is better I suppose.

~~~
bitL
Traditionally Deep Learning frameworks were all using FP32.

With FP16 one can theoretically get 2x speed and 2x larger models with the
same VRAM capacity. For inferencing with INT8/INT4 it can be even way better
(good for embedded stuff). The downside is that sometimes more complex/deep
models don't converge (or converge less often than FP32). Sometimes there are
framework issues with some advanced FP16 stuff.

------
zrav
Granted, the benchmark only covers training, but for a chip that spends
significant die space on dedicated AI circuitry the performance gain over the
previous generation is disappointing.

