
RTX3080 TensorFlow and NAMD Performance on Linux - optimalsolver
https://www.pugetsystems.com/labs/hpc/RTX3080-TensorFlow-and-NAMD-Performance-on-Linux-Preliminary-1885/
======
SloopJon
Interesting to see that an RTX 3080 is faster than a dual RTX 2080, and more
than twice as fast a single RTX 2080, which is consistent with one of NVIDIA's
claims.

~~~
formerly_proven
RTX 2080 - 10 TFLOPS, 448 GB/s

RTX 3080 - ~30 TFLOPS, 760 GB/s

So for most things compute you'd expect anywhere from 70 % to 200 %
performance increase. Note the significant increase in operational intensity
in this generation, to about 160 FLOps per Float Load/Store (up from ~90). The
fact that we're not seeing 200 % increase more widely can point at things like
this being a problem for existing applications (so RTX 3080 is _even more_
memory constrained as previous cards [1]), and perhaps also some applications
struggling feeding enough work items / scheduling issues in general.

[1] Alternative view: Even more operations you can do for free when you have
to process a given buffer anyway!

~~~
nothis
>~30 TFLOPS

Jesus.

~~~
Erwin
The November 2001 top500 supercomputer had 7.2 TFL0PS:
[https://www.top500.org/lists/top500/2001/11/](https://www.top500.org/lists/top500/2001/11/)

It's fun to compare it with the historical top500 list:
[https://www.top500.org/statistics/perfdevel/](https://www.top500.org/statistics/perfdevel/)

~~~
eslaught
I wonder how much that machine cost...

------
stared
For a comprehensive benchmark of various deep learning operations on different
GPU cards, see [http://ai-benchmark.com/ranking_deeplearning.html](http://ai-
benchmark.com/ranking_deeplearning.html). You can run it also on your
computer! (I did, just to see if all drivers are installed properly and my
results match the other users.)

------
ilovefood
I know these are just preliminary but I am really excited to get my hands on a
few of those cards. This is absolutely amazing! Thanks for sharing the
benchmark.

~~~
mhh__
The fact that the not even top of the line (3080) card is knocking the door on
8K gaming is amazing to me.

It's still not cheap but who says moore's law is dead (I know its apples and
oranges, but 60% gen-to-gen performance is great). The first games I played
were already 3D and looked ok, but the idea that we'll probably be playing
movie-quality raytracing within a decade is really something to look forward
to.

~~~
jiggawatts
What's _really_ freaky is that the RTX 30xx series of cards aren't even being
manufactured on the current-gen process!

NVIDIA is using Samsung's 8nm process, which has about 60 million transistors
per square millimetre (MTr/mm^2).

That's not cutting edge! The crown is currently held by TSCM's 5nm process, at
173 MTr/mm^2.

Some time next year, TSMC is starting "risk production" of their 3 nm process,
which is expected to hit about 300 MTr/mm^2. That's a solid FIVE TIMES higher
density than the process used for the RTX 30xx series.

Unlike general-purpose CPUs, where transistor density does not linearly
translate to real-world performance, GPUs are designed for embarrassingly
parallel problems and have nearly linear scaling. More transistors equals more
"CUDA cores" equals more performance.

The only thing holding back GPU performance is memory bandwidth. Current-gen
_consumer_ cards are just shy of 1 TB/s of memory bandwidth, but to get 5x
performance, they would need 5 TB/s memory throughput to match. That's...
difficult. Even with HBM2E, you'd need to stack a bunch of them to get near
that.

But yeah. 8K gaming is crazy. Real time raytracing was an utter fantasy just a
few years ago, and I just played through Control at 60fps and it was a visual
feast.

I grew up in an era where wire frame 3D graphics took seconds to redraw the
screen. I used keyboard macros to control a CAD program because it had no hope
of keeping up with mouse movements.

My unborn son is going to grow up to play in a world of 8K raytracing as
standard, with visuals better than Pixar movies of just a few years ago. That
blows my mind.

~~~
ekianjo
> Real time raytracing was an utter fantasy just a few years ago

We still don't have real time ray-tracing. Even the demos that only use ray-
tracing are throwing the strict minimum of rays and they apply a series of
complex filters (using machine learning) to remove the artifacts and the
noise.

~~~
rrss
By this logic, we basically still don't even have offline ray-tracing.

Both Renderman and Hyperion have denoisers that Pixar and Disney use on
feature films.

> And because we could not afford to render images to convergence, we needed
> to develop a robust denoising solution

> The denoiser is used on most production shots, and is run automatically
> unless disabled

[https://www.yiningkarlli.com/projects/hyperiondesign/hyperio...](https://www.yiningkarlli.com/projects/hyperiondesign/hyperiondesign.pdf)

(The fact that Hyperion uses a denoiser does not mean that it isn't rendering
via path tracing. Similarly, the fact that real-time rendering uses a denoiser
does not mean it isn't rendering via path tracing. dealing with noise is the
name of the game, and machine learning isn't somehow "out of bounds")

------
fluffything
Do the resnet results use the new sparsity feature ? I'd be interested to know
what impact does that have.

~~~
whatever1
What is this feature and can it work in generic matrices? It could be a game
changer in physics and operations research where the matrices are sparse.

~~~
the_svd_doctor
I don’t think “sparsity” in ML, where like 10...50% is sparse, is the same as
sparsity in Physics, where 99.999% (add more 9 with larger problems) of your
matrix is sparse.

------
lilSebastian
If anyone has access to a 3080, a full benchmark of hashcat would be
fantastic. hashcat --benchmark-all

~~~
_kbh_
No full benchmark but this looks like it will be promising.
[https://twitter.com/hashcat/status/1306937641653465090](https://twitter.com/hashcat/status/1306937641653465090)

~~~
lilSebastian
Thank you

~~~
_kbh_
This appears to be a full list, but I cannot verify if its real.

[https://gist.github.com/Chick3nman/bb22b28ec4ddec0cb5f59df97...](https://gist.github.com/Chick3nman/bb22b28ec4ddec0cb5f59df97c994db4)

Heres a list for 2080ti for comparison.

[https://gist.github.com/binary1985/c8153c8ec44595fdabbf03157...](https://gist.github.com/binary1985/c8153c8ec44595fdabbf03157562763e)

------
tasubotadas
It is quite interesting than on FP16 RTX Titan kicks RTX3080 ass and on FP32
it's the opposite.

~~~
liuliu
The number on RTX Titan for fp16 is actually a bit foreign to me (3.5x faster
than fp32). Not sure what's going on (maybe they tweaked batch size to fit
more in GPU memory?). Need to wait more comprehensive ones.

(The older numbers looks more inline with what I remember and here is an
alternative benchmark for RTX Titan showed similar:
[https://lambdalabs.com/blog/titan-rtx-tensorflow-
benchmarks/](https://lambdalabs.com/blog/titan-rtx-tensorflow-benchmarks/))

~~~
dr_zoidberg
You're using half as much memory for something that is vaguely O(n^2), so
there's your roughly 4x factor speedup. But since things aren't perfect, it
ends up being a bit lower.

------
xiphias2
They look great, but I would be much more interested in time to a specific
accuracy on the validation set. Images/second doesn't really matter when the
the computation may not be exactly the same (FP16 vs FP32, sparsity, as it was
mentioned in another comment)

------
justicezyx
Is there still some licences meddling with the use of gaming cards for deep
learning?

~~~
SloopJon
From the GeForce software license:

"The SOFTWARE is not licensed for datacenter deployment, except that
blockchain processing in a datacenter is permitted."

[https://www.nvidia.com/en-us/drivers/geforce-
license/](https://www.nvidia.com/en-us/drivers/geforce-license/)

This doesn't preclude use in a workstation, but they don't want you building a
DGX A100 competitor using RTX 3080 or 3090 cards.

------
geogra4
So is the single slot graphics card more or less dead?

