RTX3080 TensorFlow and NAMD Performance on Linux

SloopJon · on Sept 18, 2020

Interesting to see that an RTX 3080 is faster than a dual RTX 2080, and more than twice as fast a single RTX 2080, which is consistent with one of NVIDIA's claims.

formerly_proven · on Sept 18, 2020

RTX 2080 - 10 TFLOPS, 448 GB/s

RTX 3080 - ~30 TFLOPS, 760 GB/s

So for most things compute you'd expect anywhere from 70 % to 200 % performance increase. Note the significant increase in operational intensity in this generation, to about 160 FLOps per Float Load/Store (up from ~90). The fact that we're not seeing 200 % increase more widely can point at things like this being a problem for existing applications (so RTX 3080 is _even more_ memory constrained as previous cards [1]), and perhaps also some applications struggling feeding enough work items / scheduling issues in general.

[1] Alternative view: Even more operations you can do for free when you have to process a given buffer anyway!

nothis · on Sept 18, 2020

>~30 TFLOPS

Jesus.

Erwin · on Sept 18, 2020

The November 2001 top500 supercomputer had 7.2 TFL0PS: https://www.top500.org/lists/top500/2001/11/

It's fun to compare it with the historical top500 list: https://www.top500.org/statistics/perfdevel/

eslaught · on Sept 18, 2020

I wonder how much that machine cost...

jjoonathan · on Sept 18, 2020

The VRAM bandwidth is still under a terabyte per second so you still need a fair bit of data intensity to actually approach that. It's still amazing.

I think I know what Jesus In A Leather Jacket has planned for next year though.

bufferoverflow · on Sept 18, 2020

Also 3080 is much cheaper than 2080 was at its launch.

eMSF · on Sept 18, 2020

Same or similar price as 2080, cheaper than 2080 Ti.

stared · on Sept 18, 2020

For a comprehensive benchmark of various deep learning operations on different GPU cards, see http://ai-benchmark.com/ranking_deeplearning.html. You can run it also on your computer! (I did, just to see if all drivers are installed properly and my results match the other users.)

ilovefood · on Sept 18, 2020

I know these are just preliminary but I am really excited to get my hands on a few of those cards. This is absolutely amazing! Thanks for sharing the benchmark.

mhh__ · on Sept 18, 2020

The fact that the not even top of the line (3080) card is knocking the door on 8K gaming is amazing to me.

It's still not cheap but who says moore's law is dead (I know its apples and oranges, but 60% gen-to-gen performance is great). The first games I played were already 3D and looked ok, but the idea that we'll probably be playing movie-quality raytracing within a decade is really something to look forward to.

jiggawatts · on Sept 18, 2020

What's really freaky is that the RTX 30xx series of cards aren't even being manufactured on the current-gen process!

NVIDIA is using Samsung's 8nm process, which has about 60 million transistors per square millimetre (MTr/mm^2).

That's not cutting edge! The crown is currently held by TSCM's 5nm process, at 173 MTr/mm^2.

Some time next year, TSMC is starting "risk production" of their 3 nm process, which is expected to hit about 300 MTr/mm^2. That's a solid FIVE TIMES higher density than the process used for the RTX 30xx series.

Unlike general-purpose CPUs, where transistor density does not linearly translate to real-world performance, GPUs are designed for embarrassingly parallel problems and have nearly linear scaling. More transistors equals more "CUDA cores" equals more performance.

The only thing holding back GPU performance is memory bandwidth. Current-gen consumer cards are just shy of 1 TB/s of memory bandwidth, but to get 5x performance, they would need 5 TB/s memory throughput to match. That's... difficult. Even with HBM2E, you'd need to stack a bunch of them to get near that.

But yeah. 8K gaming is crazy. Real time raytracing was an utter fantasy just a few years ago, and I just played through Control at 60fps and it was a visual feast.

I grew up in an era where wire frame 3D graphics took seconds to redraw the screen. I used keyboard macros to control a CAD program because it had no hope of keeping up with mouse movements.

My unborn son is going to grow up to play in a world of 8K raytracing as standard, with visuals better than Pixar movies of just a few years ago. That blows my mind.

ekianjo · on Sept 18, 2020

> Real time raytracing was an utter fantasy just a few years ago

We still don't have real time ray-tracing. Even the demos that only use ray-tracing are throwing the strict minimum of rays and they apply a series of complex filters (using machine learning) to remove the artifacts and the noise.

rrss · on Sept 18, 2020

By this logic, we basically still don't even have offline ray-tracing.

Both Renderman and Hyperion have denoisers that Pixar and Disney use on feature films.

> And because we could not afford to render images to convergence, we needed to develop a robust denoising solution

> The denoiser is used on most production shots, and is run automatically unless disabled

https://www.yiningkarlli.com/projects/hyperiondesign/hyperio...

(The fact that Hyperion uses a denoiser does not mean that it isn't rendering via path tracing. Similarly, the fact that real-time rendering uses a denoiser does not mean it isn't rendering via path tracing. dealing with noise is the name of the game, and machine learning isn't somehow "out of bounds")

jiggawatts · on Sept 18, 2020

If anything, that's more impressive in a way. The NVIDIA presentation for the 30xx series was the first time it really "clicked" for me, despite having a 2080 in my PC for a while now.

The RTX cards dedicate something like 30% of the silicon and power "budget" to neural-nets.

This isn't something designed to appease only the ML crowd, it's used for gaming. The graphics you see is 30% ML noise reduction and upscaling.

I used to think of "AI accelerators" as some sort of gimmick, one-trick ponies like Apple's face recognition. Useful for a handful of apps, a few seconds at a time.

But no, in the RTX series of cards, the "AI stuff" is drawing nearly 100 watts and doing real work, making ray tracing viable and making 1080p look better than 4K.

mhh__ · on Sept 18, 2020

And? Everything is an approximation so if you can fake it 'til you make it does it matter?

I assume by the end of the century we will end up with some kind of physically based rendering at a near photonic level of detail.

jagger27 · on Sept 18, 2020

It’s still pretty cool that these cards have enough oomph to get close enough to real-time for DLSS to work.

formerly_proven · on Sept 18, 2020

Pretty much every raytracer uses denoising. Every "industry-strength" raytracer anyway.

holoduke · on Sept 18, 2020

Well we are still talking about few million rays per second. Without filtering you will end up with too much noise and blobs. Raytracing without noise filters is even for prerendered scenes unpractical and too expensive.

jamwaffles · on Sept 18, 2020

As I understand it, the Quake RTX demo is fully ray traced but I could be wrong on that. The FPS isn’t super high but definitely playable.

lettergram · on Sept 18, 2020

The challenge is heat (and cost) for the latest gen. GPUs are already prone to heat challenges, I can see the latest process requiring liquid cooling with their current design(s).

vbezhenar · on Sept 18, 2020

Is it a problem though? Enthusiasts have been using liquid cooling for years. Sure, hot computer, especially in warm climate is not the best thing to have around, but, I guess, that's not the end of the world. Air conditioners do exist after all.

fortran77 · on Sept 18, 2020

> What's really freaky is that the RTX 30xx series of cards aren't even being manufactured on the current-gen process!

Exactly. This is why I'm not too worried about Intel not being on the smallest current-gen process, even though this AMD fans are jumping up and down about this.

proverbialbunny · on Sept 18, 2020

Something noteworthy is the higher nm tend to be a bit faster than the smaller nm, so Samsung's 8nm isn't going to be slower than TSMC's 7nm.

The reason lower nm is better, beyond cost savings, is heat output and power consumption. If heat becomes a problem, the chip has to limit itself, as we're seeing with laptops. Nvidia's new cooling solution seems to fit the bill just fine, so 8nm is no problem.

I suspect the next generation is going to be on 7nm, is going to be a bit faster and will consume a fair bit less power, which will be nice, especially if you plan on training neural nets all day.

virtuallynathan · on Sept 18, 2020

Faster how? Newer process nodes tend to offer more clock frequency at isopower, or lower power at isofrequency.

keanebean86 · on Sept 18, 2020

Since Radeon VII's HBM provides 1TBs of bandwidth with a 4096 bus just up it to 20,480 bits and bam 5TBs.

Not sure how you'd fit 20 stacks on an interposer. I can't solve all your problems.

ipsum2 · on Sept 18, 2020

> The only thing holding back GPU performance is memory bandwidth.

Cooling and getting enough power to the chips is going to very challenging. The 3080 already needed a redesign of its cooling solution and power delivery mechanism. At 5nm or 3nm, things are going to be a lot more difficult.

vbezhenar · on Sept 18, 2020

Moore's law is about transistors. It's not dead, it's alive.

The problem is you can't use that law to increase speed of CPUs. You don't need more transistors, you need faster transistors and Moore's law does not help with that. We had 4GHz 15 years ago, most of CPUs still work under 4 GHz nowadays. But with more transistors you can implement some common operations in hardware (like crypto, vector operations). Also you can just increase core count. Or you can put energy-efficient core along with energy-hungry core. And those things happen with CPUs. Unfortunately many workloads are still single-thread capped.

GPU on the other side is inherently multi-threaded. You need 8k? You just need 4x transistors compared to 4k resolution. So GPUs will evolve even further and there's no limit, at least until we hit transistor wall.

mantap · on Sept 18, 2020

You have to compare performance per watt though to see the real gains. They can't keep pumping more and more watts through these things indefinitely, surely we are already at the limit.

dbkinghorn · on Sept 18, 2020

This is one thing that impressed me with the 3080 .. it looks like 2 of these will be around the performance (for some compute tasks) of 4 x 2080Ti The 4 2080Ti would max out at 1KW the 2 x 3080's at ~700W ...

PetahNZ · on Sept 18, 2020

Well I can supply plenty of watts, but at some point I reach the limit of performance.

mhh__ · on Sept 18, 2020

The 3090 has a TDP of 350W which is a lot but it's not unprecedented either. There have definitely been cards from even the Fermi days in that region (590?)

formerly_proven · on Sept 18, 2020

I remember the GTX "Thermi" 590 getting close to 450 W in benchmarks, but IIRC that was a dual-GPU card just like AMD's x990 series.

driverdan · on Sept 18, 2020

The 290x could easily hit 350W.

Aerroon · on Sept 18, 2020

Keep in mind that video games are optimizations on top of hacks on top of optimizations on top of hacks on... This is necessary, because we simply don't have the number-crunching capability to simulate the video game world better. If developers could have that horsepower, then they'd probably use it. In other words: as people get faster hardware video games will start using more computational power. This allows developers to scale back some optimizations and usually allows some different styles of games.

Another thing to consider is that the number of pixels pushed to your screen is one thing. Another thing is that you will need higher quality assets with more detail too.

zokier · on Sept 18, 2020

> It's still not cheap

Its funny to me that some people complain about nvidia price gouging etc, but if the sales of 3080 are any indication then based on basic supply&demand idea if anything they are massively underpriced. Lowering the prices would seem to just mean that more people would be disappointed from not actually being able to get them because the supply is so limited. The little I know about chip manufacturing also suggests that its likely that fabs have been producing ga102s at full capacity and its not like you can just get another fab to produce more of them in blink of an eye, so availability is probably really just constrained by production.

antiterra · on Sept 19, 2020

I think this is a marketing problem, where selling the 3080 at $1200 would hurt future sales once supply meets demand, even if the price is lowered later. Maybe the optimizing solution would be something (awful) like a car dealership where the price is negotiable based on volatile local factors.

As a complete layperson when it comes to economics, I’m curious if someone has extensively analyzed ‘market value’ vs MSRP over the lifecycle of a product rollout and come up with a workable formula for calculating the optimal price.

zokier · on Sept 19, 2020

I actually was thinking about this after failing to acquire 3080. Maybe for limited launches like this it would make sense to auction off the supply? It would be arguably fairer system for consumers than selling out in seconds, and potentially both more profitable and less stressful for nvidia. Of course the auction system needs carefully designed, but maybe something like uniform price sealed auction with last accepted bid pricing (with reserve) would work here? Sealed auctions seem especially well suited because they feel less timing critical.

Ygg2 · on Sept 18, 2020

8k gaming? Supposedly 3090 is capable, but I gotta ask - who games in 8k? 8k monitors are still far too experimental.

Not to mention that scaling is going to be a pain.

mhh__ · on Sept 18, 2020

The Endgame has to be VR at this point - 4k is more than enough for "non-gameplay" games (i.e. I play CSGO at low settings in a stretched resolution but I play Assassin's Creed as hard as she'll go cap'n).

Until the angular resolution of a pixel is good enough an inch from your eye there will be increasingly higher resolutions in the HMDs

zimpenfish · on Sept 18, 2020

> who games in 8k?

I think it's more that if you can render 8K at 60fps, you can probably render 4K at 180fps or more which gives you a lot of wiggle room on a 60hz or even 120hz display.

s9w · on Sept 18, 2020

Even the 3080 is just able to render most modern games at 144 fps at fullhd (!) - and only barely able half of them at 1440p. There is no such thing as practical 4k or 8k rendering, at least not at pleasent refresh rates and without party tricks (DLSS).

mhh__ · on Sept 18, 2020

The 3080 can push 65 fps on average on Assassin's Creed Odyssey at maxed graphics settings in 4K (https://www.eurogamer.net/articles/digitalfoundry-2020-nvidi...). That's perfectly playable, and Odyssey isn't particularly well optimized (They've got it on full but it has lots of settings that you can turn down but make almost 0 difference to the details your eyes actually notice).

formerly_proven · on Sept 18, 2020

Control at 1440p is still just shy of 60 fps on the RTX 3080.

gambiting · on Sept 18, 2020

The thing is....just play it without raytracing? I love control, it's a great game, but the raytracing is such a huge gimmick(in this game) that I don't really understand why it dominates every benchmark other than as a curiosity. I have an RTX card and raytracing on/off in Control is noticable....for the first 2 minutes. Then it just blends in with the rest of the look and I don't even notice those super expensive effects anymore. Switched them all off, kept the rest on ultra, can easily game in 1440p at 100fps+. The game is still incredibly pretty and has plenty of fantastic visual effects anyway.

kaibee · on Sept 18, 2020

The real benefit from RTX will be on the game development side. A lot of games currently don't look that different with RTX off, because a lot of artist time goes into making sure that non-RTX (which are the majority of the player base by far) lighting/effects still look realistic.

emu · on Sept 18, 2020

I love using a 32" 8k monitor for working with text/code. To me, as a software engineer who stares at code all day, higher quality text rendering is sufficient justification for 8k.

I tried playing a few games at 8k (just for fun) and found that it simply wasn't worth the frame rate hit, or really even that noticeable an improvement with the assets in the games I tried.

One nice property of 8k resolutions is that they have an integer scaling factor from both 4k (2x) and 1440p (3x), so if you have an 8k monitor you can play games at either of those resolutions with high quality scaling.

zepolen · on Sept 18, 2020

Is 8k really noticeable vs 4k for text?

emu · on Sept 19, 2020

Yes, the difference between 4k and 8k text is easily noticeable for a 32" monitor, and I'm not particularly eagle-eyed. Text is sharper and finer details are better rendered. For videos and images I don't notice the difference in practice.

I had a 4k 32" monitor at work and found that it simply didn't give sharp, high resolution text, driven by either a Macbook or by a Linux box. And you wouldn't really expect that, either: a 4k 32" monitor is only ~140dpi, which is only marginally higher resolution than the ~100dpi screens we had for many years.

I think the best point of comparison for the Dell UP3218k monitor which I have is a Retina Macbook Pro screen: subjectively, it's a similar experience in terms of text sharpness and legibility (>200 DPI, glossy), just in a 32" form factor.

I suspect 8k at 32" is actually a bit higher resolution than necessary (~280dpi), but there's nothing else on the >30" high resolution monitor market other than Apple's 6k display, which is significantly more expensive.

Be aware that there's no Mac OS support for 8k displays, but Linux and Windows on a desktop with a reasonably modern NVidia GPU work great.

steffan · on Sept 19, 2020

I have a 32" 4K monitor and the text quality is noticeably worse than on my Macbook Pro w/ Retina display and my iPad Pro. The dpi on the UP3218K is actually higher than my iPad (280 vs 264 iirc). That will likely be my next monitor unless another manufacturer introduces a 30-32" 8K monitor in the next few months.

galkk · on Sept 18, 2020

Not OP, but I use 4k 43" (larger diagonal) and I'm certain that on 43" it'll be noticeable

srtjstjsj · on Sept 18, 2020

Does 2x2 monitors count as 8K? Think driving or flying sim, or just a taller/wider field of view in any game without losing detail.

friedman23 · on Sept 18, 2020

4k ultrawide i think is more practical than 8k and will probably require close to as much power

alibert · on Sept 18, 2020

Prepare to wait a bit if you didn't secure one yesterday. The 3000 serie launch is the exact definition of paper launch.

cillian64 · on Sept 18, 2020

Apparently the supply of hardware was similar to the last few nvidia launches and it was demand which was substantially higher this time.

bserge · on Sept 18, 2020

/r/gaming is fuming about scalpers/resellers, and someone (allegedly) made a bot to bid on eBay listings just so they can't sell. Some were going for over $90,000 as of today.

gambiting · on Sept 18, 2020

I just think the problem of bots is hugely exaggorated - only at my office, in the small circle of my coworkers, I know at least 5 people who were trying to buy it yesterday. The actual real organic demand must have been huge, bots or no bots.

srtjstjsj · on Sept 18, 2020

>trying

Did they buy? At what price?

gambiting · on Sept 18, 2020

None, the cards went out of stock on Nvidia's website instantly. They got the cards from other brands from 3rd party sites at regular prices

jandrese · on Sept 18, 2020

Those ebay auctions ending with crazy figures in the dollar column aren't real. People are trying to screw over scalpers by making huge fake bids.

jjoonathan · on Sept 18, 2020

Good.

dispat0r · on Sept 18, 2020

paper launch means no hardware. This is just very limited supply. The only atrocious thing is the FE edition is 100$ less then the cheap AIB models.

apetresc · on Sept 18, 2020

You're not operating on the same definition of paper launch as everyone else. It's generally used also for limited-quantity releases. Lots of chips can be produced in small batches that can't be mass-produced yet.

Pet_Ant · on Sept 18, 2020

> paper launch means no hardware

If it's effectively a paper launch because they aren't available in commercial quantities. Much like Intel's overly binned high end chips.

pixl97 · on Sept 18, 2020

>The only atrocious thing is the FE edition is 100$ less

Some people are calling this rather scummy behavior on nvidias part.

"Nvidia has the cheapest card that this performance level ever!" is what the headlines will say. But if only a few hundred/thousand of these cards show up at this price over the next 6 months or so, and the AIB models go for $100+ more is really more of a way of manipulating headlines with no desire to follow up on the savings.

numpad0 · on Sept 18, 2020

It's just scalpers ran swarms of Selenium against every web storefronts. Happens for everything buzzes on the Internet these days.

fluffything · on Sept 18, 2020

Do the resnet results use the new sparsity feature ? I'd be interested to know what impact does that have.

whatever1 · on Sept 18, 2020

What is this feature and can it work in generic matrices? It could be a game changer in physics and operations research where the matrices are sparse.

the_svd_doctor · on Sept 18, 2020

I don’t think “sparsity” in ML, where like 10...50% is sparse, is the same as sparsity in Physics, where 99.999% (add more 9 with larger problems) of your matrix is sparse.

hydroreadsstuff · on Sept 18, 2020

Half the entries in your matrices need to be 0, then the Hardware will compress them and execute the matrix-matrix multiplication 2x as fast. On a tensor instruction level.

ryneandal · on Sept 18, 2020

It should, AFAIK that was on the SM-level.

> Ampere's benefit is that it can deal with dense and sparse matrices differently. Its cores are twice as fast as Turing's for dense matrix and four times as quick for sparse matrix that have all the needless weights removed. The upshot, per SM, is dense processing at the same speed - it has half the cores, remember - and twice the overall throughput for sparse processing.

https://hexus.net/tech/reviews/graphics/145342-nvidia-geforc...

std_badalloc · on Sept 18, 2020

Yeah, but ResNet does not have sparse matrices, so how could it use them? Post ReLU activations may be sparse, but I don't think that helps when used with a non-sparse Conv2d.

dplavery92 · on Sept 18, 2020

I don't know if there are any white papers with hard details yet (if anyone knows of one, please share!), but nVidia's marketing material[0] for the Ampere architecture claims the following:

"Sparsity is possible in deep learning because the importance of individual weights evolves during the learning process, and by the end of network training, only a subset of weights have acquired a meaningful purpose in determining the learned output. The remaining weights are no longer needed.

Fine grained structured sparsity imposes a constraint on the allowed sparsity pattern, making it more efficient for hardware to do the necessary alignment of input operands. Because deep learning networks are able to adapt weights during the training process based on training feedback, NVIDIA engineers have found in general that the structure constraint does not impact the accuracy of the trained network for inferencing. This enables inferencing acceleration with sparsity."

So the idea seems to be that at the end of training, there's fine tuning that can be done to figure out which weights can be zeroed out without significantly impacting prediction accuracy, and then you can accelerate inferences with sparse matrix multiplication. They consider training acceleration with sparse matrices an "active research area."

I could see it being nice for the sake of running large language models on consumer, or really cool for the few edge computing applications that can actually demand and power conventional GPUs (e.g. self-driving cars.) It's probably not a great boon to the researcher who wants to reduce their iteration timeline though.

[0] https://developer.nvidia.com/blog/nvidia-ampere-architecture...

ryneandal · on Sept 18, 2020

Ah, I should have paid more attention to the question. Read it as "are they enabled?" My bad. :(

lilSebastian · on Sept 18, 2020

If anyone has access to a 3080, a full benchmark of hashcat would be fantastic. hashcat --benchmark-all

_kbh_ · on Sept 21, 2020

No full benchmark but this looks like it will be promising. https://twitter.com/hashcat/status/1306937641653465090

lilSebastian · on Sept 23, 2020

Thank you

_kbh_ · on Sept 25, 2020

This appears to be a full list, but I cannot verify if its real.

https://gist.github.com/Chick3nman/bb22b28ec4ddec0cb5f59df97...

Heres a list for 2080ti for comparison.

https://gist.github.com/binary1985/c8153c8ec44595fdabbf03157...

lilSebastian · on Sept 23, 2020

Incomplete, but more details than the initial benchmarks https://hashcat.net/forum/thread-9511.html

dboreham · on Sept 18, 2020

Also jtr benchmark output.

tasubotadas · on Sept 18, 2020

It is quite interesting than on FP16 RTX Titan kicks RTX3080 ass and on FP32 it's the opposite.

liuliu · on Sept 18, 2020

The number on RTX Titan for fp16 is actually a bit foreign to me (3.5x faster than fp32). Not sure what's going on (maybe they tweaked batch size to fit more in GPU memory?). Need to wait more comprehensive ones.

(The older numbers looks more inline with what I remember and here is an alternative benchmark for RTX Titan showed similar: https://lambdalabs.com/blog/titan-rtx-tensorflow-benchmarks/)

dr_zoidberg · on Sept 19, 2020

You're using half as much memory for something that is vaguely O(n^2), so there's your roughly 4x factor speedup. But since things aren't perfect, it ends up being a bit lower.

EvgeniyZh · on Sept 18, 2020

They boosted FP32 performance but now FP16 is 1:1 to FP32 cuda cores

xiphias2 · on Sept 18, 2020

They look great, but I would be much more interested in time to a specific accuracy on the validation set. Images/second doesn't really matter when the the computation may not be exactly the same (FP16 vs FP32, sparsity, as it was mentioned in another comment)

star-trek-fleet · on Sept 18, 2020

Is there still some licences meddling with the use of gaming cards for deep learning?

SloopJon · on Sept 18, 2020

From the GeForce software license:

"The SOFTWARE is not licensed for datacenter deployment, except that blockchain processing in a datacenter is permitted."

https://www.nvidia.com/en-us/drivers/geforce-license/

This doesn't preclude use in a workstation, but they don't want you building a DGX A100 competitor using RTX 3080 or 3090 cards.

NwtnsMthd · on Sept 18, 2020

As far as I understand it, you cannot re-sell (rent) processing time on a gaming card from Nvidia for machine learning tasks. This is why you don't see Google and Amazon offering virtual machines with gaming cards. However if you're the end user, you can do whatever you want with the gaming card you bought (not that Nvidia can enforce this).

geogra4 · on Sept 18, 2020

So is the single slot graphics card more or less dead?