Interesting to see that an RTX 3080 is faster than a dual RTX 2080, and more than twice as fast a single RTX 2080, which is consistent with one of NVIDIA's claims.
So for most things compute you'd expect anywhere from 70 % to 200 % performance increase. Note the significant increase in operational intensity in this generation, to about 160 FLOps per Float Load/Store (up from ~90). The fact that we're not seeing 200 % increase more widely can point at things like this being a problem for existing applications (so RTX 3080 is _even more_ memory constrained as previous cards [1]), and perhaps also some applications struggling feeding enough work items / scheduling issues in general.
[1] Alternative view: Even more operations you can do for free when you have to process a given buffer anyway!
For a comprehensive benchmark of various deep learning operations on different GPU cards, see http://ai-benchmark.com/ranking_deeplearning.html. You can run it also on your computer! (I did, just to see if all drivers are installed properly and my results match the other users.)
I know these are just preliminary but I am really excited to get my hands on a few of those cards. This is absolutely amazing! Thanks for sharing the benchmark.
The fact that the not even top of the line (3080) card is knocking the door on 8K gaming is amazing to me.
It's still not cheap but who says moore's law is dead (I know its apples and oranges, but 60% gen-to-gen performance is great). The first games I played were already 3D and looked ok, but the idea that we'll probably be playing movie-quality raytracing within a decade is really something to look forward to.
What's really freaky is that the RTX 30xx series of cards aren't even being manufactured on the current-gen process!
NVIDIA is using Samsung's 8nm process, which has about 60 million transistors per square millimetre (MTr/mm^2).
That's not cutting edge! The crown is currently held by TSCM's 5nm process, at 173 MTr/mm^2.
Some time next year, TSMC is starting "risk production" of their 3 nm process, which is expected to hit about 300 MTr/mm^2. That's a solid FIVE TIMES higher density than the process used for the RTX 30xx series.
Unlike general-purpose CPUs, where transistor density does not linearly translate to real-world performance, GPUs are designed for embarrassingly parallel problems and have nearly linear scaling. More transistors equals more "CUDA cores" equals more performance.
The only thing holding back GPU performance is memory bandwidth. Current-gen consumer cards are just shy of 1 TB/s of memory bandwidth, but to get 5x performance, they would need 5 TB/s memory throughput to match. That's... difficult. Even with HBM2E, you'd need to stack a bunch of them to get near that.
But yeah. 8K gaming is crazy. Real time raytracing was an utter fantasy just a few years ago, and I just played through Control at 60fps and it was a visual feast.
I grew up in an era where wire frame 3D graphics took seconds to redraw the screen. I used keyboard macros to control a CAD program because it had no hope of keeping up with mouse movements.
My unborn son is going to grow up to play in a world of 8K raytracing as standard, with visuals better than Pixar movies of just a few years ago. That blows my mind.
> Real time raytracing was an utter fantasy just a few years ago
We still don't have real time ray-tracing. Even the demos that only use ray-tracing are throwing the strict minimum of rays and they apply a series of complex filters (using machine learning) to remove the artifacts and the noise.
(The fact that Hyperion uses a denoiser does not mean that it isn't rendering via path tracing. Similarly, the fact that real-time rendering uses a denoiser does not mean it isn't rendering via path tracing. dealing with noise is the name of the game, and machine learning isn't somehow "out of bounds")
If anything, that's more impressive in a way. The NVIDIA presentation for the 30xx series was the first time it really "clicked" for me, despite having a 2080 in my PC for a while now.
The RTX cards dedicate something like 30% of the silicon and power "budget" to neural-nets.
This isn't something designed to appease only the ML crowd, it's used for gaming. The graphics you see is 30% ML noise reduction and upscaling.
I used to think of "AI accelerators" as some sort of gimmick, one-trick ponies like Apple's face recognition. Useful for a handful of apps, a few seconds at a time.
But no, in the RTX series of cards, the "AI stuff" is drawing nearly 100 watts and doing real work, making ray tracing viable and making 1080p look better than 4K.
Well we are still talking about few million rays per second. Without filtering you will end up with too much noise and blobs. Raytracing without noise filters is even for prerendered scenes unpractical and too expensive.
The challenge is heat (and cost) for the latest gen. GPUs are already prone to heat challenges, I can see the latest process requiring liquid cooling with their current design(s).
Is it a problem though? Enthusiasts have been using liquid cooling for years. Sure, hot computer, especially in warm climate is not the best thing to have around, but, I guess, that's not the end of the world. Air conditioners do exist after all.
> What's really freaky is that the RTX 30xx series of cards aren't even being manufactured on the current-gen process!
Exactly. This is why I'm not too worried about Intel not being on the smallest current-gen process, even though this AMD fans are jumping up and down about this.
Something noteworthy is the higher nm tend to be a bit faster than the smaller nm, so Samsung's 8nm isn't going to be slower than TSMC's 7nm.
The reason lower nm is better, beyond cost savings, is heat output and power consumption. If heat becomes a problem, the chip has to limit itself, as we're seeing with laptops. Nvidia's new cooling solution seems to fit the bill just fine, so 8nm is no problem.
I suspect the next generation is going to be on 7nm, is going to be a bit faster and will consume a fair bit less power, which will be nice, especially if you plan on training neural nets all day.
> The only thing holding back GPU performance is memory bandwidth.
Cooling and getting enough power to the chips is going to very challenging. The 3080 already needed a redesign of its cooling solution and power delivery mechanism. At 5nm or 3nm, things are going to be a lot more difficult.
Moore's law is about transistors. It's not dead, it's alive.
The problem is you can't use that law to increase speed of CPUs. You don't need more transistors, you need faster transistors and Moore's law does not help with that. We had 4GHz 15 years ago, most of CPUs still work under 4 GHz nowadays. But with more transistors you can implement some common operations in hardware (like crypto, vector operations). Also you can just increase core count. Or you can put energy-efficient core along with energy-hungry core. And those things happen with CPUs. Unfortunately many workloads are still single-thread capped.
GPU on the other side is inherently multi-threaded. You need 8k? You just need 4x transistors compared to 4k resolution. So GPUs will evolve even further and there's no limit, at least until we hit transistor wall.
You have to compare performance per watt though to see the real gains. They can't keep pumping more and more watts through these things indefinitely, surely we are already at the limit.
This is one thing that impressed me with the 3080 .. it looks like 2 of these will be around the performance (for some compute tasks) of 4 x 2080Ti The 4 2080Ti would max out at 1KW the 2 x 3080's at ~700W ...
The 3090 has a TDP of 350W which is a lot but it's not unprecedented either. There have definitely been cards from even the Fermi days in that region (590?)
Keep in mind that video games are optimizations on top of hacks on top of optimizations on top of hacks on... This is necessary, because we simply don't have the number-crunching capability to simulate the video game world better. If developers could have that horsepower, then they'd probably use it. In other words: as people get faster hardware video games will start using more computational power. This allows developers to scale back some optimizations and usually allows some different styles of games.
Another thing to consider is that the number of pixels pushed to your screen is one thing. Another thing is that you will need higher quality assets with more detail too.
Its funny to me that some people complain about nvidia price gouging etc, but if the sales of 3080 are any indication then based on basic supply&demand idea if anything they are massively underpriced. Lowering the prices would seem to just mean that more people would be disappointed from not actually being able to get them because the supply is so limited. The little I know about chip manufacturing also suggests that its likely that fabs have been producing ga102s at full capacity and its not like you can just get another fab to produce more of them in blink of an eye, so availability is probably really just constrained by production.
I think this is a marketing problem, where selling the 3080 at $1200 would hurt future sales once supply meets demand, even if the price is lowered later. Maybe the optimizing solution would be something (awful) like a car dealership where the price is negotiable based on volatile local factors.
As a complete layperson when it comes to economics, I’m curious if someone has extensively analyzed ‘market value’ vs MSRP over the lifecycle of a product rollout and come up with a workable formula for calculating the optimal price.
I actually was thinking about this after failing to acquire 3080. Maybe for limited launches like this it would make sense to auction off the supply? It would be arguably fairer system for consumers than selling out in seconds, and potentially both more profitable and less stressful for nvidia. Of course the auction system needs carefully designed, but maybe something like uniform price sealed auction with last accepted bid pricing (with reserve) would work here? Sealed auctions seem especially well suited because they feel less timing critical.
The Endgame has to be VR at this point - 4k is more than enough for "non-gameplay" games (i.e. I play CSGO at low settings in a stretched resolution but I play Assassin's Creed as hard as she'll go cap'n).
Until the angular resolution of a pixel is good enough an inch from your eye there will be increasingly higher resolutions in the HMDs
I think it's more that if you can render 8K at 60fps, you can probably render 4K at 180fps or more which gives you a lot of wiggle room on a 60hz or even 120hz display.
Even the 3080 is just able to render most modern games at 144 fps at fullhd (!) - and only barely able half of them at 1440p. There is no such thing as practical 4k or 8k rendering, at least not at pleasent refresh rates and without party tricks (DLSS).
The 3080 can push 65 fps on average on Assassin's Creed Odyssey at maxed graphics settings in 4K (https://www.eurogamer.net/articles/digitalfoundry-2020-nvidi...). That's perfectly playable, and Odyssey isn't particularly well optimized (They've got it on full but it has lots of settings that you can turn down but make almost 0 difference to the details your eyes actually notice).
The thing is....just play it without raytracing? I love control, it's a great game, but the raytracing is such a huge gimmick(in this game) that I don't really understand why it dominates every benchmark other than as a curiosity. I have an RTX card and raytracing on/off in Control is noticable....for the first 2 minutes. Then it just blends in with the rest of the look and I don't even notice those super expensive effects anymore. Switched them all off, kept the rest on ultra, can easily game in 1440p at 100fps+. The game is still incredibly pretty and has plenty of fantastic visual effects anyway.
The real benefit from RTX will be on the game development side. A lot of games currently don't look that different with RTX off, because a lot of artist time goes into making sure that non-RTX (which are the majority of the player base by far) lighting/effects still look realistic.
I love using a 32" 8k monitor for working with text/code. To me, as a software engineer who stares at code all day, higher quality text rendering is sufficient justification for 8k.
I tried playing a few games at 8k (just for fun) and found that it simply wasn't worth the frame rate hit, or really even that noticeable an improvement with the assets in the games I tried.
One nice property of 8k resolutions is that they have an integer scaling factor from both 4k (2x) and 1440p (3x), so if you have an 8k monitor you can play games at either of those resolutions with high quality scaling.
Yes, the difference between 4k and 8k text is easily noticeable for a 32" monitor, and I'm not particularly eagle-eyed. Text is sharper and finer details are better rendered. For videos and images I don't notice the difference in practice.
I had a 4k 32" monitor at work and found that it simply didn't give sharp, high resolution text, driven by either a Macbook or by a Linux box. And you wouldn't really expect that, either: a 4k 32" monitor is only ~140dpi, which is only marginally higher resolution than the ~100dpi screens we had for many years.
I think the best point of comparison for the Dell UP3218k monitor which I have is a Retina Macbook Pro screen: subjectively, it's a similar experience in terms of text sharpness and legibility (>200 DPI, glossy), just in a 32" form factor.
I suspect 8k at 32" is actually a bit higher resolution than necessary (~280dpi), but there's nothing else on the >30" high resolution monitor market other than Apple's 6k display, which is significantly more expensive.
Be aware that there's no Mac OS support for 8k displays, but Linux and Windows on a desktop with a reasonably modern NVidia GPU work great.
I have a 32" 4K monitor and the text quality is noticeably worse than on my Macbook Pro w/ Retina display and my iPad Pro. The dpi on the UP3218K is actually higher than my iPad (280 vs 264 iirc). That will likely be my next monitor unless another manufacturer introduces a 30-32" 8K monitor in the next few months.
/r/gaming is fuming about scalpers/resellers, and someone (allegedly) made a bot to bid on eBay listings just so they can't sell. Some were going for over $90,000 as of today.
I just think the problem of bots is hugely exaggorated - only at my office, in the small circle of my coworkers, I know at least 5 people who were trying to buy it yesterday. The actual real organic demand must have been huge, bots or no bots.
You're not operating on the same definition of paper launch as everyone else. It's generally used also for limited-quantity releases. Lots of chips can be produced in small batches that can't be mass-produced yet.
>The only atrocious thing is the FE edition is 100$ less
Some people are calling this rather scummy behavior on nvidias part.
"Nvidia has the cheapest card that this performance level ever!" is what the headlines will say. But if only a few hundred/thousand of these cards show up at this price over the next 6 months or so, and the AIB models go for $100+ more is really more of a way of manipulating headlines with no desire to follow up on the savings.
I don’t think “sparsity” in ML, where like 10...50% is sparse, is the same as sparsity in Physics, where 99.999% (add more 9 with larger problems) of your matrix is sparse.
Half the entries in your matrices need to be 0, then the Hardware will compress them and execute the matrix-matrix multiplication 2x as fast.
On a tensor instruction level.
> Ampere's benefit is that it can deal with dense and sparse matrices differently. Its cores are twice as fast as Turing's for dense matrix and four times as quick for sparse matrix that have all the needless weights removed. The upshot, per SM, is dense processing at the same speed - it has half the cores, remember - and twice the overall throughput for sparse processing.
Yeah, but ResNet does not have sparse matrices, so how could it use them? Post ReLU activations may be sparse, but I don't think that helps when used with a non-sparse Conv2d.
I don't know if there are any white papers with hard details yet (if anyone knows of one, please share!), but nVidia's marketing material[0] for the Ampere architecture claims the following:
"Sparsity is possible in deep learning because the importance of individual weights evolves during the learning process, and by the end of network training, only a subset of weights have acquired a meaningful purpose in determining the learned output. The remaining weights are no longer needed.
Fine grained structured sparsity imposes a constraint on the allowed sparsity pattern, making it more efficient for hardware to do the necessary alignment of input operands. Because deep learning networks are able to adapt weights during the training process based on training feedback, NVIDIA engineers have found in general that the structure constraint does not impact the accuracy of the trained network for inferencing. This enables inferencing acceleration with sparsity."
So the idea seems to be that at the end of training, there's fine tuning that can be done to figure out which weights can be zeroed out without significantly impacting prediction accuracy, and then you can accelerate inferences with sparse matrix multiplication. They consider training acceleration with sparse matrices an "active research area."
I could see it being nice for the sake of running large language models on consumer, or really cool for the few edge computing applications that can actually demand and power conventional GPUs (e.g. self-driving cars.) It's probably not a great boon to the researcher who wants to reduce their iteration timeline though.
The number on RTX Titan for fp16 is actually a bit foreign to me (3.5x faster than fp32). Not sure what's going on (maybe they tweaked batch size to fit more in GPU memory?). Need to wait more comprehensive ones.
You're using half as much memory for something that is vaguely O(n^2), so there's your roughly 4x factor speedup. But since things aren't perfect, it ends up being a bit lower.
They look great, but I would be much more interested in time to a specific accuracy on the validation set. Images/second doesn't really matter when the the computation may not be exactly the same (FP16 vs FP32, sparsity, as it was mentioned in another comment)
As far as I understand it, you cannot re-sell (rent) processing time on a gaming card from Nvidia for machine learning tasks. This is why you don't see Google and Amazon offering virtual machines with gaming cards. However if you're the end user, you can do whatever you want with the gaming card you bought (not that Nvidia can enforce this).