Modern AI/ML is increasingly about neural nets (deep learning), whose performanc...

unnouinceput · on May 27, 2023

A few finer points:

1 - It's RTX 4070, not GTX 4070

2 - the 30 TFLOPS you mention are at the very top when overclocked, they go for 22 normally.

3 - Also those are single precision TFLOPS, as in 32 bit. What really matter nowadays is double precision. And in double precision a 4070 is 0.35 TFLOPS (or 350 GFLOPS). 2 orders of magnitude lower, still impressive though

HarHarVeryFunny · on May 27, 2023

For neural nets it's actually the opposite - half-precision bfloat16 is enough. You need large range, but not much accuracy.

Yes, the exact numbers are going to vary, but just giving a data point to indicate the magnitude of the numbers. If you want to quibble there's CPU SIMD too.

unnouinceput · on May 27, 2023

For gaming do matter those double precision. And we were talking about a certain GPU, which is used for gaming, not AI. Hence why the AI chips exists in the first place - dedicated hardware for dedicated tasks (or ASIC for short)

HarHarVeryFunny · on May 27, 2023

The NVIDIA cards are all dual-use for gaming and compute/ML. Some features like the RTX 4070's Tensor Cores (incl. bfloat16) are there primarily for ML, and other features like ray tracing are there for gaming.

unnouinceput · on May 28, 2023

The NVIDIA cards are for mining crypto-coins too, and they successfully did that for years, before being made obsolete in that area by ASICs. Now it's time for the same thing in AI/ML too, hence why AI chips are being developed, they are the ASICs for this domain. That's the big picture. In 2 to 3 years none is going to use NVIDIA gaming cards for AI/ML anymore, no matter how many GFLOPS future 5000/6000 series are going to offer. They will be for gaming only. End of story.

HarHarVeryFunny · on May 28, 2023

ASICs aren't magic - they are just chips designed to do a single function fast (e.g run a crypto mining algorithm) as an alternative to using a general purpose CPU/GPU whose generality comes at the cost of some performance overhead.

If your application calls for generality - like a gaming card's need to run custom shaders, or an ML model's need to run custom compute kernels, then an ASIC won't help you. These applications still need a general purpose processor, just one that provides huge parallelism.

It seems you may be thinking that all an ML chip does is matrix multiplication, and so a specialized ASIC would make sense, but that's not the case - an ML chip needs to run the entire model - think of it as a PyTorch accelerator, not a matmul accelerator.

Finally, the market for consumer (vs data center) ML cards is tiny relative to the gaming market, and these chips/cards are expensive to develop. Unless this changes, it doesn't make sense for companies like NVIDA to develop ML-only cards when with minimal effort they can leverage their data center designs and build dual-use GPU/compute consumer cards.