Kepler, Nvidia's Strong Start on 28 nm

scrlk · on Nov 24, 2023

> Nvidia’s Fermi architecture was ambitious and innovative, offering advances in GPU compute along with features like high tessellation performance. However Terascale 2’s more traditional approach delivered better power efficiency.

Fermi was given the nickname "Thermi" for a good reason. AMD marketing had a field day: https://www.youtube.com/watch?v=2QkyfGJgcwQ

It didn't help that the heatsink of the GTX 480 resembled the surface of a grill: https://i.imgur.com/9YfUifF.jpg

anvuong · on Nov 25, 2023

Well AMD marketing turned out to be a joke, remember Poor Volta? I still think AMD haven't even recovered from that. There marketing for GPUs have been terrible since.

DiabloD3 · on Nov 25, 2023

You mean Vega. Volta is a Nvidia arch.

Vega's marketing pushed Nvidia to make what is ending up to be the best product series they will ever make: series 10. That isn't much of a joke, it scared the shit out of Nvidia, and they blinked.

Vega was too late in the pipeline to stop, and Raja was ultimately let go for his role in the whole thing. He refused to start making more gamer-friendly cards, and was obsessed with enterprise compute/jack of all trades cards.

Immediately afterwards was a pivot towards a split arch, allowing multiple teams to pursue their intended markets.

Its why AMD won against Nvidia. Nvidia still has no real answer to AMD's success, other than continuing to increase card prices and making ridiculously large chips that have poor wafer yields. Nvidia won't even have working chiplets until series 60 or 70, while AMD already has them in a shipping product.

FirmwareBurner · on Nov 25, 2023

>Its why AMD won against Nvidia.

Won how?

>Nvidia still has no real answer to AMD's success

Which success? Answer to what? Are you from a paralel multiverse?

In this reality, it's the other way around. Nvidia is making so much money from the AI hype than AMD is the one trying to play catch-up.

sigmoid10 · on Nov 25, 2023

Yeah, and before that they ruled the bitcoin craze. You literally couldn't buy Nvidia cards for an insanely long time time because the 30 series was so cost effective at mining that everyone and their uncle wanted in on the deal. AMD has some edge in the non high end server niche, but Nvidia rules everything else. In the AI craze AMD is not even an afterthought anymore, as everyone wants Nvidia. Even companies like Google and Tesla, who used to make their own AI chips.

DiabloD3 · on Nov 26, 2023

Its weird people keep repeating that. I've been fortunate enough to meet a fair number of programmers who have moved onto make-AI-go-brrrr jobs, most of what they're handling is AMD hardware.

Nvidia strongly missed the boat by continually pushing CUDA lock in, while being on the Khronos steering committee and having made important contributions to OpenGL, OpenCL, Vulkan, and the SPIR-V ecosystem while simultaneously having pretty poor support for standard APIs in their software stack.

Highest perf per watt and perf per dollar is AMD land. AMD keeps moving fowards while Nvidia keeps making weird missteps. There is a reason why I said series 10 is the best they will ever make, there will never be a return to that: they're stuck in the same loop Raja was: make everything bigger for the sake of bigger, instead of making actual performance improving changes.

sigmoid10 · on Nov 27, 2023

I actually work in the field (for quite a few years by now) and basically everyone I know (who knows what they're doing and isn't just getting started) uses Nvidia. AMD drivers for GPGPU compute sucked so bad for so long, noone even bothered to try them anymore. CUDA might be closed source, but it is lightyears ahead in stability and usability. If you want to get things done in AI, you buy Nvidia. Everyone who actually has to make these decisions knows that.

DiabloD3 · on Nov 30, 2023

Still, weird to hear the "everyone who actually has to make these decisions, the ones that know that" continue to make expensive suboptimal choices.

To each their own, I guess.

sigmoid10 · on Nov 30, 2023

They are optimal in the overall sense, because flops per dollar is not everything in the real world. Though I can see why many people outside the field would think so, since that is what manufacturers write in big letters on their marketing brochures.

DiabloD3 · on Dec 2, 2023

You are partially right. Flops isn't everything, its just a major component of your capex and/or opex if you rely on enterprise compute for the majority of what you do (ie, machine learning).

The cost of doing business that isn't flops is the realm of developer time. Nvidia tooling is where developer time goes to die. People who keep claiming Nvidia's tooling is great for developers and easy to use is someone who has a skill mismatch for the industry, or worse, someone trying to sell you something.

Toqoz_ · on Nov 25, 2023

“Poor Volta” was the line from AMD’s marketing team.

KennyBlanken · on Nov 25, 2023

....which is hilarious because later on, the 580 had the same TDP as the 1070ti, but half the performance.

dannyw · on Nov 25, 2023

On a related topic: does anyone know why NVIDIA keeps reducing the bus width on their latest gen cards?

A 2060 has a 192-bit bus.

A 3060 has a 192-bit bus.

A 4060 has a 128-bit bus!

###

A 2070 has a 256-bit bus.

A 3070 has a 256-bit bus.

A 4070 has a 192-bit bus!

wtallis · on Nov 25, 2023

At most points in the product stack, memory frequency increased by enough to compensate for the narrower bus. Dropping to a narrower bus and putting more RAM on each channel allowed for some 50% increases in memory capacity instead of having to wait for a doubling to be economical. And architecturally, the 4000 series has an order of magnitude more L2 cache than the previous two generations (went from 2–6MB to 24–72MB), so they're less sensitive to DRAM bandwidth.

justinclift · on Nov 25, 2023

Seems to have gone badly for the 4060 & 4060 Ti specifically though, as they're at the same performance level as the previous gen 3060.

People could buy a 2nd hand 3070 for less money.

wtallis · on Nov 25, 2023

Yeah, those are the chips where memory bandwidth actually regressed for two generations in a row. Going from 256-bit to 192-bit to 128-bit in the xx60 segment would have been reasonable if they'd used the faster DRAM that the more expensive cards get, but the 4060 also got a much smaller memory frequency boost than its bigger siblings.

noch · on Nov 25, 2023

With respect: Have you actually measured performance or are you merely quoting Nvidia marketing?

wtallis · on Nov 25, 2023

There's not much measurement necessary for peak DRAM bandwidth; bit rate times bus width is pretty much the whole story when comparing GPUs of similar architecture and the same type of DRAM. That's not to say that DRAM bandwidth is the only relevant performance metric for GPUs (which is why a DRAM bandwidth regression doesn't guarantee worse overall performance), but there's really no need to further justify the arithmetic that says whether a higher bit rate compensates for a narrower bus width.

If you were specifically referring to the performance impact of the big L2 cache increase: I don't know how big a difference that made, but it obviously wasn't zero.

wmf · on Nov 25, 2023

Wafer prices have increased and so has Nvidia's greed so you get less hardware for your money every generation.

FirmwareBurner · on Nov 25, 2023

Pretty much. Also lack of real competition.

dogma1138 · on Nov 25, 2023

Larger caches and compression as well as considerably higher memory clocks enable them to reduce the bus width whilst being able to hit the performance target.

Both the 2070 and 3070 have a memory bandwidth of 448GB/s the 4070 with its smaller bus has a memory bandwidth of 504GB/s.

frognumber · on Nov 25, 2023

I'll be honest: I find GPUs confusing. I use Hugging Face occasionally. I have no idea what GPU will work with what. How does Fermi, Kepler, Maxwell, Pascal, Turing, Ampere, and Hopper compare? How does the consumer version of each compare to the data center version? What about AMD and Intel?

* Arc A770 seems to provide 16GB for <$300, which seems awesome. Will it work for [X]?

* Older NVidia card go up to 48GB for about the cost of a modern 24GB card, and some can be paired. Will it work for [X] (here, LLMs and large resolution image generation require lots of RAM)?

I wish there was some kind of chart of compatibility and support.

throwit12 · on Nov 25, 2023

Although you wouldn't know it from the documentation, both the GK10X/GK11X silicon had serious problems with the global memory barrier instruction that had to be fixed in software after launch. All global memory barriers had to be implemented entirely as patched routines, several thousand times slower than the underlying, broken silicon. Amusingly, that same hardware defect forced the L1 cache to be turned off on the first two keplers. I suspect if you ran the same benchmark on GK110 and vs the GK210 used in the article, you'd be surprised to see no effect from the L1 cache at all.

dist-epoch · on Nov 24, 2023

Samsung has a fab. Anyone knows why they don't want to enter the game and create an AI chip.

vGPU · on Nov 24, 2023

Create? Perhaps lack of IP/talent. The exynos chips have been lagging a bit behind last I checked. However, they are adding capacity to their fabs for AI chips, so it’s possible they may be planning one in the future.

https://www.digitimes.com/news/a20231121VL206/samsung-electr...

amelius · on Nov 25, 2023

GPU architecture is just a simple core repeated a gazillion times plus some memory bus.

sweetjuly · on Nov 25, 2023

"just" is doing an astounding amount of heavy lifting here. Modern GPUs are extraordinarily complex systems.

sbierwagen · on Nov 25, 2023

If GPUs are so easy, then why has Intel taken like three swings at making a GPU in the last ten years and haven't come close to NVDA perf even once?

pc2slow4webpack · on Nov 25, 2023

they haven't hired that guy you're replying to!

graphe · on Nov 24, 2023

They are with tenstorrent. https://www.reuters.com/technology/samsung-manufacture-chips...

CEO: Jim Keller.

treesciencebot · on Nov 24, 2023

This is not a good comparison. Nvidia doesn't have a fab, but they are the lead player in the AI chip space. Intel had both and look where it got them. TSMC has a good model, and you can basically take any of your designs for the same node and manufacture it in any of their plants. Same strategy can be applied to Samsung, and they already help a lot on the memory segment. The new HBM3E memory chips for H200s might be even coming from Samsung.

systemBuilder · on Nov 24, 2023

Intel was infected with marketing people who diseased the entire C-Suite and drained the company for 8 years without doing anything other than make up new marketing names for the 5000-11000 series of chips and their stagnant iGPUs. That level of thievery would kill any leading company. ..

caslon · on Nov 25, 2023

Intel actually did a lot in the last decade. Skylake alone was a massive improvement over the previous generation. AVX-512, countless open source projects—not to mention Optane, which was one of the most radical innovations in hardware in years. They did some really cool stuff with Altera IP, but the market didn't really care for it as much as it probably should have. Much of the value of Mobileye happened under Intel's ownership, too.

You mention their iGPUs, but their iGPUs actually got radically better in the timespan you mention; Iris, if properly cooled and not memory-choked, was actually pretty decent for a lot of purposes.

It does and historically has had a pretty terrible board, and its management hasn't been the greatest, but people who complain about them doing nothing for most of a decade generally are making a reactionary take about the lack of post-Skylake microarchitectures, while ignoring that pretty much everyone's performance gains got swallowed by vulnerability mitigations for years because they care about video games more than safety.

akira2501 · on Nov 25, 2023

> Intel actually did a lot in the last decade.

That's the expected case when you have tons of cash and market dominance.

> are making a reactionary take about the lack of post-Skylake microarchitectures

Or about the general inability of large monopolies to effectively compete and continue to deliver according to market expectations.

> or years because they care about video games more than safety.

And yet, they still had a "server chips" division for much of that time.

froggit · on Nov 25, 2023

you forgot to mention they did a great job building a hyper optimized 14nm process, spent at least 5 years on it...

SoapSeller · on Nov 24, 2023

NVIDIA Ampere consumer line was manufactured in Samsung fabs:

https://en.m.wikipedia.org/wiki/GeForce_30_series

FirmwareBurner · on Nov 25, 2023

That's why it was a power hungry hog.

KeplerBoy · on Nov 25, 2023

They fabbed a lot of AI chips. Nvidia's Ampere chips up to 3090 were made by Samsung in 8 nm.

Interestingly the chips bigger than those found in the 3090 (so GA100s for A100s) were made by TSMC on a 7 nm node.

Maybe Samsung's yield was not high enough to produce those large chips (AD100 is 826 mmsq and would probably be even bigger on Samsung's node).

ClassyJacket · on Nov 25, 2023

Don't they already create the majority of the world's AI chips? Their GPUs?

m3kw9 · on Nov 25, 2023

Why are they going back to 28nm?

colechristensen · on Nov 25, 2023

This is a new article about the historical usage of 28nm.

m3kw9 · on Nov 25, 2023

I checked the title, the 28nm and the date of the article.

nitinreddy88 · on Nov 25, 2023

It's more of like documentary writing on 3-4 generations old architecture. The opening statements clearly indicate why they published

vitus · on Nov 25, 2023

Indeed. These are cards from ~2012 (Kepler came about with the GTX 600 series).

SequoiaHope · on Nov 25, 2023

I guess the only thing you’re missing then is the content of the article. ;)

frozenport · on Nov 24, 2023

[flagged]

wtallis · on Nov 25, 2023

> There are is a new wave of silicon companies that use photonics, in memory compute, deterministic/fogs data flow, etc

Those innovations are really cool sounding and make for great press releases, but are much less amenable to third-party benchmarking and analysis on account of those "innovations" largely still being stuck in the lab and small-scale proof of concept products, whereas Nvidia's GPUs are mass-market products that actually ship.

frozenport · on Nov 26, 2023

So they ship because they aren't innovative?

Google TPU for example has been shipped.

wtallis · on Nov 27, 2023

> So they ship because they aren't innovative?

Ok, you're clearly not trying to make sense here.

And there's no way you can believe that Google's TPUs have shipped as broadly as Nvidia's GPUs (or even just Nvidia's datacenter GPUs).

jewel · on Nov 24, 2023

[flagged]

treesciencebot · on Nov 24, 2023

> November 24, 2023

It was just published a couple hours ago. Chips and cheese usually go back in time and post deep dives into old chips, not everything has to be cutting edge to get an insightful article.

titaniumtown · on Nov 24, 2023

No it's not.

behnamoh · on Nov 24, 2023

[flagged]

jeeyoungk · on Nov 24, 2023

I don't think 2.8T market cap company combining with 1.1T market cap company can be called an "acquisition", nor three letter agencies would ever approve such a deal.

latchkey · on Nov 24, 2023

AMD isn't asleep. MI300x announcement on Dec 6th.

graphe · on Nov 24, 2023

How is AMD addressing CUDA dominance?

latchkey · on Nov 25, 2023

There have been several announcements:

"Azure announces new AI optimized VM series featuring AMD's flagship MI300X GPU" https://news.ycombinator.com/item?id=38280974

Doubling down on ROCm: https://www.theverge.com/23894647/amd-ceo-lisa-su-ai-chips-n...

Putting resources into pytorch: https://pytorch.org/blog/experience-power-pytorch-2.0/

depereo · on Nov 25, 2023

Investing into making their stuff work with the abstracted libraries. Pytorch etc. Collaboration with Microsoft to make their hardware easier to get time on and test with. Starting to support tooling on their lower tier consumer GPUs.

latchkey · on Nov 25, 2023

Build a developer flywheel by making ROCm more accessible.

https://www.tomshardware.com/pc-components/gpus/amd-arms-thr...

That said, I'm approaching this from the other end, by make their high end GPUs available to developers.

amelius · on Nov 24, 2023

Microsoft might start making their own silicon like everybody else.

jmisavage · on Nov 24, 2023

They have the Azure Maia 100 (AI accelerator) and Azure Cobalt 100 (128 core ARM Neoverse derived cpu).

https://www.theverge.com/2023/11/15/23960345/microsoft-cpu-g...

varispeed · on Nov 24, 2023

I hope not, given Microsoft's typical strategy of: Embrace Extend Extinguish

It also means less competition. Corporations like Google, Microsoft, Amazon are embodiment of what is wrong with late stage capitalism and the result of lack of regulation.

If anything Microsoft and corporations of similar size should have been broken up decades ago.

They have too much power, they don't answer to anybody and are a form of unelected government that rules digital world.