Hacker News new | past | comments | ask | show | jobs | submit login
Kepler, Nvidia's Strong Start on 28 nm (chipsandcheese.com)
61 points by treesciencebot on Nov 24, 2023 | hide | past | favorite | 59 comments



> Nvidia’s Fermi architecture was ambitious and innovative, offering advances in GPU compute along with features like high tessellation performance. However Terascale 2’s more traditional approach delivered better power efficiency.

Fermi was given the nickname "Thermi" for a good reason. AMD marketing had a field day: https://www.youtube.com/watch?v=2QkyfGJgcwQ

It didn't help that the heatsink of the GTX 480 resembled the surface of a grill: https://i.imgur.com/9YfUifF.jpg


Well AMD marketing turned out to be a joke, remember Poor Volta? I still think AMD haven't even recovered from that. There marketing for GPUs have been terrible since.


You mean Vega. Volta is a Nvidia arch.

Vega's marketing pushed Nvidia to make what is ending up to be the best product series they will ever make: series 10. That isn't much of a joke, it scared the shit out of Nvidia, and they blinked.

Vega was too late in the pipeline to stop, and Raja was ultimately let go for his role in the whole thing. He refused to start making more gamer-friendly cards, and was obsessed with enterprise compute/jack of all trades cards.

Immediately afterwards was a pivot towards a split arch, allowing multiple teams to pursue their intended markets.

Its why AMD won against Nvidia. Nvidia still has no real answer to AMD's success, other than continuing to increase card prices and making ridiculously large chips that have poor wafer yields. Nvidia won't even have working chiplets until series 60 or 70, while AMD already has them in a shipping product.


>Its why AMD won against Nvidia.

Won how?

>Nvidia still has no real answer to AMD's success

Which success? Answer to what? Are you from a paralel multiverse?

In this reality, it's the other way around. Nvidia is making so much money from the AI hype than AMD is the one trying to play catch-up.


Yeah, and before that they ruled the bitcoin craze. You literally couldn't buy Nvidia cards for an insanely long time time because the 30 series was so cost effective at mining that everyone and their uncle wanted in on the deal. AMD has some edge in the non high end server niche, but Nvidia rules everything else. In the AI craze AMD is not even an afterthought anymore, as everyone wants Nvidia. Even companies like Google and Tesla, who used to make their own AI chips.


Its weird people keep repeating that. I've been fortunate enough to meet a fair number of programmers who have moved onto make-AI-go-brrrr jobs, most of what they're handling is AMD hardware.

Nvidia strongly missed the boat by continually pushing CUDA lock in, while being on the Khronos steering committee and having made important contributions to OpenGL, OpenCL, Vulkan, and the SPIR-V ecosystem while simultaneously having pretty poor support for standard APIs in their software stack.

Highest perf per watt and perf per dollar is AMD land. AMD keeps moving fowards while Nvidia keeps making weird missteps. There is a reason why I said series 10 is the best they will ever make, there will never be a return to that: they're stuck in the same loop Raja was: make everything bigger for the sake of bigger, instead of making actual performance improving changes.


I actually work in the field (for quite a few years by now) and basically everyone I know (who knows what they're doing and isn't just getting started) uses Nvidia. AMD drivers for GPGPU compute sucked so bad for so long, noone even bothered to try them anymore. CUDA might be closed source, but it is lightyears ahead in stability and usability. If you want to get things done in AI, you buy Nvidia. Everyone who actually has to make these decisions knows that.


Still, weird to hear the "everyone who actually has to make these decisions, the ones that know that" continue to make expensive suboptimal choices.

To each their own, I guess.


They are optimal in the overall sense, because flops per dollar is not everything in the real world. Though I can see why many people outside the field would think so, since that is what manufacturers write in big letters on their marketing brochures.


You are partially right. Flops isn't everything, its just a major component of your capex and/or opex if you rely on enterprise compute for the majority of what you do (ie, machine learning).

The cost of doing business that isn't flops is the realm of developer time. Nvidia tooling is where developer time goes to die. People who keep claiming Nvidia's tooling is great for developers and easy to use is someone who has a skill mismatch for the industry, or worse, someone trying to sell you something.


“Poor Volta” was the line from AMD’s marketing team.


....which is hilarious because later on, the 580 had the same TDP as the 1070ti, but half the performance.


On a related topic: does anyone know why NVIDIA keeps reducing the bus width on their latest gen cards?

A 2060 has a 192-bit bus.

A 3060 has a 192-bit bus.

A 4060 has a 128-bit bus!

###

A 2070 has a 256-bit bus.

A 3070 has a 256-bit bus.

A 4070 has a 192-bit bus!


At most points in the product stack, memory frequency increased by enough to compensate for the narrower bus. Dropping to a narrower bus and putting more RAM on each channel allowed for some 50% increases in memory capacity instead of having to wait for a doubling to be economical. And architecturally, the 4000 series has an order of magnitude more L2 cache than the previous two generations (went from 2–6MB to 24–72MB), so they're less sensitive to DRAM bandwidth.


Seems to have gone badly for the 4060 & 4060 Ti specifically though, as they're at the same performance level as the previous gen 3060.

People could buy a 2nd hand 3070 for less money.


Yeah, those are the chips where memory bandwidth actually regressed for two generations in a row. Going from 256-bit to 192-bit to 128-bit in the xx60 segment would have been reasonable if they'd used the faster DRAM that the more expensive cards get, but the 4060 also got a much smaller memory frequency boost than its bigger siblings.


With respect: Have you actually measured performance or are you merely quoting Nvidia marketing?


There's not much measurement necessary for peak DRAM bandwidth; bit rate times bus width is pretty much the whole story when comparing GPUs of similar architecture and the same type of DRAM. That's not to say that DRAM bandwidth is the only relevant performance metric for GPUs (which is why a DRAM bandwidth regression doesn't guarantee worse overall performance), but there's really no need to further justify the arithmetic that says whether a higher bit rate compensates for a narrower bus width.

If you were specifically referring to the performance impact of the big L2 cache increase: I don't know how big a difference that made, but it obviously wasn't zero.


Wafer prices have increased and so has Nvidia's greed so you get less hardware for your money every generation.


Pretty much. Also lack of real competition.


Larger caches and compression as well as considerably higher memory clocks enable them to reduce the bus width whilst being able to hit the performance target.

Both the 2070 and 3070 have a memory bandwidth of 448GB/s the 4070 with its smaller bus has a memory bandwidth of 504GB/s.


I'll be honest: I find GPUs confusing. I use Hugging Face occasionally. I have no idea what GPU will work with what. How does Fermi, Kepler, Maxwell, Pascal, Turing, Ampere, and Hopper compare? How does the consumer version of each compare to the data center version? What about AMD and Intel?

* Arc A770 seems to provide 16GB for <$300, which seems awesome. Will it work for [X]?

* Older NVidia card go up to 48GB for about the cost of a modern 24GB card, and some can be paired. Will it work for [X] (here, LLMs and large resolution image generation require lots of RAM)?

I wish there was some kind of chart of compatibility and support.


Although you wouldn't know it from the documentation, both the GK10X/GK11X silicon had serious problems with the global memory barrier instruction that had to be fixed in software after launch. All global memory barriers had to be implemented entirely as patched routines, several thousand times slower than the underlying, broken silicon. Amusingly, that same hardware defect forced the L1 cache to be turned off on the first two keplers. I suspect if you ran the same benchmark on GK110 and vs the GK210 used in the article, you'd be surprised to see no effect from the L1 cache at all.


Samsung has a fab. Anyone knows why they don't want to enter the game and create an AI chip.


Create? Perhaps lack of IP/talent. The exynos chips have been lagging a bit behind last I checked. However, they are adding capacity to their fabs for AI chips, so it’s possible they may be planning one in the future.

https://www.digitimes.com/news/a20231121VL206/samsung-electr...


GPU architecture is just a simple core repeated a gazillion times plus some memory bus.


"just" is doing an astounding amount of heavy lifting here. Modern GPUs are extraordinarily complex systems.


If GPUs are so easy, then why has Intel taken like three swings at making a GPU in the last ten years and haven't come close to NVDA perf even once?


they haven't hired that guy you're replying to!


They are with tenstorrent. https://www.reuters.com/technology/samsung-manufacture-chips...

CEO: Jim Keller.


This is not a good comparison. Nvidia doesn't have a fab, but they are the lead player in the AI chip space. Intel had both and look where it got them. TSMC has a good model, and you can basically take any of your designs for the same node and manufacture it in any of their plants. Same strategy can be applied to Samsung, and they already help a lot on the memory segment. The new HBM3E memory chips for H200s might be even coming from Samsung.


Intel was infected with marketing people who diseased the entire C-Suite and drained the company for 8 years without doing anything other than make up new marketing names for the 5000-11000 series of chips and their stagnant iGPUs. That level of thievery would kill any leading company. ..


Intel actually did a lot in the last decade. Skylake alone was a massive improvement over the previous generation. AVX-512, countless open source projects—not to mention Optane, which was one of the most radical innovations in hardware in years. They did some really cool stuff with Altera IP, but the market didn't really care for it as much as it probably should have. Much of the value of Mobileye happened under Intel's ownership, too.

You mention their iGPUs, but their iGPUs actually got radically better in the timespan you mention; Iris, if properly cooled and not memory-choked, was actually pretty decent for a lot of purposes.

It does and historically has had a pretty terrible board, and its management hasn't been the greatest, but people who complain about them doing nothing for most of a decade generally are making a reactionary take about the lack of post-Skylake microarchitectures, while ignoring that pretty much everyone's performance gains got swallowed by vulnerability mitigations for years because they care about video games more than safety.


> Intel actually did a lot in the last decade.

That's the expected case when you have tons of cash and market dominance.

> are making a reactionary take about the lack of post-Skylake microarchitectures

Or about the general inability of large monopolies to effectively compete and continue to deliver according to market expectations.

> or years because they care about video games more than safety.

And yet, they still had a "server chips" division for much of that time.


you forgot to mention they did a great job building a hyper optimized 14nm process, spent at least 5 years on it...


NVIDIA Ampere consumer line was manufactured in Samsung fabs:

https://en.m.wikipedia.org/wiki/GeForce_30_series


That's why it was a power hungry hog.


They fabbed a lot of AI chips. Nvidia's Ampere chips up to 3090 were made by Samsung in 8 nm.

Interestingly the chips bigger than those found in the 3090 (so GA100s for A100s) were made by TSMC on a 7 nm node.

Maybe Samsung's yield was not high enough to produce those large chips (AD100 is 826 mmsq and would probably be even bigger on Samsung's node).


Don't they already create the majority of the world's AI chips? Their GPUs?


Why are they going back to 28nm?


This is a new article about the historical usage of 28nm.


I checked the title, the 28nm and the date of the article.


It's more of like documentary writing on 3-4 generations old architecture. The opening statements clearly indicate why they published


Indeed. These are cards from ~2012 (Kepler came about with the GTX 600 series).


I guess the only thing you’re missing then is the content of the article. ;)


[flagged]


> There are is a new wave of silicon companies that use photonics, in memory compute, deterministic/fogs data flow, etc

Those innovations are really cool sounding and make for great press releases, but are much less amenable to third-party benchmarking and analysis on account of those "innovations" largely still being stuck in the lab and small-scale proof of concept products, whereas Nvidia's GPUs are mass-market products that actually ship.


So they ship because they aren't innovative?

Google TPU for example has been shipped.


> So they ship because they aren't innovative?

Ok, you're clearly not trying to make sense here.

And there's no way you can believe that Google's TPUs have shipped as broadly as Nvidia's GPUs (or even just Nvidia's datacenter GPUs).


[flagged]


> November 24, 2023

It was just published a couple hours ago. Chips and cheese usually go back in time and post deep dives into old chips, not everything has to be cutting edge to get an insightful article.


No it's not.


[flagged]


I don't think 2.8T market cap company combining with 1.1T market cap company can be called an "acquisition", nor three letter agencies would ever approve such a deal.


AMD isn't asleep. MI300x announcement on Dec 6th.


How is AMD addressing CUDA dominance?


There have been several announcements:

"Azure announces new AI optimized VM series featuring AMD's flagship MI300X GPU" https://news.ycombinator.com/item?id=38280974

Doubling down on ROCm: https://www.theverge.com/23894647/amd-ceo-lisa-su-ai-chips-n...

Putting resources into pytorch: https://pytorch.org/blog/experience-power-pytorch-2.0/


Investing into making their stuff work with the abstracted libraries. Pytorch etc. Collaboration with Microsoft to make their hardware easier to get time on and test with. Starting to support tooling on their lower tier consumer GPUs.


Build a developer flywheel by making ROCm more accessible.

https://www.tomshardware.com/pc-components/gpus/amd-arms-thr...

That said, I'm approaching this from the other end, by make their high end GPUs available to developers.


Microsoft might start making their own silicon like everybody else.


They have the Azure Maia 100 (AI accelerator) and Azure Cobalt 100 (128 core ARM Neoverse derived cpu).

https://www.theverge.com/2023/11/15/23960345/microsoft-cpu-g...


I hope not, given Microsoft's typical strategy of: Embrace Extend Extinguish

It also means less competition. Corporations like Google, Microsoft, Amazon are embodiment of what is wrong with late stage capitalism and the result of lack of regulation.

If anything Microsoft and corporations of similar size should have been broken up decades ago.

They have too much power, they don't answer to anybody and are a form of unelected government that rules digital world.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: