> Nvidia’s Fermi architecture was ambitious and innovative, offering advances in GPU compute along with features like high tessellation performance. However Terascale 2’s more traditional approach delivered better power efficiency.
Well AMD marketing turned out to be a joke, remember Poor Volta? I still think AMD haven't even recovered from that. There marketing for GPUs have been terrible since.
Vega's marketing pushed Nvidia to make what is ending up to be the best product series they will ever make: series 10. That isn't much of a joke, it scared the shit out of Nvidia, and they blinked.
Vega was too late in the pipeline to stop, and Raja was ultimately let go for his role in the whole thing. He refused to start making more gamer-friendly cards, and was obsessed with enterprise compute/jack of all trades cards.
Immediately afterwards was a pivot towards a split arch, allowing multiple teams to pursue their intended markets.
Its why AMD won against Nvidia. Nvidia still has no real answer to AMD's success, other than continuing to increase card prices and making ridiculously large chips that have poor wafer yields. Nvidia won't even have working chiplets until series 60 or 70, while AMD already has them in a shipping product.
Yeah, and before that they ruled the bitcoin craze. You literally couldn't buy Nvidia cards for an insanely long time time because the 30 series was so cost effective at mining that everyone and their uncle wanted in on the deal. AMD has some edge in the non high end server niche, but Nvidia rules everything else. In the AI craze AMD is not even an afterthought anymore, as everyone wants Nvidia. Even companies like Google and Tesla, who used to make their own AI chips.
Its weird people keep repeating that. I've been fortunate enough to meet a fair number of programmers who have moved onto make-AI-go-brrrr jobs, most of what they're handling is AMD hardware.
Nvidia strongly missed the boat by continually pushing CUDA lock in, while being on the Khronos steering committee and having made important contributions to OpenGL, OpenCL, Vulkan, and the SPIR-V ecosystem while simultaneously having pretty poor support for standard APIs in their software stack.
Highest perf per watt and perf per dollar is AMD land. AMD keeps moving fowards while Nvidia keeps making weird missteps. There is a reason why I said series 10 is the best they will ever make, there will never be a return to that: they're stuck in the same loop Raja was: make everything bigger for the sake of bigger, instead of making actual performance improving changes.
I actually work in the field (for quite a few years by now) and basically everyone I know (who knows what they're doing and isn't just getting started) uses Nvidia. AMD drivers for GPGPU compute sucked so bad for so long, noone even bothered to try them anymore. CUDA might be closed source, but it is lightyears ahead in stability and usability. If you want to get things done in AI, you buy Nvidia. Everyone who actually has to make these decisions knows that.
They are optimal in the overall sense, because flops per dollar is not everything in the real world. Though I can see why many people outside the field would think so, since that is what manufacturers write in big letters on their marketing brochures.
You are partially right. Flops isn't everything, its just a major component of your capex and/or opex if you rely on enterprise compute for the majority of what you do (ie, machine learning).
The cost of doing business that isn't flops is the realm of developer time. Nvidia tooling is where developer time goes to die. People who keep claiming Nvidia's tooling is great for developers and easy to use is someone who has a skill mismatch for the industry, or worse, someone trying to sell you something.
At most points in the product stack, memory frequency increased by enough to compensate for the narrower bus. Dropping to a narrower bus and putting more RAM on each channel allowed for some 50% increases in memory capacity instead of having to wait for a doubling to be economical. And architecturally, the 4000 series has an order of magnitude more L2 cache than the previous two generations (went from 2–6MB to 24–72MB), so they're less sensitive to DRAM bandwidth.
Yeah, those are the chips where memory bandwidth actually regressed for two generations in a row. Going from 256-bit to 192-bit to 128-bit in the xx60 segment would have been reasonable if they'd used the faster DRAM that the more expensive cards get, but the 4060 also got a much smaller memory frequency boost than its bigger siblings.
There's not much measurement necessary for peak DRAM bandwidth; bit rate times bus width is pretty much the whole story when comparing GPUs of similar architecture and the same type of DRAM. That's not to say that DRAM bandwidth is the only relevant performance metric for GPUs (which is why a DRAM bandwidth regression doesn't guarantee worse overall performance), but there's really no need to further justify the arithmetic that says whether a higher bit rate compensates for a narrower bus width.
If you were specifically referring to the performance impact of the big L2 cache increase: I don't know how big a difference that made, but it obviously wasn't zero.
Larger caches and compression as well as considerably higher memory clocks enable them to reduce the bus width whilst being able to hit the performance target.
Both the 2070 and 3070 have a memory bandwidth of 448GB/s the 4070 with its smaller bus has a memory bandwidth of 504GB/s.
I'll be honest: I find GPUs confusing. I use Hugging Face occasionally. I have no idea what GPU will work with what. How does Fermi, Kepler, Maxwell, Pascal, Turing, Ampere, and Hopper compare? How does the consumer version of each compare to the data center version? What about AMD and Intel?
* Arc A770 seems to provide 16GB for <$300, which seems awesome. Will it work for [X]?
* Older NVidia card go up to 48GB for about the cost of a modern 24GB card, and some can be paired. Will it work for [X] (here, LLMs and large resolution image generation require lots of RAM)?
I wish there was some kind of chart of compatibility and support.
Although you wouldn't know it from the documentation, both the GK10X/GK11X silicon had serious problems with the global memory barrier instruction that had to be fixed in software after launch. All global memory barriers had to be implemented entirely as patched routines, several thousand times slower than the underlying, broken silicon. Amusingly, that same hardware defect forced the L1 cache to be turned off on the first two keplers. I suspect if you ran the same benchmark on GK110 and vs the GK210 used in the article, you'd be surprised to see no effect from the L1 cache at all.
Create? Perhaps lack of IP/talent. The exynos chips have been lagging a bit behind last I checked. However, they are adding capacity to their fabs for AI chips, so it’s possible they may be planning one in the future.
This is not a good comparison. Nvidia doesn't have a fab, but they are the lead player in the AI chip space. Intel had both and look where it got them. TSMC has a good model, and you can basically take any of your designs for the same node and manufacture it in any of their plants. Same strategy can be applied to Samsung, and they already help a lot on the memory segment. The new HBM3E memory chips for H200s might be even coming from Samsung.
Intel was infected with marketing people who diseased the entire C-Suite and drained the company for 8 years without doing anything other than make up new marketing names for the 5000-11000 series of chips and their stagnant iGPUs. That level of thievery would kill any leading company. ..
Intel actually did a lot in the last decade. Skylake alone was a massive improvement over the previous generation. AVX-512, countless open source projects—not to mention Optane, which was one of the most radical innovations in hardware in years. They did some really cool stuff with Altera IP, but the market didn't really care for it as much as it probably should have. Much of the value of Mobileye happened under Intel's ownership, too.
You mention their iGPUs, but their iGPUs actually got radically better in the timespan you mention; Iris, if properly cooled and not memory-choked, was actually pretty decent for a lot of purposes.
It does and historically has had a pretty terrible board, and its management hasn't been the greatest, but people who complain about them doing nothing for most of a decade generally are making a reactionary take about the lack of post-Skylake microarchitectures, while ignoring that pretty much everyone's performance gains got swallowed by vulnerability mitigations for years because they care about video games more than safety.
> There are is a new wave of silicon companies that use photonics, in memory compute, deterministic/fogs data flow, etc
Those innovations are really cool sounding and make for great press releases, but are much less amenable to third-party benchmarking and analysis on account of those "innovations" largely still being stuck in the lab and small-scale proof of concept products, whereas Nvidia's GPUs are mass-market products that actually ship.
It was just published a couple hours ago. Chips and cheese usually go back in time and post deep dives into old chips, not everything has to be cutting edge to get an insightful article.
I don't think 2.8T market cap company combining with 1.1T market cap company can be called an "acquisition", nor three letter agencies would ever approve such a deal.
Investing into making their stuff work with the abstracted libraries. Pytorch etc. Collaboration with Microsoft to make their hardware easier to get time on and test with. Starting to support tooling on their lower tier consumer GPUs.
I hope not, given Microsoft's typical strategy of:
Embrace
Extend
Extinguish
It also means less competition. Corporations like Google, Microsoft, Amazon are embodiment of what is wrong with late stage capitalism and the result of lack of regulation.
If anything Microsoft and corporations of similar size should have been broken up decades ago.
They have too much power, they don't answer to anybody and are a form of unelected government that rules digital world.
Fermi was given the nickname "Thermi" for a good reason. AMD marketing had a field day: https://www.youtube.com/watch?v=2QkyfGJgcwQ
It didn't help that the heatsink of the GTX 480 resembled the surface of a grill: https://i.imgur.com/9YfUifF.jpg