> Number one is this: as you know, we would love to take Nvidia’s IP through ARM’s network. Unless we were one company, I think the ability for us to do that and to do that with all of our might, is very challenging. I don’t take other people’s products through my channel! I don’t expose my ecosystem to to other company’s products. The ecosystem is hard-earned — it took 30 years for Arm to get here — and so we have an opportunity to offer that whole network, that vast ecosystem of partners and customers Nvidia’s IP. You can do some simple math and the economics there should be very exciting.
If Nvidia truely sees ARM as an opportunity kill off Mali and expand Geforce's install base, they might create an incentive for themselves to keep the ARM ecosystem alive. Note this is a pretty credible take I think - Samsung's next Exynos processor will have Radeon graphics and Nvidia can quickly nip that stuff in the bud by this (assuming Geforce is better and cheaper). If this plays out this way would simply be great for the ARM ecosystem. If Nvidia can sell Geforce like ARM sells Mali and leave the ARM ecosystem truly intact, I don't think many will lament the demise of Mali (although I expect some counterviews for this on HN :) ).
Having Geforce be a commanding presence in the ARM ecosystem might be a big problem for the future diversity of GPU vendors though, but that's something I'm interested in seeing play out at least. I do hope AMD can take the battle to Nvidia on ARM too, and that Qualcomm and PowerVR find ways to stay relevant.
Perfect replacement, from one unsupported hard to use non-open-source Linux-drivered gpu to another hard to use non open source Linux drivered gpu.
> I do hope AMD can take the battle to Nvidia on ARM too, and that Qualcomm and PowerVR find ways to stay relevant.
Qualcomm's Adreno amusingly enough came from AMD, as Imageon, in 2009. I definitely hope AMD can get back in the mobile game though. Good luck to powervr & anyone else too. You are probably up for some hard competition soon!!
In hindsight, I'm amazed at how well it worked given the schedule and the magnitude of the work involved in fusing those two architectures.
I have seen this concern countless times, but not why that matters to them. I can understand it matters from Linus' perspective as kernel maintainer, but from users perspective I can't really get the issue. Anyways, not all code that runs on your system is open source. Why not demand your bootloader manufacturer for open source with the same intensity. If say NVIDIA wants the driver to contain malicious backdoor, open source is not going to stop them.
No, but if such a backdoor were discovered, it would be possible to do something about it. The quote from the article in top comment here says it well: https://news.ycombinator.com/item?id=23944954
> Anyways, not all code that runs on your system is open source.
Not yet, but it is my goal. If/when that's achieved, I'd also like to run it exclusively on free/libre/open (FLO) hardware.
> Why not demand your bootloader manufacturer for open source with the same intensity.
My bootloader plays a much smaller role in my computing endeavors than my gpu. And less importantly, as a practical matter, there's many more major motherboard vendors, and few FLO alternatives; whereas both nvidia alternatives (amd, integrated intel) do have FLO drivers.
We basically have to keep going back to Nvidia & relying on them to be authorities on their own system & to be acting in everyone's interest when we try to develop extensions like VK_EXT_present_timing. This greatly injures the development of good standards, obstructing there from being a collaborative healthy environment where people can work together to make standards that work well.
Another example is EGLStreams which is not that bad but very different approach to handling video buffers from what everyone else does which has been obstructing the use of the newer Wayland display server on nvidia hardware for 6 years now. Nvidia wants their thing, & closed drivers mean no one can play around & attempt to make their hardware work if they wanted to. Ridiculously harsh limitations, no choice, no experimenting.
This creates a science-free vacuum where research & experimentation & progress wither, where peership dies.
Also: less package management work.
Folks want that too, and lots of ARM platforms use open source bootloaders already, mainly u-boot.
Honestly I really don't see the Geforce play. Nvidia tried it with Tegra and failed pretty miserably. Mali and Adreno pretty much cornered the market from that era(with PowerVR pivoting over to Apple). I just don't see their IP really hitting it home with the type of workloads you see in SoCs.
The primary driver for most SoC GPUs since the qHD days is to push pixels for the UI layers which has a different set of requirements and features compared to modern GPUs use for Gaming or ML. They're almost exclusively heavily tiling based and biased more towards power consumption than raw horsepower.
I want to be nice but I don't know what rock you've been sleeping under. TX2 is 3 years old & it's not just nvidia cornering the entire hapless AI/ML market with proprietary CUDA that keeps it & the jetson platform as the #1 most obvious go to for robotics, in spite of having a fairly trash terrible not very good ARM cpu: those couple of nv cores are way better than the rest of the arm offerings. Even outside ML, the nv gpu arm offerings radically outstrip everyone else. No one else has the ram bandwidth to begin to compete, much less the cores. 3 years have passed & the only one with the X2's 60GBps is the NV Xavier top end part with 137GBps. No one else is playing the league as NV has been playing with arm gpus. I don't know how you would call this massive raring colossal success a failure. Word nothing of the Nintendo Switch.
It is a failure for NVidia in that they launched it as a mainstream mobile phone/tablet part, and it's not used in anything outside the NVidia Shield in that market that I'm aware of (and the Switch of course).
But it has seen success in robotics and self driving cars, because NVidia makes it easy to use and it has great performance.
So it's not obvious how to judge it. Commercially, compared to their initial goals it is probably a failure. But it has opened new markets that didn't exist so that's successful?
If you want to really succeed in the SoC space(which is where Arm has) then what you need is volume and I don't think Tegra ever really made any serious inroads there.
The switch is a game console and so it sits somewhat outside of the traditional high-volume SoC market.
I thought panfrost was pretty good these days
Sway refuses to support their nonstandard apis
Any kernel bugs you report are tainted
I’m sure there’s more
I'd expect Nvidia to keep the Arm ecosystem alive, but only where they don't see an opportunity to take control. So they keep Radeon off Exynos (and incidentally why couldn't they do that anyway?) by offering GeForce. But elsewhere they can deny Arm IP to other firms where they have a competitive SoC.
Take one example. So Nvidia / Arm invest heavily in data center focused designs - are they really going to offer this IP to Ampere / Amazon on equal terms when compared to an Nvidia CPU? As Ben says 'color me skeptical'.
Essentially they will have a full overview of the Arm ecosystem - will lots of confidential information - and pick and choose where they drive out competitors whilst farming license fees from the rest.
Why try to compete on merit when you can swoop in and dictate the market to do your bidding? Coming at it from this perspective, it's looking like almost the same old Nvidia again :)
> Take one example. So Nvidia / Arm invest heavily in data center focused designs - are they really going to offer this IP to Ampere / Amazon on equal terms when compared to an Nvidia CPU? As Ben says 'color me skeptical'.
That I don't believe, indeed. They will crush their datacenter competition, but I think the server market here might ironically a bit more flexible in moving to different ISAs when compared to mobile.
Incidentally Intel's stock seems to have risen over the last day or so which is a bit surprising given the datacenter story?
This is the real danger among everything.
Mobile GPUs are like integrated graphics on CPUs, those didn't kill AMD or Nvidia because that entire market doesn't want to spend a dime to begin with.
(and sure opencl too but that's too loose a standard to have any platform effect, it's just a standard that everyone implements a little differently and needs to be ported to their own compiler/hardware/etc, so there is no common codebase and toolchain that everyone can use like with CUDA.)
Everyone laughed at Huang saying that NVIDIA is a software company. He was right.
Maybe nVidia will decide to push this type of tech a lot harder.
I really don't think so. For the high end PC consumer, integrated performance can be ignored because you'll get a dedicated card. Not so with mobile SOCs.
It makes sense that nVidia might want to squeeze out Qualcomm and get their GPUs in Samsung flagships, for example.
Reading this comment from Huang, I read it as if Nvidia wants to sell this package, but with Geforce IP instead of Mali IP.
My guess is they failed because Geforce at the time was too much of a "desktop architecture", and my guess is that it still is. But now that they own Arm they can just tell all Mali customers "take it or leave it."
They had tablets at one point, and they've continued to refresh their product line for automotive and set-top use.
I think that Backslash against Nvidia is due to it's somewhat hostile nature towards OpenSource (to be accurate, they didn't open source Nvidia's Graphic Drivers).
But, Nvidia's drivers work well(Nvidia's Linux drivers are somewhat on parity with Windows ones) . I heard that until AMD open sourced their drivers, lot of them were preaching that Nvidia is only way to Play high end games on Linux(also CUDA).
Personally, I prefer AMD GPU. I find their drivers to work great with Linux.
I think their unpopularity these days come from three factors:
1. For the gamer crowd, RTX 2000 was a price hike and even the higher end cards were not that impressive for performance. It looks like they were resting on their lead against AMD.
2. People love to root for the underdog and AMD was behind for the longest time.
3. For the developer crowd, the closed source Nvidia drivers were not great for Linux compatibility, and the Mac crowd couldn't care because they couldn't have Nvidia after their fight with Apple.
I think for a sample of the general public who have opinions on Nvidia, it's 1,2 and 3 in that order while for HN users, it's 3, 2 and 1 in that order.
AMD has had pretty much always a solid mid-tier offering in the last five years, but people would still rather buy an nVidia card (like, say, a 1050) instead of the better AMD (at the price point) because of the "1080 Ti" halo effect.
nVidias software stack is pretty bad. Sure, the driver core works pretty well. The nVidia control panel looks like it was last updated in 2004, which is actually true (go find some screenshots of it from 15 years ago running on XP, it looks the same). Now, that doesn't mean it has to be bad (don't fix what isn't broken), but the NCP is actually clunky to use. Not to imply AMD's variant is necessarily better, but at least they're working on it.
The nVidia "value-adds" like shadow play and so on are all extremely buggy. And while their core driver may be good, you still get somewhat frequent driver resets and hangs with certain interactive GPU compute applications.
Do people actually think this way? "I'll buy the nvidia 2060 super rather than the amd 5700 xt (for the same price, and benchmarks higher), because nvidia has the 2080 ti"
Also, there's been constant driver problems literally since the ATI days, just recently drivers basically crippled Navi for the entire duration of the product generation, drivers crippled Vega previously, Fiji was a driver mess, Hawaii and Tahiti were not problem free either (the famous "company A vs B vs C" article  notes that "company B's drivers are bad and are on a downtrend" and that was during the heyday of GCN, in 2014, those were the "stable drivers" that everyone rhapsodizes about in comparison to the mess that is Navi/Vega). Terascale was a fucking mess too.
AMD products often have weaker feature sets. It took them years to catch up with G-Sync (offering low-quality products with poor quality control for years until NVIDIA cleaned things up with GSync Compatible certification). NVENC is far better than AMD's equivalent (Navi's H264 encoder is still broken entirely as far as I know, it provides less than realtime encoding speed and extremely poor quality), so if you want to stream with AMD you have to use your CPU and crush your framerate, or purchase a much more expensive CPU. AMD has no answer for DLSS, and is just implementing their first generation RTX support with the next generation.
This is what I refer to as the "NVIDIA mind control field" theory. That consumers are just so wowed by the NVIDIA brand that they can't help themselves. The reality is AMD simply has not been that compelling an offering for that much of the time. There has been a lot of poor execution over the years from the Radeon team and prices have often not been as good as people remember them to be.
And a few good products don't change that either. Like, it took something like 8 years of AMD slacking off (Raja mentioned in a presentation that around 2012 thatAMD management thought "discrete GPUs were going away"  and pulled the plug on R&D - certainly a very attractive idea given their budgetary problems at the time) for NVIDIA to reach their current level of dominance. AMD has never led the market for 8 years at a time. Maybe a year tops, usually NVIDIA has responded pretty quickly with price cuts and new products. NVIDIA has never let off the pedal the way AMD did (and yes money was the reason but that doesn't matter to consumers, they're buying products not giving to charity).
I used the Fury(Fiji) and the VII(Vega 20) on linux and windows, and haven't expereinced any of the crazy shit people have claimed. Such as....
>(Navi's H264 encoder is still broken entirely as far as I know, it provides less than realtime encoding speed and extremely poor quality), so if you want to stream with AMD you have to use your CPU and crush your framerate
Where did you hear that? Not even slightly true. I used GPU encoding on both cards (@1080p, 60fps, 5500kbps, 1 second keyframe interval, 'quality' preset, full deblocking filters, and pre-pass turned off because I'm not insane) with zero issues. I would have liked to see more of an improvement in the VII's quality vs the Fury, but I wouldn't go as far as calling it non-functional.
I don't know who managed to get any of those to encode at less than realtime. I've done 1440 at 15500kbps to another machine for re-encoding and I can still play star citizen on High@60 fps with the same pc that's encoding.
And for the record, I also have a GTX 1660, 1080, and had a 770 before the Fury. I'm not fanboying, just relaying my experience.
Edit: Just to be clear, I'm not saying Navi is good to go. I'm refuting the general statement "if you want to stream with AMD you have to use your CPU and crush your framerate"
edit2: Full config for reference
"Video.API": "Direct3D 11",
"lastVideo.API": "Direct3D 11",
> I used the Fury(Fiji) and the VII(Vega 20) on linux and windows, and haven't expereinced any of the crazy shit people have claimed. Such as....
I don't know what kind of reply you're expecting. You've managed to use two AMD cards and not have driver crashes? Uh, good for you, but that doesn't mean the rest of us are "full of crap". I've also owned two AMD cards, and for both of them the drivers were flaky, on Linux and Windows. I'm sure they're not broken for everyone, but they were broken for me. Maybe if I bought another AMD card I'd get lucky this time around, but I'm not going to take the risk.
>I'm refuting the general statement "if you want to stream with AMD you have to use your CPU and crush your framerate"
It turns out anecdotes are different for everyone, that doesn't make them full of crap.
Indeed, wouldn't have included it if I didn't think the same.
I do however think making a blanket statement which has direct anecdotal evidence refuting it is pretty much bunk.
When I used Linux the closed source Nvidia drivers were better than anything else and easily available, the complaints around them seemed mostly to be ideological?
The price complaints seemed mostly about 'value' since the performance was still better than the competition in absolute terms.
The big issue with Nvidia GPUs in Linux these days is with Wayland. There are some graphics APIs that are the current way to create contexts, manage GPU resources etc but Nvidia went their own way which would require compositors to have driver specific code.
Many smaller compositors (such as the most popular tiling one for Wayland) don't want to write or support one implementation for Intel/AMD and one for Nvidia so they either don't support Nvidia or require snarky sounding cli options to enable Nvidia at the cost of support.
I'd suspect part of the reason Nvidia went their own way is because their way is better? Is that the case - or is it more about just keeping things proprietary? Probably both?
If I had to guess, some mixture of ability to improve things faster with tighter integration at the expense of an open standard (pretty much what has generally happened across the industry in most domains).
Though this often leads to faster iteration and better products (at least in the short term, probably not long term).
I'd suggest his users asking for Nvidia support are evidence of this being wrong.
That aside though, it seems like Nvidia's proprietary driver doesn't have support for some kernal APIs and the other vendors (AMD, Intel) do?
I wonder why I've always had better experience with basic OS usage using the Nvidia proprietary driver over AMD in Linux. Maybe I just didn't use any applications relying on these APIs. Nouveau has never been good though.
Not really a surprise given the tone of that blog post that Nvidia doesn't want to collaborate with OSS community.
Don't people rely on Nvidia for deep learning workflows? I thought that stuff ran on Linux? Maybe this is just about different dev priorities for what the driver supports?
It all comes down to there always being two ways to do things when interacting with a GPU under Linux: The FOSS/vendor-neutral way, and the Nvidia way.
The machine learning crowd has largely gone the Nvidia way. Good luck getting your CUDA codebase working on any other GPU.
The desktop Linux crowd has largely gone the FOSS route. They have software that works well with AMD, Intel, VIA, and other manufacturers. Nvidia is the only one that wants vendor-specific code.
> Nvidia is the only one that wants vendor-specific code.
Isn't that because CUDA is better and that tight software/hardware integration is more powerful?
If it wasn't then presumably people would be using AMD GPUs for their deep learning, but they're not.
NVIDIA has been more or less good with desktop Linux + Xorg for the last 5-7 years (not accounting for the non support for hybrid graphics on Linux laptops).
I think you can use an NVIDIA GPU as a pure accelerator without it driving a display very easily.
Just because company A makes the fastest card does not imply that company A makes the faster card at every price point.
Well - they charged more at each price point because they were faster.
At some prices I think it wasn't enough to justify the extra cost from a price to performance ratio, but that doesn't seem like a reason to think they're bad.
It's possible I'm a little out of date on this, I only keep up to date on the hardware when it's relevant for me to do a new build.
Are you joking? How is it the same price point when they are charging more?
E.g. the bottom of this page:
At the lower price point nVidias line up is similarly crowded:
Either way, there would be no reason to group the 350$ and the 400$ card but not the 300$ and the 350$ card.
BTW, AMD definitely didn't always have a price/performance advantage, e.g. the nice scatter plots here from ten years ago (that I randomly found):
Before that mess (around the 900 series), nouveau was fast.
I had my Linux install in a VM for a number of years because fglrx was an aggravating experience (unstable, often broke on kernel updates) to use and radeon had abysmal performance on my 290x. To the point that my laptops 540m (on nouveau) would outperform the 290x under linux.
I still have that 290x in one of my systems, and since amdgpu supported it it has been a very pleasant experience. But even that took a while as they started with 3xx and supported the 4xx cards before going back for the 2xx cards.
Windows drivers are starting to work better now that Microsoft adopted the Linux model of centralizing them. But they still aren't any great. For any non-usual hardware the drivers still come bundled with all kinds of shit, stop working after a while, and get bugs at random.
No. Intel ships WHQL Certified broken WiFi drivers via Windows Update. It took me three weeks to try all available versions, settings etc to troubleshoot. At last, I've roll backed the driver to earliest version and pinned it there. Now the wireless card is working as intended.
Device is an HP Spectre X2 convertible tablet so everything is a package.
Edit: Use kinder words.
The problem is, Intel's all newer drivers are broken for that particular card, even the ones installed by Intel's Driver Update Utility (I've tried that too, yes).
It's the same time that Intel's e1000e Linux drivers started breaking older cards so, there's something off in that department IMHO.
The add insult to injury, both Windows and Linux drivers' release notes claim that the cards in question are supported by the respective, newer drivers.
I would add also Nvidia's hostility towards standards. They are pretty much the Apple of the GPUs : Optix, RTX, CUDA, etc. they are screwing up the whole ecosystem by making closed APIs that only work on their platform.
If you are a graphic or ML dev you can only hate Nvidia for how they are hurting standards and making our job so much harder while we could just agree on Khronos standards.
Hopefully some standards catch up (raytracing in Vulkan coming) but some other don't (ML is CUDA only)
With CUDA I get a comfortable C++ dev environment that integrates with their products AND Visual Studio, FOR FREE. I played a bit with raw OpenCL a couple years ago, and it's.. uncomfortable. I've looked for free (as in beer) SYCL implementations... couldn't find any. IDEs with integrated debugging for OpenCL... couldn't find any, at least not for free. (Intel charged $$$ for their full-fledged OpenCL tools.)
CUDA is popular because the development environment is free AND comfortable. Has the situation changed the last years?
> Khronos standards
You could argue the other way: CUDA has become the de facto standard in certain areas and other vendors should release CUDA-compatible tooling for their products. I as a developer want to get work done and I don't care about ideals like "standards". I choose and recommend tooling that gives the shortest way to the result.
I'm mostly working with REST these days and I curse the HTTP protocol and URIs, etc. that are a mishmash of hacks to get it work with 7-bit charsets. At one occasion I said: "Just because something has a 30 year old RFC doesn't mean it's suitable for use in _today's_ applications." Standards can just as much hold back and prevent better ideas from emerging as they can help with interoperability.
The "I'm just doing my job" mentality often gets us in very bad situations down the road (be it for Privacy, Ecology, Standards, etc.)
Totally agree that for "getting the job done" and having free tools then yes CUDA is better.
For your REST example, I understand your frustration, but now think what would it be like without standards :
You would have to pay 100$ to use Apple.iHTTP©®™ + implement a version for Google G.HTTP + implement yet another version for Microsoft Visual.HTTP™. For each you would have to get a licence, agree to conditions, you would have to buy both a windows & mac because ofc they wouldn't be cross platform, each one would only work in one browser, they could decide over a night to remove your right to use HTTP, Microsoft Visual.HTTP dev tools would be crap but couldn't use the cool one from another company because not compatible, etc.
Might seems like an exaggeration but when you look at the Epic vs. Apple, Google AMP, DirectX & Metal, Apple developer fees & conditions, Web DRMs, the lack of cross-compability, etc. Well that's what HTTP would look like if people didn't care about standard.
Also standard don't have to always be backward compatible (for e.g OpenGL vs. Vulkan) that's specific to the web, not to standardisation.
Are you blaming NVIDIA for Vulkan not being on par with their proprietary APIs? Presumably Nvidia is not sabotaging Vulkan, rather the issue could be that Vulkan is an abstraction that supports multiple GPUs, and that improving Vulkan is resource intensive.
They don't even have to do that. All they have to do is allow nouveau to actually drive the graphics card. Maybe it won't be as performant as the proprietary driver but it will actually work out of the box with no system instability.
They have been buying some other data center/HPC companies as well this year like Mellanox and Cumulus, to me it almost seems like they want to own the entire data center stack and eventually provide a similar offering as AWS, Azure and Google Cloud.
AWS has been deploying at least one ARM chip (Annapurna/Nitro) on every single EC2 server for 5+ years. Surely they would have made sure their license ensures future rights to keep using ARM at a reasonable price. Their Graviton instances are effectively just one more ARM chip, so an Intel instance has 2 Arm chips (1 for EBS, 1 for networking) while a Graviton/Arm instance has 3 (EBS, networking, CPU).
Unless the Arm license reads something like "If you make a really big multi-core chip we get to charge you more"... how does the x86 -> Arm transition actually move the needle for Arm Holdings / Nvidia?
2. If it's Nvidia / Arm actually selling the CPU (and not just the ISA) you'd expect margins to be higher again.
The hyperscale/cloud datacenter market is single-digit millions of CPUs shipped per year. So if the whole cloud went to Arm tomorrow it would be less than 1% of total Arm chip shipments.
My point is that unless there's a way to extract higher $ per chip in the datacenter then actually it doesn't make a difference for Arm.
Edit: By datacenter market I'm really referring to AWS/GCP/Azure... the rest of the market ain't going en-masse to Arm anytime soon.
That wouldn't make business sense.
But it also wouldn't make business sense for Amazon to hang their hat on ARM without assurances or contractual guarantees that they can keep selling ARM chips for a reasonable price into the future. Where I define "reasonable" as Arm taking a much, much smaller cut of the final price than what Intel can do.
I'm prepared to make a prediction: In 5 years Nvidia will be the only significant Arm supplier in the datacenter and all the other entrants will have given up - and that this explains why Intel's stock has risen over the last couple of days.
- NVIDIA will use RISC-V processors in many of its products
- We are contributing because RISC-V and our interests align
- Contribute to the areas that you feel passionate about!
Since then, I didn't observe much NVIDIA activities in RISC-V software community. But some NVIDIA people do participate Virtual Memory Task Group.
Apparently, the stance have changed.
Ie a library thats optimized to drive a bunch of hosts with ARM cores, strapped to nvidia gpu's, connected by mellanox interconnects.
Sell it a a unit to the public clouds for a massive markup, or maybe offer on their own cloud for an unbeatably low cost.
ARM is effectively embarrassingly parallel, low-power compute, and GPU's are parallel high-power compute.
If NVidia got into datacenter design (with ARM+GPU), dropped K8S+S3+Postgres onto it, charged $1999.95/mo for access, what could they deliver to customers?
What would a vertically integrated datacenter look like where you are the designer + manufacturer for everything from the concrete up?
A lot like AWS, Oracle Cloud, and soon to be Google Cloud and Azure?
people are losing their shit but this isn't even that novel lol, quite a few other players already own their full stack.
CUDA provides software lock-in to Nvidia GPUs. This would provide software lock in to the combined Nvidia Arm / Nvidia GPU platform.
This all sounds like buzzword bingo and vaporware.
It is one possible future.
It also highly depends on the definition of "our time"
We are joining arms with Arm to create the leading computing company for the age of AI.
AI is the most powerful technology force of our time.
Learning from data,
AI supercomputers can write software no human can.
Amazingly, AI software can perceive its environment,
infer the best plan, and act intelligently.
This new form of software will expand computing to every corner of the globe
NVidia will integrate Arm cores into their GPUs where the GPU itself will start to become server hosted on a PCIe bus, potentially with its own Infiniband connection.
EDIT - explaining the benefits for developers, and therefore consumers.
A "cloud to edge" stack from hardware to the application layer could create new application patterns that can tremendously accelerate autonomous driving, everyday robots, gaming. It could democratize this for small (maybe indie) developer teams. Wouldn't this have a great impact on consumers?
You should welcome a third competitor to the duopoly that has strangled CPU development. We surely could have made much more progress if x86 was not limited to only two (really three) competitors, you can already see how much change that AMD getting back in the game has made.
And I'm not sure Huang is going to burn it all down anyway. That seems like it would be a shortsighted move that would negatively affect the long-term value of ARM.
But I mean - I don't think anyone can deny that Huang would do great things with ARM. Terrible, perhaps, but also great.
(and it says a lot that a lot of people are probably nodding along with a comparison of one of the greatest tech CEOs of all time to literal Voldemort, the public opinions on NVIDIA and Huang are just ridiculously hyperbolic)
Plus CUDA shows what can happen even when alternatives are available (OpenCL for example) you just have to use the hardware / software integration to be sufficiently ahead and establish a virtuous circle.
This suggests that as a consumer, you haven't benefited from this.
I was pointing out that the above commentors personal experience showed otherwise.
The fact they own their full stack, has resulted in some of the most anti consumer parts of the business. Allowing monopolization on repairs / part pricing etc
NVidia doubly fucks over desktop Linux with it's proprietary drivers and CUDA beating Khronos Group stuff, and I don't want their abilities to grow.
I'm so tired of Nvidia getting away with this blatant falsehood.
They weren't even actual-cores when Nvidia started counting SIMT lanes as "cores" (how many hands do you have, 2 or 10? And somebody who writes twice as fast as you must obviously have 20 hands, yes?), and now that the cores can, under some conditions, do dual-issue, they are counting each one double.
What's next, calling each bit in a SIMT lane a "core"?
No, because that would break the convention they've used since the introduction of the term CUDA core: it's the number of FP32 multiply-accumulate instances in the shader cores. Nothing more, nothing less. If you see more into it, then that's on you.
You may not like that they define core this way (all other GPU vendors do it the same way, of course), but they've never used a different definition.
BTW, I checked the 8800 GTX review and there's no trace of 'core' being used in that way. It was AMD who started calling FP32 ALUs 'shader cores' or 'shaders' with the introduction of the ill fated 2900 XT. Since comparing the number of same-function resources is one of the favorite hobbies of GPU buyers, Nvidia subsequently started using the same counting method and came up with the terms "CUDA core."
The term "core" had a concrete meaning before Nvidia defined it to mean number of FMA units, and it's obviously no accident: they get to claim they have 32x more cores than would be the case by common definition. They are the odd one out; should Intel and AMD start multiplying their number of cores by the SIMD width? You would accept that as a "core"?
Now, as I wrote above, they have changed it AGAIN: they are now double-counting each "core" because it has some (limited) superscalar capability.
It was AMD/ATI who started doing it. Nvidia followed, and they didn't really have a choice if they wanted to avoid an "AMD has 320 shader cores yet Nvidia only has 16" marketing nightmare.
> Now, as I wrote above, they have changed it AGAIN: they are now double-counting each "core" because it has some (limited) superscalar capability.
They did not. In Turing, there's one FP32 and one INT32 pipeline with dual issue, in Ampere, there's one FP32 and one (INT32+FP32) pipeline, allowing dual issue of 2 FP32 and INT32 is not being used.
That can only be done if there are 2 physical FP32 instances. There is no double counting.
If your point is that this second FP32 unit can't always be used at 100%, e.g. because the INT32 is used, then see my initial comment: it's the number of physical instances, nothing more, nothing less. It doesn't say anything about their occupancy. The same was obviously always the case for AMD as well, since they had a VLIW5 ISA when they introduced the term, and I'm pretty sure that those were not always fully occupied either.
My point is that the second FP32 unit is not a core, in the sense of https://www.amazon.com/Computer-Architecture-Quantitative-Jo... which, it is my understanding, was a well-established standard; nothing more, nothing less.
But, again, if you want to blame someone for this terrible, terrible marketing travesty, start with AMD. They started it all...
My point was simply been that, contrary to your assertion, Nvidia and AMD have never changed the definition of what they consider to be a core, even if that definition doesn't adhere to computer science dogma.
I would love people to backlash on that too. Unfortunately the ship has sailed.
This is not true, just like a shader core with AMD was not a GPU thread.
For example, the 2900 XT had 320 shader cores, but since it used VILW-5 ISA, that corresponds to 64 GPU threads.
Similarly, an RTX 3080 has 8704 CUDA cores, but there are 2 FP32 ALUs per thread, resulting in 4252 threads, and 68 SMs since, just like Turing, there are 64 threads per SM.
They then dropped the vCore, so it is now simply Core.
IIRC, ATI started to use proper FP32 ALUs for both pixel shading and video processing/acceleration  around the same time. I guess doing this stuff needs more than simple MUL/ACC instructions.
So, there's no misdirection here.
If you're a computer science fundamentalist like pixelpoet who wants to stick to definition of core as what's AMD and Nvidia call "CU" and "SM", using "threads and warps" instead of "strands and theads", then "CUDA core" is obviously jarring.
It's simply a case where marketing won. At least Nvidia and AMD, and now Intel, are using the same term. You can go on with your live, or you can whine about it, but it's not going to change.
From my PoV, any functionally complete computational unit (in the context of the device) can be called a core. Should we say that GPUs has no core because they don't support 3D Now! or SSE or AES instructions?
Consider an FPGA. I can program it to have many small cores which can do a small set of operations or I can program it to be single but more capable core. Which one is a real core then?
That is in contrast to your FPGA example. Either one big core or many small cores can all execute their own threads.
It's not that simple either, you've got stuff like SMT and CMT where you have multiple threads executing on a single set of execution resources - but CUs are clearly on the line of "not a self-contained core".
Is it possible for you to point me to the right direction so, I can read how these things work and bring myself up to speed?
SMT is more like cramming two threads into a core and hoping they don't compete for the same ports/resources in the core. CMT is well... we've seen how that went.
The short of it is that GP is right, SIMT is more or less "syntactic sugar" that provides a convenient programming model on top of SIMD. You have a "processor thread" that runs one instruction on an AVX unit with 32 lanes. What they are calling a "CUDA core" or a "thread" is analogous to an AVX lane, the software thread is called a "warp" and is executed using SMT on the actual processor core (the "SM" or "Streaming Multiprocessor"). The SM is designed with a lot of SMT threads (warps) being able to be living on the processor at once (potentially dozens per core), being put to sleep when they need to do long-term data accesses, and then it swaps to some other warp to process while it waits. This covers for the very long latency of GDDR memory accesses.
The distinction between SIMT and SIMD is that basically instead of writing instructions for the high-level AVX unit itself, you write instructions for what you want the AVX lane to be doing and the warp will map that into a control flow for the processor. It's more or less like a pixel shader type language - since that's what it was originally designed for.
In other words, under AVX you would load some data into the registers, then run an AVX mul. Maybe a gather, AVX mul, and then a store.
In SIMT, you would write: outputArr[threadIdx] = a[threadIdx] * b[threadIdx]; or perhaps otherLocalVar = a[threadIdx] * threadLocalVar; The compiler then maps that into loads and stores and allocates registers and schedules ALU operations for you. And of course like any "auto-generator" type thing this is a leaky abstraction, it behooves the programmer to understand the behavior of the underlying processor, since it will faithfully generate code with suboptimal performance.
In particular, in order to handle control flow, basically any time you have a code branch ("if/else" statement, etc), the thread will poll all its lanes. If they all go one way it's good, but if you have them go both ways then it has to run both sides, so it takes twice as long. The warp will turn off the lanes that took branch B (so they just run NOPs) and then it will run Branch A for the first set of the cores. Then it turns off the first set of cores and runs Branch B. This is an artifact of the way the processor is built - it is one thread with an AVX unit, each "CUDA core" has no independent control, it is just an AVX lane. So if you have say 8 different ways through a block of code, and all 8 conditions exist in a given warp, then you have to run it 8 times, reducing your performance to 1/8th. Or potentially exponentially more if there is further branching in subfunctions/etc.
(obviously in some cases you can structure your code so that branching is avoided - for example replacing "if" statements with multiplication by a value, and you just multiply-by-1 the elements where "if" is false, or whatever. But in others you can't avoid branching, and regardless you have to manually provide such optimizations yourself in most cases.)
AMD broadly works the same way but they have their own marketing names, the AVX lane is a "Stream Processor", the "warp" is a "wavefront", and so on.
AVX-512 actually introduces this programming model to the CPU side, where it is called "Opmask Registers". Same idea, there is a flag bit for each lane that you can use to set which lanes an operation will apply to, then you run some control flow on it.
1 kiB, on the other hand, is 1024 :)
> Prior to the definition of the binary prefixes, the kilobyte generally represented 1024 bytes in most fields of computer science, but was sometimes used to mean exactly one thousand bytes. When describing random access memory, it typically meant 1024 bytes, but when describing disk drive storage, it meant 1000bytes. The errors associated with this ambiguity are relatively small (2.4%).
I think it all ends up being useless at best and completely misleading at worse. The reality is that I don't even know if I want to buy their product or not. I guess that's what reviews are for? But why do reviewers have to do the job of Nvidia's marketing department? Seems strange to me.
In all cases it helps to know your must-haves and prioritize accordingly, given that it's rarely the case that you can "just" get the new one and be happy: even if it benchmarks well, if it isn't reliable or the software you want to run isn't compatible, it'll be a detriment. So you might as well wait for reviews to flesh out those details unless you are deadset on being an early adopter. The specs say very little about the whole of the experience.
I actually hate the idea of having a high-end GPU for personal use these days. It imposes a larger power and cooling requirement, which just adds more problems. I am looking to APUs for my next buy - the AMD 4000G series looks to be bringing in graphics performance somewhere between GT 1030 and GTX 1050 equivalent which is fine for me, since I mostly bottleneck on CPU load in the games I play now(hello Planetside 2's 96 vs 96 battles, still too intense for my 2017 gaming laptop) and these APUs now come in 6 and 8 core versions with the competitive single-thread performance of more recent Zen chips. I already found recordings of the 2000G chips running this game at playable framerates, so two generations forward I can count on being a straight up improvement. The only problem is availability - OEMs are getting these chips first.
Are you suggesting you would rather read reviews of Nvidia products that are written by Nvidia, and you would trust them more than 3rd party reviews?
> I don't understand Nvidia's marketing at all.
Do read @TomVDBs comments; this isn't Nvidia, this is the industry-wide marketing terminology.
Cores are important to developers, so what you're talking about is that some of the marketing is (unsurprisingly) not targeted for you. If you care most about Blender and games, you should definitely seek out the benchmarks for Blender and the games you play. Even if you understood exactly what cores are, that wouldn't change anything here, you would still want to focus on the apps you use and not on the specs, right?
> I have the feeling they're not; if you're using all Y tensor cores, then there aren't Z unused CUDA cores sitting around, right?
FWIW, that's a complicated question. There's more going on than just whether these cores are separate things. The short answer is that they are, but there are multiple subsystems that both types of cores have to share, memory being one of the more critical examples. The better answer here is to compare the perf of the applications you care about, using Nvidia cards to using AMD cards, picking the same price point for each. That's how to decide which to buy, not worrying about the internal engineering.
That's referring to memory and not cores; is that a realistic example? I'm not very aware of Nvidia marketing that does what you said specifically - the example feels maybe a little exaggerated? I will totally grant that there is marketing speak, and understanding the marketing speak for all tech hardware can be pretty frustrating at times.
> if they're like "3% more performance on the 3090 vs. the RTX Titan" then I can just ignore it and not even bother reading the reviews.
Nvidia does publish some perf ratios, benchmarks, and peak perf numbers with each GPU, including for specific applications like Blender. Your comment makes it sound like you haven't seen any of those?
Anyway, I think that would be a bad idea to ignore the reviews and benchmarks of Blender and your favorite games, even if you saw the headline you want. There is no single perf improvement number. There never has been, but it's even more true now with the distinction between ray tracing cores and CUDA cores. It's very likely that your Blender perf ratio will be different than your Battlefield perf ratio.
It has a list of applications, each GPU and their relative performance. Y=0 is even on the graph!
Apple is a good example of a company that just doesn't really talk too much about the low-level details of their chips. People buy their products anyway.