I would sincerely hope for a competitive AMD GPU for deep learning. But as long as it's a week-long journey with unknown ending to try to recompile TensorFlow to support ROCm, everyone I know in AI will firmly stick with NVIDIA and their production-proven drivers and CUDA APIs.
I wish AMD would offer something like NVIDIA's Inception program to gift some accelerators and GPUs to suitable C++ coders (like me) so that there's at least a few tutorials on the internet on how other people managed to successfully use AMD + ROCm for deep learning.
Sounds like AMD might still be using the 'tesla roadster' strategy, of selling fewer, more lucrative contracts for the time being. Probably not that they don't care, just that for now, they have to focus.
That sounds reasonable on the surface, but cards don't cost a lot, and sending them out for free to leverage community interest for the open source stack could be extremely cost effective.
If they generate demand they can't keep up with (without messing up their strategy), it's entirely possible that negative perception would hurt then more than the 'don't care' perception they have now.
Even better, you can provide support for gaming cards and then the community pays you for them instead of you having to pay for it out of dev-rel budget.
And yet AMD is actively moving in the opposite direction and dropping support for older hardware from both their Windows driver, as well as from ROCm and their other compute software.
It's hard to overstate what a positive impact gaming card support and donations of hardware to educational institutions and other devrel have been for CUDA - those are the people who are writing the next 10 years of your software, that sells the actual enterprise hardware. And at this point it's not just "can't afford to support it", AMD is doing fine these days, they just don't want to. Kind of baffling.
GPGPU have been a thing for two decades, they should have had this solved years ago. Lets hope SYCL gets wide support, Intel seems to bet on it and there's a Cuda backend available. If AMD wants to make themselves irrelevant for anything but gaming and HPC that's their choice.
Years ago AMD was almost bankrupt and put everything into one last attempt at a CPU. Now AMD is hiring like crazy, and I hope they are able to catch up to their larger competitors in tooling and software.
Here's what I think they'll do for the near future. They can't build an all-encompassing library ecosystem that supports all the hardware, all the existing software, has really good tooling, etc. in a reasonable amount of time. In consumer they can mostly get away with just supporting Microsoft's APIs, which they already do pretty well. If they design a really good compute GPU, they probably won't be able to make enough of them. They can sell all of their production with a few big deals to important customers, and they can attach engineers to those deals to help support the specific needs of those customers. That is much more practical than trying to catch up to CUDA in its entirety. Basically, status quo.
As someone who isn't a big customer and who is mad at Nvidia I don't hope for this to happen, but it seems the most likely path.
I have friends here in the Austin area who have worked for AMD. From what I’ve heard from them, it’s not that AMD doesn’t care, it’s that AMD is clueless and hopelessly disorganized, and they’re constantly doing whatever they can to chase the latest supercomputing contract, to the exclusion of all else.
It’s a target-rich environment if you want to learn all the bad anti-patterns, so that you can avoid doing them in the rest of your career.
So, it’s not that they don’t care. It’s that they don’t have enough hours in the every-day-is-a-hair-on-fire-day that they would be capable of caring.
Yeah frankly it's a little misleading to frame this as "AMD won this", this is a gimmie contract to keep them in the game. The DoE is throwing gimmie contracts to Intel too for Xe and they haven't even produced a working product yet.
Their CPUs pretty much fall into the same boat too - is it justifiable to buy Intel CPUs right now for HPC applications, especially with AMD supporting AVX-512 with their Zen4 chips (which are the counterparts to the Sapphire Rapids DoE is buying)? Not really but their interests are in keeping Intel in the game, an AMD monoculture doesn't benefit anyone anymore than an Intel monoculture did.
Although of course this is not fab-related, it's the same basic strategy - the US wants as diverse and thriving a tech ecosystem as they can get in the west, and particularly in the US, to counterbalance a rising China. Not that China is anywhere close today, but in the 20-year timeframe it's a major concern.
It's clearly a real problem that AMD's ML Software stack isn't quite there, and lacks in support for the non-specialized cards, but that's not really an issue for these HPC use cases....
Apparently they are going to use “HIP” to convert CUDA applications to be able to run on AMD:
> The OLCF plans to make HIP available on Summit so that users can begin using it prior to its availability on Frontier. HIP is a C++ runtime API that allows developers to write portable code to run on AMD and NVIDIA GPUs. It is essentially a wrapper that uses the underlying CUDA or ROCm platform that is installed on a system. The API is very similar to CUDA so transitioning existing codes from CUDA to HIP should be fairly straightforward in most cases. In addition, HIP provides porting tools which can be used to help port CUDA codes to the HIP layer, with no loss of performance as compared to the original CUDA application. HIP is not intended to be a drop-in replacement for CUDA, and developers should expect to do some manual coding and performance tuning work to complete the port.
Would this one day lead to a HIP based PyTorch? I hope so!
you can think of HIP as being a fancy batch script to convert CUDA to OpenCL and it works about as well.
If you're going to have to re-debug it anyway, what's the point? AMD's focus certainly should have been on something like GPU Ocelot or compiling PTX to GCN/RDNA instead.
IMO the deep learning folk need to be working more actively towards the future. The CUDA free ride is amazing, but AMD's HIP already does a good job being CUDA compliant in a general sense. But CUDA also sort of encompasses the massive collection of libraries that Nvidia has written to accelerate a huge amount of use cases. Trying to keep pace with that free-ride is hard.
My hope is eventually we start to invest in Vulkan Compute. Vulkan is way way way harder than CUDA, but it's the only right way I can see to do things. Getting TensorFlow & other libraries ported to run atop Vulkan is a herculean feat, but once there's a start, I tend to believe most ML practitioners won't have to think about the particulars, and I think the deep engineering talent will be able to come, optimize, improve the Vulkan engines very quickly, rapidly be able to improve whatever it is we start with.
It's a huge task, but it just seems like it's got to happen. I don't see what alternative there is, long term, to starting to get good with Vulkan.
I don't want to presume, but it sounds like you haven't actually tried using ROCm "what should be".
My experience with it was an absolute nightmare, I've never gotten ROCm working before. Just as well, since it turns out my systems never would have supported it for various reasons (lacking PCIe atomics for one), but I never actually got so far as to run into the driver problem, I never got the whole custom LLVM fork/ROCm software stack to work.
Caveat, I'm not professionally involved in deep learning or HPC, and as others have mentioned, the framework was only intended for a few specific cards running on very specific hardware for HPC cases.
But pretending like this is even a fraction as useful for the "average" person trying to experiment or even work at a low-medium level in machine learning feels off to me.
I don't think people will be swayed by platitudes about creating a competitive open-systems ecosystem to use plainly inferior software. Companies aren't going to spend oodles of money (and individuals won't volunteer tons of time) to suffer porting frameworks to target bare-bones APIs for the sake of being good sports.
Until either nvidia screws over everyone so much that using AMD cards becomes the path of least resistance, or AMD/Intel offers products at significantly lower prices than nvidia, I don't see the status quo changing much.
> Vulkan is an API (Application Programming Interface) for graphics and compute hardware.
Vulkan has compute shaders[1], which are generally usable. Libraries like VkFFT[2] demonstrate basic signal processing in Vulkan. There are plenty of other non-graphical Compute Shader examples, and this is part of the design of Vulkan (and also WebGPU). Further, there is a Vulkan ML TSG (Technical Subgroup)[3], which is supposed to be working on ML. Even Nvidia is participating, with extensions like VK_NV_cooperative_matrix, which specifically target the ml tensor cores. A more complex & current example, we see works like Google's IREE, which allow inference/Tensorflow Lite execution on a variety of drivers, including Vulkan[4], which has broad portability across hardware & fairly decent performance, even on mobile chips.
There's people could probably say this better/more specifically, but I'll give it a try: Vulkan is, above all, an general standard for modelling, dispatching & orchestrating work usually on a GPU. Right now that usage is predominately graphics, but that is far from a limit. The ideas of representing GPU resources, dispatching/queueing work are generic, apply fairly reasonably to all GPU systems, and can model any workload done on a GPU.
> Khronos' compute standard that's most similar to Cuda is SYCL.
SYCL is, imo, the opposite of where we need to go. It's the old historical legacy that CUDA has, of writing really dumb ignorant code & hoping the tools can make it run well on a GPU. Vulkan, on the other hand, asks us to consider deeply what near-to-the-metal resources we are going to need, and demands that we define, dispatch, & complete the actual processing engines on the GPU that will do the work. It's a much much much harder task, but it invites in fantastic levels of close optimization & tuning, allows for far more advanced pipelining & possibilities. If the future is good, it should abandon lo-fi easy options like SYCL and CUDA, and bother to get good at Vulkan, which will allow us to work intimately with the GPU. This is a relationship worth forging, and no substitutes will cut it.
History has shown that it's not necessarily the most performant option that wins, it's usually the most convenient. Otherwise we wouldn't use Python for data science and ML, or Javascript for IDEs. Remeber AMD's Close To Metal? It wasn't a big hit. Take a look at the Blender announcement that's on the front page right now, Blender is removing OpenCL code and replacing it with HIP/Cuda to get a single codebase [0].
There's room for both solutions, but I think it's important to have a relatively easy way to use accelerators like GPUs in a cross platform way, without being an expert or having to rewrite code for new architectures.
It is my understanding that because both Vulkan and SYCL use SPIR-V, the work done on drivers and compilers for one of them benefits the other as well.
AMD's close-to-the-metal was the blueprint for Vulkan, or at leat the majorest contributor (ed: oops, im thinking of AMD Mantle, a little bit latter & also extremely super low level). It won. It's ideas power almost all GPU technology today.
Vulkans not relatively easy to do from scratch, but it is cross platform, and again, most folks dont need to write Vulkan. They're using ML frameworks that abstract that away.
Vulkan will enable use of countless great extensions & capabilities that tools like SYCL wont be smart enough to use. Maybe the SPIR-V can be optimized by the drivers well, but the code SYCL spits out, i'd wager, will be world's less than what we do if we try. This is what CUDA does: it allows bad/cheap SYCL like code, but most folks use NV's vast ultra-complete libraries that are far far far better written in much lower level code & tweaked for every last oz of performance. CUDA is basically a scripting language + close to the metal library. SYCL might eventually be able to become similar, but only if we do the hard work of making really good Vulkan libraries. Otherwise it'll be pretty much trash.
There's a lot of argumemts in favor of being bad, of not going for thr good stuff. But end of the day, we just should do the right thing. Everyone other than NV should make a real start, should make a bid for their & the survival of everyone else. Vulkan is the only real bid i see.
Consoles use their own APIs, even the Switch, has an alternative to Vulkan.
Apple has Metal and Windows DirectX.
The only places where Vulkan has "won" are Linux, irrelevant given the 1%, and Android, where most games are still GL ES if one cares about reaching everyone.
There is a competitive port of tensorflow for AMD GPU. Just the only issue is that it is from Apple and works only on Mac. Tensorflow fully supports AMD GPU in Mac and it could max out the graphics card usage as well. Makes me wonder if Apple could do it, why can't AMD work to use Vulkan instead of Metal.
> I wish AMD would offer something like NVIDIA's Inception program to gift some accelerators and GPUs to suitable C++ coders (like me) so that there's at least a few tutorials on the internet on how other people managed to successfully use AMD + ROCm for deep learning.
Why not go a step further and pay some people to integrate good first-party support in the most popular libraries? It would probably be quite cheap in comparison and kickstart the adaption.
It will be expensive no matter who does it. Probably more for a contractor, given that they’d have less experience and internal knowledge of the hardware.
And they've been delaying this for months. Last time on April they've said on Github that 5700XT support should be roughly available in 2~4 months, and its already November.
Today a Blender beta version with HIP support has been released. This is working on RDNA hardware (RDNA2 officially supported, RDNA1 enabled but not supported).
I guess a release date for ROCm is approaching after all.
I wish AMD would offer something like NVIDIA's Inception program to gift some accelerators and GPUs to suitable C++ coders (like me) so that there's at least a few tutorials on the internet on how other people managed to successfully use AMD + ROCm for deep learning.
EDIT: And it seems ROCm doesn't even support any of those new RDNA2 accelerators or gaming GPUs: https://github.com/RadeonOpenCompute/ROCm/issues/1344
So this is great hardware, but absolutely useless unless you are big enough to write your own GPU drivers from scratch ~_~