Hacker News new | past | comments | ask | show | jobs | submit login

Honestly I don't get how the current situation could develop.

Video games have used compute shaders forever, and those work just fine on other cards.

I remember doing some quite involved stuff for my thesis using OpenCL a decade ago, and it worked just fine. Nowadays OpenCL is dead for some reason...

I just don't get what is there in CUDA that makes it so special. As far as I remember, a GPGPU API consists of a shader language, a way to copy stuff to and from the GPU, a way of scheduling and synchronizing work, atomics, groupshared variables, and some interop stuff.

Is the vendor lock-in due to CUDA libs? In that case, the problem is not CUDA, but the libraries themselves. Not sure about today, but performance-portability basically didn't exist back then. You needed to do specialize your code for the GPU arch at hand. It didn't even exist between different generations of cards by the same vendor. Even if you could run CUDA code on AMD, it would be slow, so you need to do a rewrite anyways.




I don't know specifically about OpenCL, but the big thing about CUDA is that it's supported across operating systems and across the whole spectrum of NVidia GPUs. AMD has ROCm, but they have to be dragged kicking and screaming to support any of the consumer grade GPUs. ROCm is still pretty much Linux only, with limited support on Windows and only the high-end consumer offerings (7900XT and 7900XTX) are officially supported for ROCm. Because they want to push compute users (and AI users) into workstation grade GPUs. In comparison, CUDA is usable and supported even on mid range NVidia GPUs such as RTX 3060. And most of the software around things like AI is done by enthusiasts with gaming PCs, so they write their software for the interface that supports them.

I have an AMD card that I try to use for AI stuff and it's an uphill battle. The most popular workloads such as Stable Diffusion or Llama more or less work. But as soon as you go into other, less mainstream workloads it starts to fall apart. Projects install CUDA versions of torch by default. To make them work you have to uninstall them manually and install the ROCM versions of torch. Then it turns out some Python dependency also uses CUDA, so you also have to find a ROCM fork for that dependency. Then there's some other dependency which only has a CUDA and CPU version and you're stuck.


There probably isn't anything in CUDA that makes it special. They are well optimised math libraries and the math for most of the important stuff is somewhat trivial. AI seems to be >80% matrix multiplication - well optimised BLAS is tricky to implement, but even a bad implementation would see all the major libraries support AMD.

The vendor "lock in" is because it takes a few years for decisions to be expressed in marketable silicon and literally only Nvidia was trying to be in the market 5 years ago. I've seen a lot of AMD cards that just crashed when used for anything outside OpenGL. I had a bunch of AI related projects die back in 2019 because initialising OpenCL crashed the drivers. If you believe the official docs everything would work fine. Great card except for the fact that compute didn't work.

At the time I thought it was maybe just me. After seeing geohotz's saga trying to make tinygrad work on AMD cards and having a feel for how badly unsupported AMD hardware is by the machine learning community, it makes a lot of sense to me that it is a systemic issue and AMD didn't have any corporate sense of urgency about fixing those problems.

Maybe there is something magic in CUDA, but if there is it is probably either their memory management model or something quite technical like that. Not the API.


The magic is that CUDA actually works well. There is no reason to pick OpenCL, ROCm, Sycl or others if you get a 10x better developer experience with CUDA.


> The vendor "lock in" is because it takes a few years for decisions to be expressed in marketable silicon and literally only Nvidia was trying to be in the market 5 years ago.

It's crazy, because even 10 years ago it was already obvious that machine learning was big and is only going to become more important. AlphaGo vs Lee Sedol happened in 2016. Computer vision was making big strides.

5 years ago, large language model hadn't really arrived on the scene yet, at least as impressively as today, but I think eg Google was already using machine learning for Google Translate?


Goes to show how difficult and important execution is. There was also Hangouts vs Zoom.


Or Skype vs Zoom. Skype had a lot of mindshare, it was basically synonymous with calling people via the internet for a while.

But somehow Zoom overtook them during the pandemic.


What do you think about SYCL as a viable cross-platform GPU API?


I'd have been happy to use OpenBLAS if it worked on a GPU. Any API is good enough for me. I have yet to see anything in the machine learning world that required real complexity, the pain seems to be in figuring out black box data, models and decyphering what people actually did to get their research results.

The problem I had with my AMD card was that SYCL, like every other API, will involve making calls to AMD's kernel drivers and firmware that would crash the program or the computer (the crash was inevitable, but how it happened depended on circumstances).

The AMD drivers themselves are actually pretty good overall, if you want a desktop graphics card for linux I recommend AMD. Open source drivers have a noticeably higher average quality than the binary stuff Nvidia puts out. Rock solid most of the time. But for anything involving OpenCL, ROCm or friends I had a very rough experience. It didn't matter what, because the calls eventually end up going through the kernel and whatever the root problem is lives somewhere around there.


The biggest problem with SyCL is that AMD doesn’t want to back a horse that they don’t control (same reason they opposed streamline) so they won’t support it. When the #2 player in a 2-player market won’t play ball, you don’t have a standard.

Beyond that, AMD’s implementation is broken.

Same as Vulkan Compute - SPIR-V could be cool but it’s broken on AMD hardware, and AMD institutionally opposes hitching their horse to a wagon they didn't invent themselves.

This is why people keep saying that NVIDIA isn't acting anticompetitively. They're not, it's the Steam/Valve situation where their opponents are just intent on constantly shooting themselves in the head while NVIDIA carries merrily on along getting their work done.


Writing and porting kernels between different GPU paradigms is relatively trivial, that's not the issue (although I find the code much clunkier in everything other than CUDA). The problem is that the compiler toolchains and GPU accelerated libraries for FFT, BLAS, DNN, etc. which come bundled with CUDA are pretty terrible or non-existent for everything else, and the competitors are so far away from having a good answer to this. Intel have perhaps come closest with OneAPI but that can't target anything other than NVidia cards anyway, so it's a moot point.


I'm not very familiar with this area, so this is a useful insight. Otherwise I was really confused what's so special about CUDA.


I think a lot of it is customer service. If you work in ML you probably have some NVIDIA engineer's number in your phone. NVIDIA is really good at reaching out and making sure their products work for you. I am not a Torch maintainer but I am sure they have multiple NVIDIA engineers on speed dial that can debug complicated problems gratis.


> Nowadays OpenCL is dead for some reason...

Where does SYCL fit into this picture, is it a viable replacement for cross-platform GPU access?




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: