The author of the library has done numerous YouTube rants on how bad he thinks t...

roenxi · on May 29, 2024

Has he? The ones I'm aware of he was complaining about the low quality of the generic kernel drivers. AMD software had a tendency to crash when doing anything outside of standard video games (which was my experience too, but I've caved and bought Nvidia since then; average driver quality of Nvidia on linux seems to be much lower but the kernel doesn't go down which is nice. Got a lot of OOM errors where on ROCm the kernel froze requiring a full system restart).

But this is interesting and probably strong evidence that the CUDA API isn't the moat people thought it was. CUDA multiplies matricies and that is close to a commodity operation. The moat actually seems to be Nvidia's higher generic software engineering standards, the difficulty in writing job scheduling/memory management infrastructure and possibly the fact that closed firmware is the norm.

casperb · on May 29, 2024

Yes he has. I have seen multiple episodes on his YouTube[1] where he absolutely grills the whole company. He also gave them a deadline to opensource the drivers or he would stop trying to make AMD stuff work.

Sorry for no direct link, but he has so many and very long videos that it is hard to find the exact spot.

https://www.youtube.com/@geohotarchive

alexbaden · on May 29, 2024

Arguably the nvidia AI moat is PyTorch and the heavily optimized libraries behind it. The CUDA language and toolchain helped get that effort off the ground, no doubt, but PyTorch is written and optimized for CUDA first. All other backends work best with similar semantics to CUDA and have to match Cuda semantics to keep their users happy.

JMiao · on May 28, 2024

What does he get wrong?

KaoruAoiShiho · on May 29, 2024

He's not wrong.