I really do not understand why can't they just work with existing OSS developers pulling their hair out trying to make AMD devices work and instead do it this way. It's like Mozilla with the questionable decisions.
There are a lot of OSS developers, I doubt AMD has the resources to do that. And realistically they don't need to, I wandered over to watch some George Hotz videos the other day and it looked like the AMD driver situation has improved to the point where specialist AMD access isn't needed to debug any more. Which is a huge change and very exciting for me personally because it means I might be able to jump back to an AMD card and ditch the mess that is Nvidia on Linux.
In theory they might not even need to be involved in optimising compute kernels, there is probably some PhD student who'll do the work because they want to be a kernel-optimising specialist. In practice a few strategic applications of paid talent is all they really need to do. Everyone wants to diversify off Nvidia so there is a lot of interest in supporting AMD if they are willing to push out firmware that multiplies matrices without crashing. Which has been a weird sticking point for AMD for a surprising amount of time.
> Back in the day you had to optimize your card for Quake...
That is exactly the attitude that got AMD out in the cold away from the AI revolution; they learned a lot of stupid lessons about optimising to specific games and present-day use cases instead of trying to implement general capabilities to a higher standard like Nvidia did in CUDA. They ended up a decade away from a multi-trillion dollar market
PyTorch might be special. I wouldn't be at all surprised if AMD does have a dedicated engineer working on PyTorch. But their problem to date hasn't been that their engagement with PyTorch, but rather that literally nobody could make PyTorch work on AMD cards which had buggy and terrible support for GPGPU work. If they fixed that some random might do the work without their involvement because a lot of people want to see that happen.
Now that the required task is known though, it doesn't really matter. If AMD understand that, they should have no problem putting engineers on making Pytorch work well.
Considering its importance, it shouldn't be one engineer. It should be 50+.
I think they are taken over by exactly the same people leading the AI-hype. Funny how in this article they are a) not advertising clearly what they are doing, b) solving a small subset of problems in a way noone asked for (I think most people just want ROCm to work at all...) and c) just adding to a complex product without any consideration of actually integrating with its environment.
solving a small subset of problems in a way noone asked for
What do you mean? Having ROCm fused MoE and MLA kernels as a counterpart to kernels for CUDA is very useful. AMD needs to provide this if they want to keep AMD accelerators competitive with new models.
should the matrix-multiplication at the core of this not be in a core library? Why are generic layers intermixed with LLM-specific kernels when the generic layers are duplicating functionality in torch?
Upstreaming that might actually help researchers doing new stuff vs. the narrow demographic of people speeding LLMs on MI300X's.
> I think most people just want ROCm to work at all
I think most people don't want to have to think about vendor lock-in related bullshit. Most people just want their model to run on whatever hardware they happen to have available, don't want to have to worry about whether or not future hardware purchases will be compatible, and don't want to have to rewrite everything in a different framework.
Most people fundamentally don't care about ROCm or CUDA or OneAPI or whatever else beyond a means to an end.