"needs work" is an understatement. ROCM only just came out for windows(!) and so far everyone in the LLM and SD scene says that ROCM on linux is a pain in the ass to get working, requires a lot of hackery and override environment variables, is buggy, and also only supports a small handful of cards - whereas CUDA runs on nearly everything NVIDIA makes, from laptop GPUs to datacenter compute cards. And there's very little support on the third party software side at the moment.
That "no consumer hardware support" sounds stupid, but people getting into ML/AI, grad students, etc want to be able to mess around and develop/prototype on local hardware they already own.
You cannot do that with ROCM, because they only support a very, very small number of consumer-level cards - and very oddly, it's a mix of the very high end and low end, nothing in the middle.
AMD is also massively behind hardware-wise with only the current 7xxx series cards that just came out having AI-specific hardware, whereas NVIDIA has had tensor cores for three generations of their cards.
But of course AMD can't do anything right, so they nerfed the GPU's graphics processing cores when they added the AI stuff, making the cards a worse deal from a pure gaming standpoint. The 7000 series cars are basically a tiny bump at best, and in many games worse, than the 6000 series equivalents. Their only advantage is better hardware video encoding support and somewhat better power consumption.
All of what you're saying is true and I can't argue that.
The only difference is that previously you couldn't even rent time on these super high end AIA's. They were reserved for supercomputers only. Even the MI250x sku is government/research only and even I am unable to buy that over the MI250.
So, we need to work on building that developer flywheel. My view is that if you can rent some time on one of these systems, that is a good first step. Nowhere else can you load a model into 192gb of RAM on a top end system.
I think that PyTorch is part of the puzzle and it certainly helps that it is supported by AMD [0]. That said, there is code that needs to run closer to the metal too.
That "no consumer hardware support" sounds stupid, but people getting into ML/AI, grad students, etc want to be able to mess around and develop/prototype on local hardware they already own.
You cannot do that with ROCM, because they only support a very, very small number of consumer-level cards - and very oddly, it's a mix of the very high end and low end, nothing in the middle.
AMD is also massively behind hardware-wise with only the current 7xxx series cards that just came out having AI-specific hardware, whereas NVIDIA has had tensor cores for three generations of their cards.
But of course AMD can't do anything right, so they nerfed the GPU's graphics processing cores when they added the AI stuff, making the cards a worse deal from a pure gaming standpoint. The 7000 series cars are basically a tiny bump at best, and in many games worse, than the 6000 series equivalents. Their only advantage is better hardware video encoding support and somewhat better power consumption.