Hacker News new | past | comments | ask | show | jobs | submit login

Unfortunately that support is via ROCm, which doesn't support the last three generations (!) of AMD hardware: https://github.com/ROCm/ROCm.github.io/blob/master/hardware....



ROCm supports Vega, Vega 7nm, and CDNA just fine.

The issue is that AMD has split their compute into two categories:

* RDNA -- consumer cards. A new ISA with new compilers / everything. I don't think its reasonable to expect AMD's compilers to work on RDNA, when such large changes have been made to the architecture. (32-wide instead of 64-wide. 1024 registers. Etc. etc.)

* CDNA -- based off of Vega's ISA. Despite being "legacy ISA", its pretty modern in terms of capabilities. MI100 is competitive against the A100. CDNA is likely going to run Frontier and El Capitan supercomputers.

------------

ROCm focused on CDNA. They've had compilers emit RDNA code, but its not "official" and still buggy. But if you went for CDNA, that HIP / ROCm stuff works enough for the Oak Ridge National Labs.

Yeah, CDNA is expensive ($5k for MI50 / Radeon VII, and $9k for MI100). But that's the price of full-speed scientific-oriented double-precision floating point GPUs these days.


> ROCm supports Vega, Vega 7nm, and CDNA just fine.

yeah, but that's exactly what OP said - Vega is three generations old at this point, and that is the last consumer GPU (apart from VII which is a rebranded compute card) that ROCm supports.

On the NVIDIA side, you can run at least basic tensorflow/pytorch/etc on a consumer GPU, and that option is not available on the AMD side, you have to spend $5k to get a GPU that their software actually supports.

Not only that but on the AMD side it's a completely standalone compute card - none of the supported compute cards do graphics anymore. Whereas if you buy a 3090 at least you can game on it too.


I really don't think people appreciate the fact enough that for developers to care to learn about building software for your platform, you need to make it accessible for them to run that software. That means "run on the hardware they will already have". AMD really need to push to get ROCm compiling for RDNA based chips.


yeah, it's hard to overstate the importance of getting your hardware into people's hands to work with. A $5k buy-in before you can even play with programming on a device and figure out if your program is going to work on GPGPU at all is a big deal! CUDA being accessible on a $100 gaming GPU has enabled a lot of development, and NVIDIA also has made big efforts to get their hardware out to universities, and that has paid big dividends as well.

Also, just flat-out using your own developers to write the basic code for libraries and frameworks and so on. There are tons of basic libraries like CuRAND and CUB (Cuda UnBound) and I'm sure tons of others that NVIDIA wrote not because a customer wanted that, but because it was building blocks that others could leverage into products that would sell NVIDIA hardware. In contrast AMD has fallen behind in that area, and has further fragmented their development work across a bunch of "flavor of the week" frameworks and implementations that usually get discarded within a couple years (remember HIP? remember Bolt? etc)


There's unofficial support in the rocm-4.3.0 math-libs for gfx1030 (6800 / 6800 XT / 6900 XT). rocBLAS also includes gfx1010, gfx1011 and gfx1012 (5000 series). If you encounter any bugs in the {roc,hip}{BLAS,SPARSE,SOLVER,FFT} stack with those cards, file GitHub issues on the corresponding project.

I have not seen any problems with those cards in BLAS or SOLVER, though they don't get tested as much as the officially supported cards.

FWIW, I finally managed to buy an RX 6800 XT for my personal rig. I'll be following up on any issues found in the dense linear algebra stack on that card.

I work for AMD on ROCm, but all opinions are my own.


I've mentioned this on other forums, but it would help to have some kind of easily visible, public tracker for this progress. Even a text file, set of GitHub issues or project board would do.

Why? Because as-is, most people still believe support for gfx1000 cards is non-existent in any ROCm library. Of course that's not the case as you've pointed out here, but without any good sign of forward progress, your average user is going to assume close to zero support. Vague comments like https://github.com/RadeonOpenCompute/ROCm/issues/1542 are better than nothing, but don't inspire that much confidence without some more detail.


You don't think it's reasonable to expect machine learning to work on new cards?

That's exactly the point. ML on AMD is a third-class citizen.


AMD's MI100 has those 4x4 BFloat16 and FP16 matrix multiplication instructions you want, with PyTorch and TensorFlow compiling down into them through ROCm.

Now don't get me wrong: $9000 is a lot for a development system to try out the software. NVidia's advantage is that you can test out the A100 by writing software for cheaper GeForce cards at first.

NVidia also makes it easy with the DGX computer to quickly get a big A100-based computer. AMD you gotta shop around with Dell vs Supermicro (etc. etc.) to find someone to build you that computer.


That makes a lot more sense, thanks. They could do with making that a lot clearer on the project.

Still handicaps them compared to Nvidia where you can just buy anything recent and expect it to work. Suspect it also means they get virtually no open source contributions from the community because nobody can run or test it on personal hardware.


NVidia can support anything because they have a PTX-translation layer between cards, and invest heavily on PTX.

Each assembly language from each generation of cards changes. PTX recompiles the "pseudo-assembly" instructions into the new assembly code each generation.

---------

AMD has no such technology. When AMD's assembly language changes (ex: from Vega into RDNA), its a big compiler change. AMD managed to keep the ISA mostly compatible from 7xxx GCN 1.0 series in the late 00s all the way to Vega 7nm in the late 10s... but RDNA's ISA change was pretty massive.

I think its only natural that RDNA was going to have compiler issues.

---------

AMD focused on Vulkan / DirectX support for its RDNA cards, while its compute team focused on continuing "CDNA" (which won large supercomputer contracts). So that's just how the business ended up.


I bought an ATI card for deep learning. I'm a big fan of open source. Less than 12 months later, ROCm dropped support. I bought an NVidia, and I'm not looking back.

This makes absolutely no sense to me, and I have a Ph.D:

"* RDNA -- consumer cards. A new ISA with new compilers / everything. I don't think its reasonable to expect AMD's compilers to work on RDNA, when such large changes have been made to the architecture. (32-wide instead of 64-wide. 1024 registers. Etc. etc.) * CDNA -- based off of Vega's ISA. Despite being "legacy ISA", its pretty modern in terms of capabilities. MI100 is competitive against the A100. CDNA is likely going to run Frontier and El Capitan supercomputers. ROCm focused on CDNA. They've had compilers emit RDNA code, but its not "official" and still buggy. But if you went for CDNA, that HIP / ROCm stuff works enough for the Oak Ridge National Labs. Yeah, CDNA is expensive ($5k for MI50 / Radeon VII, and $9k for MI100). But that's the price of full-speed scientific-oriented double-precision floating point GPUs these days.

I neither know nor care what RDNA, CDNA, A100, MI50, Radeon VII, MI100, or all the other AMD acronyms are. Yes, I could figure it out, but I want plug-and-play, stability, and backwards-compatibility. I ran into a whole different minefield with AMD. I'd need to run old ROCm, downgrade my kernel, and use a different card to drive monitors than for ROCm. It was a mess.

NVidia gave me plug-and-play. I bought a random NVidia card with the highest "compute level," and was confident everything would work. It does. I'm happy.

Intel has historically had great open source drivers, and if it give better plug-and-play and open source, I'll buy Intel next time. I'm skeptical, though. The past few year, Intel has a hard time tying their own shoelaces. I can't imagine this will be different.


> Yes, I could figure it out, but I want plug-and-play, stability, and backwards-compatibility

Its right there in the ROCm introduction.

https://github.com/RadeonOpenCompute/ROCm#Hardware-and-Softw...

> ROCm officially supports AMD GPUs that use following chips:

> GFX9 GPUs

> "Vega 10" chips, such as on the AMD Radeon RX Vega 64 and Radeon Instinct MI25

> "Vega 7nm" chips, such as on the Radeon Instinct MI50, Radeon Instinct MI60 or AMD Radeon VII, Radeon Pro VII

> CDNA GPUs

> MI100 chips such as on the AMD Instinct™ MI100

--------

The documentation of ROCm is pretty clear that it works on a limited range of hardware, with "unofficial" support at best on other sets of hardware.


Only...

(1) There are a million different ROCm pages and introductions

(2) Even that page is out-of-date, and e.g. claims unofficial support for "GFX8 GPUs: Polaris 11 chips, such as on the AMD Radeon RX 570 and Radeon Pro WX 4100," although those were randomly disabled after ROCm 3.5.1.

... if you have a Ph.D in AMD productology, you might be able to figure it out. If it's merely in computer science, math, or engineering, you're SOL.

There are now unofficial guides to downgrading to 3.5.1, only 3.5.1 doesn't work with many modern frameworks, and you land in a version incompatibility mess.

These aren't old cards either.

Half-decent engineer time is worth $350/hour, all in (benefits, overhead, etc.). Once you've spent a week futzing with AMD's mess, you're behind by the cost of ten NVidia A4000 cards which Just Work.

As a footnote, I suspect in the long term, small purchases will be worth more than the supercomputing megacontracts. GPGPU is wildly underutilized right now. That's mostly a gap of software, standards, and support. If we can get that right, every computer have many teraflops of computing power, even for stupid video chat filters and whatnot.


> Half-decent engineer time is worth $350/hour, all in (benefits, overhead, etc.). Once you've spent a week futzing with AMD's mess, you're behind by the cost of ten NVidia A4000 cards which Just Work.

It seems pretty simple to me if we're talking about compute. The MI-cards are AMD's line of compute GPUs. Buy an MI-card if you want to use ROCm with full support. That's MI25, MI50, or MI100.

> As a footnote, I suspect in the long term, small purchases will be worth more than the supercomputing megacontracts. GPGPU is wildly underutilized right now. That's mostly a gap of software, standards, and support. If we can get that right, every computer have many teraflops of computing power, even for stupid video chat filters and whatnot.

I think you're right, but the #1 use of these devices is running video games (aka: DirectX and Vulkan). Compute capabilities are quite secondary at the moment.


> It seems pretty simple to me if we're talking about compute. The MI-cards are AMD's line of compute GPUs. Buy an MI-card if you want to use ROCm with full support. That's MI25, MI50, or MI100.

For tasks which require that much GPU, I'm using cloud machines. My dev desktop would like a working GPU 24/7, but it doesn't need to be nearly that big.

If I had my druthers, I would have bought an NVidia 3050, since it has adequate compute, and will run once available <$300. Of course, anything from the NVidia consumer line is impossible to buy right now, except at scalper prices.

I just did a web search. The only card from that series I can find for sale, new, was the MI100, which runs $13k. The MI50 doesn't exist, and the MI25 can only be bought used on eBay. Corporate won't do eBay. Even the MI100 would require an exception, since it's an unauthorized vendor (Amazon doesn't have it).

Combine that with poor software support, and an unknown EOL, and it's a pretty bad deal.

> I think you're right, but the #1 use of these devices is running video games (aka: DirectX and Vulkan). Compute capabilities are quite secondary at the moment.

Companies should maximize shareholder value. Right now:

- NVidia is building an insurmountable moat. I already bought an NVidia card, and our software already has CUDA dependencies. I started with ROCm. I dropped it. I'm building developer tools, and if they pick up, they'll carry a lot of people.

- It will be years before I'm interested in trying ROCm again. I was oversold, and AMD underdelivered.

- Broad adoption is limited by lack of standards and mature software.

It's fine to day compute capabilities are secondary right now, but I think that will limit AMD in the long term. And I think lack of standards is to NVidia's advantage right now, but it will hinder long-term adoption.

If I were NVidia, I'd make:

- A reference CUDA open-source implementation which makes CUDA coda 100% compatible with Xe and Radeon

- License it under GPL with a CLA, so any Intel and AMD enhancements are open and flow back

- Have nominal optimizations in the open-source reference implementation, while keeping the high-performance proprietary optimizations NVidia proprietary (and only for NVidia GPUs)

This would encourage broad adoption of GPGPU, since any code I wrote would work on any customer machine, Intel, AMD, or NVidia. On the other hand, it would create an unlevel playing field for NVidia, since as the copyright holder, only NVidia could have proprietary optimizations. HPC would go to NVidia, as would markets like video editing or CAD.


Hopefully CDNA2 will be similar enough to RDNA2/3 that the same software stack will work with both.


I assume the opposite is going on.

Hopefully the RDNA3 software stack is good enough that AMD decides that CDNA2 (or CDNA-3) can be based off of the RDNA-instruction set.

AMD doesn't want to piss off its $100 million+ customers with a crappy software stack.

---------

BTW: AMD is reporting that parts of ROCm 4.3 are working with the 6900 XT GPU (suggesting that RDNA code generation is beginning to work). I know that ROCm 4.0+ has made a lot of github checkins that suggest that AMD is now actively working on the RDNA-code generation. Its not officially written into the ROCm documentation yet, its mostly the discussions with ROCm github issues that are noting these changes.

Its not official support and its literally years late. But its clear what AMD's current strategy is.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: