Hacker News new | past | comments | ask | show | jobs | submit login

You can get it to work now on Linux.

https://github.com/RadeonOpenCompute/ROCm/issues/1880




Can confirm, I can run PyTorch with ROCm just fine on 6900xt and 7900xt on Debian. The 7900xt does require the nightly build of PyTorch (for ROCm >=5.5 support, I run 5.6) in order to automatically get the gfx1100_42.ukdb miopen kernel. I must specify which gpu and which miopen kernel when starting python. My device 0 is the 7900xt and device 1 is 6900xt, so for each I run the corresponding commands:

  $ CUDA_VISIBLE_DEVICES=0 HSA_OVERRIDE_GFX_VERSION=11.0.0 python3

  $ CUDA_VISIBLE_DEVICES=1 HSA_OVERRIDE_GFX_VERSION=10.3.0 python3
GFX 10.3.0 for gfx1031 and GFX 11.0.0 for gfx1100, but be aware that the kernel is tied to the series, so even thought the 6700 is technically gfx1031, it uses the gfx1030 kernel, same thing if you use a newer rx 7000 series.


I would really want to buy nvidia over overpriced RTX 4090. Is it possible for you to run some standard benchmark like this[1]?

[1]: https://huggingface.co/docs/transformers/benchmarks


Seems that specific benchmark is deprecated. For me, I do get around 15 iterations/second on my 7900xt when running "stabilityai/stable-diffusion-2-1-base" for 512x512. I get around 10 iterations/second when running "stabilityai/sd-x2-latent-upscaler" to upscale a 512x512 to 1024x1024.

Here is a link to a tom's HARDWARE stable diffusion benchmark from January to get a rough idea on where various cards fit in for that use case:

https://www.tomshardware.com/news/stable-diffusion-gpu-bench...

In the article they show a performance comparison chart here:

https://cdn.mos.cms.futurecdn.net/iURJZGwQMZnVBqnocbkqPa-970...


Sure, some people say this is possible, but in my experience this is practically impossible.

Maybe that was a failure on my part; but I never was able to get it to work reliably. For a certain combination of versions of tools + some hacks, you can make it work. Then it breaks with the first update to any of those tools.


Nice -- does anyone one know if this applies to the Steam Deck? Maybe not a game changer, but certainly a neat thing for us to play around with. (I also have a 4gb RX 570, but I'm fairly certain the Steam Deck is better, yes?)


I would expect ROCm to work just fine with with Steam Deck. Given that the Steam Deck apparently uses gfx1033. So you probably need to specify the environment variable HSA_OVERRIDE_GFX_VERSION=10.3.0 and corresponding gfx1030_20.ukdb miopen kernel. I do not own a Steam Deck, but I do have another RDNA2 card, an RX 6700XT, which uses the similar gfx1031 ISA, which works just fine. While I don't have a RX 570, I do have a RX 480 which is also gfx803 like your RX 570 which should technically work, however don't expect much in performance or capability, as most work loads expect more than your 4GB of vram and much more compute power. You will also would need to use older versions of ROCm as well as these older cards are deprecated; I think I had to use ROCm <=4.3.1 if I remember correctly. What did you want to run specifically?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: