Hacker News new | past | comments | ask | show | jobs | submit login

Can confirm, I can run PyTorch with ROCm just fine on 6900xt and 7900xt on Debian. The 7900xt does require the nightly build of PyTorch (for ROCm >=5.5 support, I run 5.6) in order to automatically get the gfx1100_42.ukdb miopen kernel. I must specify which gpu and which miopen kernel when starting python. My device 0 is the 7900xt and device 1 is 6900xt, so for each I run the corresponding commands:

  $ CUDA_VISIBLE_DEVICES=0 HSA_OVERRIDE_GFX_VERSION=11.0.0 python3

  $ CUDA_VISIBLE_DEVICES=1 HSA_OVERRIDE_GFX_VERSION=10.3.0 python3
GFX 10.3.0 for gfx1031 and GFX 11.0.0 for gfx1100, but be aware that the kernel is tied to the series, so even thought the 6700 is technically gfx1031, it uses the gfx1030 kernel, same thing if you use a newer rx 7000 series.



I would really want to buy nvidia over overpriced RTX 4090. Is it possible for you to run some standard benchmark like this[1]?

[1]: https://huggingface.co/docs/transformers/benchmarks


Seems that specific benchmark is deprecated. For me, I do get around 15 iterations/second on my 7900xt when running "stabilityai/stable-diffusion-2-1-base" for 512x512. I get around 10 iterations/second when running "stabilityai/sd-x2-latent-upscaler" to upscale a 512x512 to 1024x1024.

Here is a link to a tom's HARDWARE stable diffusion benchmark from January to get a rough idea on where various cards fit in for that use case:

https://www.tomshardware.com/news/stable-diffusion-gpu-bench...

In the article they show a performance comparison chart here:

https://cdn.mos.cms.futurecdn.net/iURJZGwQMZnVBqnocbkqPa-970...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: