Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I currently have a project with ORNL OLCF (on Frontier). The short answer is yes. Happy to answer any questions I can.


ROCm or HIP? Does it start out with porting a lot from CUDA etc. or starting fresh on top of the AMD APIs?

How much of the project time is spent on that compute API stuff in comparison to "payload" work?


>ROCm or HIP?

I'm not sure that's the right question to ask. Afaik ROCm is the name of that entire tech stack and HIP is AMD's equivalent to CUDA C++ (they basically replicated the API and replaced every "CUDA" by "hip", they have functions called "hipmalloc" and "hipmemcpy").

The repository is located at https://github.com/ROCm/HIP.


My project is ROCm (torch, more or less) and working with OLCF staff I've never heard of HIP in use but based on their training series it is supported[0].

Of course my personal experience isn't exhaustive and it can be inferred from the ongoing training series that it is in use in some cases.

Speaking from personal experience ROCm itself is... Challenging (which I already knew from prior endeavors). We've taken to dev and staging workloads on more typical MI2xx hardware and then working it over to Frontier.

We currently have 20k node hours on Frontier via a Director's Discretion Project[1]. It's a relatively simple application and at the end of the day you have access to significant compute so depending on workload the extra effort for ROCm, etc is still worth it.

[0] - https://www.olcf.ornl.gov/hip-training-series/

[1] - https://www.olcf.ornl.gov/for-users/documents-forms/olcf-dir...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: