Hacker News new | past | comments | ask | show | jobs | submit login
Radeon Open Compute 3.0 (github.com/radeonopencompute)
208 points by mroche on Dec 23, 2019 | hide | past | favorite | 55 comments



For anyone else using NixOS, https://github.com/nixos-rocm/nixos-rocm by Anthony Cowley works very well.

While ROCm works well, I think it is yet another example of what AMD gets wrong software-wise. I like to think I'm fairly familiar with GPGPU, in that I have used OpenCL and CUDA for years, and even wrote my own optimising compiler that generates GPU code. I even use ROCm every day on my home system. Yet, I would be hard pressed to succinctly state what ROCm is. The GitHub page states the following:

> ROCm is designed to be a universal platform for gpu-accelerated computing. This modular design allows hardware vendors to build drivers that support the ROCm framework. ROCm is also designed to integrate multiple programming languages and makes it easy to add support for other languages.

Okay? It then lists some other stuff:

> • Drivers > • Tools > • Libraries > • Source Code

The AMD GPU kernel driver, amdgpu, is upstreamed in Linux, so what exactly does ROCm contain? The OpenCL ICD? (I'm pretty sure it does.) As far as I can see, I don't have any ROCm-specific kernel drivers loaded, but OpenCL on my AMD GPU works fine.

The tools are also confusing. There's an "HCC compiler", but are we even still supposed to use HCC? As I recall, it was/is some heterogeneous compute C++ dialect. There's also HIP, which converts CUDA to something that can run on AMD GPUs. Trying to refresh my memory, I actually can't figure out in two minutes what language the output is. HCC? OpenCL?

Among the other tools, there is inconsistency between ROCM and ROCm (pedantry, I know, I assume it's the same), but there is also the "ROC Profiler" and "ROCr Debug Agent". Are "ROC" and "ROCr" significant terms? And when should I use the "Radeon Compute Profiler" (rocm-profiler) and when should I use the "ROC Profiler" (rocprofiler-dev)?

I do so dearly want to support AMD. Their hardware is good and their fully open source drivers are a wonderful accomplishment. Seriously, I want to underline just how amazed I am that I have a fully free high-performance GPGPU stack running. However, if you compare AMDs software efforts to NVIDIA, it is clear that NVIDIA deserves their success. It's not just about breadth (NVIDIA is bigger, I can understand they have more resources), but the fact that AMD seems to spread itself too thin over too many effort, and it makes everything feel kind of shoddy and poorly documented.


> There's also HIP, which converts CUDA to something that can run on AMD GPUs.

Kinda making your point, HIP is actually an (open source) one-to-one replacement for the CUDA API. While there are tools (https://github.com/ROCm-Developer-Tools/HIP/tree/master/hipi... and https://github.com/ROCm-Developer-Tools/HIP/blob/master/bin/...) that convert CUDA to HIP, these are meant to be run once and perhaps touched up manually. HIP source can then be compiled (no human interaction) for AMD (ROCm) devices and NVIDIA (CUDA) devices.

The CUDA brand refers to both the API and the device. AMD certainly could have done a better job, but wanted to distinguish these, presumably in hopes that existing users of the CUDA API would consider porting their code to HIP even when targeting CUDA devices.


But CUDA C++ is a single-source programming language incorporating both CPU and GPU parts, so what is the one-to-one replacement for writing a kernel and launching it with the angle bracket syntax? Looking at the whitepaper[0] it looks like there is still some language extension going on to support the single-source case (the __global function). So HIP is also a C++ compiler frontend?

[0]: https://gpuopen.com/wp-content/uploads/2016/01/7637_HIP_Data...


I can never understand why AMD has not thrown its weight around to get support added in PyTorch and TensorFlow. NVidia is minting money on this by charging 2-4X on data center GPUs and eating theri lunch, dinner and breakfast. AMD could invest as little as 10 developers and get it done in few months. It's astounding that they can go all the way to do silicon, febs and just fall short of adding this final framework support.

https://towardsdatascience.com/on-the-state-of-deep-learning...


Because until very, very recently they didn't have much weight to throw around. They were hemorrhaging money.

These things take time. CUDA took what, a decade to become what it is today? And it was developed by a company that wasn't on its deathbed.

Arguably AMD still doesn't have much weight to throw around. Their profits are meager. Their P/E is absolutely insane for a hardware company (240!), suggesting a dramatic correction in the foreseeable future. Stocks on an upward trajectory help attract top talent. On a downward trajectory they actively repel it.

So they probably have like half a dozen engineers working on this, and not those magical $500K/yr engineers that can actually turn shit into gold, because a hardware company wouldn't be able to hire them (or afford them, for that matter).


AMD employs 10,000 people, yet they are spectators watching NVIDIA rake in monopoly profits on GPGPU year after year. And you're saying it's because they can't afford to get 10 good engineers to work on framework support because their shares may crash at some point in the future?

I'm sorry, but this makes absolutely no sense at all. P/E or corrections have absolutely nothing to do with it. Salaries are paid from revenue, not from profits, and their revenues have been growing for years. They don't need to pay engineers in stock options.

This is about priorities and execution. Getting 10 engineers to focus on an obvious multi-billion dollar opportunity that leverages existing investments is not "throwing your weight around". It's basic management competence.

And it's interesting work. I don't believe that engineers need low P/Es as an incentive to work on a key part of AI infrastructure that only very few people get to work on.


FWIW I applied to AMD for an HPC application engineer position (GPU compute) and a couple of weeks later to a similar position by a direct competitor.

In the time it took the other company to contact me, do 6 interviews, negotiate contract, and get me signed, I haven’t even been contacted by AMD. Their position is still open, I’m a perfect fit and I haven’t been rejected either. I suppose I’ll get an automatic rejection letter at some point.

The online application system for AMD is just quite bad. Looking at my application in their system, everything is full with “Unknown” and I can’t event put in my skills. It wouldn’t surprise me if on their database I’m a horrible candidate.


>> And it's interesting work.

It's also a hard slog, and something very few people _really_ know how to do. Ideally you have to be both deep learning / HPC expert _and_ low level programming expert, and on an "exotic" platform to boot, not on the CPU. Such people are scarcer than hen's teeth, and they have no problem whatsoever getting mid-six-figure employment at companies that don't treat them as an afterthought, and where they can hope for a strong upward trajectory career-wise.

I agree with the rest of your point, and in particular that this is an existential threat to AMD as a whole. I don't get why they don't treat it as one.

Be that as it may, I think you're severely overestimating the present level of "management competence" allocated to something that's not currently perceived as a do-or-die situation at a company that was dying just a year ago.


>Their P/E is absolutely insane for a hardware company (240!),

Yes, in a market today that is trading relatively low in PE~20 - 25ish.

AMD would have to grow 12 times its profits to normalise that price, as much as I am pro AMD in the Intel/AMD case, 12 times profit is a little over the top.


Sorry but you have to spend money to make money and AMD could easily do this to increase its margins and revenue.


AMD is a fabless semiconductors company. They are more akin to low-level software than to the majority of hardware companies.


NVIDIA has never owned a fab either. The fabs that AMD once owned were spun out into a separate company, Global Foundries. Intel in some ways is the odd duck in the semiconductors business, still holding on to its fabs.


Not really, no. Unlike a software company it still costs them a significant amount of cold hard cash to produce each unit they sell. Captial intensive AF, especially at 7nm. They don't have to build their own fabs, but they DO have to pay for those fabs to be built, one way or another. Being fabless doesn't free you from that.


I can never understand why AMD has not thrown its weight around to get support added in PyTorch and TensorFlow.

Just for completeness: Tensorflow has (upstream) ROCm support. When training some of our transformer models, the performance difference was not large compared to AMD's Tensorflow fork.


If you have a system with ROCm installed, support for compiling PyTorch on AMD has already been upstreamed. The PyTorch source code ships with a script for building a version using HIP (basically the script converts PyTorch's CUDA code to HIP and adjusts some build settings). So, if you're running a system with ROCm installed and are willing to compile PyTorch from source you can run PyTorch on AMD (admittedly that's several caveats and you're also limited to Linux in order to use ROCm).


I'd like to highlight similar issues. I've been a fanatical AMD supporter since 2008 and refuse to buy or invest in learning about Nvidia GPUs because I honestly care about open source drivers even if that means giving up opportunities. I'm very interested in experimenting with neural networks as an amateur.

I couldn't figure out what ROCM is, what it does, what parts of it I was meant to install (if anything?) and what was a brand name for coordinated upstream efforts in the linux kernel. Does it have an API? Who is meant to be using it for what? I dunno.

I honestly don't understand. I don't know anything about CUDA because, as mentioned, no Nvidia GPU in a decade. My theory is if I pour all my effort into learning about Vulkan that will be good enough. I'll run tensorflow on my CPU or just do without and suffer for it. Whatever ROCMs use case and target audience is I hope it isn't me.


https://rocm.github.io maybe a good place to start. I've been meaning to checkout TensorFlow for ROCm myself but it's challenging finding general info like what cards are supported (I'm pretty sure Radeon VII is).


The issue with that page is it assumes (1) that the reader is very familiar with CUDA and (2) has a feature checklist vs CUDA.

Eg, there is an "Important features include the following" section where a typical example is "User-mode queues and DMA". We've had user mode queues since Sedgewick's Algorithms in C. Or "Large memory allocations" which we've had since the introduction of the 64 bit CPU. Obviously these must be features in the context of a GPU, but they don't really have a lot of explanatory power for anyone not already intimately familiar with the field.

I suppose the obvious conclusion is ROCM is professionals only; but presumably those professionals are already perfectly happy with CUDA. I'd find it helpful to know what these professionals are expected to implement with all this so I can go figure that out.

Specifically for me I know Vulkan (aka OpenGL++) is a general purpose API for running things on a GPU. I don't understand if that supersedes ROCM, compliments ROCM or what. I'm assuming that Vulkan is all anyone needs and then ROCM is going to ... provide something, maybe libraries? ... as middleware to Tensorflow and related tools. But blow me down if I can find anything that fills me with confidence that I've got the right take. ROCM seems to be very low level which makes little sense to me given that Vulkan (and the slowly obsoleting OpenCL) seem to be where the compute work is being done.

If I wanted to summarise PyTorch it is "a Python API & library that provides convenience functions and data models for machine learning." Or CUDA, at a guess, is "a C++ API for highly parallel compute on Nvidia GPU". ROCm claims to be "the first open-source HPC/Hyperscale-class platform for GPU computing that’s also programming-language independent" and none of that means much to me. Doesn't hint what programming language I'm expecting to be using, and I don't think I even want Hyperscale computing. But I get the feeling I want ROCm because it is associated with PyTorch on AMD GPUs. Befuddling.

I dunno, the ROCm version of PyTorch looked complicated and convoluted so I'm just going to sit it out and wait for someone to explain who the circus is performing for.


CUDA is language agnostic.

While Khronos was busy pushing the C only API they go from Apple, Nvidia created a GPGPU platform, with support for C, C++, Fortran and any language with PTX capable backend.

When Khronos woke up to the fact that most researchers don't want to use plain old C, it was too late.

It remains to be seen how much uptake SYSCL or SPIR-V will ever get, across GPGPU vendors.

Intel is trying to get back into the game via their extended SYSCL, so lets see.

Also note that on mobile devices only Apple actually has supported OpenCL, while Google pushed their Renderscript instead.


They get my vote just for Spinal Tape creds

> Going to 11: Amping Up the Programming-Language Run-Time Foundation


Wow, you're principled.


Aww, thanks :). It's easy for anyone to be principled if they don't need a GPU, although I've been blessed by nature to enjoy tracking the progress of Open Source up close more than having a working graphics card.

If I wanted to be serious about getting in to AI research I'd probably need to buckle and buy Nvidia.


> I don't have any ROCm-specific kernel drivers loaded

I'm pretty sure it needs amdkfd, which is not really "specific" but doesn't have any other consumers.

> but OpenCL on my AMD GPU works fine

Well, if instead of ROCm (or proprietary "pro" stuff) you have Mesa's Clover… that's the opposite of fine. I mean, fine for simple things, but Darktable won't work (no image support), Blender won't work (just too complex lol)…

---

my casual user opinion: I HATE ALL the special compute-specific GPU APIs. I don't want a special fancy kernel driver that loads everything differently, I don't like OpenCL and other unusual special garbage. Please please please just use Vulkan compute shaders for everything.

(And thankfully, people are starting to do that https://github.com/nihui/waifu2x-ncnn-vulkan https://github.com/hanatos/vkdt …)


For anyone else using NixOS, https://github.com/nixos-rocm/nixos-rocm by Anthony Cowley works very well.

Not just NixOS. Our work AMD GPU server uses Ubuntu 18.04, but I use Anthony's package set for easy use of PyTorch and Tensorflow in various environments (using nix-shell + direnv).


What kind of compiler you have wetting for GPU?

I am curious on what's the motivation given that nvcc is fairly competent.


For people trying to install this on Debian/Ubuntu, the group that enables access to the gpgpu is actually "render" instead of "video"

I spent some time wondering why I couldn't get access to the device until some kind soul pointed me to the different group name.

https://fosstodon.org/@dctrud/103315274791608145


Does ROC support ATI RS600 GPUs?


No, only GCN and later, AFAIK.


Note that one reason AMD have to make this stuff work is their big supercomputing win: https://www.amd.com/en/products/frontier

I don't know if there's any relationship between that and the free software infrastructure, but it's a relief to see an alternative to the CUDA-ish stuff. In the absence of a shared library dummy interface, we can't distribute OS packages of performance engineering tools with NVIDIA support, for instance.


I wish OS X would do better with AMD and also get it working with Tensorflow. No OS X support here sadly.


I wish macOS would allow for PCI passthrough to VMs (like you can do on Linux via OVMF, using QEMU). I wish I could connect an eGPU and pass full access of it to my Linux VM (or Windows VM), whether that GPU is AMD or NVidia shouldn't matter. That would remove the disadvantages Macs have when it comes to GPU and make it the best VM host (currently Linux seems to be the best option for VM host). I'm sure Mac Pro owners could take advantage of this were it possible.

However, I suspect Apple has no intention of doing this, as they are more focus on pushing Metal and reliance on macOS.

I wouldn't mind seeing PCI or GPU passthrough capabilities this in Windows 10 Pro (not just Server). With their Linux Subsystem progressing the way it is, they really have an opportunity here. Though since NVidia and CUDA are supported in Windows, maybe not as necessary as the situation on macOS.


Check out PlaidML (from of all companies intel) to get AMD GPUs working with the Keras APIs for tensorflow

https://github.com/plaidml/plaidml


I've used this, unfortunately I've not had a good experience with it. Say I was to have a keras model, and use a tensorflow backend, and then use the plaidml backend and compare the results: They're different in a radical way.

Also I don't think they've got their driver usage right. Metal on the GPU gets different results than OpenCL. OpenCL is supposed to be the 'beta' and the results are more stable with it - and again different radically between the PlaidML devices.


I remember running inference on a model that I trained on a pair of NVIDIA 1080TIs with a NVIDIA 2080TI (after upgrading) and observing slightly (albeit not significantly) different results.

We’re dealing with single precision floats for everything. FWIW, every implementation of IEE 754 floats has its own set of minor oddities and quirks.


That's entirely self inflicted wound by Apple they did their best to stop any vendor from developing the driver.


You bought into a more closed ecosystem and that is what you have now.


Why do you need macOS? You can use Linux for your needs, it lets you do what you want and how you want it, instead of Apple telling you you can't do it.


So is this AMDs response to CUDA? Is there any reason to believe this will end up practically cross-platform? Is the implication that opencl is dead?


This contains an OpenCL implementation (with support for 2.0 at that!) and the kitchen sink. It's a bit unclear what AMD actually suggests you use for programming their GPUs.


No Navi support? Surprising.


I’m not predicting support until there’s a server oriented model available. So 2020 with Navi 20 and the new Radeon Instinct cards. They do have a single Radeon Pro W card (Navi 10), but I think they’re waiting for their Tesla competitor to be ready.

It’ll be interesting to see how these cards and ROCm perform in comparison to NVIDIA’s offering and CUDA.


The ROCm compilers are slightly different, the newest GCN arch is probably not well tested for compute yet. The ROCm stack's development and testing effort strongly focuses on their 'pro/server' line cards. Is there a Navi WX or the equivalent yet?


There is a W5700 GPU but it was only made public about a month ago. I think they're still working on improving the base drivers for Navi so they haven't gotten to adding in proper support.

Granted that I haven't really tried properly, as the card does show up as a compatible device, but trying to run Tensorflow on it fails due to associated GPU code not being available. Perhaps it doesn't have official support yet due to needing more compiler work.


What is it, exactly?


More or less AMD CUDA: a collection of compilers, tools, and libraries. The main difference is that NVIDIA markets CUDA strongly based on CUDA C++. You know the way you write CUDA is through nvcc and the language it accepts, unless you turn out to have exotic needs. Instead, AMD just provides a whole bunch of various things with fuzzy documentation, and lets you to pick whatever you like (although I'm not quite sure how you're supposed to know what you like).


Will this help any on getting acceleration for Tensorflow?


Yes, there is supposedly Tensorflow support in ROCm. I have never used it, so I cannot say how well it works.


Just don't expect it to be as good as Tensorflow on CUDA.


What makes you say that? It works perfectly fine and so far I haven't encountered anything I couldn't do with tensorflow-rocm that should be doable with Tensorflow on Nvidia


Well they have decided to put the most important information at the bottom of the readme...

https://github.com/RadeonOpenCompute/ROCm/tree/roc-3.0.0#roc...

It's a collection of GPU development tools and libraries.


If I want to do OpenCL on Linux with a new AMD GPU do I need to now separately install a Radeon Open Compute package before I install normal Open CL stuff?


This is what I would recommend, but there are also other OpenCL stacks you could use. At least two from AMD (amdgpu-pro and ROCm), and and there is also one from the Mesa project (Clover).


would this help in running tensorflow on the latest MBP 16 ? If so any guide out there, the documentation is not super helpful


No, this is Linux-only.


My Linux-running Dual booting Macbook with an AMD GPU would disagree.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: