AMD at a Tipping Point with Instinct MI100 GPU Accelerators

wrsh07 · on Nov 16, 2020

It has lots of flops

> The fun bit is that Arcturus is the first GPU accelerator to break through the 10 teraflops floating point barrier 64-bit precision – and AMD is doing it in a 300 watt thermal envelope for 11.5 teraflops, not a 400 watt one like Nvidia has with its Ampere A100, which weighs in at only 9.7 teraflops at 64-bit precision floating point

It is expensive

> Pricing on the Instinct MI100 was buried in the footnotes, and is $6,400 a pop in single unit quantities. We are going to try to see what the OEMs are charging for it.

You can connect them together in more efficient "hives"

> Having three Infinity Fabric pipes per Arcturus GPU allows a NUMA-like coupling of four GPUs and 128 GB of HBM2 memory into a much bigger virtual GPU, much like UltraPath Interconnect at Intel

Whereas before it was less efficient

> With only two Infinity Fabric ports on the Instinct MI50 and MI60 cards, banks of GPUs could only be hooked to each other in a ring topology and the larger the number of GPUs in the ring, the more latency between the devices.

I'm sure this is an incomplete list of highlights, but these all jumped out at me

Voloskaya · on Nov 16, 2020

> It is expensive

At $6400 this is still only ~50% of the A100 price.

But as others have said, it does not really matter while ROCm is still so bad.

dogma1138 · on Nov 16, 2020

The interesting part is that instead of adding a lot (if any really) "matrix multiply add" cores they seem to just dumped a truckload of FP16 cores they claim to have nearly 200 tflops of FP16 in just normal half precision that's kinda bonkers...

But the issue is always software, ROCm is currently a joke.

xbar · on Nov 16, 2020

Briefly, what would it take to get ROCm to be CUDA-competitive, as a platform?

dogma1138 · on Nov 16, 2020

Feature parity with the core libraries of CUDA.

Flawless Windows and Linux support.

Software updates released in line with GPU releases, and guaranteed continued support throughout their lifecycle.

Full support for consumer grade GPUs and APUs.

lotyrin · on Nov 16, 2020

Am I mistaken or is that instead rather cheap in this segment (compared to A100)?

wrsh07 · on Nov 16, 2020

Oops, a couple comments have corrected me on that point

The a100 is theoretically $12,500 https://www.pcgamer.com/nvidia-ampere-a100-price/

chaosfox · on Nov 16, 2020

unfortunately having a competitive card is only half the battle, they also need all the deep learning libraries to support using this card otherwise nobody is going to bother. I hope AMD understands that I enlists their own engineers to help the community make this card support solid.

FredFS456 · on Nov 16, 2020

ROCm (https://rocmdocs.amd.com/en/latest/) is their compute framework/stack. Not as good as CUDA but has support for Tensorflow etc

dogma1138 · on Nov 16, 2020

The problem is that ROCm is only for linux which is still a huge downside, and it doesn't have good support (or support at all) for the consumer grade GPUs, pretty much Polaris onwards is good luck, heck even Radeon VII isn't well supported.

CUDA works because any NVIDIA GPU will run CUDA this means it's easier to learn, easier to prototype and easier to ship and the code you ship isn't limited to the datacenter.

What AMD needs to do to "win" an HPC GPU launch is to have an event which is 95% "How we fixed ROCm, and here is our full software roadmap and support guarantee for the next 5 years" and the remaining 5% "oh btw here is our new silicon, it's really fast and shiny".

PEJOE · on Nov 16, 2020

I would be interested in knowing how much compute happens away from Linux. My impression is that almost nobody uses windows for theses tasks, but anecdotes are not data. There are of course workstation type acceleration tasks like simulation that are very windows heavy(e.g. ANSYS), but I am not privy to the breakdown of compute demand per segment.

> What AMD needs to do to "win" an HPC GPU launch is to have an event which is 95% "How we fixed ROCm, and here is our full software roadmap and support guarantee for the next 5 years" and the remaining 5% "oh btw here is our new silicon, it's really fast and shiny".

This is a great point. NVDA has worked on CUDA for years and has a great ecosystem of material and questions on places like stackexchange. AMD will have to work very purposefully to close the gap, but it seems like they are aware and headed in the right direction.

hpcjoe · on Nov 16, 2020

The software side of AMD has been (in preceding years) a disaster. I say that as someone interested in their products. Who would love to see a realistic competitor to CUDA.

AMDs linux support has been somewhere between no-assed to half-assed for the preceding decade. I think they believed that the world will return to windows for everything. That ship has sailed, long ago. The point about CUDA being usable up/down the HW stack is quite salient. When I develop GPU things, I start on my laptop GTX1060. Test on my deskside RTX2060, and run them on V100s. Code is in Julia, C, Fortran, so it should work anywhere with good underlying library support. I've got a zen laptop with integrated Radeon. No dice, can't do computing on it (yet).

AMDs function/library support is nascent, and will take years to get to a viable point for many.

I am hoping ... hoping ... that AMD sees this as an opportunity long term, and not a short term expense that must provide immediate ROI. SW ecosystems drive the HW purchases, but there is usually a lag of years before this engine really gets started.

AMD needs to be in this for the long haul.

dogma1138 · on Nov 16, 2020

Your impression is wrong, there is a metric ton of enterprise and consumer software that uses CUDA and runs only on windows.

There are also whole "data sciences" divisions in bluechip companies that are running windows.

Case in point i work for a huge financial company we have CUDA powered excel add-ins/macros...

And no I'm not joking https://on-demand.gputechconf.com/gtc/2010/presentations/S12...

And engineering, sciences and medical consumer applications are also quite often than not Windows only or Windows first.

Then you have all the less-enterprisy stuff video and photo editing, filters, chess programs w/e...

And lastly the biggest point is that Windows and consumer grade hardware is where most developers and students live, good luck running ROCm on your laptop, and no I really mean it it's officially not supported and in reality even if you manage to get a moderately compatible chip you'll encounter more bugs than on Klendathu.

Don't underestimate the importance of software that runs everywhere and just works. Node.JS didn't became popular because JavaScript on the backend was something that was desperately needed, it became popular because you had a plethora of front-end developers that had little to no knowledge of server-side languages and frameworks.

Unlike what HN and recruiters would like you to believe most developers can't learn 10 languages and frameworks, and definitely not well sure some can but the vast majority of developers don't spend 9 hours working and 9 hours hacking, for every dev with a github account that needs it's own storage rack there are 10,000 that just do 9 to 5 and check out.

If on one hand you have a solution that forces you to pick from a narrow list of linux kernels and supported distros and an extremely narrow list of GPUs and still encounter bugs on every corner so you can maybe produce something that if it runs would only run on the same system as yours vs on the other hand a solution that would run on any OS that supports an NVIDIA GPU you'll pick the latter unless you are really really bored.

And that is before you entertain the marketability and job prospects of learning CUDA vs ROCm, one allows you to get a job at any place that ships something that runs on a GPU it doesn't matter if it's something that occupies 1000 racks and might become sentient or something that filters excel spreadsheets faster the other one doesn't.

PEJOE · on Nov 16, 2020

> And no I'm not joking https://on-demand.gputechconf.com/gtc/2010/presentations/S12...

Thank you for sharing the link and correcting my information bias. It sounds like the "workstation," compute world is a forest of deep niches.

You make a lot of good points about the staying power of Windows. I am excited about all the moves towards a complete Linux desktop, but am not imagining that it will be mainstream.

dogma1138 · on Nov 16, 2020

I’m not sure if it’s a forest of deep niches, at this point I would say that the niche is the 7 figure server racks with A100’s outisde of the cloud providers...

There are still more use cases for GPU compute on the edge than in the datacenter and that likely won’t change.

And for Linux on the enterprise desktop well then ROCm can’t run in WSL2, CUDA can so yet another reason to bloody support Windows...

Because WSL2 is ironically probably the way forward for Linux on the desktop for the majority of the computerized workforce.

zokier · on Nov 16, 2020

> There are of course workstation type acceleration tasks like simulation that are very windows heavy(e.g. ANSYS)

Funny you mention ANSYS specifically as they seem to have pretty decent Linux support:

https://www.ansys.com/solutions/solutions-by-role/it-profess...

Although only on nVidia hardware if I'm interpreting it right

PEJOE · on Nov 16, 2020

> Funny you mention ANSYS specifically as they seem to have pretty decent Linux support:

With ANSYS in particular the question is who is doing the simulation. If the engineer is doing it, many engineering tools are windows only (although this has been improving) so it makes sense to run ANSYS under windows as well so you can be close to your modelling software. If the stress or EM guys are separate from the designers, then it shouldn't matter as much.

I would love to see all productivity tools move to Linux and things have been getting a lot better over the years. Personally I'm excited around the noise that Microsoft was exploring office for Linux, as office is the only reason I ever boot into windows. What a godsend it would be to be able to program and run all my productivity software at the same time.

dogma1138 · on Nov 16, 2020

Yes, AMD’s fuckup in compute isn’t just ROCm but also their OpenCL support on Linux.

Windows still gets semi-decent support especially for their WS cards but Linux oh-boy...

dTal · on Nov 16, 2020

Do not believe it. They wrote it for Windows and ported it badly.

Source: Struggled for months to get ANSYS to work even crappily on three different distros of Linux, two of which were clean installs of "officially supported" distros.

Eventually gave up, bought Windows, and installed it on that. Worked immediately.

boulos · on Nov 16, 2020

That’s kind of the make-or-break plan for Frontier and El Capitan [1]. They’re having all the science folks try using ROCm and the HIP recompiler thing. We’ll see how that shakes out in practice.

[1] https://www.anandtech.com/show/15581/el-capitan-supercompute...

dogma1138 · on Nov 16, 2020

LLNL writes their own stack usually, I don't see the main API for El Capitan being anything but OpenMP and LLNL can and has written their own compilers and libraries for other GPU powered supercomputers.

boulos · on Nov 16, 2020

Sure, but other folks have to make use of it, too. Not everyone's code will be abstracted from CUDA. They're trying to get folks on Summit to test out HIP more strongly. Repeating my comment from last summer [1] that linked to the "try to use HIP" [2]:

> The OLCF plans to make HIP available on Summit so that users can begin using it prior to its availability on Frontier. HIP is a C++ runtime API that allows developers to write portable code to run on AMD and NVIDIA GPUs. It is essentially a wrapper that uses the underlying CUDA or ROCm platform that is installed on a system. The API is very similar to CUDA so transitioning existing codes from CUDA to HIP should be fairly straightforward in most cases. In addition, HIP provides porting tools which can be used to help port CUDA codes to the HIP layer, with no loss of performance as compared to the original CUDA application. HIP is not intended to be a drop-in replacement for CUDA, and developers should expect to do some manual coding and performance tuning work to complete the port.

[1] https://news.ycombinator.com/item?id=20495637

[2] https://www.olcf.ornl.gov/wp-content/uploads/2019/05/frontie...

fxtentacle · on Nov 16, 2020

In case someone from AMD is reading here, you should probably gift out some of the cards to generate some traction. For example, send one to me ;) But joke aside, the ROCm documentation seems so wacky that I would not dare to purchase an expensive AMD card without recommendation by someone that I trust.

For example, rocmdocs.amd.com appears to be the official documentation, but the front page doesn't mention any popular AI framework. Opening the "Deep Learning" box in the sidebar, I see that there should be a TensorFlow 2.2.0-beta1 release. But on https://github.com/ROCmSoftwarePlatform/tensorflow-upstream the featured release is "2.0.0-rocm", the docker link says it's TensorFlow 2.3 and the "Official Builds" at the bottom of the readme point to CUDA versions ... The whole GitHub repo kind of feels like a bad copy&paste job. I manage to eventually download the TensorFlow 2.3.1 + ROCm source from GitHub, only to find that it doesn't compile. I need the hip_hcc library https://github.com/ROCm-Developer-Tools/HIP/ but apparently that one has been renamed some months ago, but the ROCm TensorFlow port has not been updated yet.

After spending half a day trying to install a ROCm-compatible TensorFlow, I gave up. Obviously, I wouldn't want to hope on those issues getting resolved when I buy an as-is unusable $6400 accelerator card. That said, the MI100 could be a great deal if I could get it to work. AMD hardware is a lot more affordable than comparable NVIDIA products, plus the MI100 has lots of fast RAM. I work on optical flow (think autonomous selfie drone) and that needs lots and lots of GPU RAM. Plus I'm very happy with my AMD CPU. But like I said in the beginning, I wouldn't dare buying an AMD GPU unless there was believable evidence that I can get my TF2.3 model running with ROCm in a reasonable timeframe.

I'm not trying to blame AMD for having bugs here. I have run into a fair share of issues with NVIDIA GPUs, too, like for example a RAM corruption bug inside CUDA 10.0 and a reproducible kernel freeze in CUDA 10.1. But for NVIDIA, the initial setup is quick and easy. And the CUDA documentation looks like they have a dedicated team that cares about developer productivity. For AMD's ROCm, on the other hand, even the marketing material reads more like someone hopes that the open source community will fix things for free. "first open-source software development platform for HPC/Hyperscale-class GPU computing" sounds great until you notice that in this context, "open-source" apparently means "no binaries + no support".

criddell · on Nov 16, 2020

Seems like a tipping point is really only something that can be pointed out once you are past it.

akimball · on Nov 17, 2020

Twice the flops/$ over the competition is enough to attract a lot of software development. It seems reasonable to say that the introduction of optimized tensor ops is a tipping point. The nvidia monopoly is definitely at risk now.

floatboth · on Nov 16, 2020

Huh so "scalability" on that roadmap did not mean chiplets on GPUs?

enchiridion · on Nov 16, 2020

Is this good for AMD?