The big problem right now is that ROCm, AMD's equivalent to Cuda for deep learning, is not supported for the consumer Radeon Navi cards. I had hoped it would be supported in ROCm 3.0 when that was released - but there has been no official word, which is worrying. If AMD are to compete in the GPU/DeepLearning space, it will be because they have hardware that is competitive (not even as good as Nvidia), low cost, and doesn't have the same EULA licensing restrictions as Nvidia. Could you imagine taking your TensorFlow program on AWS and just running the same code on a AMD GPU, but costing 25% of a V100 (with 80% of the performance)? That is where they should be. And the "run unchanged TensorFlow program" is not actually a problem - ROCm has been upstreamed into TensorFlow. It is support for their own Navi GPUs that is the software problem right now!
I agree that the lack of (official) Navi support is just bad strategy on AMD's part, but I disagree with your numbers.
Their goal needs to be performance parity or dominance, and then perhaps offer a slight discount. That's how they're succeeding with Zen on CPUs. Offering the kind of steep discount you write about is just bad business.
From a consumer perspective Ryzen certainly feels like a steep discount. Was there a comparable price/perf to 3900X/3950X in the Intel world? Last I looked it seemed like you had to hop ship to Xeon with more expensive boards and really expensive CPUs to get something similar on paper.
It looks like the 10980XE is the best answer. But in fairness, I believe it and its pricing was exactly an answer to Ryzen 9.
At the high-end (V100), Nvidia are price gouging, so there is room for a profitable strategy here where AMD significantly undercut Nvidia. My reasoning comes from my belief that AMD will not reach performance parity in the next generation - for deep learning, at least. If the new Arcturus GPU is competitive with Nvidia's new Ampere chip, that would be great. But they won't sell if they are not significantly cheaper. But they won't be significantly cheaper (Arcturus will be the "Enterprise" GPU). However, Big Navi will be much cheaper and can be used in a data center for deep learning without the legal problems that using a 3080Ti would have. I see this as the same "war" between Enterprise hard-disks from 10-15 years ago. Google came out and built their infrastructure on cheap commodity disks. But Dell, HPE and other vendors would not let you use them - they said they were inferior, etc. Commodity GPUs could "democratize" deep learning - ordinary servers could just include a couple of GPUs as a cheap add-on.
x86 CPUs arrive in a software marketplace with tons of compatible software available for them. AMD GPUs are at an enormous disadvantage without support for Cuda or some equivalent, or even some hint to developers about what they should be programming to (is it still OpenCL?)
They necessarily have to sell at a huge discount or datacenter providers won’t even pick up the phone when they call.
Did they mention it will ever be supported on Navi?
It seems to be AMD is on its way to have specialise Architecture for each segment. Navi and Navi Refresh and Gaming GPU without Ray Tracing, so basically low end to middle segment. Navi 2 support Ray Tracing and aiming at higher quality possible.
Arcturus; the uArch name for Vega successor ( also known as Vega 30 ), was suppose to be double the size of VEAG VII, with all the power optimisation they learned from Zen 2. ( As in the current variant shown in Ryzen Mobile ). It seems AMD want to separate the GPGPU market from the rest. Which makes sense because 90% of the gamers dont do anything GPGPU based.
It doesn’t make sense at all. An enormous number of people who work on GPGPU will develop on consumer hardware before they move up to cloud computing or a Quadro or something. Trying to segment the market like this is based on a misunderstanding of developer needs and preferences. Which is pretty typical of ATI/AMD unfortunately.
AMD has no chance until there's broad support for Cuda. Cuda is so deeply entrenched at this point that trying to fragment the space and compete against it is a non-starter.
The consumer/gaming market is nothing compared to the GPGPU market today.
Cuda is a proprietary Nvidia framework. You don't need cuda support to train deep learning models on TensorFlow. ROCm also works - and is built by AMD and is open-source.
Tensorflow is only a small subset of the massive engineering investment that has been done into cuda by various orgs and open source projects. We have GPU databases, GPU accelerated analytics packages, GPU accelerated DSP, etc. All with hand-written cuda kernels.
Unless AMD can provide a toolchain which takes cuda code and generates whatever it takes to run it with performance parity to Nvidia cards, it'll never take off.
ROCm is a decade too late to simply coexist with cuda and battle for market/mind share. Unless they can offer a massive perf/cost advantage, there is no incentive for anyone to invest in porting their code.
> Unless AMD can provide a toolchain which takes cuda code and generates whatever it takes to run it with performance parity to Nvidia cards, it'll never take off.
Does HIP support all the programming languages that are able to target CUDA, like Julia, Numba, .NET, Java, Haskell, Fortran, C++?
Khronos did a big mistake to keep pushing C as the only compute language for so long, and it remains to be seen how much adoption SPIR-V is able to actually take from PTX.
So unless HIP actually matches PTX, it is a non-starter.
In theory HIP is a modified LLVM, but as far as I know Julia uses a modified LLVM as well, so unless they are upstreamed, we can't use them together :(
I guess I'm staying with NVIDIA with developer laptops for a long time, and maybe using AMD for ultrabook.
Of course not, and some things would never be ported. But if they had a massive cost advantage for a year or two in data centers, a lot of things would get ported, quickly.
Have you tried installing rocm? Last I checked two months ago you had to run a Ubuntu 16.04 container on Ubuntu 18.04 because the amd drivers were only compiled for bionic and the rocm software was only compiled for xenial.
I've been running rocm directly on Ubuntu 19.04 for a while now. I didn't have to build from source either. It's odd that they haven't updated their docs about this yet.
The last time we purchased GPUs, we bought two Radeon VIIs besides RTX 5000s. For training transformers in Tensorflow and PyTorch, they were a steal, because the training is only marginally slower than the RTXes at a fraction of the price.
For LSTMs, we have found performance regressions compared to NVIDIA, but it's largely because it was hard to hit the optimized MIOpen code paths.
We're in the age of serverside GPGPU on the cloud.
"Looking at the GPU segment, the revenues have almost doubled over the past two years. This can primarily be attributed to Nvidia’s foray into data centers, expanding in both High Performance Computing and the cloud. The data center revenues were up a solid 133% in 2017, led by continued growth in its CUDA platform, and increased acceptance of its Volta architecture. "
The peak of bitcoin was when GPU marker were at their highest revenue. May be the word mining should not have been there. Given they are mostly gone to ASIC.
Doubling GPGPU market from 2017 -2018 doesn't mean "he consumer/gaming market is nothing compared to the GPGPU market today" And even they managed to grow in 2019 again, it is still no where near close.
Nvidia's revenue has been consistent over the years that their Gaming revenue is still the most important source of income and over 50% [1] of their revenue source.