
AMD is edging closer to breaking Nvidia's graphic dominance - toufiqbarhamov
https://www.engadget.com/2019/01/17/amd-versus-nvidia-radeon-vii-7-nanometer/
======
so_tired
Wanna know why AMD drivers are so broken?

I talked once to an AMD project managery guy. He told me how they have several
teams in different countries. And for each feature/bug different managers have
to "bid" with weekly resource estimates. Bid too high and you dont get enough
work for your peoples. Bid too low and they end up with unpaid overtime. Rinse
and repeat.

And these are kernel developers...

EDIT: to be clear, I am rooting for competition to Intel and NVIDIA. I just
dont think this kind of culture can work on the software side.

~~~
ezoe
Seriously? That doesn't work in an entity who share the same revenue. Everyone
lose.

They should rather hire the economist as their boss.

------
dogma1138
With a $5000> card (MI50) AMD is forced to sell at a loss to be able to meet
the same performance levels NVIDIA had for the same amount of money ($700) 2
years ago.

If this is getting closer I think they need a better tape measurer.

With 16GB of aquabolt HBM2 on a 7nm node there is no way AMD is losing less
than $150 per card without even accounting for opportunity loss.

In fact this card is the reason why the head of the RTG was fired when
suggesting to sell it at $749 at a loss.

The 2080 for all intents and purposes is a mid range die.

The only thing AMD is competing on atm is price and for them it’s a lose lose
situation until Navi comes out and that is only if Navi could actually be
competitive above the $300 price bracket as it’s a Polaris successor its not
clear if AMD will have anything on the level of the 2080 not to mention the
2080ti based on Navi.

AMD had to drop the prices of VEGA to around $300 due to the RTX 2060 they are
likely also losing money on that front, they are losing money on Radeon 7 and
hopefully they will finally make money with Navi.

~~~
jungler
AMD doesn't have to win in discrete at all to make a killing. They only have
to dominate the lower end, which they already are positioned to do with
current APUs - these chips can manage 30hz/720p playability in most new games,
which is easily ignored if your requirement is competitive gaming minimums or
4k, but in terms of price/performance, it's near unbeatable.

The Radeon VII is ultimately just a repurposed workstation card given flagship
treatment. It's not where AMD's business really is, ever since they went down
the route of semi-custom and small dies. While pushing RTX could renew
Nvidia's advantages, the only game console they're on these days is the Switch
- no raytracing to be seen there. Developers accommodate the Geforce cards for
PC releases, but AAA going console-first precludes designing most content
around RTX. They've made a lot of moves to try to repurpose high end graphics
for other markets, but it looks like it's going to be very rough for Nvidia in
the next few years if they can't find a "blue ocean" that needs both speed and
programmability.

~~~
dogma1138
AMD's APUs are good but aren't really that popular with gamers, Intel's new
iGPU also looks impressive as hell, as they are already competitive with Vega
8 with their current lineup this new one seems to be a powerhouse for
integrated graphics.

It seems that AMD will have competition on the APU side very soon, and I
really hope Intel would pull it off and actually be competitive in the
discrete market as well in 2020 or 2021 and that their GPU adventure isn't
going to get a backroom abortion.

~~~
ChrisSD
The problem is most people who game aren't "gamers". By which I mean they
aren't necessarily interested in the hardware or anything else except "will
this game run?".

~~~
rubinelli
I'd bet there is a very large market for pre-built "Fortnite battlestations"
in the $500 to $700 range, a little above current-gen consoles. If that's your
budget, then AMD is the obvious choice.

~~~
asdff
Is there data for that? Anecdotally, I was thinking about buying a gaming PC
recently. Every single thread you find on reddit about PC gaming starts with
'don't buy pre built.' It is engrained in the gaming community at this point
that pre-built wastes money to the tune of hundreds of dollars, and PCs aren't
all that difficult to put together.

As it stands $500-700 gets you 1080p gaming thats better looking than current
gen consoles; this is the entry level and its already several notches above
APU.

~~~
ChrisSD
But if you're reading reddit threads on this I think you're several levels
removed from the average person who plays games. Sure "gamers" will read up on
this and be part of the gaming community but PC games are now mass
entertainment and not the niche it used to be.

Few people build their own PC. Many more play games.

------
brandon
Depending on how much stock you put into the Steam Hardware Survey, it's worth
noting that Nvidia still controls 74% of the market relative to AMD's 15%:
[https://store.steampowered.com/hwsurvey/videocard/](https://store.steampowered.com/hwsurvey/videocard/)

I also feel like graphics cards are one of those weird things that command a
lot of brand loyalty, so it's probably going to take more than near-
performance-parity to move the needle.

~~~
j-c-hewitt
I think the marketshare contributes more to the actual inferiority of AMD
cards at the midrange compared to the hardware specs.

Because AMD is a minority and game developers know it, driver-related bugs
that would be priority 1 to fix if it affected nVidia users are bottom
priority for AMD users. The AMD midrange cards are wonderful in terms of the
hardware for certain displays and competitive in every way from a hardware
perspective. Especially if you are looking for 60fps+ @ 1080p you can get
awesome deals by going with AMD.

Just be prepared to occasionally have issues like BSODing whenever a game
renders a lot of white textures like snow scenes for six months at a time
unless you downgrade the drivers to a specific version, which will then be
fixed in a driver update and broken again if you are an AMD user. Or to
occasionally have driver crashes that developers acknowledge and don't fix for
months. And to fuss with shader cache settings and things like that because
those settings are busted for AMD for some specific game you want to play.

It's not even necessarily that the drivers suck. They don't suck. They're
fine. It's that the developers don't have a big incentive to fix bugs that
only impact a smaller segment of the market. While it's not like there aren't
a lot of nVidia only bugs that crop up, because of its market share, those
bugs are almost always a higher priority to fix and you really notice it if
you have used both GPU manufacturers at the same time over a period of years
or have used both at various times.

Most of the nVidia tech gimmicks usually suck and are uninteresting
(HairWorks, physx, realtime ray tracing) and those that don't suck are usually
matched quickly (Gsync). But the marketshare alone is sort of a perversely
positive feature because it just means you are on the same drivers as the
majority of the market so bugs that impact you are a higher priority for
developers to fix.

~~~
jplayer01
There's nothing gimmicky about real-time raytracing.

~~~
j-c-hewitt
Now, in the next 3 months, this year, it is a 100% gimmick that is all running
on hybrid technology instead of full ray tracing. I own an RTX and I can tell
you that it is a gimmick that no one turns on except to see what it looks like
once and you can count the number of games that support it on less than one
hand.

Over the next 3 years and the next generation of cards, absolutely, great
technology.

~~~
jplayer01
Jesus, maybe appreciate the fact that just a few years ago real-time
raytracing was nothing but a pipedream. Now we have cards on the market that
can do a limited amount of real-time raytracing. That's huge. Next year
they'll get faster. The year after that they'll get faster. More games will
support it. Developers will buy in. This is a generational change that will
take 10+ years. That doesn't make this any less significant. It's not a
gimmick if it's going to take time for things to develop. That's how things
work.

~~~
j-c-hewitt
In 2019 it is a technically interesting gimmick that tanks your FPS. This
shouldn't be that controversial, because it is the nearly universal opinion of
reviewers and of people who have the card. What is interesting from a
technical perspective can easily be a gimmick from the perspective of a
consumer. I bought an RTX for the regular performance and maybe DLSS (which
although not widespread is not currently a gimmick). By the time actual ray
tracing and not hybrid ray tracing gains wider adoption, the next generation
of RTX cards will already be out.

I can even link you to a video from a Youtuber sponsored by nVidia that
essentially says that it is a nice-looking gimmick that will be nice for
cinematic single player games that isn't that useful for fast paced
multiplayer games in which most gamers will pick performance over image
quality most of the time. DLSS because it is a performance feature is in a
different category. Like Hairworks RTX is a tech demo type feature that most
users will turn off to get dozens of FPS more.

~~~
jplayer01
It's either this or we don't get raytracing at all. Do you not understand how
things develop? They have to start implementing it somewhere and develop the
tools and engine to support it, but the hardware doesn't exist for them to do
raytracing globally. They have to gain experience and develop practices for
implementing it. This is a good first step that will lead to more. How are you
not getting this?

------
jclay
Maybe for the competitive consumer graphics market, but I don't see them
disrupting Nvidia's hold in the HPC world. With Nvidia's NVLink, you can
achieve high bandwidth data transfer between graphics cards without having to
pass through the PCIe to CPU. If you have large computations that require
syncing data among multiple GPU's, NVLink is your most performant option by a
large margin. That's not to mention how much further ahead CUDA is from Sycl,
OpenCL, ROCm, etc. I certainly welcome the competition (and hopefully an open
standard). In my experience, CUDA is ahead on developer tooling, performance
and productivity features (Thrust, Unified Memory, C++17 support, etc).

~~~
ct520
AMD's NVLink is infinity fabric (MI60, MI50) isnt it? with 200gb card to card
transfer rate... Did you take that into account? I agree the toolset is light
years ahead of what I seen from AMD, but I'm not much into that.

~~~
jclay
Ah, I hadn't seen that announcement. It looks like this was announced recently
(November 2018) and it's still 100Gb/s slower throughput than the latest
NVLink. Either way, it could certainly be interesting if they compete on
price, but I cannot find any information yet on actually using it to sync with
OpenCL / ROCm.

------
shmerl
Radeon 7 (Vega 20) is a major overkill for gaming though. It looks more like a
3D rendering targeted card. Upcoming Navi cards supposedly should be more
gaming oriented and more affordable.

~~~
vertexFarm
As somebody who uses my cards for rendering in my freelance work and hasn't
played a AAA game since my teenage years, I'm definitely hoping to see AMD
kick some ass and competitively drive down some prices. Especially since they
actually seem to give a shit about linux drivers. Those saints!

~~~
shmerl
Sure. Open source Linux drivers is exactly why I'm using AMD and not Nvidia.

------
dragontamer
I'm no graphics programmer. I'm mostly interested in the compute (OpenCL,
CUDA) side of things. I don't own an NVidia GPU, so my experience mostly
relates to AMD.

I'm certainly interested in AMD's Linux "ROCm" push. I really think the
programming model there is relatively easy to understand, but there are major
flaws in the documentation and implementation.

For example, OpenCL 1.2 on ROCm 2.0 isn't stable enough to run Blender Cycles.
Yes, you can render the default cube, but very slowly. On a real scene,
Blender Cycles on OpenCL ROCm can take 500+ seconds to compile, and the actual
execution seems to hang (infinite loop and/or memory segfault, depending on
the scene) on anything close to a typical geometry.

Note that Blender's OpenCL code is explicitly written for AMD's older OpenCL
(AMDGPU-Pro OpenCL implementation). Blender has a separate CUDA branch for
NVidia cards. So OpenCL ROCm is at very least performance-incompatible with
OpenCL AMDGPU-Pro. The Blender OpenCL code probably has to be rewritten to
work (ie: not infinite loop), and maybe even become efficient on OpenCL ROCm
again.

\--------

AMD's hardware is fine (not as power-efficient as NVidia, but performance is
fine, in theory). But the drivers / software stack is clearly immature. Even
as ROCm has hit a 2.0 release, these sorts of issues still exist.

AMDGPU-PRO with OpenCL1.2 is workable, but feels old and cranky. (OpenCL 1.2
was specified in 2011, and is missing key features. Its atomics model is
incompatible with C/C++11, its missing SVM and kernel-side enqueue... etc.
etc.)

AMDGPU-PRO OpenCL2.0 is theoretically supported, but is still unstable in my
experience. ROCm OpenCL (both 1.2 and 2.0) is still under development, but
doesn't seem to be ready for prime-time yet. (At least, with Blender 2.79 or
2.80 Cycles is any indication).

AMD HCC seems usable, but there aren't many programs using it. AMD HIP is an
interesting idea but I haven't used it.

I know NVidia has driver issues / software issues. But CUDA Code written 5
years ago will still have similar performance / implementation if run on
today's cards, on today's software stack. I'm not sure if the same is true for
AMD's OpenCL code (between AMDGPU-PRO OpenCL1.2 and ROCm 1.2).

\----------

Long story short: the only mature AMD OpenCL compute platform seems to be
OpenCL 1.2 on AMDGPU-PRO. Fortunately, it also seems like AMDGPU-PRO will work
for the foreseeable future, but AMD really needs to clarify its platform to
attract developers. (Ex: prioritize testing of ROCm OpenCL to ensure
performance-compatibility with existing OpenCL 1.2 code written for AMDGPU-
PRO)

~~~
shmerl
Is OpenCL implementation open source in the PRO package?

~~~
dragontamer
No. AMDGPU-PRO is closed-source.

Which is partially why ROCm exists. Its the OpenSource implementation of AMD's
driver stack. AMD seems to have indicated that the Open Source ROCm drivers
are the way of the future. Its a sentiment I can certainly get behind (and AMD
has even pushed the ROCm driver stack to the Linux Kernel proper).

But ROCm isn't ready quite yet. So I think in practice, people will still be
relying upon the older AMDGPU-Pro drivers. At least for the next year.

------
meshenna
Methinks the title is a bit too strongly worded. At that price point Radeon
VII is unlikely to win many customers over. If some of the recent leaks[1]
turn out to be true, then we can talk about "edging closer to breaking
Nvidia's graphic dominance."

[1] [https://youtu.be/PCdsTBsH-rI?t=1247](https://youtu.be/PCdsTBsH-rI?t=1247)

Edit: added timestamp to video

------
twtw
I'm not sure I follow the reasoning in the article. It says if ray tracing
flops, AMD will take the lead, but then goes on to say that AMD believes ray
tracing will be important and is working on it.

~~~
devonkim
It’s not too dissimilar from how AVX instructions are so poorly implemented on
AMD CPUs - they may support it but it may not be core to their strategy (no
pun intended).

~~~
CoolGuySteve
At some point I'm hoping to see AVX completely emulated in microcode and
replaced with an embedded GPU core that can write to an L3 or L4 cache.

This slow scaling of 128->256->512 bits in the instruction set is more or less
a solved problem in the GPU space with shader compilers and AVX would mostly
be redundant with GPUs if it weren't for the memory bandwidth constraint.

ie: When it comes to vector processing, go big or go home.

AVX/SSE are a compromise from back when CPU die space and bandwidth was more
precious. Now that we have 8-32 cores on a die with a good bus between them,
it seems like duplicating those AVX units 8-32 times is less optimal.

~~~
m0zg
What you think of as "vector" processing is currently being used by compilers
to speed up things you didn't think were vectorizable. This is possible only
because these instructions are pretty cheap latency-wise. By introducing huge
latency, you'd be ruining performance of autovectorization, which accounts for
a lot of the performance gains in the past decade.

~~~
CoolGuySteve
After years of staring at HFT disassembly, I'm thoroughly disappointed by
autovectorization.

Here's the simplest godbolt example I could think of to illustrate, summing a
string of fixed length and non-fixed length:
[https://godbolt.org/z/O_M5fU](https://godbolt.org/z/O_M5fU)

You can see that the most recent GCC fails to use the AVX512 zmm registers
even after being configured to do so (afaik) and also fails to use more than 4
registers. Clang does better, using zmm* and all the registers.

But in both cases, the amount of code generated is quite large. If you compile
with -Os instead of -O3, no vector instructions are used for some reason.

So when you load this code, no matter what, you're loading a bunch of
instruction cachelines, which will destroy most of your latency gain unless
the input is very large. And even if your input is large, you'll fault that
data anyways.

So what's the point of doing this on the CPU again?

~~~
dragontamer
I don't believe Knights Landing supports an 8-bit ZMM add.

Changing the code to 32-bit ints results in the "vpaddd zmm0, zmm0, ZMMWORD
PTR [rdx]" that you'd expect from the auto-vectorizer.

> You can see that the most recent GCC fails to use the AVX512 zmm registers
> even after being configured to do so (afaik) and also fails to use more than
> 4 registers

In the 32-bit vpaddd code... "vpaddd zmm0, zmm0, ZMMWORD PTR [rdx]" becomes a
new zmm register allocation in the register-renamer (due to a cut dependency).
I doubt any code would be any faster.

In any case, its not about "the number of registers used", proper analysis is
about the depth of dependencies. I'd only be worried about small-register
usage if the dependency chain is large (which doesn't seem to be the case).

EDIT: I had some bad analysis. I've erased the bad paragraph.

So it seems pretty good in my eyes.

~~~
CoolGuySteve
The clang code loads the 8-bit words into 16-bit vector instructions and
parallelizes using a tree-algorithm to minimize dependencies.

So already it's better than "pretty good" and it's still verbose.

But neither of these compilers are able to optimize for code size while using
AVX, so the ((code size)/64byte cacheline) * (~100ns loads) will still kill
your performance on any data set that's smaller than a few kilobytes.

~~~
dragontamer
> The clang code loads the 8-bit words into 16-bit vector instructions and
> parallelizes using a tree-algorithm to minimize dependencies.

You specified the "int sum", so that means "sum" needs to follow 32-bit
overflow semantics. I don't think it is possible to do 8-bit

> But neither of these compilers are able to optimize for code size while
> using AVX

You can totally do that. You're specifying -O3, which is "speed over size". If
you want size, you want -Os, and then use -ftree-vectorize to enable the auto-
vectorization.

------
m0zg
I (selfishly) hope they succeed. NVIDIA is getting ridiculous with their
constrained supply of $1200+ high end GPUs.

~~~
verelo
as someone that bought AMD stock with the belief that NVIDIA was over priced
and it was about time it swung back the other way...me too!

------
kordlessagain
Related, AMD's stock has been heavily manipulated for years:
[https://seekingalpha.com/article/4153502-institutional-
short...](https://seekingalpha.com/article/4153502-institutional-short-sale-
amd-abusive)

~~~
dpflan
This is very interesting. Are you aware of any more research into this?

------
Octoth0rpe
It's certainly a very small part of the overall GPU-buying community, but
nearly the entire hackintosh community has more or less abandoned / foresworn
NVIDIA due to lack of mojave drivers. Even on older macos releases, the nvidia
drivers were awful.

------
xvilka
Good, NVIDIA should bankrupt as soon as possible, because of their opensource
stance.

