
GPU folks: we need to talk about control flow - jsnell
https://medium.com/@afd_icl/gpu-folks-we-need-to-talk-about-control-flow-c20fd225197e#.t8zrzrc46
======
exDM69
Buggy shader compilers are no surprise to me but it's a serious issue that
needs to be addressed by _all_ GPU vendors. Vulkan and SPIR-V may help a
little as the GLSL compiler frontend is moved out of the "driver" component
and the resulting SPIR-V blob can be more easily manipulated.

There are some major issues at play:

1) Shipping an OpenGL-based product on multiple vendors' devices (especially
on mobile) requires a huge amount of QA effort and per-device bug workarounds.
Pretty much everyone makes their games on top of Unity/Unreal/etc because the
engine vendors do most of the QA work.

2) Issue #1 is made worse by the fact that many consumer devices do not get
GPU driver updates because the phone/tablet OEM end-of-lifes their products
too soon and/or the mobile operators and other middlemen don't ship updates as
they should. This means that if you ship a GPU-using product, you need to
support devices with old, buggy drivers and this is expensive.

3) There's a huge untapped potential in GPUs, there are very few non-graphics
apps taking advantage of the processing power but I can't see it changing for
the better before GPUs become easier to target and verify correctness of
operation.

It would be too easy to blame GPU vendors' software engineering practices, but
I (as a GPU driver programmer) see that the bigger problem is issue #2 (not
that the GPU drivers are faultless). Even if the drivers _do_ get fixed and
updated, getting the updates to the hands of the customers is still going to
be an issue. The middlemen need to be cut out of the equation, we can't be
dependent on the business requirements of OEMs and operators when shipping
mission critical software infrastructure.

This is a bit of a chicken and egg problem, games are not important enough to
OEMs and operators fixing their update delivery mechanisms, but no-one dares
to use GPUs for anything more important before this issue gets sorted out.

~~~
DannyBee
You missed one of the most major issues:

Most of the vendors know they have serious issues here, but hide them.
Literally. You never notice because they don't use open source compilers (even
for things like CUDA, let alone shader compilers), so it's not obviously more
than "just a bug" until it happens to you continuously. Instead, they pretty
much never have to fix the bug until someone notices, and then they hack it
some more and move on, instead of fixing underlying issues in their
structurization/etc passes.

Most vendors i've talked to can't even tell me what control flow breaks their
compiler (again, doesn't matter if we are talking shaders, cuda, you name it.
it's all broke), they know plenty does, but are fairly ¯\\_(ツ)_/¯ about doing
more than working around whatever bug they get given.

Meanwhile, over in open source clang/llvm world, we can basically fuzz test
CFG's, etc for CUDA.

The death of some of these compilers can't come fast enough.

~~~
mrb
_" You never notice because they don't use open source compilers (even for
things like CUDA, let alone shader compilers)"_

Actually AMD recently completely open sourced their entire GPGPU compute
stack. See
[https://github.com/RadeonOpenCompute](https://github.com/RadeonOpenCompute)

Compiler, assembler (LLVM based), linker, driver, etc. Everything is open to
my knowledge.

~~~
DannyBee
Actually, we wrote the CUDA support in clang :)

and yes, AMD has done this because they have nothing to lose anymore :)

------
bhouston
Found a bug on Nexus 5 that was never fixed - struct constructors in WebGL
glsl were completely broken:

[https://github.com/mrdoob/three.js/pull/7556](https://github.com/mrdoob/three.js/pull/7556)

Also some other code that seemed completely valid failed on Nexus 5 devices -
so we had to revert it:

[https://github.com/mrdoob/three.js/pull/9948](https://github.com/mrdoob/three.js/pull/9948)

Basically mobile GPUs are a mixed bag of bugs all over the place. Three.JS is
an attempt to find common path through those bugs in order to achieve
reproducible results.

If you want to try your hand at this, there is still this outstanding issue
that we haven't yet tracked down on the Nexus 5 devices:

[https://github.com/mrdoob/three.js/issues/9515](https://github.com/mrdoob/three.js/issues/9515)

------
mattst88
I see "Next stop: Intel."

I sincerely hope that you test our Linux drivers (I work for Intel on them).
We'd be excited to learn about anything you find. Please feel free to file
bugs here
([https://bugs.freedesktop.org/enter_bug.cgi?product=Mesa&comp...](https://bugs.freedesktop.org/enter_bug.cgi?product=Mesa&component=Drivers/DRI/i965))
and ping us on #intel-gfx on Freenode.

~~~
wolfgke
> I sincerely hope that you test our Linux drivers

I sincerely hope that they test out _both_ Linux _and_ Windows drivers. It is
well-known that these drivers at least in the past were developed by two
completely different teams, so the results might be quite different and thus
interesting.

~~~
bhouston
There was a ThreeJS bug that differed based on whether it was a Linux or a
Windows Intel driver -- specifically this one:

[https://github.com/mrdoob/three.js/issues/10331](https://github.com/mrdoob/three.js/issues/10331)

Linux Intel didn't reproduce, but Windows Intel did. Windows Intel acted like
Adreno GPUs but Linux Intel behaved like NVIDIA GPUs. Fun times.

------
Sarkie
This is from 2013 but still a fun read.

[https://dolphin-emu.org/blog/2013/09/26/dolphin-emulator-
and...](https://dolphin-emu.org/blog/2013/09/26/dolphin-emulator-and-opengl-
drivers-hall-fameshame/)

~~~
jsheard
Also this, from the guy who ported Valves Source engine to OpenGL:
[http://richg42.blogspot.co.uk/2014/05/the-truth-on-opengl-
dr...](http://richg42.blogspot.co.uk/2014/05/the-truth-on-opengl-driver-
quality.html)

The vendors are Nvidia, AMD, Intel/Linux and Intel/Windows respectively. Yes,
Intel maintains two completely separate OpenGL implementations.

------
_pdp_
Depending on the rendered garbage this could have security implications!

It is pretty easy to capture the image from a canvas. The rendered garbage
could reveal buffered adjacent/random memory (requires research). The memory
can be extracted from the image with the right interpretation.

------
greggman
You can be proactive in getting these kinds of issues fixed. Make a small repo
and submit it to the webgl conformance tests.

[https://github.com/KhronosGroup/WebGL](https://github.com/KhronosGroup/WebGL)

There are already something like 2500 tests

------
raverbashing
Mobile GPU vendors seems to be firmly in the "make some games and common apps
work and to heck with the specs and all the rest"

AMD and nVidia seem to have this better worked out

~~~
willvarfar
This is a series of articles (previous entries have been on HN in recent
weeks) that finds bugs in other vendors too.

AMD, for example: [https://medium.com/@afd_icl/first-stop-amd-bluescreen-via-
we...](https://medium.com/@afd_icl/first-stop-amd-bluescreen-via-webgl-and-
more-ba3eaf76c5fb#.z7cgofrih)

They are doing the suppliers alphabetically, so they'll get to NVIDIA in good
time :)

What mades you think that driver stability was a mobile problem? :D

~~~
raverbashing
> What mades you think that driver stability was a mobile problem? :D

Not an exclusive problem, but it seems to be worse on those platforms :)

------
vvanders
This shouldn't be a surprise to anyone who's worked on mobile GPUs, no one
uses conditionals in shader code since the performance hit is gnarly(and used
to be even worse before branching support). It's a fairly uncommon code path.

As far as these go they aren't that bad, I've seen a highly regarded vendor's
shader compiler die and bring down the whole Android stack which resulted in
insta-reboot.

If it isn't in the rendering path for Android(HWUI) or Chrome then it's best
to tread carefully on mobile GPUs.

------
rosstex
Could someone briefly explain why seemingly innocuous changes to the code can
have such large effects on the resulting images?

------
jlebar
Note that GPGPU compilers have the same sorts of problems; see e.g.
[https://llvm.org/bugs/show_bug.cgi?id=27738](https://llvm.org/bugs/show_bug.cgi?id=27738).

------
ingenter
Would it be better if GPU drivers could compile open-spec bytecode and upload
the result to the GPU to do all of the computation? This way OpenGL may be
used as a library, shipped with the application.

~~~
wolfgke
Vulkan does - you can upload shaders in SPIR-V (open-spec bytecode) to the
GPU.

~~~
Athas
Sadly, the translation from SPIR-V to system-dependent machine code can still
be buggy. Although hopefully most of the optimisation will take place at the
SPIR-V level, which, as I understand it, is pretty similar to LLVM. That
should enable reuse of thoroughly debugged code, instead of each vendor
maintaining their own full compiler.

~~~
wolfgke
> Sadly, the translation from SPIR-V to system-dependent machine code can
> still be buggy.

This is true, but it eleminates at least one one the points where things can
go wrong. Additionally this approach has the advantage that developers (with
some practice) can read the SPIR-V "assembly" code to make sure it is correct.
With existing solutions it was already hard to get and interprete the
intermediate code to find out whether the problem is in the frontend or
backend.

------
willvarfar
Excellent article, but not the article I was expecting from the title ;)

From the title, I thought this would be a post about warps/wavefront
divergence :)

~~~
bhouston
Me too. I was hoping for better coordination of warps.

------
jwatte
To be fair, the reason gpus can be fast on gpu workloads is that the hardware
makes totally different assumptions about typical control flow across
processed pixel tiles. You can't run Call of Duty or Tensorflow at the speeds
we see today on x86 style control flow. (Knights Landing proved that!)

Bugs are annoying, but so are mismatched programmer expectations.

~~~
14113
There's a difference between speed and correctness. If you promise something
in the spec (e.g. unreachable control flow not mattering), in practice, that
promise should be upheld. If it's upheld with a 32x slowdown (e.g. warp
divergence), then so be it, but the code should still run correctly according
to the spec.

------
nix0n
GPUs do not actually have control flow.

They are SIMD vector processors (Same Instructions Multiple Data).

CUDA and OpenCL make this a bit more explicit.

~~~
wolfgke
> GPUs do not actually have control flow.

Wrong. Let's look at the instriction set of the AMD GCN3 ISA:

> [http://gpuopen.com/compute-product/amd-gcn3-isa-
> architecture...](http://gpuopen.com/compute-product/amd-gcn3-isa-
> architecture-manual/)

which links to

> [http://32ipi028l5q82yhj72224m8j.wpengine.netdna-
> cdn.com/wp-c...](http://32ipi028l5q82yhj72224m8j.wpengine.netdna-cdn.com/wp-
> content/uploads/2016/08/AMD_GCN3_Instruction_Set_Architecture_rev1.1.pdf)

which even has a whole chapter about control flow: "Chapter 4 Program Control
Flow".

~~~
andars
> Wrong.

I'd say it's more like "not quite". ~10 years ago, the statement was pretty
much correct. GPUs were entirely SIMD, so they couldn't truly branch, but
could fake it with predication. Longer branches would involve executing both
branches on every thread and only committing the results of the active branch.

Modern GPUs can do much better, since individual warps/wavefronts can truly
diverge. Within warps, however, it's still a bit of a mess.

~~~
wolfgke
> ~10 years ago, the statement was pretty much correct.

If one looks at the capabilities of CPUs from 10 years ago, one can easily
justify similar completely outdated statements for CPUs.

