
Low-Level Thinking in High-Level Shading Languages (2013) [pdf] - pablode
http://www.humus.name/Articles/Persson_LowLevelThinking.pdf
======
dahart
> Nowadays renowned industry luminaries include shader snippets in their GDC
> presentations where trivial transforms would have resulted in a faster
> shader... When the best of the best aren't doing it right, then we have a
> problem as an industry.

This is an amazing presentation, and there are many valuable points on how to
optimize, but I don't love this "motivation", the presentation would be better
without it, IMO. "Here's how to make the fastest shaders when you need them"
would feel better to me than "you should _always_ optimize, even if that's not
your immediate goal".

The best are "doing it right", but a presentation isn't always the place to do
it right. A GDC presenter should be making concepts easier to understand, and
optimizing makes things harder to understand. Optimizations are often hardware
and compiler and platform and language specific.

On top of that, the most important thing to do when optimizing a shader is to
profile the thing. Memorizing every possible instruction and trying to out-
think the compiler and hardware will lead to surprises, guaranteed. I just hit
a case two days ago where I optimized an expression from 24 flops to 13, and
the shader slowed down by 15%. The reason was data dependencies that I
couldn't see, even in the assembly.

~~~
zdfjkhiuj
>The reason was data dependencies that I couldn't see, even in the assembly.

I don't understand why that would matter. Aren't GPUs in-order? I don't know
the low-level architecture of GPUs at all.

~~~
AstralStorm
They are parallel, a data dependency cannot be pipelined as easily.

So no, they are not in-order.

Radeons do not have speculative execution but that doors not make them in-
order.

~~~
fyi1183
Radeons compute units are in-order though, in the usual sense of the word, and
I'd love to hear it if there really was an out-of-order GPU. It'd be rather
surprising.

One thing that they do have to deal with data dependencies is that load (and
texture fetch etc.) instructions don't block. Instead, there's a separate
instruction for waiting on the result of a previous load.

------
xchip
Nice you got around this presentation, just a small note to say that pretty
much everyone in the industry knows about it :)

~~~
bradford
I write shaders as a hobbyist game developer (I'm not in the gaming/graphics
industry). I appreciated the submission. Shader code is often
counterintuitive, I tend to think like a mathematician and not an optimizer
when I read code examples, and I always wonder why the code is written in the
way it is. This paper makes it clearer!

One question: On slide 13, the author outlines some of the limitations on the
compiler. In the ~4 years since the paper was published, have compilers gotten
better at solving some of these problems? or are these instead fundamental
problems that will more or less always be present?

~~~
unsigner
Don't count on it, write it out (unless it's much uglier - in which case,
still write it out, but leave the original code in a comment.) New
architectures, new platforms, a mess of shader languages, and the practice of
customizing shaders for big titles leads to driver/compiler teams being
chronically understaffed and overworked.

