This is an amazing presentation, and there are many valuable points on how to optimize, but I don't love this "motivation", the presentation would be better without it, IMO. "Here's how to make the fastest shaders when you need them" would feel better to me than "you should always optimize, even if that's not your immediate goal".
The best are "doing it right", but a presentation isn't always the place to do it right. A GDC presenter should be making concepts easier to understand, and optimizing makes things harder to understand. Optimizations are often hardware and compiler and platform and language specific.
On top of that, the most important thing to do when optimizing a shader is to profile the thing. Memorizing every possible instruction and trying to out-think the compiler and hardware will lead to surprises, guaranteed. I just hit a case two days ago where I optimized an expression from 24 flops to 13, and the shader slowed down by 15%. The reason was data dependencies that I couldn't see, even in the assembly.
I don't understand why that would matter. Aren't GPUs in-order? I don't know the low-level architecture of GPUs at all.
If you want to understand low-level GPU architecture, https://fgiesen.wordpress.com/2011/07/09/a-trip-through-the-... is a great intro.
So no, they are not in-order.
Radeons do not have speculative execution but that doors not make them in-order.
One thing that they do have to deal with data dependencies is that load (and texture fetch etc.) instructions don't block. Instead, there's a separate instruction for waiting on the result of a previous load.
GPUs are (generally) in-order within each thread, but they are pipelined. The pipeline is filled with instructions that are ready to execute from across many threads. If all threads have an unmet dependency (previous instruction or memory access), the pipeline will stall.