Slide #33 is actually reasonable advice. Variable shift = 11 cycle latency on PS3/Xbox360, and it blocks both threads and disables interrupts as it runs. (Will the compiler figure this out? Maybe, maybe not. But if you write the code you want - which in this case you might as well, since the transformation is simple - then you won't have to worry about that. As a general point of principle you should write the code you want, rather than something else that you hope will become the code you have in mind; computers are very bad at figuring out intent, but excellent at simply following instructions.)
What are the other bogus suggestions? The overall thrust of the slides seems to me valid: know what's expensive (branches, memory access, mini-pitfalls like the micrododed thing), pick a strategy that avoids all of that, and don't throw away performance. Performance always ends up being an issue; you don't have to carefully preserve every last cycle, but that's no excuse for just pissing away cycles doing pointless stuff.
Not explicitly stated, but apparent from comparing the suggested approach to the original one, is that you can't always just leave this stuff to the last minute, when you've had the profiler tell you what's a bottleneck and what isn't. The requisite changes might have far-reaching consequences, and so it's worth giving a bit of thought to performance matters when you start and things are a bit more in flux.
A bit of domain knowledge also won't hurt. I bet if this function were ten times worse, but called "PostDeserializeFromDVD", it wouldn't attract very much attention.
> but that's no excuse for just pissing away cycles doing pointless stuff.
Yes there is - maintainable code, programmer time and effort. What are you worried about, a large power bill due to your cpu doing what is supposed to be doing?
On an unrelated note this kind of attitude is the first thing I test for during a programmer interview, and is my strongest case for eliminating potential candidates. I made the mistake once of letting one through - his first task was a relatively straightforward modification to a large C program. A week later I was a bit worried he hadn't reported back done, so I went to check up on him and it turns out he was busy changing every line of code to adjust formatting and variable names, not to mention these kinds of pointless micro-optimizations. And he hadn't even checked in once, he was saving the checkin itself for another multiple day effort. Sigh. I tried using the "premature optimization is the root of all evil" line on him to get my point across (and see if he had heard of it), and it was when I saw his eyes flare up in anger I knew he had to go. Sad really because he was otherwise quite bright.
Now I basically put C++/game programmer applications in a "special pile" to be considered as a last resort. I just dont need this kind of arrogance and cowbow mentality wrecking the place. Its like sending a bull into a china shop.
If performance is a requirement, it's a requirement, and you need to bear it in mind. And working in games, it usually is. Virtually every project has problems with performance, and dealing with the issues properly at the end of a project can be very hard. By that point, the code is usually insufficiently malleable to be safely transformed in the necessary fashion, and there's a very good chance that you'll introduce new bugs anyway (or cause problems by fixing existing ones).
So, armed with a few simple rules of thumb about what is particularly expensive (let's say: memory accesses, branching, indirect jumps, square root/trig/pow/etc., integer division), and a bit of experience about which parts tend to cause problems that can be rather intrusive to fix (and object culling is one of these), one might reasonably put in a bit of forethought and try to structure things in a way that means they're relatively efficient from the start. Better that than just producing something that's likely to be a bottleneck, but written in a way that means it is never going to be efficient, whatever you do to it, without a ground-up rewrite. And that's the sort of approach the slide deck appears to be advocating.
Seems uncontroversial, to my mind. Some (and I'd agree) might even just call this planning ahead. Seems that when you plan ahead by drawing up diagrams of classes and objects, because you've been burned by having people just diving in and coming up with a big ball of spaghetti, that's good planning ahead. But when you plan ahead by trying to ensure that the code executes in an efficient manner, because you've been burned by having people come up with slothful code that wastes half its execution time and requires redoing because of that, that's premature optimisation, and a massive waste of time.
As with any time you make plans for the future, sometimes you get it wrong. Ars longa vita brevis, and all that.
> Variable shift = 11 cycle latency on PS3/Xbox360
That's not the issue here. It doesn't matter if the variable shift has to get its results by carrier pigeon from a monk in Tennessee, because the variable shift is eliminated by the compiler. When the programmer is manually doing work that has already been automated, your process is inefficient.
I tried some super-simple test code and ran it through the PS3 compilers... but was unable to get either to turn the variable-width shift into a fixed-width one. With the right flag included, gcc was even good enough to warn me about the microcoded instruction.
I also tried gcc (OS X, x64) and llvm (OS X, x64/ARM) and they didn't do the transformation either. (I'm not sure I would expect this for ARM anyway, but for x64 the variable shift code looked a bit more involved than I was expecting. Perhaps a transformation into a fixed-width shift would be beneficial in that case as well.)
Compile options were "-O3" for SNC (this seems to be about your lot as far as options go) and "-O6 -fstrength-reduce" for gcc (obviously I could spend all day fiddling with all the possible options, but I won't, which I admit is a flaw in my experiment - but I believe snc is supposed to produce much better code anyway). And in both cases, the code for `f' includes a variable shift, and the code for `f2' didn't.
Still, I would stand by my maxim even if the data in this case were against me. It's the winning strategy. Why rely on the compiler's ability to second-guess your intentions, when you could just tell it what you want, straight up?
While, I'm normally of the school of thought to let the compiler do the optimization. Modern compilers often miss what would seem like rather trivial optimizations, And often due to assumptions that the language spec won't allow that the programmer otherwise can make.
I'm not sure there's a point in testing on x64 because I'm not sure that changing to a fixed shift is actually an optimization. I think that the variable shift was slow back in the Pentium days but it's really a thing of the past.
I'm honestly surprised that the variable shift didn't get fixed. This strength reduction will happen if you use a multiply instead of a shift: the multiply will get reduced to an addition. I had assumed that the same would hold for variable and fixed width shifts.
I would file this as a bug against the compilers in question. I don't think it would take very long to fix.
Keep in mind that for games you are often targeting three different consoles plus PC. You may also port to other platforms eventually, and won't be able to easily track down all these lines of code at a later date to guarantee they optimize down correctly on your new target.