Hacker News new | comments | show | ask | jobs | submit login

> Variable shift = 11 cycle latency on PS3/Xbox360

That's not the issue here. It doesn't matter if the variable shift has to get its results by carrier pigeon from a monk in Tennessee, because the variable shift is eliminated by the compiler. When the programmer is manually doing work that has already been automated, your process is inefficient.

I tried some super-simple test code and ran it through the PS3 compilers... but was unable to get either to turn the variable-width shift into a fixed-width one. With the right flag included, gcc was even good enough to warn me about the microcoded instruction.

I also tried gcc (OS X, x64) and llvm (OS X, x64/ARM) and they didn't do the transformation either. (I'm not sure I would expect this for ARM anyway, but for x64 the variable shift code looked a bit more involved than I was expecting. Perhaps a transformation into a fixed-width shift would be beneficial in that case as well.)

Test code:

    float f(const float *f,unsigned mask){
        unsigned i;
        float t=0.f;
        return t;
    float f2(const float *f,unsigned mask){
        unsigned i,m;
        float t=0.f;
        return t;
Compile options were "-O3" for SNC (this seems to be about your lot as far as options go) and "-O6 -fstrength-reduce" for gcc (obviously I could spend all day fiddling with all the possible options, but I won't, which I admit is a flaw in my experiment - but I believe snc is supposed to produce much better code anyway). And in both cases, the code for `f' includes a variable shift, and the code for `f2' didn't.

Still, I would stand by my maxim even if the data in this case were against me. It's the winning strategy. Why rely on the compiler's ability to second-guess your intentions, when you could just tell it what you want, straight up?

Thanks for doing these tests, Super insightful.

While, I'm normally of the school of thought to let the compiler do the optimization. Modern compilers often miss what would seem like rather trivial optimizations, And often due to assumptions that the language spec won't allow that the programmer otherwise can make.

I'm not sure there's a point in testing on x64 because I'm not sure that changing to a fixed shift is actually an optimization. I think that the variable shift was slow back in the Pentium days but it's really a thing of the past.

I'm honestly surprised that the variable shift didn't get fixed. This strength reduction will happen if you use a multiply instead of a shift: the multiply will get reduced to an addition. I had assumed that the same would hold for variable and fixed width shifts.

I would file this as a bug against the compilers in question. I don't think it would take very long to fix.

Thanks for this.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact