Hacker News new | comments | show | ask | jobs | submit login
(x / 2) or (x >> 1)? (stackoverflow.com)
67 points by pykello on June 8, 2012 | hide | past | web | favorite | 41 comments



In general, you're not smarter than your compiler[1].

Even if you're smarter than your compiler it's usually not worth the readability trade-off.

These sorts of optimisations only ever make sense if you're doing something incredibly intensive and you've tested it on real world data. This is rarely the case and the code suffers.

[1]: "Source Code Optimization" => http://www.linux-kongress.org/2009/slides/compiler_survey_fe...


Thanks for this. It's not the 90s anymore, that's for sure.

I just tried my age-old adage of doing bit shifts and adds instead of multiplications. Turns out that's slower on an i7!

The cool thing is that llvm-g++ actually recognizes my bad hand-optimization on -O3 and fixes it.

Code at https://gist.github.com/2896137


>These sorts of optimisations only ever make sense if you're doing something incredibly intensive and you've tested it on real world data. This is rarely the case and the code suffers.

Or are working on something incredibly small with an incredibly stupid compiler.


Perhaps thinking about it as an optimisation and not a standard operation is not the best way to look at it.

For example 99% of the time I want a binary shift and not a /2.In a dynamically typed language 201/2 will give you 100.5 when you'd prefer to keep it integer.


"In a dynamically typed language 201/2 will give you 100.5"

Not necessarily. Python gives 100, PHP gives 100.5, Perl gives 100.5, Ruby gives 100.

Edit: Versions are Python 2.7.3, PHP 5.3.10, Perl 5.14.2 and Ruby 1.8.7, all as shipped with Ubuntu Precise.


Python 3 gives 100.5. Python 2 gives 100 (or 100.5 if you use "from __future__ import division").


But using >> for division is still a bad idea. If you want integer division, use //.


Um, no.

Lisp will give you 201/2.


I agree, except you must be intelligent enough to use the correct algorithm for the job.


Not long ago, I was doing something that needed to be fast and I tried the old the "x>>1" trick and timed it compared to x/2. Much to my surprise "x/2" was faster (which is why you should always time things instead of assuming).

The takeaway is not that "x/2 is faster than x>>1", instead it's that you shouldn't assume some code it faster - always time it.


Nobody would make this argument if, during testing, it transpired that reading files over a 100Mbit network connection was faster than reading files from the local SSD. Sure, one should always measure rather than pontificate. And yes, in this hypothetical case, one has measured, and it truly turns out it's quicker... to read over the network. But one should always consider: do the results make any sense whatsoever at all, or do they suggest that further investigation might be warranted?

And similarly, it would be an unusual CPU indeed - so unusual in fact that you'd probably already know that you were working on it, and would mention it as part of this crazy story of the time that divisions were faster than shifts - that managed to mess up single-bit shifts so badly that they were slower than an integer division.

Perhaps in a higher-level language, one might find that x>>1 were faster than x/2, for some reason unrelated to the relative speeds of the two operations. But if one is working in C or C++ (and I assume this is the case, if one is bothering with stuff like replacing divides with shifts), and yet the shift is still the slower, then this is most unexpected, and probably evidence that something has gone wrong.

(The division algorithm is inherently more complicated than a shift, and so divisions are always going to take more time than a shift. You can expect a shift to take 1 cycle on a modern CPU. A division would take more like 20. It will probably be the slowest integer instruction possible, and it might not be accessible from every pipeline. Sample division latencies for vaguely modern CPUs: 20+ cycles on x86, ~35 cycles on MIPS, ~20 cycles on POWER, ARM doesn't seem to have it. Shifts, on the other hand - always 1 cycle. ARM even gives them away for free.)


I just noticed your response. The machine I timed it on was an older Lenova laptop with some sort of dual core Intel processor - indeed not an "unusual" CPU.

In the end, it's results are what matter even if you (or I) don't understand why. For all I know, the compiler parallelized some of the code with divisions and didn't for the shift using code. The thing is, you can't just look at what "should" be fastest and my experience was a good reminder of that.

In the old days, one optimization technique we used was to "unroll loops" i.e. instead of:

for (int i=0; i<8; i++) a[i] += b[i];

We'd do this:

// Unrolled loop a[0] += b[0]; a[1] += b[1]; a[2] += b[2]; ... a[7] += b[7];

The unroll loop has less clock cycle according to the book. Great - unless the larger "faster" code no longer fits in the CPU cache, now it's actually slower. The only way to tell is to measure it.

Here is the code. Substitute ">>1" for the "/2" portions and compare.

// pBuffer contains a single horizontal line in a bitmap // destBuffer will contain the first derivative convolution i.e. if you graph it, it would look like peaks at the edges of things in the image

void Convolution1D_FastFirstDerivative(int* pBuffer, int sizeOfBuffer, int destBuffer) { int lastPos = sizeOfBuffer - 1; int isum = 0;

// Special case start of buffer where filter "hangs off edge" isum = -pBuffer[0] / 2; // 0.5 //sum += pBuffer[0] * 0; // * 0 isum += pBuffer[1] / 2; destBuffer[0] = isum;

for (int i=1; i<sizeOfBuffer-1; i++) { isum = -pBuffer[i-1] / 2; // * -0.5 //sum += pBuffer[lastPos] * 0; // * 0 isum += pBuffer[i+1] / 2; // * 0.5 destBuffer[i] = isum; }

// Special case end of buffer where filter "hangs off edge" isum = -pBuffer[lastPos-1] / 2; // * -0.5 //sum += pBuffer[lastPos] * 0; // * 0 isum += pBuffer[lastPos] / 2; // * 0.5 destBuffer[lastPos] = isum;

}

Sorry, I wish I could fix the formatting.


They don't have the same results if x is signed (-1 >> 1 gives you -1; -1 / 2 typically gives you 0). A compiler worth its salt will optimize x / 2 appropriately. For unsigned, it will emit the shift; for signed, it will generate (x + 1) >> 1.


(x+1)>>1 isn't equivalent to x/2. It gives the wrong result for positive numbers like (x=1).

Compilers actually generate (x+(x>>31))>>1 for signed division by 2. Which on x86 is three more instructions and requires an additional register.

Using x>>1 makes sense if you want floored division instead of truncating towards zero. Floored division is better if you want to reduce precision of the input or something like that and don't want to bias towards zero.


Most disappointing was the fact that the top comment focused on x=x/2 vs x/=2, the latter of which I believe is more confusing than not. In any event, none of the three options ought to make a difference when you have the luxury of a compiler, which can choose the "best" way to do it.

When you're programming a microcontroller YOU are often the compiler and therefore need to know the relative clock times for each command. I remember one chip where the division command could take 100 cycles while a shift was 2. Big difference there, and you'd definitely pick the shift.

One of the later comments there focused on actually decompiling the resulting code. I was surprised that the compiler did not produce the same output.


That's what I don't like about stackoverflow, this type of "viral" questions that get a lot of attention while the good ones often get lost in the noise.


I'm always surprised when I look at the top questions and it's full of stuff like "why does $#^$&%*# work in perl" or "what does c=c++-1 equal."


Not only that, its also completely irrelevant for 99% of developers. My bet is that most of the (currently) 167 up votes did not have this specific problem.


The Shift vs Division question was also already asked many times on Stackoverflow in the past.


It's a bad question, though. Either you know the answer, or the peephole optimizer will do a better job than you will, and therefore you should use what's most readable. I agree with GP that these kinds of questions are unfortunate noise at stackoverflow.


Many folks commented on the difference between

  x /=2;
and

  x = x / 2;
I object to the combined operator (x /= 2) because it is ill-conceived as a language feature. It only works for a half-dozen cases the language writer chose. I can't extend it to my operations. Why would I do that? Lets assume instead we define '@' as a symbol for 'LHS' (Left-Hand-Side). It now looks like this:

  x = @ / 2;
How does that add value? Imagine 'x' is '(int)(aRec[RecIndex(p->handle)]->itemCountRef)'

I know, could do that already with /= 2; But consider we wanted to do

  x = x / (x+1);
No help in C. But with '@' it becomes

  *(int*)(aRec[RecIndex(p->handle)]->itemCountRef) = @ / (@+1);
Much clearer than the alternative

  *(int*)(aRec[RecIndex(p->handle)]->itemCountRef) = *(int*)(aRec[RecIndex(p->handle)]->itemCountRef) / (*(int*)(aRec[RecIndex(p->handle)]->itemCountRef)+1);
Sure, you can add declarations of intermediate values in C now. But I'm not sure that adds clarity in the same way.


What would this mean?

    x = @ + (y = @+1)
Would this be legal?

    x = 1 + (@ = @ - 1)
etc...


Both could have reasonable definitions. Also a serious concern; even in C simple expresions remained undefined for a decade e.g. foo(i, i++, i--) was implementation-defined for years.

For that matter, in the 1st C implementation the combined-assignment operators were defined this way: i =+ 1; until they noticed it was ambiguious. C has a long history of ambiguity.

So maybe its good that C was no more ambitious that it was.


Why stop there? Add

  x = (@++? @ : @ /= 2);


> How does that add value? Imagine 'x' is '(int)(aRec[RecIndex(p->handle)]->itemCountRef)'

That's a gnarly code smell and a Law of Demeter[1] violation. I suspect you could find a better way to express that operation without requiring that a single line of your code know so much about so many different things (aRec, aRec members, the return value of RecIndex, p, aRec member -> itemCountRef).

1: http://c2.com/cgi/wiki?LawOfDemeter


I guess it still adds value (potentially? Maybe less so with IDEs and maybe not worth the language complexity tradeoff) if you use very long variable names, which do help somewhat with readability.


I think you're missing the point.


Am I? The language doesn't need to make affordances for bad code, bad code needs to be refactored into better code.


Oh for heaven's sake, use some other complex reference then.


Opening "Hacker's Delight", which you (as a C++ programmer) surely have in your collection next to "Code Complete 2", we find:

"Apparently, many people have made the mistake of assuming that a shift right signed of k positions divides a number by 2^k"

Surprisingly, the top few answers do not give the correction for negative numbers, which in "Hacker's Delight", is to add 2^k-1 to negative numbers before shifting.

Obviously, this ignores optimization and readability arguments, which although important, are probably less important than correctness.


I answered almost exactly the same question on Stack Overflow once, except the asker was comparing (x / 2) to (x * 0.5). Someone had previously advised him to use the latter, on the theory that multiplication is cheaper than division. I disassembled both to show that (x * 0.5) is absurdly less efficient: http://stackoverflow.com/questions/895574/what-are-some-good...


In my work, it's generally better to optimize for clarity rather than performance. I suspect this is true more and more as compilers grow in sophistication. In this vein, use:

x = x / 2

To DIVIDE, use the DIVISION operator.


It really depends on what you are doing. I do embedded work on a 40 MIPs PIC dsp. When i want to smooth A/D readings I collect 32 by adding them each time an interrupt is called, and when i reach a count of 32 I use: smooth_value =sum >>5

This is a case where shifting is worthwhile.


its 2012 dammit. x/2


I often find myself using 1 << Y when I mean 2^Y, but just because it is faster to write and generally accepted in all languages than to find pow in the current language I am writing in...

Edit: fixed


You're computing 2^Y there.

pow is usually a floating point operation, that's why it's slow. Modern CPU's may actually be fast at a pow floating point operation, but it's still expensive to move integers from general purpose registers to floating point registers.


You generally calculate 2^Y because you want to work with a Y-bit quantity somehow. Like generating a mask with the Yth bit set, calculating the size of a Y-bit address space, allocating storage for a power set of Y items, ... In these cases it's OK because bits are actually the thing that matters. With division, they usually aren't.


I prefer x/2, the operator precedence is much clearer.


If you're using floating point, x * 0.5 is faster


Given that the question indicates that bit-shifting x would result in a valid answer, I think we can rule out floating point.


I think this is the same these days, due to smarter compiler.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: