Bit Twiddling Hacks 60 points by wglb on May 21, 2011 | hide | past | web | favorite | 14 comments

 It is amazing what you can accomplish with bit twiddling.I've been working on a Protocol Buffer decoder, and I was looking for ways to optimize varint decoding. On a Google-internal list I asked how you'd transform the 64-bit integer:`````` 0aaa aaaa 0bbb bbbb 0ccc cccc 0ddd dddd 0eee eeee 0fff ffff 0ggg gggg 0hhh hhhh `````` into:`````` 0000 0000 aaaa aaab bbbb bbcc cccc cddd dddd eeee eeef ffff ffgg gggg ghhh hhhh `````` Not two hours later I had several impressive transformations that I never would have come up with on my own. The two most effective:`````` b = ((b & 0x7f007f007f007f00) >> 1) | (b & 0x007f007f007f007f); b = ((b & 0xffff0000ffff0000) >> 2) | (b & 0x0000ffff0000ffff); b = ((b & 0xffffffff00000000) >> 4) | (b & 0x00000000ffffffff); `````` and:`````` b += b & 0x007f007f007f007fULL; b += 3 * (b & 0x0000ffff0000ffffULL); b += 15 * (b & 0x00000000ffffffffULL); `````` I don't know how these guys come up with this stuff.
 "b += 15 * (b & 0x00000000ffffffffULL)" might look more magical than it actually is. The crux is:`````` b * 4 `````` is the same as`````` b << 2 `````` thus`````` b += 3 * b `````` is the same as`````` b = b << 2 `````` So plus, a multiply and a mask make it possible to shift only a subset of the bits in the integer. The first transformation uses two masks and bitwise or to achieve the same.
 Whenever a discussion on rearranging bits and bytes comes up, I can't help but recommend Hacker's Delight by Henry S. Warren http://amzn.com/0201914654
 Yep, I just got that from Amazon a few days ago. I'm a Python/Web programmer in my day job, but I'm looking to get more into the low level stuff. I recently got a Nerdkit too - lots of fun :)
 Bit-fiddling at the C-level isn't necessarily faster on all platforms. E.g., the "masked combine bits" case where he says that r = a ^ ((a ^ b) & mask) is one operation shorter than r = (a ^ ~mask) | (b ^ mask). On ARM, the (a ^ ~mask) can be performed by one instruction, so the instruction count is actually the same, and the latter way of doing it parallelizes better (only a dependency depth of two instead of three). I.e., the "optimized" version is actually likely to be slower on some CPUs.
 I had no idea there was a patented way to compute the absolute value of an integer. Insane.
 Nowadays things like these should only be interesting to compiler writers.
 Believe it or not, there are still programs that require blazingly fast bit operations. I used like ten different bit hacks for a game engine I wrote which uses bitboards[1].
 Bitwise operations pop up fairly often in advanced data structures (Hash Array Mapped Tries, Bloom Filters, etc). The big advantage in my view is being able to perform complex, data-intensive tasks in otherwise limited environments (e.g., mobile devices).
 There are algorithms that use bits just like there are algorithms that use trees. You wouldn't expect your compiler to write your algorithms for you, would you.Just recently, I've used the bitboards representation for cellular automata (which lets you compute 64 cells at the same time by doing about 100 non-branching integer instructions). I have difficulty seeing a compiler that can recognize that algorithmic optimization.
 I took him to mean that compiler writers should provide these bit operations in the most efficient manner on the current platform, e.g. __builtin_ctzl instead of directly using assembly or using arithmetical solutions.
 It's no surprise this is hosted on graphics.stanford.edu.
 Might still be useful for some embedded applications.