
Bit Twiddling Hacks - wglb
http://graphics.stanford.edu/~seander/bithacks.html
======
haberman
It is amazing what you can accomplish with bit twiddling.

I've been working on a Protocol Buffer decoder, and I was looking for ways to
optimize varint decoding. On a Google-internal list I asked how you'd
transform the 64-bit integer:

    
    
        0aaa aaaa 0bbb bbbb 0ccc cccc 0ddd dddd 0eee eeee 0fff ffff 0ggg gggg 0hhh hhhh
    

into:

    
    
        0000 0000 aaaa aaab bbbb bbcc cccc cddd dddd eeee eeef ffff ffgg gggg ghhh hhhh
    

Not two hours later I had several impressive transformations that I _never_
would have come up with on my own. The two most effective:

    
    
        b = ((b & 0x7f007f007f007f00) >> 1) | (b & 0x007f007f007f007f);
        b = ((b & 0xffff0000ffff0000) >> 2) | (b & 0x0000ffff0000ffff);
        b = ((b & 0xffffffff00000000) >> 4) | (b & 0x00000000ffffffff);
    

and:

    
    
        b +=       b & 0x007f007f007f007fULL;
        b +=  3 * (b & 0x0000ffff0000ffffULL);
        b += 15 * (b & 0x00000000ffffffffULL);
    

I don't know how these guys come up with this stuff.

~~~
gorset
"b += 15 * (b & 0x00000000ffffffffULL)" might look more magical than it
actually is. The crux is:

    
    
        b * 4
    

is the same as

    
    
        b << 2
    

thus

    
    
        b += 3 * b
    

is the same as

    
    
        b = b << 2
    

So plus, a multiply and a mask make it possible to shift only a subset of the
bits in the integer. The first transformation uses two masks and bitwise or to
achieve the same.

------
scorchin
Whenever a discussion on rearranging bits and bytes comes up, I can't help but
recommend Hacker's Delight by Henry S. Warren <http://amzn.com/0201914654>

~~~
anthonyb
Yep, I just got that from Amazon a few days ago. I'm a Python/Web programmer
in my day job, but I'm looking to get more into the low level stuff. I
recently got a Nerdkit too - lots of fun :)

------
reichstein
Bit-fiddling at the C-level isn't necessarily faster on all platforms. E.g.,
the "masked combine bits" case where he says that r = a ^ ((a ^ b) & mask) is
one operation shorter than r = (a ^ ~mask) | (b ^ mask). On ARM, the (a ^
~mask) can be performed by one instruction, so the instruction count is
actually the same, and the latter way of doing it parallelizes better (only a
dependency depth of two instead of three). I.e., the "optimized" version is
actually likely to be slower on some CPUs.

------
Thasc
I had no idea there was a patented way to compute the absolute value of an
integer. Insane.

------
adrianN
Nowadays things like these should only be interesting to compiler writers.

~~~
cultureulterior
There are algorithms that use bits just like there are algorithms that use
trees. You wouldn't expect your compiler to write your algorithms for you,
would you.

Just recently, I've used the bitboards representation for cellular automata
(which lets you compute 64 cells at the same time by doing about 100 non-
branching integer instructions). I have difficulty seeing a compiler that can
recognize that algorithmic optimization.

~~~
andreasvc
I took him to mean that compiler writers should provide these bit operations
in the most efficient manner on the current platform, e.g. __builtin_ctzl
instead of directly using assembly or using arithmetical solutions.

------
azar1
This page was a life saver in my fall CS class.

