Low Level Bit Hacks You Absolutely Must Know

old-gregg · on June 30, 2009

For those who enjoy these kind of problems I highly recommend "Hacker's delight".

http://www.amazon.com/Hackers-Delight-Henry-S-Warren/dp/0201...

pkrumins · on June 30, 2009

Seconded. That is an amazing book.

I also recommend this much more complete article on bit hacks:

http://graphics.stanford.edu/~seander/bithacks.html

moe · on June 30, 2009

How could they miss the classic. Swap two integers without a temporay var:

    a ^= b
    b ^= a
    a ^= b

pkrumins · on June 30, 2009

oh no! totally forgot about this classic. :)

malloc · on June 30, 2009

it doesn't deserve the time...

first only works if a is a different var than b

second it can be much slower than typical swap...

and finaly, it isn't clear what it does if your code is for others to read

scottdw2 · on June 30, 2009

You are write on the time issue (but it does work if a and b are the same)

The reason becomes very apparent if you write out the assembly necessary to compile the xor, vs what is needed to compile a classic swap.

The xor would look something like:

load $1, a load $2, b xor $3, $1, $2 xor $4, $3, $2 xor $5, $4, $3 store b, $4 store a, $5

where as the classic swap would look something like this:

load $1, [a] load $2, [b] store b, $1 store a, $2

which is much better.

If you take the pipeline into account, then the difference between the swap and the xor can be huge because their is high level of dependence between the instructions.

On a theoretical classic 5-stage pipeline, the swap approach ends up needing about 10 cycles, where as the xor needs about 18.

In a real processor the difference would probably be much worse.

ori_b · on July 1, 2009

If you've hit memory, you've already lost. Running instructions will basically be line noise compared to that. Similarly, function call overhead will probably dwarf actually doing the xors.

However, with a good register allocator and inlined functions, your point becomes even stronger. The compiler can simply remove all instructions associated with the naieve swapping version, and simply record that before the swap %eax holds b, and after it, %eax now holds a. (Even the most naieve of code generators will do this sort of thing if the values in the swap don't fall outside of a basic block). The xors are far harder to optimize.

patcdr · on June 30, 2009

It still works fine if a and b are the same value.

Here's the general proof:

a ^= b (a = a ^ b, b = b)

b ^= a (a = a ^ b, b = b ^ (a ^ b) = a)

a ^= b (a = b ^ (a ^ b) = b ,b = a)

Now, let's replace all references initial values with constant c:

a ^= b (a = c ^ c = 0, b = c)

b ^= a (a = 0, b = c ^ 0 = c)

a ^= b (a = c ^ 0 = c, b = c)

Notice that at the end, you still end up with a = b = c. There are plenty of reasons not to use this approach, but that ain't one of them.

jibiki · on June 30, 2009

He meant if they are the same variable, not the same value. (E.g., you pass pointers to your function.)

  int swap(int * a, int * b)
    {
    (*a)^=(*b);
    (*b)^=(*a);
    (*a)^=(*b);
    }
  int x = 15;
  int *y = &x;
  int *z = &x;

  swap(y,z);  //Now *y == 0

gjm11 · on June 30, 2009

The case in which it breaks is when a and b are the same variable, not when they have the same value.

yters · on June 30, 2009

Why would I want to swap the same var with itself?

ori_b · on July 1, 2009

Assume you had a sorting algorithm that swaps values, and in some cases would swap two of the same values. It's easier (and often faster, thanks to the high cost of branch misprediction) to simply assume the swap of something with itself is a no-op than to explicitly check the address of the value to make sure it isn't the same. Or you could have just forgotten to make the check, and allowed the user to pass in their own swap functions (say, to deep-copy structs in C).

sofal · on June 30, 2009

Your code should be robust enough to provide an intuitive default behavior for corner cases like that. That way the application code doesn't need to be cluttered with special conditions.

scott_s · on June 30, 2009

You wouldn't, but let's say you wrote a function with the signature

  fast_swap_values(unsigned long*, unsigned long*)

Users of the function could accidentally pass in the same addresses.

pkrumins · on June 30, 2009

that is true, hackers, that is true.

kragen · on June 30, 2009

But they cannot help their neighbors!

pkrumins · on June 30, 2009

that's not good, hackers, that's not good!

duskwuff · on July 1, 2009

Obligatory: http://www.jwz.org/hacks/rms-deathmetal.mp3

(JWZ's subtitle: "Why Cooperation With RMS Is Impossible".)

gamache · on June 30, 2009

Also see the more advanced Bit Twiddling Hacks page: http://graphics.stanford.edu/~seander/bithacks.html

pmorici · on June 30, 2009

This article is kind of lame. 1 - 6 should be basic skills from undergrad CS courses. Also the intro paragraph is a bait and switch,

"Instead of performing some operation (such as counting the 1 bits in an integer) by looping over individual bits, these programming nuggets do the same with one or two carefully chosen bitwise operations."

We never actually learn how to count set bits in an integer. (My initial thought was look-up table but is there a better way?)

Locke1689 · on July 1, 2009

Indeed. I go to Northwestern University, but I know for a fact that our first lab in Intro to Computer Systems is the same as the one at Carnegie Mellon. Basically, you're given a C file and you have to implement about 20 instructions using only specific bitwise operations and only a certain number of them. The fun part is that the number of instructions you used is posted on a website, along with the "professor's" result so you can compete against other people in the class to do it in the quickest possible fashion, but also against the professor's score.

Basically, if you can't figure these out for yourself, I'm not sure you're a very good programmer.

pkrumins · on June 30, 2009

There is a divide and conquer strategy, that will come in advanced bithacks article.

I agree 1-5 are basic. That's why I say at #6 "Now it finally gets more interesting!!! Bit hacks #1 - #5 were kind of boring to be honest." I included them for completeness.

Here is method for counting one bits:

    int one_bits(unsigned int x) {
       x = x - ((x >> 1) & 0x55555555); 
       x = (x & 0x33333333) + ((x >> 2) & 0x33333333); 
       x = (x + (x >> 4)) & 0x0F0F0F0F; 
       x = x + (x >> 8); 
       x = x + (x >> 16); 
       return x & 0x0000003F; 
    }

sgk284 · on July 1, 2009

The best solution I know of is:

  a, i = 11, 0
  while a > 0:
    a, i = a & a-1, i + 1

If there are k bits set out of n bits, it only does k iterations.

ralph · on July 3, 2009

That's http://graphics.stanford.edu/~seander/bithacks.html#CountBit...

Confusion · on June 30, 2009

I think it's unfortunate that the article does not provide real examples where these bit hacks are used. Why would I want to be able to unset the left-most 1 bit of a series of bits?

pkrumins · on June 30, 2009

Thanks, that's a good question and comment. I'll try to write another part of the article with applications to these hacks! There are plenty in embedded engineering, kernel programming, cryptographic algorithms, space efficient data structures, fast image processing algorithms, and in plenty other computing areas.

To answer your question, why you'd want to set/unset the right-most 1-bit, suppose you have a space efficient 8 bit data structure that represents 8 devices. Each bit represents if a device is on and off. And you want to turn off the right-most device. Then you could use that hack. Or for example, you want to linearly turn on devices from 1st to 8th, then you can just turn on the rightmost bit eight times, and turn off in the same manner. Or you could have some crazy device that puts data at some memory location and waits you to clear the rightmost bit before it puts new data at that location. Or you have a number system coded in your byte in such a way that rightmost low order bit is always the sign (or some other craziness).

tptacek · on June 30, 2009

When, given a bit-vector representation of 8 devices, would I want to do something with the "right-most" of those devices?

pkrumins · on June 30, 2009

for example, those 8 devices were 8 LEDs, some of them on, and you'd want to turn the rightmost led off. :)

edit: i'll look into more serious applications when i write that article on bit trick applications.

tptacek · on July 1, 2009

Ok, lights are a good example.

scott_s · on June 30, 2009

Two cases are when talking to devices which use bits in a register to control behavior, and when the size of your data structure must be as compact as possible: http://miller.cs.wm.edu/lxr3.linux/http/ident?v=2.6.11.12;i=...

tptacek · on June 30, 2009

He didn't even talk about how to do the left-most bit, which is too bad, because getting the left-most bit is more useful than getting the right-most bit.

pkrumins · on June 30, 2009

Unfortunately it is not possible to do that efficiently with just a few bit operations because of the following theorem:

Theorem. A function mapping words to words can be implemented with word-parallel add, subtract, and, or, and not instructions if and only if each bit of the result depends only on bits at and to the right of each input operand.

The proof and comments are in the Hacker's Delight book!

That's why it's not in the article.

tptacek · on July 1, 2009

I didn't take the article to be "with just a few bit manipulations", but rather "how to start working with bits". But yeah that's a sensible reason not to attempt it.

antipax · on June 30, 2009

I guess I should be the one to provide the usual warning that C and C++ make no guarantees about the in-memory representation of the built-in types, and that this type of code can be very dangerous.

It is, however, very cool when it works.

jules · on June 30, 2009

No. They might not guarantee the representation, but they do guarantee that the operations (e.g. >> & |) work.

scott_s · on June 30, 2009

There actually are guarantees, they're just a guarantee of ranges. (When bit-twiddling, you always want to use the unsigned version.)

char is guaranteed to be at least 8 bits.

short is guaranteed to be at least 16 bits.

int is guaranteed to be at least 16 bits.

long is guaranteed to be at least 32 bits.

long long is guaranteed to be at least 64 bits.

unsigned long is guaranteed to be the same size as a pointer.

Locke1689 · on July 1, 2009

Look up C99 and stdint.h. You will be pleasantly surprised.

antipax · on July 1, 2009

Consider me educated.

mahmud · on July 1, 2009

you want to use "unsigned" and "register".

mahmud · on July 1, 2009

you want to use "unsigned" and "register".

spaghetti · on July 1, 2009

my favorite is determine if an int is a power of 2: if(!(a & (a-1))){ // a is a power of 2 }

swolchok · on July 1, 2009

or a is negative.

aswanson · on June 30, 2009

Is that a MOS 6502 in the image of the article? I want that chip.

Keyframe · on July 1, 2009

you can also print binary in python like so:

  >>>print bin(12)
  0b1100
  >>>

pkrumins · on July 1, 2009

Only Python 2.6 I think, and it does not print negatives very nicely (it prints bin(-x) = -bin(x), instead of some number representation).

Keyframe · on July 1, 2009

yes, you are right python >2.6 only

tetha · on July 1, 2009

The best tool I know to do such things is i (debian package iprint):

tetha@dev-server:~$ i 12 12 0xC 014 0b1100 '\f'

on June 30, 2009

[dead]

scott_s · on June 30, 2009

It's probably trivial for the people who run HN to figure out your real account by comparing IP addresses. If they chose to do so.

You're never anonymous on the internet. Act accordingly.

bkudria · on June 30, 2009

AKA "how to write line noise".

ajross · on June 30, 2009

If you read the article instead of sneering at it, you might find it less noisy. It's an idiom. All idioms involve some learning, and generally some odd, seemingly-misapplied abstractions. But they generally have value, too, which is why they persist. In this case, the bit tricks work to map a very common metaphor in the design domain (a set of boolean flags) to a very common low-level implementation choice (the machine word).

Does your application benefit from that kind of mapping? Probably not, if it's a web 2.0 thing. But don't pretend that this sort of thing is never useful either. Somewhere down in the layers you don't understand, there is real machine code running your computer. And that code, like yours, needs compact and simple storage of booleans.

bkudria · on July 1, 2009

You're right - my comment was intentionally snarky, and deserved all the downvotes it got.

You're right - these are idioms. Idioms for working with current hardware. Essentially, these are machine implementation details, and while, yes, we currently have to program machines to get anything done, not everyone finds that kind of work rewarding or interesting.

I didn't mean to degrade the type of person that finds this interesting, I wanted to point out the divide between the hardware hacker and the, for a lack of a better term, math hacker.

I hope that clears up my intentions!

pkrumins · on June 30, 2009

Not quite!

These are embedded programmer tools of trade. For example, you don't want to loop over bits to count number of bits in it (not covered in this article, but will be in the next part of the article). It can be done with several shifts and ANDs! Huge speedups!

Another example, finding the lowest order 1-bit in a 64bit integer - it can take up to 64 iterations to find it with a loop. Instead it takes one AND and one but inversion operation (= 2 ops total)! 32x speedup!

swolchok · on July 1, 2009

The problem is that learning to think this way in general makes you habitually write things like x >> 8 instead of x / 256, which makes your code hard to read for no benefit if you're using a compiler that was written sometime in, oh, the past 20 years.

EDIT: sometimes x >> 8 is clearer, if you're actually dealing with packing bits into words, but I've just recently seen it done when the code really "meant" division.

bcl · on June 30, 2009

Absolutely! Some of the examples, depending on the core architecture, could benefit from a table lookup, but all are good tools to have in the box.

pkrumins · on June 30, 2009

Exactly! And you can get these questions even in job interviews. If you want to score the intervieews, then you better know the quick ways!

ramchip · on July 1, 2009

I like bit twiddling, but I'm not sure I'd take the job if a primary concern of the interviewer is whether I know the quick way to find the lowest-order bit in a 64-bit int...

kqr2 · on June 30, 2009

Also, even if you don't use them, it's nice to be able to read through code and understand these "idioms".

GeneralMaximus · on July 1, 2009

This stuff is actually pretty useful.

I was recently looking at some C code that takes some text and converts it to Base64 encoding for sending it over an HTTP POST request. There was one function in my code that was, IIRC, about 30 lines long and completely unreadable. Later on, I found some code on SourceForge that did the same thing in 3 lines of bit shift operations.

Here is the code in question: http://base64.sourceforge.net/b64.c

I guess I've just been spoiled by Python to not care :p

notaddicted · on June 30, 2009

I was expecting to see "None of them" as kind of a joke.

For a web app developer are these useful? For fun? yes. For profit? no. Title should be Low Level Bit Hacks You May Find Interesting.

CodeMage · on June 30, 2009

Maybe I'm missing something... Why the assumption that all (or even the majority of) HN readers are interested only in developing web apps?

kscaldef · on June 30, 2009

It suffices that just one such individual exists to invalidate the claim of the title.

yters · on June 30, 2009

Quick, someone bring more koolaid!

piramida · on June 30, 2009

for a non-php developer that article is kind of trivial entry level stuff so no value either.

paulgb · on June 30, 2009

I assume the title was targeted at Peteris' normal audience, not web app developers.

From the about: Peteris Krumins’ blog about programming, hacking, software reuse, software ideas, computer security, google and technology.

notaddicted · on June 30, 2009

The purpose of the Hacker News website is to serve past/present/future ycombinator candidates. I'm not even a web app developer, I wrote C code for an $8 processor today.

The title (as submitted by the author of the blog) says "Absolutely Must Know," which is quite hyperbolic.

I would suggest: Basic tricks of bit manipulation.

Consider that this topic is covered in K. N. King's book C Programming: A Modern Approach, in Chapter 20: Low-Level Programming, on page 451

noodle · on June 30, 2009

not all of us here are (purely/originally) web app developers.

jf · on June 30, 2009

Yes, even for web applications, bitwise manipulation will come in handy at some point.

Personal anecdote: I recently had to combine two arbitrary unsigned 32 bit integers into one unsigned 64 bit integer to make a unique ID for an index. Not sure how I would have done that without bitwise manipulation.

notaddicted · on June 30, 2009

lol.

in python: (2 ^ 32)*upper + lower

edit: dbl star goes blank .. so in almost python

tptacek · on June 30, 2009

You're XORing 32 into 2, multiplying it by the top word, and adding the bottom word?

Also, you think exponentiation and multiplication is faster than a shift and an OR? I think you're the target audience for this article, and I revise my opinion (thankfully unstated) about whether this is material every developer "must" know.

comatose_kid · on June 30, 2009

Surprisingly (I just benchmarked it myself), the times are near the same.

Here's my benchmark (in case I made a mistake):

  import time

  def combine(upper,lower): return (2**32)*upper + lower
  def combine2(upper,lower): return (upper << 32) | lower

  def test_combine():
  	start = time.clock()
	for i in range(10000000):
		combine(i,i)
	end = time.clock()
	output = 'The time taken for combine() is ' + str(end - start)
	print output

  def test_combine2():
	start = time.clock()
	for i in range(10000000):
		combine2(i,i)
	end = time.clock()
	output = 'The time taken for combine() is ' + str(end - start)
	print output

Also see http://wiki.python.org/moin/PythonSpeed/PerformanceTips#Pyth...

tptacek · on July 1, 2009

That's surprising; it's more than twice as slow in Ruby (yeah, yeah). It's also (for obvious reasons) orders of magnitude faster in C --- if you were crazy enough to actually write code that way.

ramchip · on July 1, 2009

I don't know much about Python so forgive me if this is stupid - but could 2^32 have been optimized as a constant, so that the result is just a multiplication and an addition?

notaddicted · on June 30, 2009

Okay okay ... I should have written:

import math; math.pow(2,32)

I tried to use the double star operator, but that is markup for italics in this software, so it was edited out.

jf · on June 30, 2009

I considered that solution. And in thinking about why I chose to do bit-shifting, I remembered why: I actually needed to combine two _signed_ 32 bit integers in MySQL, and since I wasn't sure how signed integers are stored in MySQL, I figured that I could just use bit-shifting and move on.

strlen · on July 1, 2009

Here's an example where a web developer may find bit=fiddling useful:

- How would you, compactly and discretely, represent in a cookie the days on which a user has visited your site this month?

(I am decided not a web developer, but this is a question I'd ask a web developer on a phone screen were I to hire one).

jackowayed · on June 30, 2009

same here! I was hoping it would just have the title and then be a blank page.