You are write on the time issue (but it does work if a and b are the same)
The reason becomes very apparent if you write out the assembly necessary to compile the xor, vs what is needed to compile a classic swap.
The xor would look something like:
load $1, a
load $2, b
xor $3, $1, $2
xor $4, $3, $2
xor $5, $4, $3
store b, $4
store a, $5
where as the classic swap would look something like this:
load $1, [a]
load $2, [b]
store b, $1
store a, $2
which is much better.
If you take the pipeline into account, then the difference between the swap and the xor can be huge because their is high level of dependence between the instructions.
On a theoretical classic 5-stage pipeline, the swap approach ends up needing about 10 cycles, where as the xor needs about 18.
In a real processor the difference would probably be much worse.
If you've hit memory, you've already lost. Running instructions will basically be line noise compared to that. Similarly, function call overhead will probably dwarf actually doing the xors.
However, with a good register allocator and inlined functions, your point becomes even stronger. The compiler can simply remove all instructions associated with the naieve swapping version, and simply record that before the swap %eax holds b, and after it, %eax now holds a. (Even the most naieve of code generators will do this sort of thing if the values in the swap don't fall outside of a basic block). The xors are far harder to optimize.
Assume you had a sorting algorithm that swaps values, and in some cases would swap two of the same values. It's easier (and often faster, thanks to the high cost of branch misprediction) to simply assume the swap of something with itself is a no-op than to explicitly check the address of the value to make sure it isn't the same. Or you could have just forgotten to make the check, and allowed the user to pass in their own swap functions (say, to deep-copy structs in C).
Your code should be robust enough to provide an intuitive default behavior for corner cases like that. That way the application code doesn't need to be cluttered with special conditions.
This article is kind of lame. 1 - 6 should be basic skills from undergrad CS courses. Also the intro paragraph is a bait and switch,
"Instead of performing some operation (such as counting the 1 bits in an integer) by looping over individual bits, these programming nuggets do the same with one or two carefully chosen bitwise operations."
We never actually learn how to count set bits in an integer. (My initial thought was look-up table but is there a better way?)
Indeed. I go to Northwestern University, but I know for a fact that our first lab in Intro to Computer Systems is the same as the one at Carnegie Mellon. Basically, you're given a C file and you have to implement about 20 instructions using only specific bitwise operations and only a certain number of them. The fun part is that the number of instructions you used is posted on a website, along with the "professor's" result so you can compete against other people in the class to do it in the quickest possible fashion, but also against the professor's score.
Basically, if you can't figure these out for yourself, I'm not sure you're a very good programmer.
There is a divide and conquer strategy, that will come in advanced bithacks article.
I agree 1-5 are basic. That's why I say at #6 "Now it finally gets more interesting!!! Bit hacks #1 - #5 were kind of boring to be honest." I included them for completeness.
Here is method for counting one bits:
int one_bits(unsigned int x) {
x = x - ((x >> 1) & 0x55555555);
x = (x & 0x33333333) + ((x >> 2) & 0x33333333);
x = (x + (x >> 4)) & 0x0F0F0F0F;
x = x + (x >> 8);
x = x + (x >> 16);
return x & 0x0000003F;
}
I think it's unfortunate that the article does not provide real examples where these bit hacks are used. Why would I want to be able to unset the left-most 1 bit of a series of bits?
Thanks, that's a good question and comment. I'll try to write another part of the article with applications to these hacks! There are plenty in embedded engineering, kernel programming, cryptographic algorithms, space efficient data structures, fast image processing algorithms, and in plenty other computing areas.
To answer your question, why you'd want to set/unset the right-most 1-bit, suppose you have a space efficient 8 bit data structure that represents 8 devices. Each bit represents if a device is on and off. And you want to turn off the right-most device. Then you could use that hack. Or for example, you want to linearly turn on devices from 1st to 8th, then you can just turn on the rightmost bit eight times, and turn off in the same manner. Or you could have some crazy device that puts data at some memory location and waits you to clear the rightmost bit before it puts new data at that location. Or you have a number system coded in your byte in such a way that rightmost low order bit is always the sign (or some other craziness).
He didn't even talk about how to do the left-most bit, which is too bad, because getting the left-most bit is more useful than getting the right-most bit.
Unfortunately it is not possible to do that efficiently with just a few bit operations because of the following theorem:
Theorem. A function mapping words to words can be implemented with word-parallel add, subtract, and, or, and not instructions if and only if each bit of the result depends only on bits at and to the right of each input operand.
The proof and comments are in the Hacker's Delight book!
I didn't take the article to be "with just a few bit manipulations", but rather "how to start working with bits". But yeah that's a sensible reason not to attempt it.
I guess I should be the one to provide the usual warning that C and C++ make no guarantees about the in-memory representation of the built-in types, and that this type of code can be very dangerous.
If you read the article instead of sneering at it, you might find it less noisy. It's an idiom. All idioms involve some learning, and generally some odd, seemingly-misapplied abstractions. But they generally have value, too, which is why they persist. In this case, the bit tricks work to map a very common metaphor in the design domain (a set of boolean flags) to a very common low-level implementation choice (the machine word).
Does your application benefit from that kind of mapping? Probably not, if it's a web 2.0 thing. But don't pretend that this sort of thing is never useful either. Somewhere down in the layers you don't understand, there is real machine code running your computer. And that code, like yours, needs compact and simple storage of booleans.
You're right - my comment was intentionally snarky, and deserved all the downvotes it got.
You're right - these are idioms. Idioms for working with current hardware. Essentially, these are machine implementation details, and while, yes, we currently have to program machines to get anything done, not everyone finds that kind of work rewarding or interesting.
I didn't mean to degrade the type of person that finds this interesting, I wanted to point out the divide between the hardware hacker and the, for a lack of a better term, math hacker.
These are embedded programmer tools of trade. For example, you don't want to loop over bits to count number of bits in it (not covered in this article, but will be in the next part of the article). It can be done with several shifts and ANDs! Huge speedups!
Another example, finding the lowest order 1-bit in a 64bit integer - it can take up to 64 iterations to find it with a loop. Instead it takes one AND and one but inversion operation (= 2 ops total)! 32x speedup!
The problem is that learning to think this way in general makes you habitually write things like x >> 8 instead of x / 256, which makes your code hard to read for no benefit if you're using a compiler that was written sometime in, oh, the past 20 years.
EDIT: sometimes x >> 8 is clearer, if you're actually dealing with packing bits into words, but I've just recently seen it done when the code really "meant" division.
I like bit twiddling, but I'm not sure I'd take the job if a primary concern of the interviewer is whether I know the quick way to find the lowest-order bit in a 64-bit int...
I was recently looking at some C code that takes some text and converts it to Base64 encoding for sending it over an HTTP POST request. There was one function in my code that was, IIRC, about 30 lines long and completely unreadable. Later on, I found some code on SourceForge that did the same thing in 3 lines of bit shift operations.
The purpose of the Hacker News website is to serve past/present/future ycombinator candidates. I'm not even a web app developer, I wrote C code for an $8 processor today.
The title (as submitted by the author of the blog) says "Absolutely Must Know," which is quite hyperbolic.
I would suggest: Basic tricks of bit manipulation.
Consider that this topic is covered in K. N. King's book C Programming: A Modern Approach, in Chapter 20: Low-Level Programming, on page 451
Yes, even for web applications, bitwise manipulation will come in handy at some point.
Personal anecdote: I recently had to combine two arbitrary unsigned 32 bit integers into one unsigned 64 bit integer to make a unique ID for an index. Not sure how I would have done that without bitwise manipulation.
You're XORing 32 into 2, multiplying it by the top word, and adding the bottom word?
Also, you think exponentiation and multiplication is faster than a shift and an OR? I think you're the target audience for this article, and I revise my opinion (thankfully unstated) about whether this is material every developer "must" know.
Surprisingly (I just benchmarked it myself), the times are near the same.
Here's my benchmark (in case I made a mistake):
import time
def combine(upper,lower): return (2**32)*upper + lower
def combine2(upper,lower): return (upper << 32) | lower
def test_combine():
start = time.clock()
for i in range(10000000):
combine(i,i)
end = time.clock()
output = 'The time taken for combine() is ' + str(end - start)
print output
def test_combine2():
start = time.clock()
for i in range(10000000):
combine2(i,i)
end = time.clock()
output = 'The time taken for combine() is ' + str(end - start)
print output
That's surprising; it's more than twice as slow in Ruby (yeah, yeah). It's also (for obvious reasons) orders of magnitude faster in C --- if you were crazy enough to actually write code that way.
I don't know much about Python so forgive me if this is stupid - but could 2^32 have been optimized as a constant, so that the result is just a multiplication and an addition?
I considered that solution. And in thinking about why I chose to do bit-shifting, I remembered why: I actually needed to combine two _signed_ 32 bit integers in MySQL, and since I wasn't sure how signed integers are stored in MySQL, I figured that I could just use bit-shifting and move on.
http://www.amazon.com/Hackers-Delight-Henry-S-Warren/dp/0201...