Hacker News new | past | comments | ask | show | jobs | submit login

You are write on the time issue (but it does work if a and b are the same)

The reason becomes very apparent if you write out the assembly necessary to compile the xor, vs what is needed to compile a classic swap.

The xor would look something like:

load $1, a load $2, b xor $3, $1, $2 xor $4, $3, $2 xor $5, $4, $3 store b, $4 store a, $5

where as the classic swap would look something like this:

load $1, [a] load $2, [b] store b, $1 store a, $2

which is much better.

If you take the pipeline into account, then the difference between the swap and the xor can be huge because their is high level of dependence between the instructions.

On a theoretical classic 5-stage pipeline, the swap approach ends up needing about 10 cycles, where as the xor needs about 18.

In a real processor the difference would probably be much worse.




If you've hit memory, you've already lost. Running instructions will basically be line noise compared to that. Similarly, function call overhead will probably dwarf actually doing the xors.

However, with a good register allocator and inlined functions, your point becomes even stronger. The compiler can simply remove all instructions associated with the naieve swapping version, and simply record that before the swap %eax holds b, and after it, %eax now holds a. (Even the most naieve of code generators will do this sort of thing if the values in the swap don't fall outside of a basic block). The xors are far harder to optimize.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: