The Three Ways of XOR 143 points by horia141 on Feb 12, 2017 | hide | past | web | favorite | 66 comments

 > most programming languages don’t have an explicit “logical operator” for itI guess the author never taught of != . The only thing to be careful with is that it doesn't implicitly convert arguments to boolean, so expressions like " != (flags & Flag)" will go wrong without an explicit conversion "bool(flags & Flag)" or equivalent expression like "((flags & Flag) != 0)".And let's not forget about the friend, ==. I've more than once seen code like "(a && b) || (!a && !b)".A similar interesting pattern many don't think of is "bool(a1) + ... + bool(aN) == M" (particularly with M==1) and instead we see unreadable monstrosities :)
 If you only want to convert the second argument you can use the ==! operator ;)
 C has so many wonderful operators, why don't people use them?`````` while (x --> 0) x goes to zero; while (0 <---- x) goes much faster on some compilers (ymmv);``````
 !!a != !!b also works, but that's unwieldy.I think maybe part of the reason is that usually you're also doing something with the value of the lhs or rhs, so you end up having to separate the cases anyway:`````` if (a && !b) { frob(a); } else if (b && !a) { twiddle(b); } else { panic(); }``````
 You don't need to double-negate,`````` !a != !b `````` is sufficient.
 > The only thing to be careful with is that it doesn't implicitly convert arguments to booleanWell, you could also use bitwise XOR in that case.
 Is "bool(a1) + ... + bool(aN) == M" not just "a1 || a2...||an"?
 No, it's true when exactly M are true. "a1 || a2 || ... || an" is true when at least one is true.The latter is the same as "a1 + ... + an == 1 || a1 + ... + an == 2 || ... || a1 + ... + an == n".
 So your + operator implicitly casts booleans to an integer type, with false=0 and true=1?
 What you are saying is the only way I can imagine to interpret what I wrote, if one assumes I was intending to make any sense.By the way, in Python, bool is a subtype of int (i.e., instanceof(True,int) == True), and True + True == 2. C is the same, and in Javascript booleans are converted to integers for arithmetic operations.
 > What you are saying is the only way I can imagine to interpret what I wroteOne might imagine it (without experience, or not thinking, of any particular programming language) as being a logical OR on Booleans - which in EE at least is frequently written '+'.
 In c++, yes. Not every language will do that though.
 Obviously no, the first (for M==1) means exactly one is true, the latter at least one is true. But a1||...||aN is equivalent to bool(a1)+...+bool(aN)>0 (assuming no integer overflow :).
 > "(a && b) || (!a && !b)"It is XNOR, not XOR (no pun intended)Boolean algebra laws excellently help if there are many input aN. I used to use it to simplify legacy ruby codes written by c-style programmer (you can imagine how the conditions look like)
 Very early on in my computing career (nearly 40 years ago now), I remember being blown away when an older IBM Systems 360 programmer showed me how you can swap the values of two variables over WITHOUT using a third placeholder variable by using pure XOR.I didn't believe him until he showed me. Apparently they used to use it all the time to swap out entire segments of RAM in the S/360 without having to page out to disk or clobber other free RAM segments. It is simply:`````` a = a xor b b = b xor a a = a xor b `````` Three steps, same as using a placeholder 'c' variable. I think he mentioned on most of the processors of the time, 3 XOR instructions actually worked faster than 3 MOV instructions.Illustration for the non-believers:`````` a = 10010110 b = 01100011 a = a xor b a = 11110101 b = 01100011 (unchanged) b = b xor a a = 11110101 (unchanged) b = 10010110 a = a xor b a = 01100011 b = 10010110 `````` Voila!
 This is a classic trick, and as you write could be used for performance benefits in the old day.However, the semantics differ from just using a temporary variable in that if a and b are in the same memory location then the result will be zero.This was used in an entry for the underhanded C contest [1] if I remember correctly where for an implementation of RC4 the author defined the following macro.`````` #define SWAP(x, y) do { x^=y; y^=x; x^=y; } `````` And used it for swapping the values in the substitution table for the cipher, e.g. SWAP(S[i], S[j]). The weakness was that since sometimes the indices are the same in RC4 the substitution table would be gradually replaced with zeroes.
 I like to think about this algebraically:`````` a1 = a0 ^ b0 b1 = b0 ^ a1 = b0 ^ (a0 ^ b0) = (a0 ^ b0) ^ b0 = a0 a2 = a1 ^ b1 = (a0 ^ b0) ^ a0 = (b0 ^ a0) ^ a0 = b0 `````` The xor-linked list (https://en.wikipedia.org/wiki/XOR_linked_list) is another neat idea along the same theme.
 I have a similar memory from my early days of coding.I was implementing a toy rc4 cipher. One of the steps in the ciphers involves swapping entries in an arrays of 256 elements. I thought "hey, I'm a 1337 coder, I'm going to use the xor trick".Except it doesn't work if you're trying to swap something with itself. If you do "a[i] ^= a[j]" and i == j then you're just clearing the entry.Taught me the valuable lesson that I shouldn't try to be a smartass while writing code and the importance of unit tests.
 This was the basis of an excellent entry [0] in the 2007 Underhanded C Contest. It had a correct implementation of the RC4 encryption algorithm, except it used the XOR swap, so on average one byte of the pseudorandom state was zeroed every 256 iterations. Eventually, the state is all zeroes, and the encryption just outputs pure plaintext. Best of all, the first few kilobytes of output looks random at first glance.
 Nice catch - I hadn't though about in-place swap situations. I have always used this trick to switch two variables, but yes, I can see when you are talking about lists and matrix array manipulation, you can easily encounter an edge case which necessitates an 'in place' swap which would fail under these circumstances.
 You can do the same with addition/subtract. Perhaps there is a simpler way, but here is one way to do it:`````` a = a + b b = b + a a = b - a b = b - 2*a``````
 a = a + bb = a - ba = a - bI think this works too, assuming a+b doesn't overflow.
 Good one.> I think this works too, assuming a+b doesn't overflow.Well, in two's complement arithmetic (as is used on most architectures), the intermediate overflow can be ignored, and it will work just fine.
 You have to take overflow into account. In most cases, that would make the algorithm not very useful compared to just doing the swap with a third variable.
 In two's complement arithmetic, you are basically computing modulo N (N = 2 to the power of the number of bits). So overflow will not interfere with the swap operation.
 Hm I don't think overflow will apply. Anything it does in an add will be undone by a subtract?Except that 2a, that might be trouble.
 Imagine you only had 4 bit numbers (range 0-15) and you tried to do swap 14, 15 (1110, 1111). You can do that with xor but not with the add method, because you can't store a + b without a wider variable.
 15 + 15 gives 1414 - 15 gives 15
 The way I wrote this was rubbish, edited. I was trying to get at if you have 1110, 1111 then 1110 + 1111 will overflow, but you could xor them.
 `````` 1110 + 1111 = 1101 1101 - 1110 = 1111 1101 - 1111 = 1110 `````` Overflow doesn't matter if you just want to swap.
 (Assuming two's complement arithmetic, as amelius pointed out - I don't know if all programming languages and platforms use it)
 This also works with `sub` instead of xor. This code (xor swap, sub swap) should nevertheless better not be used on a modern CPU since it can badly be pipelined.
 Yes, in fact compilers these days are smart enough to convert people's xor swaps into mov swaps: https://godbolt.org/g/FYv7xQ
 When I look at the generated code that multiple of the compilers (gcc,clang,icc) generate`````` mov eax, edi mov edi, esi mov esi, eax `````` I would intuitively use the `xchg` instruction that x86-32/x86-64 provides instead. Is there a specific reason why the compiler(s) decide to generate the mentioned code instead?
 When used to swap with data in memory, the reason is that it is faster. xchg is atomic. To do that, it implicitly locks its target address. That makes it slower than the series of moves. http://www.agner.org/optimize/instruction_tables.pdf:"Instructions with a LOCK prefix have a long latency that depends on cache organization and possibly RAM speed. If there are multiple processors or cores or direct memory access (DMA) devices then all locked instructions will lock a cache line for exclusive access, which may involve RAM access. A LOCK prefix typically costs more than a hundred clock cycles, even on single-processor systems. This also applies to the XCHG instruction with a memory operand."
 I know that. But this is only relevant if you exchange registers with memory and is not of relevance if you exchange two registers. I accept that this is a good point if some variables are moved to the stack because of register spilling or because you want to use the address of the variable (which is not the case here).So I still stand by my point: What is the reason why the compiler uses `mov` for exchanging two registers here instead of `xchg`?
 I think that's because (at least on some CPUs) it takes three macro-operations. http://www.agner.org/optimize/microarchitecture.pdf (section 17.4, page 188):"Vector path instructions are less efficient than single or double instructions because they require exclusive access to the decoders and pipelines and do not always reorder optimally. For example:`````` ; Example 17.1. AMD instruction breakdown xchg eax, ebx ; Vector path, 3 ops nop ; Direct path, 1 op xchg ecx, edx ; Vector path, 3 ops nop ; Direct path, 1 op `````` This sequence takes 4 clock cycles to decode because the vector path instructions must decode alone."
 This is indeed a good explanation - I admit I was not aware of this detail of the K8/K10 processors.
 Some of that is weird placeholders for the debugger, for inserting hook instructions or whatnot?
 I think you mean a combination of sub and add (e.g. sub sub add). xor is somewhat special in that it is its own inverse.
 Yes, you are right - I was a little abentminded.
 Yep, that's the 'classic' usage for me. Used to use this all the time if registers were short (and they always were).
 Another way to look at XOR - it's an adder (the sum, without the carry)
 Yeah, surprised this wasn't mentioned; XOR as addition mod 2 (note that they're just talking about XOR here, not bitwise XOR) comes up way more often than it being the most complex binary boolean operation.
 The post states at the beginning of the 6th paragraph:"Some extra treats: XOR can be considered an imparity function. When the number of inputs is even, it outputs zero, while when it is odd, it outputs one. It is also the sum part of an adder, that is, without the carry part. "
 One thing I love about xor is an interesting correspondence between bitwise xor and the outer product of the exterior algebra.Say we have an N=4 dimensional vector space, and we use binary place values to represent units vector in a basis, like this:w = 1000x = 0100y = 0010z = 0001Then a multivector basis could be represented e.g.:xyz = 0111xy = 0110wz = 1001Now, if ^ is the outer product:xy^yz = xz ↔ 0110 xor 0011 = 0101.wx^yz = wxyz ↔ 1100 xor 0011 = 1111.and so on.If anyone has more information about this, I'd be really interested in seeing it!
 I had a similar 'aha' moment a few years ago about inner product (aka dot product) and the choice of '.' as an operator to access elements of a structure.`````` struct Example { int elem1; int elem2; ... }; Example ex1 = { 1, 2, ... }; int a = ex1.elem1; int b = ex1.elem2; ... `````` If we think of ex1 as a vector, and elem1 as a constant vector [1, 0, 0...] (and elem2 as a constant vector [0, 1, 0, ...] and so on) then ex1.elem1 is literally the inner product of ex1 and the constant elem1.I have no idea if that was the original thinking behind dot notation but it's neat and I like it.
 Unless you mean something different from the usual multilinear algebra meaning of "exterior algebra", xy^yz is zero. (It contains two y's.) I think that you must actually mean Clifford algebras. (But there are also signs when in characteristic ≠ 2.)
 Yeah, I do mean Clifford algebras. Thanks for the catch. I'll have to study up on the distinction between "outer products" in Clifford algebras and exterior algebras.
 They are isomorphic as vector spaces (again with characteristic != 2), but their products are not preserved by the isomorphism (unless the Clifford algebra is trivial).
 I remember this and other tricks are implemented in Gaigen (a code generator for geometric algebra).https://sourceforge.net/projects/g25/?source=directoryThe XOR trick is close to slide 16 of http://www.science.uva.nl/research/ias/ga/gaigen/files/20020......How to compute the geometric product of unit orthogonal basis blades (3/3) If we represent each basis vector with a specific bit in a binary number (e1 = 001b, e2 = 010b, e3 = 100b), computing the geometric product of basis blades is exactly the xor operation on binary numbers!`````` (e1^e2)(e2^e3) = e1^e3 011b xor 110b = 101b `````` We have to take care of the signs though:- basis vectors have to be rearranged into a specific order before they can annihilate each other (this rearranging causes a sign change in the result). This can also be computed binary. - signature of annihilated basis vectors can change the sign as well.
 "Sadly XOR doesn’t appear as an equivalent to NOT, AND and OR, as a logical operator on booleans, being relegated to just a bitewise operator in most programming languages."Well, in C there's just no need. The main raison d'etre for && and || over & and | is that you can exploit their short circuiting behaviour. A hypothetical ^^ operator wouldn't bring anything extra to the table.
 It might at least coerce its operands to bool. As things are, 2&&1 is true, and 2&1 is false; hypothetically, 2^^1 could be false while 2^1 would be (as now) true.
 The real problem is that you can not short circuit with XOR.
 You have to evaluate both arguments to a XOR op to know its result. Short circuiting it has no natural meaning.
 Also, 3 XOR gates allows a wire crossing in 2D (where you cannot make a wire crossing with a bridge as that would be 3D)
 Lovely article, in particular the small but deep excursion into AI history and the AI winter after the publication of Minsky/Papert's "Perceptrons" (though it was 70's, not 80's).Wonder when the current AI summer will come to an end...
 I posted this on his site but it needed to be approved. Also, there's actually a comment in the book (I just borrowed my gf's copy) in which they speculate that multilayer networks could be built, but since they only computers they had were .15 MIPS KA-10s (and all the figures in the book were hand drawn) they didn't pursue it.My guess is it's still early days on the AI boom.My comment on AI Winter:Minor correction: Marvin and Seymour's Perceptrons book was published around 1969 and nuked neural nets around there. AI winter set in in the 80s as expert systems and similar crude symbolic systems proved not to be scalable (I was at that AAAI in 1984 and remember that panel well).Moore's law rescued NNs and is about to rescue symbolic AI as well.
 > Wonder when the current AI summer will come to an end...Are we really closer to AI in a significant way today? Chomsky makes an interesting argument to the contrary. He points out that any physical system, say the motion of falling bodies, can be closely modeled statistically given enough data points. However, this modeling gives us very little insight into why the system behaves in that manner, i.e. gravity.Of course, this presumes that the human brain isn't just an advanced statistical model trained by billions of years of evolution.
 The last statement in the article is technically a hardware decision (the article says: “As an extra treat, XOR can be considered an imparity function. When its input has an even number of zeros, the output is zero. Otherwise it is one.”).While 2-input XOR behaves as “one or the other but not both”, a many-input gate is generally implemented by chaining other XORs and the chaining causes even/odd parity instead of a “must be only one” behavior [1].
 It's not "just an implementation decision"[1], it's a reasonable mathematical definition: for a binary operator OP, we define it's n-parameter version whenever a OP (b OP c) == (a OP b) op c, which is true for XOR but not e.g. for XNOR (F == F) == T is F, but F == (F == T) is T.[1] and sorry if I'm falling victim to the difficulty of interpreting tone on the internet
 By symmetry it must be true for XNOR also, and:`` (F == F) == T in fact is T``
 The analog xor is also the basis for CDMA. Fascinating stuff if you dig into it.
 it can also be used to generate simple parity.
 That is mentioned in the penultimate paragraph.

Search: