
The Aggregate Magic Algorithms (2015) - rsp1984
http://aggregate.org/MAGIC/
======
pc86
Maybe it's just my own neuroses around being a self-taught developer and not
having much of any mathematical background, but it always bothered me that
articles like this constantly use phrases like "it's clear that...",
"obviously...", "it's trivial to...", etc.

> _Clearly, floor of base 2 log of x is (WORDBITS-lzc(x))._

Uh, is it?

~~~
taneq
It's a bit of a trope that mathematicians consider all knowledge to fall into
two groups: 'unknown' and 'obvious/trivial'.

------
Veedrac
These tricks are cool on their own, but they're really neat when you tie them
together to solve larger (but still toy) puzzles. Two fun ones I can remember:

[https://www.reddit.com/r/programming/comments/4vctor/.../d5x...](https://www.reddit.com/r/programming/comments/4vctor/.../d5xl6vl/)

[https://cs.stackexchange.com/a/74888/16408](https://cs.stackexchange.com/a/74888/16408)

The first asked

> Extra-extra credit can be earned for finding an algorithm that takes no more
> than O(log N) space and no more than O(log N) time, where N is the number
> you are seeking for.

and the bit magic made it O(1) space and time. The second was on similar
lines.

Unfortunately the bit fiddling at my day job tends to be a bit tamer ;).

~~~
Veedrac
A clarification: this isn't to say such tricks are _never_ used in a non-toy
environment. There is, for example, this search algorithm I made for a serious
purpose:

[https://stackoverflow.com/q/40141793/1763356](https://stackoverflow.com/q/40141793/1763356)

The bit math is milder, but it still comes in handy.

------
Palomides
if you like these, definitely check out hacker's delight by warren, it's
pretty readable and thorough. imo a big advantage of at least skimming it is
that you can develop an intuition of what sorts of problems can be solved in a
few bitwise ops.

------
glangdale
There are a few good pages/books on this ("Bit Twiddling Hacks", and Henry
Warren's Hacker's Delight).

On my wishlist: updated versions of these for modern architectures and SIMD.
Full disclosure: am Intel employee. That said, modern chips of all strips
(x86, ARM64) all can do some amazing things with low-cost SIMD and bit-
manipulation instructions. It would be interesting to see what smart folks
come up with using these. Obviously you can make the 'bit extract' function
_really boring_ with PEXT, but what cool tricks can you do when you start with
AVX2 and BMI1/2 as a baseline?

------
zzo38computer
I have found many thing like this, and have thought of how some of these
things might be done with MMIX too. With MMIX, you could for example to use
2ADDU and 4ADDU to multiply by fifteen or by twelve, etc. Reversing bits (in
this case, of a 64-bit number) is also simple in MMIX:

MOR $0,#0102040810204080,$0 MOR $0,$0,#0102040810204080

(Although that constant will be placed in a register, but you can use the same
constant twice.)

------
wsgeek
Do these actually run faster than the code produced by a good compiler?
Benchmarks?

~~~
deathanatos
> _Do these actually run faster than the code produced by a good compiler?
> Benchmarks?_

Some of these are absolutely suspect. Now, these are all architecture
specific: my results below are for gcc, targeting the Intel chip in the MBP in
my lap. (Of course, some of the article's code samples assume certain things
about an architecture, and are not portable.)

For example,

> _Absolute Value of a Float_

> _IEEE floating point uses an explicit sign bit, so the absolute value can be
> taken by a bitwise AND with the complement of the sign bit. For IA32 32-bit,
> the sign bit is an int value of 0x80000000, for IA32 64-bit, the sign bit is
> the long long int value 0x8000000000000000. Of course, if you prefer to use
> just int values, the IA32 64-bit sign bit is 0x80000000 at an int address
> offset of +1. For example:_
    
    
      double x;
    
      /* make x = abs(x) */
      *(((int *) &x) + 1) &= 0x7fffffff;
    

First, this is undefined behavior. The portable way to do this is fabs(),
which is part of the standard library. For me, calling fabs() compiles to a
single instruction:

    
    
      andps   LCPI1_0(%rip), %xmm0
    

where LCPI1_0 is

    
    
      .quad   9223372036854775807     ## 0x7fffffffffffffff
      .quad   9223372036854775807     ## 0x7fffffffffffffff
    

fabs() compiled down to essentially the same bitwise &, in assembly, that the
article is doing. The article's "advice" _also_ gets compiled to exactly the
same instruction & set of constants, which I actually find way more
interesting.

> _Integer Constant Multiply_

> _Given an integer value x and an integer or floating point value y, the
> value of x_ y can be computed efficiently using a sequence derived from the
> binary value of x. For example, if x is 5 (4 + 1):*
    
    
      y2 = y + y;
      y4 = y2 + y2;
      result = y + y4;
    

> _In the special case that y is an integer, this can be done with shifts:_
    
    
      y4 = (y << 2);
      result = y + y4;
    

Compilers will do this for you, today, automatically. In the example case of
multiplying by 5, gcc will do better than the article's code, by using lea
cleverly:

    
    
      leal    (%rdi,%rdi,4), %eax
    

gcc _also_ compiles the article's shift-then-add strategy to _exactly_ the
same leal.

The article mentions this optimization again, later,

> _Shift-and-Add Optimization_

> _Rather obviously, if an integer multiply can be implemented by a shift-and-
> add sequence, then a shift-and-add sequence can be implemented by
> multiplying by the appropriate constant... with some speedup on processors
> like the AMD Athlon. Unfortunately, GCC seems to believe constant multiplies
> should always be converted into shift-and-add sequences, so there is a
> problem in using this optimization in C source code._

The above leal instruction was the output of GCC, so this is obviously wrong.
One can maybe say the article is old, but I'm pretty sure GCC has had the "use
leal to do constant multiplies" optimization for over a decade now, but then,
perhaps the article is that old. If so, the advice would appear to no longer
apply.

Write it the obvious, readable way first. If you decide to optimize at the
level this article is discussing, you should be reading the assembly to see if
your optimizations will have any effect.

But not all of them are bad. The "is this unsigned integer a power of 2" code
is correct, I believe, and if you understand binary, not that hard to
understand.

~~~
Veedrac
That's not to say these optimisations are no longer needed; you just need to
be smarter about applying them. If you're multiplying by 2^n - 1, for example,
keeping n around and writing a pair of shifts is not an optimisation a
compiler will be able to make, and even when doing the power inline (x * ((1
<< n) - 1)) most compilers aren't able to optimise it properly.

[https://godbolt.org/g/aHmMDL](https://godbolt.org/g/aHmMDL)

------
mikeash
This is one of my favorite web pages. There isn't much call to use these
techniques in most code, but understanding why they work helps expand one's
thinking.

------
microtherion
This seems to have some overlap with Henry Warren's _Hacker 's Delight_:
[https://www.amazon.com/Hackers-Delight-2nd-Henry-
Warren/dp/0...](https://www.amazon.com/Hackers-Delight-2nd-Henry-
Warren/dp/0321842685)

