
Libdivide, optimized integer division (by ridiculous_fish) - yan
http://libdivide.com/
======
wbhart
There's a scheme described in this paper: <http://gmplib.org/~tege/division-
paper.pdf> which will work in practice for 2^b bit divisors (e.g. 2^b = 32 or
64). It's trivial to implement and amazingly doing a precomputation plus
division actually _beats the chip_. Normally these methods only beat the chip
when doing the division part after some precomputation, not when you include
the time for the precomputation too! This trick is used internally in the GMP
library and its fork MPIR (and various other places).

------
drallison
Most compilers already do this. For the theory behind it, see Hank Warren's
*Hacker's Delight", a book that ought to be in everyone's library.

~~~
tptacek
Third sentence on the page:

 _Compilers usually do this, but only when the divisor is known at compile
time. libdivide allows you to take advantage of it at runtime._

------
logophobia
I wonder if there are a lot of edge cases (large numbers that aren't easily
decomposed into shifts/additions) where the performance decreases a lot. If
these optimizations are cheap, why aren't they used everywhere?

~~~
nkurz
No, actually the performance is good for almost all divisors. I had the same
thought you did, so I got the SVN and ran the benchmark. If we are try trust
their timing, on a Core i7 the fast divisions were almost all around 4x the
speed of the native. There were surprisingly small differences between the 2^n
numbers and the harder ones --- the magic of long pipelines, I guess.

These optimizations aren't used much because they are specific to the divisor
used. For a single division, it would be faster just to let the silicon do the
work. They only help when you are repeatedly doing division by the same
denominator, which is rarely a bottleneck. But when it is (repeated integer
divisions but not hardcoded), this looks like a fine solution!

