

Toom–Cook multiplication - ahanjura
http://en.wikipedia.org/wiki/Toom%E2%80%93Cook_multiplication

======
psykotic
Daniel J. Bernstein has a nice survey paper covering all the major multidigit
multiplication algorithms:

<http://cr.yp.to/papers/m3.pdf>

What I like about it beyond the comprehensive coverage is that it explains the
mathematical structure that underlies each of the algorithms. Unfortunately
for HN readers, the paper's intended audience is mathematicians, but with a
bit of background in algebra you might still be able to glean some useful
insights.

As an example, I'll try to describe the Karatsuba trick.

We want to multiply two linear polynomials p = a + bx and q = c + dx. That is,
we want to calculate the coefficients e, f, g in (a + bx)(c + dx) = e + fx +
gx^2 in terms of a, b, c, d. The standard algorithm is e = ac, f = ad + bc, g
= bd. This has 4 multiplications and 1 addition.

Here's the Karatsuba trick as usually presented. The word 'trick' is apt
because this makes it seem like pulling a rabbit from a magician's hat. Let u
= (a+b)(c+d) = ac + ad + bc + bd. Then f = u - ac - bd = u - e - g. Thus
Karatsuba's trick calculates u = (a+b)(c+d), e = ac, g = bd, f = u-e-g. This
has 3 multiplications and 4 additions. We've saved 1 multiplication at the
expense of 3 extra additions.

If we now apply Karatsuba's trick recursively in divide-and-conquer fashion to
the left and right halves of higher-degree polynomials, we get an algorithm
that is faster than the standard algorithm even if we assume that scalar
additions and multiplications have the same cost. The standard algorithm has
cost O(n^2), where n is the degree of the polynomials, and Karatsuba's
algorithm has cost O(n^(lg 3)), which is around O(n^1.6).

So what's the structure underlying Karatsuba's trick? Well, you might have
noticed that u is p q evaluated at x = 1. You can evaluate a product of
polynomials without multiplying them out first because evaluation is a
homomorphism, so u = (p q)(1) = p(1) q(1).

Evaluation is a lossy (non-injective) mapping, so we will have to evaluate
their product at three different points (since the product is quadratic and
hence has three coefficients) to recover the original product uniquely. We've
already evaluated at x = 1. Two other obvious candidate points for cheap
evaluation are x = 0 and x = infinity. Evaluating at x = 0 just gives the
constant term, so (p q)(0) = a c. Evaluating at x = infinity (make the
substitution w = 1/x, clear denominators and evaluate at w = 0) gives the
highest-degree term, so (p q)(infinity) = b d.

Now that we've evaluated the product at three points, all we have to do is
interpolate between them with the Lagrange formula to recover the product.

That's the conceptual, geometric derivation of Karatsuba's trick.

This evaluate-and-interpolate approach is also what underlies FFT-based
multiplication algorithms. The n-point FFT efficiently evaluates a polynomial
at the nth roots of unity, which are the vertices of the regular n-gon on the
unit circle in the complex plane. The inverse FFT efficiently interpolates an
(n-1)-degree polynomial from its values at the nth roots of unity. The usual
way of looking at FFT-based multiplication is via the convolution theorem
(polynomial multiplication is convolution of the coefficient sequences). That
may be more direct, but I like the unifying character of the evaluate-and-
interpolate perspective.

Rather than use just evaluation, you can apply more general homomorphisms.
That's how you get Toom's trick. If you've taken algebra, you'll recall that
evaluation at a point t is just the quotient homomorphism for the maximal
ideal (x - t).

If you find this intriguing, I suggest you study djb's paper.

~~~
dvdhsu
> This has 3 multiplications and 4 additions. We've saved 1 multiplication at
> the expense of 3 extra additions.

Stratssen's algorithm, for multiplying matrices, uses a similar "trick". Naive
matrix multiplication uses 8 steps for a (2x2) * (2x2), or n^3
multiplications. Stratssen's lowers this to 7 multiplications, and instead
uses extra additions, thus achieving O(N^~2.807).

<http://en.wikipedia.org/wiki/Strassen_algorithm>

~~~
psykotic
Yeah. There should be a similar conceptual explanation for Strassen's
algorithm, but I haven't seen one. The 2x2 block matrix decomposition
corresponds to viewing a (2n)x(2n) matrix as a 2x2 matrix over the ring of nxn
matrices. There isn't any notion of evaluation for a 2x2 matrix ring, but we
might look for other natural ring homomorphisms. The determinant and trace
seem like obvious candidates. Unfortunately, the determinant respects
multiplication but not addition, and the trace respects addition but not
multiplication. Any ideas?

Edit: I've asked djb and we'll see if he responds.

------
carterschonwald
For an even faster algorithm, <http://arxiv.org/pdf/0801.1416v3.pdf> and its
prequel <http://www.cse.psu.edu/~furer/Papers/mult.pdf> both beat strassen.

~~~
wbhart
I don't think this is terrible practical. Just as SSA wasn't efficient until
the advent of modern computers and for integers of about millions of bits, so
Furer's algorithm probably isn't efficient until integers are so large that
their _number_ of bits takes million of bits to write down. I won't say never,
but it isn't going to be practical any century soon.

The multimodular version is likewise pretty useless in practice.

~~~
carterschonwald
in practice, certainly! The devil's in those constant factors etc :)

------
jheriko
this is not exactly news... there are fft methods which are even better for
very large numbers and karatsuba is adequate for the most common ranges.

~~~
dalke
The Karatsuba algorithm page says "The Toom–Cook algorithm is a faster
generalization of [the Karatsuba algorithm]]", and the Toom-Cook page links to
the GMP document, which says:

"Toom-3 is asymptotically O(N^1.465), the exponent being log(5)/log(3),
representing 5 recursive multiplies of 1/3 the original size each. This is an
improvement over Karatsuba at O(N^1.585), though Toom does more work in the
evaluation and interpolation and so it only realizes its advantage above a
certain size."

Reading <http://gmplib.org/manual/Multiplication-Algorithms.html>, it appears
that GMP implements: Karatsuba, Toom-3, Toom-4, Toom-6.5, Toom-8.5, FFT -based
multiplication methods, which I interpret to mean that Toom-Cook is useful for
ranges between where Karatsuba and FFT are most useful.

~~~
kurlberg
The answer to "Who's fastest when?" is a bit complicated - you can find a nice
picture of the "champion algorithm" for different ranges at

<http://gmplib.org/devel/log.i7.1024.png>

More context and explanation can be found at: <http://gmplib.org/devel/>

Short summary: Toom-Cook is nice because you have many parameters to play
with.

~~~
dalke
Pretty! Since I never manually multiply n>10 digit numbers together, it looks
like I can keep doing that the schoolbook way.

