
Even faster integer multiplication - fdej
http://arxiv.org/abs/1407.3360
======
sambeau
Can anyone out there explain this in pseudocode? I tried my best to understand
the paper but was thoroughly defeated by it.

~~~
fdej
All asymptotically fast algorithms for integer multiplication basically reduce
the problem to one or several polynomial multiplications.

Polynomial multiplication can be done in subquadratic time using an
evaluation-interpolation strategy: namely, to multiply two polynomials of
degree less than n, we evaluate both polynomials at 2n points, compute 2n
pointwise products, and recover the polynomial product by interpolation.

When one chooses roots of unity as the evaluation points, the evaluation and
interpolation steps reduce to discrete Fourier transforms (forward and
inverse, respectively), which can be done in O(n log n) arithmetic operations
by means of the fast Fourier transform (the pointwise products require O(n)
arithmetic operations).

Now, there are a couple of difficulties. How do you decompose the integers
into polynomials? Is it better to use short polynomials with large
coefficients, or long polynomials with small coefficients? Since the
coefficients asymptotically grow with the length, you have to account for the
bit complexity of the arithmetic operations in the FFTs and pointwise products
(here you get recursive integer multiplications).

Which FFT algorithm should you use (there are many to choose from)?

Also, the ring of integers does not have enough roots of unity to do FFTs, so
you have to lift the computation to a larger ring, such as the field of
complex numbers (with numerical approximations), an appropriate finite field,
or (as in the Schönhage-Strassen algorithm) a polynomial extension ring. Which
choice gives the best complexity?

The improvements to integer multiplication published in the last few decades
have basically been tightened analyses of such tradeoffs. I have not read this
paper in detail yet, but it seems to provide a clear description in the
introduction. Quote:

The idea of the new algorithm is remarkably simple. Given two n-bit integers,
we split them into chunks of exponentially smaller size, say around log n
bits, and thus reduce to the problem of multiplying integer polynomials of
degree O(n/log n) with coeffcients of bit size O(log n). We multiply the
polynomials using discrete Fourier transforms (DFTs) over C, with a working
precision of O(log n) bits. To compute the DFTs, we decompose them into "short
transforms" of exponentially smaller length, say length around log n, using
the Cooley-Tukey method. We then use Bluestein's chirp transform to convert
each short transform into a polynomial multiplication problem over C, and
finally convert back to integer multiplication via Kronecker substitution.
These much smaller integer multiplications are handled recursively.

~~~
tobz
When I read "integer multiplication", I think of "123 * 456". Is that what is
actually being said here? If so, why would you do anything but "123 * 456" to
arrive at the actual answer?

I'm sure there's a reason, but reading your explanation sounds like the
solution to something much harder than just multiplying two whole numbers.

~~~
oakwhiz
Using these types of multiplication algorithms can actually compute the result
in fewer steps versus a naive algorithm when dealing with very large integers,
such as integers used in cryptography.

~~~
zxcdw
To add, there might be applications in for example bignum or other arbitrary
precision math libraries/implementations.

------
taeric
Having just gotten through the Knuth volume on this, I'm curious how this
compares to many of the techniques he touches on.

In particular, I have to admit my mind was blown when he went over the
"balanced ternary" method of representing numbers where multiplication was
closer to addition.

Granted, I'm a complete outsider to this field, so I'm sure most of these
techniques would blow my mind. Or, more likely, just go so far over what I'm
used to seeing that it would be incomprehensible. Definitely exciting to grasp
at, though.

Edit: I'm also more than a little curious/jealous for folks that have these
sorts of concerns in what they do. The section on fast evaluation of
polynomials, while ridiculously comprehensive and awesome, was so far from my
field of work that it felt almost like I wasn't a programmer after a bit.

Edit2: I should say I did skim this paper, and I see they make references to
Knuth's tightening of the bounds. So, I should clarify that I'm more
interested in knowing roughly how this compares to all of methods given in the
3rd edition. And I'm mostly interested in whether this sees any usage and by
whom.

------
owlish
Undergrad here, what's the best way to keep up with new papers in a field
(e.g. distributed computing or machine learning)? What sources/methods are
good for making the most of your reading time?

~~~
petercooper
For the areas they cover, being a member of the relevant IEEE group can be
useful (I guess ACM probably does something similar but I'm not a member). For
one of the groups I'm in, I receive a pamphlet every few months that
summarizes a ton of papers in the field and a CD with the full content on.
(It's all online as well, but I know I'd never remember to actively check
there.)

You may also find the relevant section of
[http://arxiv.org/](http://arxiv.org/) useful as well. For example,
[http://arxiv.org/list/cs.SE/recent](http://arxiv.org/list/cs.SE/recent)

------
TazeTSchnitzel
Is this something users of GNU MP will get to see the benefits of?

~~~
jekub
The improvement is only theoretical. The Fürer algorithm is quicker than
Schönhage-Strassen only for impractical numbers and no arbitrary precision
toolkit use it as far as I know.

The Fürer algorithm change the loglog term of the complexity to 2^log* which
doesn't change anything in practice for workable numbers but it also add a lot
to the constant term which is not counted by the big O notation but is very
important in practice.

~~~
userbinator
That reminds me of the fastest known (in terms of number of multiplications)
matrix-multiplication algorithm, where the constant factor is so huge that
it's of no practical use.

Even the more practical ones e.g.
[http://en.wikipedia.org/wiki/Strassen_algorithm](http://en.wikipedia.org/wiki/Strassen_algorithm)
trades 8 multiplications and 4 additions with 7 multiplications and _18_
additions, won't be all that much faster on modern architectures where
multiplication is often surprisingly fast (almost the same as addition) and
how much data is read/written also matters (memory bandwidth and latency).

~~~
arghbleargh
Those are actually matrix multiplications and additions, so it can make a
significant difference (e.g. it's way faster to add two 50x50 matrices than to
multiply them).

