
Integer multiplication in time O(n log n) [pdf] - throwawaymath
https://hal.archives-ouvertes.fr/hal-02070778/document
======
lifthrasiir
For whoever wondering about the practicality of this algorithm, no, at least
for now. Constants aside, the algorithm currently requires an enormous cutoff
for the recursion (anything below that should be handled by traditional
algorithms):

> Let `n0 := 2^(d^12) >= 2^4096`, and suppose that we wish to multiply
> integers with n bits. For `n >= n0` we will describe a recursive algorithm
> that reduces the problem to a collection of multiplication problems of size
> roughly n^(1/d). We will show that this algorithm achieves `M(n) = O(n log
> n)`, provided that `d >= 1729`. (p. 33)

2^(1729^12) ~= 10^(2.15 * 10^38). The authors do note the possibility of much
lower cutoff:

> In this section we outline a number of such modifications that together
> reduce the constant to `K = 8 + epsilon`, so that the modified algorithm
> achieves `M(n) = O(n log n)` for any `d >= 9` (rather than `d >= 1729`). (p.
> 39)

Here 2^(9^12) has about 85 billion decimal digits. Much smaller, but still too
big to be practical.

~~~
wbhart
There are practical problems in pure mathematics that require multiplication
of numbers with 85 billion digits. For example, polynomial multiplication can
be reduced to integer multiplication, and certain problems, such as computing
congruent numbers or class numbers of quadratic number fields can be reduced
to that. Naturally the authors of the paper do not make any claims about the
algorithm being practical though.

~~~
lifthrasiir
Yeah, the recent pi calculation amassed 31.4 _trillion_ decimal digits (or
about 10^15 binary digits) of the computation load, which would definitely
include the integer multiplication of numbers with the comparable size among
others. I'm not sure though, because the currently widespread algorithm
(Schönhage-Strassen) is practically within two orders of magnitude from O(n
log n) [1] and the previous record holder (Fürer's) had been proved extremely
infeasible. Also note that 85 billion digits do not relate to the constant
itself: it is the lower bound that the new algorithm can make difference,
otherwise it reduces to the base case. I'm not really qualified to determine
the expected constant of this algorithm though, so it might actually prove
feasible.

[1] Its time complexity is O(n log n * log log n). The double logarithm is
already slowly growing; log log 2^(10^15) ~= 34 for the reference. At this
stage the constant is much more important than the time complexity itself.
y-cruncher, a record-setting pi computation software, actually has a set of
proprietary algorithms [2] optimized for modern hardwares.

[2]
[http://www.numberworld.org/y-cruncher/internals/multiplicati...](http://www.numberworld.org/y-cruncher/internals/multiplication.html)

~~~
dragontamer
> Yeah, the recent pi calculation amassed 31.4 trillion decimal digits (or
> about 10^15 binary digits) of the computation load

But that was to be cool, not for any practical problem.

They needed 156 TiB of storage to hold that singular number, and that happened
to be all-solid state storage ( because the majority of the time was spent on
I/O, believe it or not!! )

That is to say: a faster CPU wouldn't really make the computation much faster.
You need faster storage when you're dealing with numbers that big.

Double-precision floats are what people need most of the time. Crypto-guys
need 2048+ or 4096+ byte numbers or something around those sizes for
cryptography purposes. I'm not sure if large numbers are really used anywhere
else.

~~~
lifthrasiir
If you do agree that mathematical problems can be considered practical, many
prime-hunting computations do intend to resolve conjectures (Seventeen or Bust
[1] was a good example), thus solving practical problems. I of course think
that that pi calculation is nothing more than GCP advertisement ;-)

[1]
[https://en.wikipedia.org/wiki/Seventeen_or_Bust](https://en.wikipedia.org/wiki/Seventeen_or_Bust)

------
moab
This very recent paper shows a matching (conditional) lower bound of
\Omega(n\log n):
[https://arxiv.org/abs/1902.10935](https://arxiv.org/abs/1902.10935) (the
hardness is from a conjecture in network coding).

~~~
davidivadavid
Very interesting. I was just wondering what the lower bound could be (beyond
O(n) which seems fairly obvious, even though as a hobbyist I'd be hard pressed
to even prove that).

------
burk96
I'm making my way through uni currently. I've taken a few CS classes that have
touched briefly on O notation and I am currently in Calc 2. I understand bits
and pieces of this but I am lost in most of the paper. What courses should I
be taking to better my understanding of papers of this sort? Or alternatively,
are there any online resources that could help me work my way through research
papers like these?

~~~
jacobolus
You can learn all of these ideas on your own (by e.g. going through textbooks
and doing a significant proportion of the exercises), but the guidance of a
course / expert is pretty helpful.

To understand computational complexity, take a course with a title like
“theory of computation” or similar.
[https://en.wikipedia.org/wiki/Computational_complexity_theor...](https://en.wikipedia.org/wiki/Computational_complexity_theory)

To understand linear maps, tensor products, etc., take a course (or 2–3
courses) in linear algebra. To understand various matrix decompositions, take
a course in numerical linear algebra.
[https://en.wikipedia.org/wiki/Tensor_product](https://en.wikipedia.org/wiki/Tensor_product)
[https://en.wikipedia.org/wiki/Cholesky_decomposition](https://en.wikipedia.org/wiki/Cholesky_decomposition)

To understand the FFT and convolutions, take a course in signal processing,
maybe after a course in ordinary differential equations.
[https://en.wikipedia.org/wiki/Fast_Fourier_transform](https://en.wikipedia.org/wiki/Fast_Fourier_transform)
[https://en.wikipedia.org/wiki/Convolution](https://en.wikipedia.org/wiki/Convolution)

To understand the theory of polynomial rings, take a course in abstract
algebra.
[https://en.wikipedia.org/wiki/Polynomial_ring](https://en.wikipedia.org/wiki/Polynomial_ring)

To understand numerical approximations and error propagation, take a course in
numerical analysis.
[https://en.wikipedia.org/wiki/Numerical_analysis](https://en.wikipedia.org/wiki/Numerical_analysis)

Courses in discrete math, algorithms, and complex analysis would also be
helpful.

~~~
fromthestart
>but the guidance of a course / expert is pretty helpful.

From my peers I get the feeling that this is underappreciated in the tech
world where many successful developers skipped college.

~~~
thethirdone
> From my peers I get the feeling that this is underappreciated in the tech
> world where many successful developers skipped college.

I agree that most developers probably underestimate the value of a guidance
from a course/expert, and I feel that is largely do to the shear amount of
resources available for learning programming.

I have found that as I get into more and more advanced topics the amount of
resources available has decreased and has made me want guidance a lot more.
Published papers really aren't a great way to learn advanced topics unless you
are already an expert in that field.

Mathematics gets hit harder by this than computer science does. I think this
is because it is incredibly interconnected and in each topic is very deep;
there has been lots of time for the cutting edge to grow.

~~~
_delirium
This is something I really appreciated about some of my well-taught graduate
classes (of course, "well-taught" is a big caveat). In undergrad there's often
a sense that the professor is supposed to teach you the material. In advanced
topics, though, I often found it most useful if the professor actually skipped
most of the details, and served as more of a tour guide of the landscape of
material for me. Once I know that some specific topic can be learned from a
specific tutorial, book chapter, etc., then I can just go learn it on my own.
But what's hard is figuring out how to make your way through this sequence
entirely DIY, i.e. knowing what to read in what order, whether you're on the
right track, where to look if you get stuck, and having someone to answer big-
picture questions about how things fit together, why something that seems
obvious to me isn't true, etc., etc. I've found some of the best courses
helped with that.

A good textbook can also serve this role, but I've found good textbooks rarer
than well-taught graduate classes. A lot of textbooks don't really seem to be
designed for self-teaching the topic by reading it sequentially cover-to-
cover. Maybe because they're often intended to be used in courses, they often
have too much material, presented in a somewhat haphazard way, with an
expectation that a course instructor will pick and choose parts and supplement
it with lectures.

------
throwawaymath
The abstract of this paper is refreshingly succinct:

 _We present an algorithm that computes the product of two n-bit integers in
O(n log n) bit operations._

The result is excellent, and it closes (in the affirmative) the Schonhage-
Strassen conjecture first postulated in 1971.

~~~
hackcasual
It establishes the upper bound, but I don't believe this paper establishes a
lower bound, so the conjecture is still open (a super linear lower bound would
violate the conjecture).

~~~
rincebrain
Just to be sure I understand, doesn't the conjecture postulate that O(n log n)
is the lower bound, so _anything_ beneath n log n would violate it?

~~~
lovecg
Yes (technically the lower bound would be the small-o(n log n)). The
conjecture is that it’s possible to do in O(n log n) time, which this paper
proves, and that it’s not possible to do any faster (which is still an open
problem).

------
amichail
The second author of this paper created TeXmacs btw.

~~~
williamstein
And the first contributed much to the large integer and polynomial
multiplication code in SageMath...

------
hhmc
It's coincidentally pleasing that the cutoff `d >= 1729` is the Hardy-
Ramanujan number.

------
tombert
Wow, I learned something today; I had always thought of integer multiplication
as a constant-time operation.

This is why I love computer science, no matter how much I think I know, I can
feel like an idiot a day later.

~~~
pragmaticpandy
If by integer you mean the four or so bytes that many languages use to
represent a (bounded) int, then it is indeed constant. This is often the
context in software engineering.

~~~
tombert
I'm somewhat interested in compiler and computation theory, so this is why it
was a surprise to me.

------
nraynaud
Tangential question: how does single cycle multiplication works in arm cortex
processors? Do they just run the ALU clock fast enough that the multiplication
finishes in one visible core cycle or is there a binary trick?

~~~
_0ffh
The paper applies to integer multiplication on a Turing machine. Actual
digital circuits are not bound to that constraint.

~~~
wbhart
Is there a known example of something a digital computer can do that a Turing
machine cannot?

~~~
pmiller2
In terms of computational power, real computers (ignoring physics) are linear
bounded automata, which is a class of automata strictly weaker than Turing
machines. Throwing physics into the mix means they’re unreliable computing
devices.

~~~
kps
Real computers generally have external storage and/or network interfaces,
which makes them unbounded.

~~~
pmiller2
No, you’re still limited by the computational power of the universe. Since
there is a finite amount of energy and a finite number of particles, you’re
limited to a finite number of states. The entire universe is no more powerful
than a very large, but finite, linear bounded automation.

------
soVeryTired
Are there any non-FFT multiplication algorithms that are faster than O(n^2)?

I wonder if you could try to find a systematic way of 'efficiently'
representing integers that is amenable to multiplication. Consider squaring
9999, for example. The standard algorithm decomposes 9999 as (9000 + 900 + 90
+ 9), then uses distributivity of multiplication, which is clearly n^2 in the
number of digits. However, you can achieve the same result more efficiently
via 9999 = (10000 - 1), which requires fewer multiplications to square. Can
efficient representations of integers be found systematically?

~~~
kadoban
Yes, the first one found was Karatsuba multiplication. It's based on splitting
the number in about half and doing smaller multiplications.

It is quite interesting to learn, the algorithm is both easy to understand and
surprising in result (though less surprising if you know FFT or this result).

~~~
soVeryTired
Nice, thanks (to you and the sibling comment, who got there at the same time)

------
SilasX
A little frustrated here: I was just trying to get background on how fast int-
multiplication is and what the fastest algorithms are (and whether this is an
advancement), but the Wikipedia article on this topic barely mentions Big-O.

[https://en.wikipedia.org/wiki/Multiplication_algorithm](https://en.wikipedia.org/wiki/Multiplication_algorithm)

------
jgoodknight
Interesting... what are the chances something like this gets implemented in
Silicon and actually speeds up computation or is this purely of theoretical
interest?

~~~
_0ffh
For actual silicon, this does not seem like a relevant result. I think there
are already time O(log n) multiplier circuits out there.

Edit: A typical imul will probably be no more than O(n), just to add something
less speculative.

~~~
wbhart
The big-oh notation is an asymptotic notation, so it is meaningless to
describe an imul as being O(n). Given that an imul is doing 64x64 bit
multiplications, it is almost a tautology to say it can be done in a constant
times 64 ops/cycles.

~~~
_0ffh
That was not what I was trying to say, wrong as I may still be.

I meant to talk about nxn bit multiplication. If you scale n then, given the
same basic architecture, you will also scale the circuit delay. When the delay
scales linearly with the number of bits, I'd call that architecture O(n) in
time. To me that seems to make intuitive sense, even though I might have that
wrong. The term imul I used merely as a short hand for integer multiplication.
I was not alluding to any specific architecture or width, there are plenty of
CPU architectures out there using that mnemonic.

