
Mathematicians discover a perfect way to multiply (2019) - mmphosis
https://www.quantamagazine.org/mathematicians-discover-the-perfect-way-to-multiply-20190411/
======
dreamcompiler
BTW you may have noticed that the paper multiplication method we all learned
in third grade is digit-by-digit _convolution_ , which is an O[n^2] operation.
But what does convolution in the time domain become when we switch to the
frequency domain? It becomes digit-by-digit multiplication, which is O[n]. So
now you see why the FFT is essential here. The extra factors of logn and
log(logn) are about taking the FFT itself in both the forward and reverse
directions.

So fast multiplication is about looking at numbers as _signals_ and doing
signal processing on them.

~~~
photonemitter
There’s a theorem known as the Convolution Theorem:
[https://en.m.wikipedia.org/wiki/Convolution_theorem](https://en.m.wikipedia.org/wiki/Convolution_theorem)

Much used in simplifying kernel operations and convolutions (and some other
nifty tricks.)

Another useful idea is also that in the domain of the fourier transformation
we have exponentials (Fourier series are some series of $ c_n e^inx$), and
when multiplying exponentials we get $ e^ix * e^iy = e^i(x + y) $

Moreover this is usually coupled with the case where we integrate on some
periodic signal (so it’s integrated from 0 to 2pi, and unless the product of
e^i(x+y) = e^0 = 1, then the integral becomes 0 as well. )

------
albertzeyer
> Regardless of what computers look like in the future, Harvey and van der
> Hoeven’s algorithm will still be the most efficient way to multiply.

This is only true if you just look at the big O notation. However, if you also
look at the constant involved to get C * n * log(n) + C2 as an upper bound for
the number of operations for the calculation, there can be faster algorithms
with lower constants C and C2. And these algorithms also will be of practical
interest.

Also, as I understand it, it was not actually proven (yet) that O(n log(n)) is
the fastest possible class.

Also, you could also study the average runtime, instead of the worst case
upper bound. There are cases where the multiplication can be much more
efficient than O(n log(n)). An algorithm could be clever and detect many such
cases and then maybe end up in a strictly better average runtime class. Or I'm
not sure if it is proven that this cannot be the case.

Similarly, sorting algorithms are still optimized more and more, although the
worst case complexity is always O(n log(n)) (ignore integer sorting for now).
But average and best case runtime complexity is relevant. And memory usage
complexity as well. And other properties. And of course also these constants
which matter in practice.

A big C2 basically corresponds to the galactic algorithms, mentioned in
another thread here.

~~~
eru
Another thing to consider: sometimes you don't need to multiply precisely. If
you can live with small errors, further speedups are possible.

------
mellosouls
While this approach may have practical uses, the "sufficiently large numbers"
bound got me reading about Galactic Algorithms...

[https://en.wikipedia.org/wiki/Galactic_algorithm](https://en.wikipedia.org/wiki/Galactic_algorithm)

~~~
adfsgkinio
I found this line amusing:

>This means it will not reach its stated efficiency until the numbers have at
least 2^1729^12 bits (at least 10^10^38 digits), which is vastly larger than
the number of atoms in the known universe.

"Vastly larger" is one of the biggest understatements I've ever heard. It's
roughly 10^10^38 times larger.

I can't think of a better way to write that line. I just found it funny.

~~~
wetmore
Neat seeing 1729 show up there.

[https://en.wikipedia.org/wiki/1729_(number)](https://en.wikipedia.org/wiki/1729_\(number\))

~~~
perl4ever
Wikipedia also says that 1729 was the year of the largest/brightest comet ever
seen.

[https://en.wikipedia.org/wiki/Comet_of_1729](https://en.wikipedia.org/wiki/Comet_of_1729)

------
rendall
"For millennia it was widely assumed that there was no faster way to multiply.
Then in 1960, the 23-year-old Russian mathematician Anatoly Karatsuba... found
it."

I have a bizarre mental tic wherein I will often read a story like this and
then fantasize about going back in time to just before the discovery, and
present it myself, in essence stealing it. I'll get quite far along in my
daydream before the absurdity collapses the narrative.

Are there others of my kind, here?

~~~
Agentlien
That's really funny. I love the weird fantasies we cook up.

I have a variation of this, where I travel back in time to before some long
dead person did what made them famous and tell them all about their
discoveries and all the wonders that came from it.

It's usually accompanied with a hint that they won't remember our
conversation, so it won't affect history, but they'll enjoy marveling at it
for a short while.

~~~
Xelbair
I usually have a fantasy where i travel back in time to middle ages/ancient
times and try to:

* convince people i am not a devil worshipper/magician/time traveler

* explain how some concept works.

Imagine trying to explain electricity to Pope from 1000 AD, or mathematical
functions, and integration/derivative to Greek philosophers etc.

------
mikorym
Title, paper version:

> Integer multiplication in time O(n log n)

[https://hal.archives-ouvertes.fr/hal-02070778/document](https://hal.archives-
ouvertes.fr/hal-02070778/document)

~~~
andyljones
For anyone else just looking for an outline of the new algorithm: last two
paragraphs of p4, first two paragraphs of p5.

Coming from a position of a few abstract algebra classes in college many years
ago, all the words and notation are familiar but I am a long way from being
able to follow it.

------
nurettin
Previously on HN
[https://news.ycombinator.com/item?id=19474280](https://news.ycombinator.com/item?id=19474280)

~~~
dang
Also
[https://news.ycombinator.com/item?id=19672835](https://news.ycombinator.com/item?id=19672835)

and
[https://news.ycombinator.com/item?id=19644374](https://news.ycombinator.com/item?id=19644374)

------
mumbisChungo
I think I'm misunderstanding something about the article. In what way is this
method perfect, as is mentioned multiple times? Is it not just an incremental
improvement over standard practice?

~~~
hinkley
In computational complexity, we have some proofs that a certain cost is the
absolute bare minimum that an optimal solution must expend. We have the same
for some mechanical systems too.

If you get the order down to the proven limit, especially for the first time,
people sometimes use the word 'perfect', even if there are potentially other
solutions with lower constant overhead, or lower practical overhead (see also
Timsort, which uses insertion sort for leaf operations).

> Second, in that same paper Schönhage and Strassen conjectured that there
> should be an even faster algorithm than the one they found — a method that
> needs only n × log n single-digit operations — and that such an algorithm
> would be the fastest possible.

Is this saying that there is a proof in there paper that nlogn is the
cheapest? If so it certainly beats around the bush.

~~~
remcob
A conjecture is a proposition that is suspected to be true. So there would be
no proof in the paper, but likely some arguments why the authors expect it to
be true.

~~~
hinkley
If a scientist were writing the article about the paper, yes. But everyone has
been quick to point out is definitely not the case.

------
JakeStone
Gotta admit, I started chuckling at “We use [the fast Fourier transform] in a
much more violent way..." I've known some math geeks in my day, and this
sounded like them.

------
amelius
> Two decades ago, computers performed addition much faster than
> multiplication. The speed gap between multiplication and addition has
> narrowed considerably over the past 20 years to the point where
> multiplication can be even faster than addition in some chip architectures.
> With some hardware, “you could actually do addition faster by telling the
> computer to do a multiplication problem, which is just insane,” Harvey said.

Is this true for popular architectures such as Intel and ARM?

~~~
corsix
An Intel Haswell chip, when doing floating point vector operations, can
dispatch one addition per cycle, or two fused-multiply-add. The latency is
worse for the FMA (5 cycles versus 3 cycles), but if you've got enough
parallel accumulations to hide that latency, then using FMAs for addition will
be faster than addition.

This is a consequence of Intel optimising for FLOPs (or - equivalently - for
matrix multiplication), as an FMA counts as two FLOPs, versus addition
counting as just one.

~~~
gowld
Is that just because multiplication is so commonly needed that there is double
hardware allocated for FMA?

Like, if I had a special purpose adding machine and a special purpose multiply
machine, and I bought a second multiply machine, then I could multiply large
numbers faster than adding, but that's not insane, that's resource allocation.

~~~
jcranmer
A floating point add works roughly like this:

0\. Extract mantissa and exponents from the inputs [basically free in a HW
implementation].

1\. Combine the two exponents and shift the mantissas so that they same the
share scale.

2\. Add the two mantissas together.

3\. Normalize the result back into a floating-point number.

Now look at the process for a floating-point multiply operation:

0\. Extract mantissa and exponents from the inputs [basically free in a HW
implementation].

1\. Add the two exponents together.

2\. Multiply the two mantissas together.

3\. Normalize the result back into a floating-point number.

Steps 0 and 3 are identical in both cases. I'm not really knowledgeable about
hardware multiplier implementations, but doing an additional "add" operation
can be pretty close to free in some implementations. In any case, adding in an
extra adder to the process is going to be quite cheap, even if not free.

What this means is that there is not much extra hardware to turn a FMUL ALU op
into an FMA ALU op... and if you make just a single FMA ALU op, you can use
that hardware to implement FADD, FMUL, and FMA with nothing more than
microcode.

In other words, instead of thinking of FMA as something worth pouring more
resources into, think of it as FADD as not worth enough to be made its own
independent unit.

~~~
dreamcompiler
FP units are not used in integer bignum operations. I'm still curious if
"multiplies can be faster" applies to modern _integer_ ALUs.

------
tdons
Respect :). I Implemented Karatsuba a while ago and wondered if an even faster
method would be available. It is!

I admire the perseverance, grit, and intelligence needed to crack hard nuts
like this.

~~~
jacobolus
This new method will be marginally faster if you ever need to multiply two
numbers on a scale where writing down the bits fills every plank-length cube
in the known universe.

Otherwise the “practical” benefit is limited to saving a few symbols when
writing abstract math papers.

~~~
croutonix
> “The best we can hope for is we’re three times faster,” van der Hoeven said.

And they talk about billions of digits elsewhere in the article. I doubt that
only applies for the ~8.46e184 digits you seem to be talking about.

~~~
jacobolus
From last time this came up, the number of _bits_ in the numbers involved is
2^(9^12). Storing a count of the number of bits takes about 930 billion
digits. Actually storing the bits or digits of the numbers per se takes a
whole lot more, to say the least.

[https://hal.archives-ouvertes.fr/hal-02070778/document](https://hal.archives-
ouvertes.fr/hal-02070778/document)

------
rdtsc
The traditional method in the article looks a bit different from how I learned
it, which was this way [https://www.wikihow.com/Do-Double-Digit-
Multiplication](https://www.wikihow.com/Do-Double-Digit-Multiplication),

The overall description of it looks right, but in the picture the order of
operations would be more like B, C, D, A not A, B, C, D

~~~
Sharlin
Yeah, that's a bit nonstandard order but multiplication, of course, is
associative and commutative, so it doesn't really matter.

------
peter_d_sherman
>"Over the past decade, mathematicians have found successively faster
multiplication algorithms, each of which has inched closer to n × log n,
without quite reaching it.

Then last month, Harvey and van der Hoeven got there.

Their method is a refinement of the major work that came before them. It
splits up digits, uses an improved version of the fast Fourier transform, and
takes advantage of other advances made over the past forty years. “We use [the
fast Fourier transform] in a much more violent way, use it several times
instead of a single time, and replace even more multiplications with additions
and subtractions,” van der Hoeven said.

Harvey and van der Hoeven’s algorithm _proves that multiplication can be done
in n × log n steps_.

However, it doesn’t prove that there’s no faster way to do it."

------
glouwbug
Looks like Joris is now at SFU according to his homepage:
[http://www.texmacs.org/joris/main/joris.html](http://www.texmacs.org/joris/main/joris.html).

Would be cool to run into him someday

~~~
bordercases
Yeah now that SFU got the hosting for Canada's West Coast supercomputation,
they've been attracting top-shelf talent.

~~~
glouwbug
Yeah, a buddy of mine works at D-Wave. They held the world record for the
longest quantum state (or maybe still do, I think it was like 15 mins or
something) and I'd joke that they couldn't get past 15 mins because they'd
lift the lid to check if the "quantum state was still okay".

But seriously, D-Wave is doing some pretty amazing stuff. The Lower Mainland,
in general, is an awesome place to be career wise. Just a shame about the
housing situation

------
thelazydogsback
This is amazing to me: > "Their conjecture was based on a hunch that an
operation as fundamental as multiplication must have a limit more elegant than
n × log n × log(log n)"

Only in maths (and sometimes in its "implementation"/ expression in nature in
conjunction with physical systems) can such a subjective viewpoint be so
powerful.

Also blows me away, at least at first, that multiplying using an _FFT_ would
be the fastest method so far. (I guess I have an older mindset based on how
slow FFTs used to be.) I wonder in actual implementations at how many digits
the perf curves meet at break-even compared to other methods.

~~~
gowld
When you say FFT is slow, did you know any way to multiply large numbers that
was fast? Or you assumed it was without evidence?

~~~
thelazydogsback
Huh?

------
jiveturkey
Does this apply to division? division being multiplication by the inverse. I
wonder because

> And while the new algorithm is important theoretically, in practice it won’t
> change much, since it’s only marginally better than the algorithms already
> being used. “The best we can hope for is we’re three times faster,” van der
> Hoeven said. “It won’t be spectacular.”

wow, what an understatement! 3x is huge.

so if it speeds up factorization, even by 1.5x, then that's significant for
RSA isn't it?

------
gerdesj
There's a constraint that I as a civilian am missing. Could someone please put
me out of my misery:

n.m (in integers) can be computed by starting with zero and adding m to it, n
times. That's n operations. Errrrr _cough_ , assuming that the computer can
deal with an arbitrarily large n and m.

~~~
rat9988
You do n times m operation. that's nm.

~~~
gerdesj
10.10 = 10 + 10 + 10 + 10 ...

That's 10 operations.

10,000,333,999 . 33,123,456

That's 33123456 operations.

~~~
reikonomusha
How much does each operation cost? Adding together two 5 digit numbers and two
5,000 digit numbers should be counted differently, don’t you agree? So perhaps
we should count the number of single-digit operations required.

In that case, adding two five digit numbers will require 5 (or so) additions,
and two 5,000 digit numbers will require 5,000 (or so) additions.

What about a 5,000 digit * 5,000 digit multiplication?

Just the very first “big” addition in your scheme will require 5,000 single-
digit additions. But that result might now be 5,001 digits long (think of how
99+99 results in a 3 digit sum). Do it again, and if you have carries, it
might be 5,002 digits long.

So you’ll have 5,000 + 5,001 + ... = 5000N + (1 + 2 + ... + N) operations. How
many times does this repeat, that is, what is N? Well, however big your
multiplier is! Your multiplier is around N = 10^5000. So you’ll need a total
number of additions approximately equal to

    
    
        5000 * 10^5000 + 1 + 2 + ... + 10^5000
    

And what is 1 + 2 + ... + N? It’s N(N+1)/2\. So we have

    
    
        5000 * 10^5000 + 10^5000*(10^5000 + 1) / 2
    

which we might as well estimate to be

    
    
        (10^5000)^2.
    

That’s a _lot_ of operations. And that’s an _under_ estimate! Even grade
school multiplication is a lot simpler (if you count single-digit
multiplication to have a cost of 1): it only requires 5000^2 elementary
operations.

The goal of these funny multiplication algorithms is to beat (number of
digits)^2 single-digit operations. What the article shows is that of you have
enough digits, you can do only

    
    
        (number of digits) * (number of digits in the number of digits)
    

operations to do the multiplication, perhaps off by some fixed, constant
factor.

------
zitterbewegung
If the title had "almost the perfect way" it would be the perfect way to write
this title.

But, leaving that out increases your earnings by a constant factor.

~~~
MauranKilom
It's not yet known if it's only "almost". If it is indeed optimal, then
"almost" would be wrong (from a big-oh perspective).

------
abetusk
Call 's' the number of bits of the maximum of the two numbers, 'n' and 'm',
that you're trying to multiply (that is s = max(lg(n), lg(m))).

Standard "long multiplication" works in O(s^2) time [1].

To speed things up, consider the univariate polynomial whose coefficients are
the base representation of each number (for example 10 = p(2) = 0 + 1 _2^1 +
0_ 2^2 + 1 _2^2). Multiplying two polynomials algebraically (h(x) = p(x)_ g(x)
= sum (i,j = k) p_i _g_i_ x^k) then evaluating at some point (e.g. '2') is the
same as multiplying the points of each polynomial at 's' different points and
then using the resulting points to interpolate back to the multiplied
polynomial and then evaluating at the same "base" point (e.g. '2').

That is, take 's' points, w_k, evaluate p(x) and g(x) at each of the w_k
points, multiply each of the evaluated points together (that is, p(w_k) *
g(w_k)) then interpolate back to get the multiplied polynomial, h(x). Evaluate
h(x) at the original base (e.g. '2') to get the resulting number.

Take each of the w_k's that have some symmetry, say the roots of unity, and
you can exploit various tricks that are either the Fourier transform or
similar to go back and forth between the algebraic representation and the
'point representation' in O(s lg(s)) time.

The bit complexity of the evaluated points along with a host of other factors
meant that the algorithms discovered were O(s lg(s) lg(lg(s))) or nearby. I
would recommend "Fundamental Algorithms of Algorithmic Algebra" for a good
introduction [2]. The article mentions the various methods but they go by
Karatsuba [3], Shonhage and Strassen [4] among others.

I haven't read the paper but it looks like they're exploiting some 'symmetry'
of multivariate rings that are each "simple" and of the form (x^r -1), where r
divides some exact power of 2 and does some Fourier transform like tricks in
each of those rings.

Pretty cool that they've reduced it down so much.

[1]
[https://en.wikipedia.org/wiki/Multiplication_algorithm#Grid_...](https://en.wikipedia.org/wiki/Multiplication_algorithm#Grid_method)

[2] [https://www.amazon.com/Fundamental-Problems-Algorithmic-
Alge...](https://www.amazon.com/Fundamental-Problems-Algorithmic-Algebra-
Chee/dp/0195125169)

[3]
[https://en.wikipedia.org/wiki/Multiplication_algorithm#Karat...](https://en.wikipedia.org/wiki/Multiplication_algorithm#Karatsuba_multiplication)

[4]
[https://en.wikipedia.org/wiki/Sch%C3%B6nhage%E2%80%93Strasse...](https://en.wikipedia.org/wiki/Sch%C3%B6nhage%E2%80%93Strassen_algorithm)

------
mgrennan
I can't believe this is a thing. I showed this to my grade school teacher and
she told me I was doing it WRONG. I just new it worked so I kept using it.

