
Using Fourier Transforms to Multiply Numbers - signa11
http://blog.robertelder.org/fast-multiplication-using-fourier-transform/
======
meuk
Intuitive explanation: if you interpret a time series as the coefficients of a
polynomial, the Fourier transform is just polynomial multiplication (this
makes sense because polynomial multiplication is just convolution). Now we can
write base b numbers as polynomials with single-digit coefficients, evaluated
at x = b. Then after multiplying two such numbers/polynomials the coefficients
are too large. The only thing remaining is too carry the overflows in the
digits to the next digit (much like the school algorithm for adding numbers).

~~~
gjm11
The Fourier transform isn't polynomial multiplication. (It operates on _one_
thing, not _two_.) What the Fourier transform does is to translate between two
representations of a polynomial: (1) its coefficients, (2) its values at a
suitable set of points. And then the thing that makes FFT multiplication work
is that _pointwise_ multiplication of values is cheaper and simpler than the
convolution operation that gives you the coefficients of the product of two
polynomials.

~~~
TomVDB
Every once in a while I pick up "The Scientist and Engineer's Guide to Digital
Signal Processing" (freely downloadable:
[http://www.dspguide.com/](http://www.dspguide.com/)) and it's always so
satisfying to see the different relations between domains and how you can
think about it. Your example is no different.

But since I never get to actually use that stuff in my just, I always forget
the details.

------
jlamberts
For anyone who's interested in a more long-form, intuitive explanation of the
fourier transform and how you can use it to multiply polynomials, I'd highly
recommend Erik Demaine's lecture on the topic:
[https://www.youtube.com/watch?v=iTMn0Kt18tg](https://www.youtube.com/watch?v=iTMn0Kt18tg).

------
tzs
A very interesting look at large integer multiplication, including the FFT
method, that finds some interesting underlying connections mathematically
between them all is given in the paper "Multidigit Multiplication for
Mathematicians" by Daniel J. Bernstein [1].

He ties them all to exploiting certain ring homomorphisms, either to compute r
s with the help of f(s) f(s) or to compute f(r) f(s) with the help of r s.
Karatsuba, Toom, FFT, Schönhage–Strassen...all are using ring homomorphisms to
go between what you want to be doing and something that is easier.

Also see his later "Fast multiplication and its applications" [2].

[1] [https://cr.yp.to/papers/m3.pdf](https://cr.yp.to/papers/m3.pdf)

[2]
[https://cr.yp.to/lineartime/multapps-20080515.pdf](https://cr.yp.to/lineartime/multapps-20080515.pdf)

------
pwsun
I wrote a much more in-depth analysis of Schonhage-Strassen and why exactly
it's an O(N log N log log N) algorithm here:
[https://psun.me/post/fft2/](https://psun.me/post/fft2/)

It's pretty interesting--it's actually a recursive multiplication algorithm
with log log N levels of recursion and N log N work at each level. And it
actually uses _another_ fast multiplication algorithm within it, the Karatsuba
algorithm.

------
sneakernets
Man, is there anything Fourier Transforms can't do? For a music project in
college, I created a book of Hammond Organ presets by taking slices of samples
from actual instruments and running FTs on them, then finding the best
approximation in Hammond drawbar notation. The fact that it worked so well was
such a satisfying experience that I managed to make 100+ combinations.

~~~
nitrogen
I sort of did the reverse for a song once, I used FFT to analyze a recording
of a drawbar organ I wanted to mimic, then turned that into a ZynAddSubFX
additive synthesis preset.

~~~
sneakernets
This is where I started, actually! I wondered how far back I could go though,
and the Hammond B3 worked extremely well.

------
earthicus
I have to admit that i've never gotten around to sitting down and thinking
through why fast fft multiplication works, so I have a question that maybe
someone here could answer. Some background:

If a number is a multiple of two (i.e. n= 2k+0), we call it even, or if n=2k+1
then it's odd. This is interesting because evenness and oddness form an
_algebraic structure_ \- an amoeba of an algebra with only two elements: even
* odd = even, odd * odd = odd, etc...

But a moment's reflection reveals there is nothing special about being a
multiple of _two_. We can just as well consider multiples of three and divide
the universe into three groups: if a number is a multiple of three (i.e.
n=3k+0) we call it 3-even, or if n=3k+1 then it's 3-odd, or if n=3k+2 then
it's 3...blern! Again this is interesting because we have a mini-algebraic
structure, this time with three elements. For example, a 3-odd times a 3-blern
is (3k+1) * (3j+2) = 3(k3j+k2+j)+2 which is a 3-blern.

Similarly we have multiples of four: 4-even, 4-odd, 4-blern, 4-pop. Making up
these names is of course absurd, so we instead invent a notation: modular
arithmetic. Traditional even and odd are arithmetic modulo 2 (mod 2 for
short); 3-even 3-odd 3-blern are arithmetic mod 3, etc. The mini-algebras are
the cyclic groups or the multiplicative groups of integers modulo n.

The key word there being _cyclic_ : we've identified periodic structures in
the integers with different 'wavelengths' or periods, which is how long it
takes for the even/odd/blern cycle to repeat:

    
    
      (mod 2) 0 1 .. 2 3 .. 4 5 .. 6 7 .. 8 9 .. 10 11 ..
      (mod 3) 0 1 2 .. 3 4 5 .. 6 7 8 .. 9 10 11 ..
      (mod 4) 0 1 2 3 .. 4 5 6 7 .. 8 9 10 11 ..
     

\----------

Back to the topic of discussion. Typically we represent a number 'locally'
with coordinates given wrt a base system, but it seems we might describe a
number instead nonlocally, using these cyclic groups. That is, instead of
saying 1 * 10^0 + 1 * 10^1 + 0 * 10^2 ... (11) i would say

    
    
      2-odd:   11 = 2 * 5 + 1
      3-blern: 11 = 3 * 3 + 2
      4-pop:   11 = 4 * 2 + 3
      5-odd:   11 = 5 * 2 + 1
    

So we're doing a kind of spectral decomposition: examining the number modulo
2, then mod 3, mod 4 etc. The key observation is, if two numbers are
represented this way, how do I compute their product? Well its very easy,
because i know what happens when i multiply, say, an an odd times a blern.. I
get a blern! Multiplication becomes a pointwise operation. My question is
this... is this an equivalent encoding to the one being done by the complex
roots of unity in the FFT?

~~~
pbsd
You're essentially describing a residue number system, except the different
moduli need to be coprime for things to be well-defined.

FFT multiplication is basically a residue number system product on
polynomials. Evaluating a polynomial p(x) at a point w_i is the same as
computing p(x) modulo (x - w_i), where w_i here is one of the nth roots of
unity. So the FFT computes p(x) mod (x - w_0), p(x) mod (x - w_1), ...,
multiplies each residue individually, and the inverse FFT recovers the result
modulo (x - w_0)(x - w_1)...(x - w_{n-1}) = x^n - 1.

~~~
earthicus
Thank you!

------
ulber
One interesting application of NTT polynomial multiplication is in fully-
homomorphic encryption libraries, such as Microsoft's SEAL [1], where the
ciphertexts are polynomials of very high degree. (Disclaimer: I work at MSR
using SEAL for machine learning.)

[1]: [https://github.com/Microsoft/SEAL](https://github.com/Microsoft/SEAL)

------
Hernanpm
This is well know in competitive programming usually it's the second best
alternative after karatsuba.

~~~
giornogiovanna
Generally Fourier methods are asymptotically better than Karatsuba, but with
much higher constants, which makes them kind-of impractical until you get
_very_ big numbers. Anyway, do competitive programmers tend to program
everything from scratch, or is there another reason they don't just use GMP?

~~~
bcaa7f3a8bbc
> _Anyway, do competitive programmers tend to program everything from scratch,
> or is there another reason they don 't just use GMP?_

I think directly using algorithms and programs implemented by others would be
considered cheating (and these libraries are not presented on the server which
checks your answer anyway). In competitive programming, participants compete
their own ability and knowledge to implement algorithms.

~~~
Hernanpm
In C++ thanks to complex the implementation is short see
[https://ideone.com/tbHUN7](https://ideone.com/tbHUN7)

~~~
bcaa7f3a8bbc
C99 has complex, too. I think most competitive programming contests are now
using C99, finally.

------
aouyang2
This property has helped make matrix profile computations feasible for motif
discovery.
[https://www.cs.ucr.edu/~eamonn/MatrixProfile.html](https://www.cs.ucr.edu/~eamonn/MatrixProfile.html)

------
_fullpint
I don’t think I’ve ever come across an algorithms book that discusses FFT that
doesn’t discuss fast polynomial multiplication when introducing the algorithm.

That’s literally what the FFT does, fast polynomial multiplication.

~~~
planteen
If you come at it from an engineering approach (e.g., electrical engineer), it
doesn't look like that at all. In fact, the focus is on filters and spectral
analysis. It looked something like this when I was at Georgia Tech:

First, you learn about the convolution operation that, given any input
function x(t), get the output of any linear time-invariant circuit as y(t).
The convolution integral is nasty, professors make you feel the pain a bit.
Then introduce the Laplace transform to make the computation much easier. Then
go to continuous time, continuous frequency Fourier transform. Talk about
frequency domain a bunch, learn filter topologies, etc. Circuits class over.

Now comes a signal processing class where you learn about discrete-time
signals. First, talk lots about sampling. Then, get introduced discrete time
convolution. Now, learn the z-transform and the discrete time Fourier
transform (DTFT) for a discrete-time, continuous frequency signal. Mostly
discuss filtering and spectral analysis. Intro to signal processing class
over.

Learn about sampling the DTFT in the frequency domain. This is the DFT, which
is usually presented as a sum that would require O(n^2) operations to compute.
Learn that this corresponds to circular convolution and learn about zero
padding for traditional convolution. Finally, get presented with Cooley-Tukey
FFT algorithm for base-2. Focus is still signal and spectral analysis. Talk
lots about windowing. You may get a mention that convolution corresponds to
polynomial multiplication here. Or maybe they talk about grade-school
multiplication, its really the same thing as polynomial multiplication with a
carry. Senior level signal processing class over.

~~~
_fullpint
I've worked with filters and spectral analysis and you would often do this in
CS in Digital Image Processing.

The point isn't what you use the FFT for but what the FFT does. The FFT does
fast polynomial multiplication.

~~~
planteen
Yes that's true. But if you come at it looking at it as a sampled continuous
signal and spectrum, you don't think of the FFT as fast polynomial
multiplication. Neither viewpoint is wrong. Just different lenses for looking
at different problems.

~~~
shaklee3
I think I still think of it that way. In EE you are convolving a filter with a
signal in the time domain, which you can think of as a polynomial. Doing the
fft, then a vector multiply +ifft accomplishes the same thing, but using the
fft.

~~~
planteen
How can you think of a continuous time signal as a polynomial though? I don't
know of a continuous time analogy. It seems like the polynomial only works for
the discrete case. And FIR filter convolution is rarely computed by FFT in
FPGAs (in stream applications).

Look at a system doing DSP for example. Let's say you have an input RF signal
that goes into an analog band-pass filter, a mixer, an analog low-pass filter,
a DAC, and a FPGA. The FPGA implements a FIR filter - lets say a Hilbert
transform using a distributed arithmetic algorithm. You are getting envelope
information out of the FPGA. There's no FFT anywhere in the FPGA to compute
this filter output.

You are seeing some weird noise in your DSP output. You dig a little by
looking at the output of the DAC and then look at the FFT magnitude log10. You
see some suspicious spurs popping up, but at really strange discrete
frequencies. You show the spectrum to an RF engineer, then you two go to the
lab and put an analog spectrum analyzer on the input to the DAC. Sure enough,
there are mixing products of your input carrier, mixer frequency, and sampling
clock frequency. The weird spur frequencies make sense now - you are seeing
folding of the mixing products from the sampling. You tweak the sampling clock
to move the spur digital frequencies.

In that long but real world example, I don't see anywhere here where thinking
of things in terms of polynomials would be helpful here. It's also going to
lead to confusion of terminology with your RF engineer - they tend to think
continuous.

~~~
shaklee3
Sorry, I think I misspoke. I didn't mean that I think of it in those terms
normally. working with RF everyday, I think of it as time and frequency domain
in the sense that you were describing. But mathematically I could see how the
polynomial makes sense. Also, I was only thinking about discreet time since we
are talking about the fft.

------
i0exception
If anyone's interested in writing and testing their code for this -
[https://www.spoj.com/problems/MUL/](https://www.spoj.com/problems/MUL/)

