Hacker News new | comments | show | ask | jobs | submit login
Understanding the FFT Algorithm (jakevdp.github.io)
114 points by linux_devil 1571 days ago | hide | past | web | favorite | 37 comments



If anyone is interested in more of the mathematics, I have a few posts on Fourier Analysis (starting with four lengthy posts here: http://jeremykun.com/primers/ ) and a follow-up on actually deriving and computing the FFT here: http://jeremykun.com/2012/07/18/the-fast-fourier-transform/ with a (relatively misguided) application of denoising a sound bite.

My implementation is in almost pure python (except for complex number arithmetic) and it shows.


Your primers look great! Do you have any sources for people interested in learning more about the various topics?


I think you mean about Fourier analysis, yes? Brad Osgood's Stanford lectures (available on YouTube) give a very good (read: steady, unassuming) approach and the lecture notes associated with that course are extremely detailed and informative if you want a deeper dive.


Yep, definitely Fourier analysis, I haven't studied it formally at all.

Thanks for the pointer to Osgood's lectures. A co-worker recommended Stein's "Fourier Analysis" after a bit of work on my part. Do you have any thoughts on that text?


Coincidentally, I was just reading through the wiki pages on Fourier transforms the other evening, and having a fantastically hard time wrapping my head around some parts of it. Maybe the wrong place to ask, but if anyone had on hand some good intuitive (potentially directed more towards the programmer than the data scientist) description of the Fourier transform?


I don't know any data scientists, or even what one is really, but here's the EE explanation: the Fourier transform maps a signal in the time domain to the frequency domain, and vice versa. So if you have a sine wave in the time domain [say something at 3khz: y = sin(3000x)], the Fourier transform would turn that into a pair of Dirac deltas (spikes that are infinitely thin but have an area of one) at omega = +-3000. Or, if you had a pulse, the Fourier transform would map that to a sinc [sinc(x) = sin(x)/x].

The Fourier transform is related to the Laplace transform, which maps signals in the time domain to those in the "S domain", a domain where locations are normally described with complex coordinates which contains information both on frequency of the periodic components of the signal as well as any transients (starting conditions, in layman's terms).


Another approach to understanding fourier transforms at the esteemed BetterExplained: http://betterexplained.com/articles/an-interactive-guide-to-...

This is more about the actual Fourier Transform operation itself, rather than the FFT algorithm. OTOH, it has whizzy interactive animations.

The thing that made it simpler to me was the fact that an FT takes us from an oscilloscope to a spectrum analyser and back again.


If I have a time-varying signal containing a sinewave at some frequency f, and if I want to detect that signal, I could try generating a bunch of sinewaves that were changing at different rates and multiply each by the original signal of interest over time. If one of the sinewaves was close enough in frequency to the signal of interest, the sequential multiplication in the time domain would leave a nonzero result. I could call that a spectral line that identified the original signal. For those test frequencies not matched to the incoming signal, the sequential multiplications would produce a zero result.

It's sort of like tuning a radio, trying to detect a narrow incoming signal -- the signal is only detectable at a specific receiver frequency. The difference is the DFT tunes a spectrum of frequencies simultaneously, and presents its results as a set of spectral lines covering a specified frequency domain.

All the players in this story are complex numbers. This allows an unambiguous multiplication result when the signal is compared to various test frequencies -- for a monotonic frequency signal, the result is zero except at the frequency of interest.

The FFT is a refinement of the DFT that minimizes the amount of computation, but its results are identical to the DFT.

One of the more interesting aspects of Fourier analysis is the role of the Euler formula:

http://en.wikipedia.org/wiki/Euler's_formula

Understanding Euler's formula given one a profound insight into how Fourier and inverse Fourier transforms work. It turns out that an inverse Fourier transform is accomplished by simply changing the sign of a subexpression:

http://arachnoid.com/sage/fourier.html#Fourier_Transform

The same sign change determines whether a DFT converts from time to frequency domain or the reverse:

http://arachnoid.com/sage/fourier.html#Discrete_Fourier_Tran...

HTH


You have a time series of N points. You could use normal curve fitting to fit a constant function + (N-1) sine and cosine waves, and it turns out there is always a perfect fit, whatever your N points are. Discrete fourier transform is "just" a faster way to compute the same result.




Here's a more mathy way of understanding it. You know how Taylor series work in Calculus? You have some differentiable function and you are able to approximate the function arbitrarily well by adding up more and more polynomials.

Turns out you can do the same thing with linear combinations of sines and cosines, with some slightly different conditions on the function. In particular, we need the function to be periodic. This is all that's happening: by adding up sines and cosines so they cancel out "just so," we can approximate other functions. The Fourier transform is the tool by which we calculate the coefficients of the various sines and cosines in the linear combination.

Now, if we relax the notion that we require the function to be periodic, we get the "integral" form of the Fourier transform. We do this by slowly extending the permitted period to infinity since any function is periodic over a closed interval, although perhaps with a period of 1. This shouldn't be too surprising as the integral is a pretty specific sense the "continuous" version of a sum.

When people talk about "frequency domain" what they mean is the Fourier transform "at x" is the contribution of "frequency x" to the overall value of the function. I give you the frequency, you give me its contribution.

In fact, there's a very general theorem called the Stone-Weierstraß Theorem which tells us when you can approximate a function with linear combinations from a set of other functions. You can then go through a similar process to determine the extent to which each of these other functions contribute to the value of the original.

This is a lot of hand waving and will most likely upset any mathematicians in the audience. I left out mountains of details. :)

If you're comfortable with the idea of vector spaces and orthonormal bases, then the existence of the Fourier series is equivalent to the statement that linear combinations of sines and cosines have a dense span in the space of functions whose integral over the entire real line is finite. Yes, our vectors are functions and the sines and cosines don't quite form a basis, but they do form a "dense span," which is another way of saying that for a given function there's some linear combination that can approximate our functions up to any level of precision we desire.

Here's another statement that wraps up Fourier transforms, Fourier series, and all that in one go. Let C([0,1]/{0,1}) be the space of functions continuous on [0,1] where we identify 0 and 1 as the same point to obtain a circle. Consider the family of functions f_n(x) = e^{2πinx} where n is an integer. Then the set {f_n : n is an integer} has a dense linear span in C([0,1]/{0,1}).

One consequence of this is that the f_n form an orthonormal basis for L^2([0,1]), the space of "square-integrable functions" on [0,1] (http://en.wikipedia.org/wiki/Square-integrable_function).

Remember that e^{ix} = cos(x) + isin(x), so f_n(x) = e^{2πinx} = cos(2πnx) + isin(2πnx), which is where the sines and cosines come from.

Sorry for the fact that "pi" and "n" look so similar. It's HN's font.

Anyhow, the above may or may not have been useful, I just thought I'd throw in a more mathy explanation than one usually sees. It didn't really click for me until I was able to relate what was happening with the Fourier series and Fourier transform to how I understood vector spaces and bases.


Oh, he didn't do the bit reversal trick! That's the really cool part :)


I don't know Python, so I may be missing something, but:

I don't understand how the naive approach can be O(n^2) if it involves matrix multiplication.

Surely at best it would be somewhere between O(n^2) and O(n^3), and most likely O(n^3)?


The post means matrix-vector multiplication, which is O(n^2). You're thinking of matrix-matrix multiplication, for which the naive algorithm is O(n^3).

He writes

  X⃗ = M⋅x⃗
and here only M is a matrix. x⃗ is a vector and X⃗ is the discrete fourier transform of it, also a vector.


Total tangent, but man do I love unicode!

Also for future reference, there's a superscript two (² at U+00B2) and three (³ at U+00B3).


Bah, those two are firmly in 8859-1 territory.


There's no matrix multiplication, just vector dot product. (Yes, you can express it as a matrix multiplication, but nurrrr)


I don't think I've seen the dot product there. Don't just throw terms around, it's very confusing for beginners :-). While I may have missed your idea about the dot product, I'm quite sure you can't express a dot product as a matrix multiplication; perhaps you were thinking you can express matrix multiplication in terms of dot products, but I don't see how that answer's OP's question.


A conventional (i.e. with vectors over ℝ) dot product is just a matrix multiplication between a 1×n and an n×1 matrix.


That hardly qualifies as "you can express dot product as matrix multiplication" in the context of the discussion about complexity. I also think it's backwards from a theoretical standpoint. Dot product isn't a cornercase of matrix multiplication, it has a meaning of its own that can be viewed without even introducing matrices.

You can view matrix multiplication as computationally equivalent with a sequence of dot products, but that where stuff ends; otherwise, for instance, matrix multiplication also doesn't have all the properties that the dot product has (e.g. it's not commutative). I know a lot of books take this shortcut, but the two concepts are different enough that, IMO, they do not warrant alexchamberlain's remark.


In all fairness, I misread the article. However, dot product is a very basic mathematical concept all readers of Hacker News should be aware of. Again, expressing the dot product as a matrix multiplication is basic linear algebra.


The book by Cormen et al. has a fantastic explanation of the FFT too.


This is true. I looked up CLR's treatment recently, and it was surprisingly readable (for CLR). It describes the FT as transforming a polynomial between its coefficient representation and point-value representation (n point values are sufficient to fully represent a polynomial of order-bound n). To yield the point values, the polynomial is evaluated at the n complex roots of unity, because these numbers have useful properties.

My background in math is thin when it comes to things like complex numbers, but the explanation more or less made sense to me. CLR was focused on how the FFT was useful for multiplication, so it didn't really go into all the things I would have liked to know.


Python is too slow for FFT. There is no reason to understand how FFT could be implemented in python, if at the end of the day, C or Fortran will be used [1].

[1] http://www.freeware.vasp.de/FFT/fft-pentium.f


Read the article. Quote for you:

> "Though the pure-Python functions are probably not useful in practice, I hope they've provided a bit of an intuition into what's going on in the background of FFT-based data analysis."

There's very little to be gained by reading pure Fortran code, especially if you don't know the mathematical tricks/optimisations used beforehand. The Python code, on the other hand, stays (relatively) readable.

It's not extremely fast (though it stays decent, thanks to Numpy), but it's not the point.


"There's very little to be gained by reading pure Fortran code, especially if you don't know the mathematical tricks/optimisations used beforehand. The Python code, on the other hand, stays (relatively) readable."

Depends. His first Python code was:

  x = np.asarray(x, dtype=float)
  N = x.shape[0]
  n = np.arange(N)
  k = n.reshape((N, 1))
  M = np.exp(-2j * np.pi * k * n / N)
  return np.dot(M, x)
I can write more or less the same in Fortran:

  n = reshape([(i, i=0,len-1)], shape=[1,len])
  k = transpose(n)
  M = exp(-2 * j * pi * matmul(k,n) / len)
  res = matmul(M, x)
But of course Fortran is a typed language, so I need to explicitly write the types for everything, so my whole function gets longer

  pure function dft_slow(x, len) result(res)
    complex, intent(in) :: x(len)
    integer, intent(in) :: len
    complex             :: n(1,len), k(len,1), M(len, len), res(len)
    integer             :: i

    n = reshape([(i, i=0,len-1)], shape=[1,len])
    k = transpose(n)
    M = exp(-2 * j * pi * matmul(k,n) / len)
    res = matmul(M, x)
  end function dft_slow
and so yes, maybe less readable.

P.S. One must also define somewhere global constants

    complex, parameter  :: j = (0,1)  ! imaginary unit
    real, parameter     :: pi = acos(-1.)
for the above function to work. Also everything is by default in single precision here.


I have a sudden urge to find an excuse to write Fortran again.


> There's very little to be gained by reading pure Fortran code, especially if you don't know the mathematical tricks/optimisations used beforehand. The Python code, on the other hand, stays (relatively) readable.

Don't agree. Why trying to understand implementation in the language that will never be used in the production? And, there is a lot to gain by reading Fortran code.


Because a good part of the optimizations are language-independent. And you need to understand these optimizations before going further.

The computations which are redundant and can be done once and stored afterwards don't change depending on whether you're coding in Python of Fortran. The symmetries that you can exploit to compute less stuff don't either. The math behind is always the same.

Once you have done that (and gained two or three order of magnitude in computation time), and you're still not fast enough, then okay, it's time to drop to C/Fortran and start playing bookkeeper with your memory allocations. But not before.


> Once you have done that (and gained two or three order of magnitude in computation time), and you're still not fast enough, then okay,

You are talking about the language (or better implementation) that is compiled to bytecode. The speed increase that you gain by optimization is of the order of magnitude smaller than by just using language that compiles to machine code.

Once you understood how it's done in python, then you need to port it to fortran/c, what is the point?

Fortran is not much different from python in syntax.


> Once you understood how it's done in python, then you need to port it to fortran/c, what is the point?

At that point, why would you port it to fortran/c instead of using FFTW? The whole point of the exercise was to learn about and optimize the algorithm. If you can learn the easiest using python, why wouldn't you do your exploration in python, since going from python to your language of choice should be the easiest part of the exercise if you choose to do that.


> Why trying to understand implementation in the language that will never be used in the production?

It's the underlying mathematics that's most interesting: that's how one goes from O(n^2) to O(n log n). And the mathematics is language independent, so it might as well be demonstrated in a reasonably accessible way.


>Why trying to understand implementation in the language that will never be used in the production?

Because the end goal is understanding the algorithm, NOT writing production code.

Those reading the article will never even write a FFT function anyway, since you can use ready made highly optimized versions. So knowing what the production code looks like is totally moot to them.

It's like you're arguing why ever use pseudo-code. Because it's readable, makes the process easy to follow, and doesn't contain all the boilerplate junk you'd need to get performance in a production implementation. Python serves this role very well, with the added benefit that it's not even pseudo.

Plus, ever heard of prototyping?


The title of the post if just "Understanding the FFT Algorithm", and his goal is to explain the algorithm. He just uses Python instead of pseudocode, to explain things.

The HN title has misled you, in this respect.


The usual approach is to use the foreign function interface and work with FFTW or some other high-performance library.


that's correct. AFAIK the point of the OP was to understand how that "high-performance library" works.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact

Search: