Hacker News new | past | comments | ask | show | jobs | submit login
Fourier Transforms – The Math Trick Behind MP3s, JPEGs, and Homer Simpson’s Face (nautil.us)
356 points by aatish on Nov 6, 2013 | hide | past | web | favorite | 100 comments



This is a great post, but it's a little bit misleading when talking about MP3s and lossy compression and conflates analog fourier analysis with discrete analysis.

When you're talking about a digital signal, it is the sample rate that determines the maximum frequency you can represent. It's not MP3s that "throw out the really high notes" -- it's any digital signal. A discrete fourier transform actually is lossless, but it is bandwidth limited.

The reason audiophiles prefer Flac to MP3s, for instance, is because MP3s do more than just "throw out the high notes." Both are bandwidth limited, but MP3s also throw out other information based on psychoacoustic principles.


I'm an audiophile, and I can tell you from looking at the FFT which settings you used in LAME to make that MP3* :)

Different presets set different lowpass filter values, opting for the best balance between preserving frequencies, achieving a small file size, and the perceived quality. The latter is often measured with various approximations to human perception of sound - psychoacoustics. This is why you can't compare different encoders using an approximation - the algorithms themselves are made using them. It's the Dunning-Kruger effect, but for computers.

You can actually change those presets, so you can hear for yourself what effects changing it has: http://lame.cvs.sourceforge.net/viewvc/lame/lame/USAGE

* Why would anyone bother to learn this skill? Well, I don't mind MP3s (or AACs or OGGs for that matter), and I certainly can't tell the difference between them and lossless formats, but an MP3 that has been re-encoded several times is really atrocious. It's like when you re-compress a jpg a few times, it gets messy. This helps definitively figure out what an MP3 really is or was. Sometimes the tags on the MP3 lie about which encoder was used or what the compression settings were. Other times, the file used to be something else, such as re-encoding a 128kbps MP3 as a LAME V0 preset MP3. Looking at the frequency plot in a sound editor makes this rather obvious, as the lower presets and crummier encoders have much lower lowpass filters.


Thanks for your feedback. Sure, any digital signal is by definition finite in its resolution (the sample rate or bits). I was trying to address the distinction between wave files of the type stored on audio CDs, and MP3s - both digital signals. I agree that the Fourier transform is in principle lossless, but it's particularly useful to use it in a lossy way, i.e. to throw out the least important (to us) components of the signal. If I was particularly misleading about this, I'd like to know.


Well, if you want to be 100% accurate, I think the section talking about how the high notes aren't important could be clarified. The really high "notes" have already been lost when you recoded the wav file digitally. The lossy step of mp3 encoding is not a result of the transform, but what you do with that information and is more complex than just discarding high frequency components.

Also, the the word "note" is confusing in the context of music, since really low notes usually contain a lot of high frequency information.


Correct. (For the author's benefit) We usually call them harmonics, in the context of pitched sounds, but more accurately, the sinusoidal components of any sound are called its partials. Partials differ from harmonics in that harmonics are restricted to be sinusoids with frequencies that are integer multiples of a fundamental frequency. Real world musical notes don't often exactly fit this paradigm [1].

In any case, if discarding high frequency information is all you needed to do to compress, you could simply low-pass filter the time-domain signal. A better description of what goes into MP3 compression is that it omits frequency components in sound that we can't hear because they are shadowed by nearby (in time and/or frequency) components that are louder.

[1] http://en.wikipedia.org/wiki/Piano_tuning#Stretch


For what it's worth, mp3s do not store the signal as a DFT. The frequency domain data is produced as a MDCT (modified discrete cosine transform). A DFT is performed during MP3 encoding, but it's used to apply the psychoacoustic model (which is much more complex than just throwing out high frequencies) to figure out how to layout the frequency bands in the MDCT. I don't believe JPEG uses a DFT at any point, just a DCT.


I think you were kind of hand-wavy, but that's just how it has to be for this kind of article. If you're trying to explain this stuff to a general audience, you can't just jump right into talking about nyquist limits and stuff.

But maybe you could allude to the fact that there's more going on.

Anyways, I liked the article.


Yes, any digital signal has a limited bandwidth

However, MP3s throw more high frequencies than the digital signal at that given sampling rate allows.


Well, sort of. If you have a wav file and an mp3 file, both of which have a sample rate of 44kHz, they will both be able to represent the same maximal frequency of 22kHz. The mp3 wouldn't necessarily discard the high frequency information, but it may do so when it is deemed that the sound wouldn't be perceivable.


But a lot of MP3s do have a 16kHz shelf and higher frequencies are aggressively discarded.


of course, analog or discrete, without the fourier (and related) transforms, you can't even do the analysis that allows you to "throw out information".


There's a bit of a mischaracterization of the way 2 dimensional Fourier transforms operate on images in this post. The 2D DFT (or DCT rather) doesn't deal with anything as complex as tracing shapes on a 2D plane like seen in the video. What it does is treat each 1 dimensional line of the image as a signal/waveform, with the pixel intensity as amplitude. Fourier-family transforms are separable, so the 2D (and ND) case is equivalent to the transform of each line followed by a transform along the resulting columns. So the actual representation that arises from this looks like so, with each image representing the corresponding cosine component: http://0x09.net/img/dct32.png (here grey is 0)

Visually most of the sinusoidal components here are zero or nearly so. However if we scale them logarithmically, we'll see that it's actually not so: http://0x09.net/img/dct32log.png

What transform coders like JPEG do is reduce the precision of these components, causing many of them to become zero. Which is good for the entropy coder, and mostly imperceptible to us. Of course JPEG operates on 8x8 blocks only * rather than a whole image like here.

It's hard to imagine this as an image, so here's a progressive sum starting from the second term, which essentially demonstrates an inverse DCT: http://0x09.net/img/idct32.png

mind that 0 is adjusted to grey in this rendering, and the brightness of the result is not an artifact of the transform.

It's easier to understand what goes on with these transforms if you can visualize things in terms of the basis functions. Which in the case of a 32x32 image like above would be http://0x09.net/img/basis.png (warning: eye strain).

All the examples above pertain to the DCT, partly because of JPEG and partly so I could avoid getting phase involved, but the principles apply equally to the other transforms in the family.

* although recent versions of libjpeg can use other sizes


Ah such a shame. Tried to resist but I can't help pointing out you missed a chance to explain how when Fourier discovered the expansion, it really was an example of "one weird trick all the Mathematical Physicists will hate him for".

It was during his time in Grenoble that Fourier did his important mathematical work on the theory of heat. His work on the topic began around 1804 and by 1807 he had completed his important memoir On the Propagation of Heat in Solid Bodies. The memoir was read to the Paris Institute on 21 December 1807 and a committee consisting of Lagrange, Laplace, Monge and Lacroix was set up to report on the work. Now this memoir is very highly regarded but at the time it caused controversy.

There were two reasons for the committee to feel unhappy with the work. The first objection, made by Lagrange and Laplace in 1808, was to Fourier's expansions of functions as trigonometrical series, what we now call Fourier series. Further clarification by Fourier still failed to convince them. As is pointed out in [4]:-

"All these are written with such exemplary clarity - from a logical as opposed to calligraphic point of view - that their inability to persuade Laplace and Lagrange ... provides a good index of the originality of Fourier's views"

http://www-history.mcs.st-and.ac.uk/Biographies/Fourier.html


These rotating circles that are referenced by the article will forever remain a cautionary tale for me: http://blog.matthen.com/post/42112703604/the-smooth-motion-o...

In pre-Copernican astronomy, where the Earth was considered the center of the cosmos, the erratic orbits of the other planets relative to us were explained away as epicycles within perfect circles. Later, closer examination of the orbits required modeling the orbits as epicycles within epicycles, exactly as depicted in the rotating circles above. In the end, it turned out that there aren't epicycles in the orbits, they were just creating a Fourier transform to explain the orbital paths. We now know that you can create a Fourier transform to generate any arbitrary path.

I sometimes wonder how many of our modern physics models, such as the standard model, differential geometry, and M-theory, are just highly sophisticated versions of the Fourier transform.


ImageMagick has a nice tutorial on the use of Fourier transforms in image processing, if you want to get a more intuitive feel on the subject: http://www.imagemagick.org/Usage/fourier/


Really interesting, thanks!


But why do they use holocaust_tn.gif as an example filename?



They really are a beautiful thing, it is unfortunate that most computer science majors don't get at least a basic introduction to signal processing as part of their standard curriculum. Especially because in my experience, the best way to understand a Fourier transform is to implement it in a program, feed in different signals, and wait for the light in your mind to go off.

Images and audio signals provide a particularly stunning insight.


They don't? At University of Sydney every engineering degree does 2 years of maths and that includes Fourier Series.


Nope, I only learned about it because my cousin went to the same school and was doing his electrical engineering graduate degree at the time. He had a book on signal processing lying around and that's where I was introduced to it.


It's interesting to note that Fourier wasn't trying to any of the things that the Fourier Transform is commonly used for today, namely signal processing. He was trying to solve heat transfer equations when he came up with the Fourier series. I'm not even sure if he was that interested in the Transform as such, e.g. looking at a signal in frequency space and then efficiently applying filters before transforming back to time domain.

My dad remembers his professor, sometime in the 40s, posing the question of calculating when a worm buried in the ground would experience the same temperature we'd experience at Christmas (Erdwuermchen's Weihnachten) and the solution had to be calculated with Fourier's heat transfer equations.


when it first came out most people ridiculed it for being a mere intellectual curiosity.

I would characterize the fourier transform (especially the DFT) as the single most important mathematical innovation that enables the interface between the digital and analog world.


I agree although there's a continuum from Fourier to Cooley-Tukey and so on so it's hard to pinpoint which part is the most important.

Also, maybe it's more beloved by engineers (like myself) than mathematicians.


Hi Hacker News - I'm the author of the piece, also on twitter @aatishb. Look forward to hearing your thoughts. I encourage you to share your thoughts and insights with other readers by leaving a comment on the post, particularly if you know of other interesting applications about the Fourier transform. Cheers!


OK, since you asked:

> The sound wave produced by a piano note is a simple sine wave.

No, it's not. A piano note is a complicated stack of overtones (some of which are harmonic and some of which aren't) and transients. If it was just a sine wave, it would sound like a sine wave and not like a piano.

This is part of why things like Shazam are so difficult: musical notes aren't just a single frequency in the FFT, they are a stack of them.

> You could just tell them a handful of numbers—the sizes of the different circles in the picture above.

This is actually the exact same number of numbers as in the time-domain series. Taking the FFT on its own doesn't reduce the amount of data, it's about discarding or compressing some of the frequencies after you do.

> The really high notes aren’t so important (our ears can barely hear them), so MP3s throw them out, resulting in added data compression.

This is only part of what MP3's do, and at high bitrates this really is inaudible. The main source of compression is that the precision of the numbers used to represent the amplitude of the sine waves is reduced.

When you have a loud sound and a quiet sound at the same time, the loud sound will "drown out" the quiet one (called "auditory masking"). You won't be able to hear the quiet one, or one be able to hear it precisely. MP3 and other audio codecs take advantage of that by encoding quieter frequencies with less fidelity when there are other louder frequencies at the same time. You don't notice the loss of precision since it's buried under louder sounds.

> Just as MP3s throw out the really high notes, JPEGs throw out the really tiny circles.

This is off too. If JPEG discarded high-frequency signals, you would just be blurring the entire image. It would be exactly like saving it scaled down and then scaling it back up with some smooth interpolation.

Obviously, JPEGs don't appear to be stretched out thumbnails, so that isn't what happens. Instead, it's not that high-frequency signals are discarded, it's that their precision is reduced.

Human eyes are quite good at detecting sharp edges and fine details. What they aren't good at is detecting how sharp an edge is. We can definitely see a break between two colors, but we can't accurately detect the magnitude of that the difference.

JPEG takes advantage of that by rounding off those high-frequency variances to nearby values. That means there are fewer possible values at high frequencies, so fewer bits are needed to encode them.

I realize I'm being a negative Nancy here. I really liked your post and I agree 100% on how awesome the Fourier transform is. It's also quite hard to describe it in an approachable way, and you've done an admirable job. I just get bugged when simplifications for a lay audience are actually off the mark.


>> The sound wave produced by a piano note is a simple sine wave.

> No, it's not. A piano note is a complicated stack of overtones (some of which are harmonic and some of which aren't) and transients. If it was just a sine wave, it would sound like a sine wave and not like a piano.

This is really such a fundamental* aspect of frequencies and sound, it should have been the starting point of the article. "Look how noisy and squiggly this wave is, but with FFT we can see that it really only consists of waves at multiples of the same frequency".

* no pun intended


This is actually really valuable feedback and one of the things I like about the HN audience. I struggle to strike a balance between precision and simplicity, and your comments raised many interesting subtleties that I had overlooked. So thanks for helping me get a better handle on this.


Auditory masking is more than that. The psychoacoustic model used for MP3s will discard some frequencies during others, like you said, but it will also discard "information" proceeding and following an acoustic event.

While not a perfect analog, it is similar to how vision works. If you are in a bright room and the lights go out, it takes you a little time to readjust to the darker room before you can see things again. While your irises were adjusting, the chair and take were still there, and light reflecting from those objects were still reaching your eyes, but you were unable to see them. It works the other way too. If you are still in that dark room and the lights come on, it will take your eyes a little time to readjust, less time then it took your eyes to adjust to the dark, but still your eyes are effectively discarding that information.

The psychoacoustic model in MP3s is even more bizarre. It turns out that some frequencies, when heard BEFORE another frequency, your brain will throw that first frequency away. It is unintuitive, but that has been proven in laboratory settings. Knowing how the majority of humans discard the same auditory information under different conditions, MP3s are compressed by throwing away the same information that the brain would normally discard. The Fourier transform is a significant part, but it isn't the whole story.


I love the FFT even more than you and enjoyed that it is getting lauded, but would have found more algorithmic details even more interesting. Breaking down DFT, etc. and then showing the performance magic of FFT is a great way to approach discussion of many issues in problem analysis and algorithm design.

So, Nice enough article for slipping into the topic - now give me more! harder! faster!


Sparse fast Fourier transform is even more "magical" than fast Fourier transform (FFT). If you assume that the discrete Fourier transform (DFT) has only k non zero coefficients, then, there exists an algorithm to compute it in O(k log(n)). That's right, you do not have to see the entire signal to compute the DFT, which is pretty awesome.

If you are interested, see http://groups.csail.mit.edu/netmit/sFFT/ .


I work in this field (though I'm none of the authors). If anybody has any questions about the sFFT, let me know.


> This is actually the exact same number of numbers as in the time-domain series.

In a way, it's twice as many, since they have both real and imaginary components.


Nope - the amount of numbers is in fact identical. The discrete Fourier transform has redundant information in bins above NFFT/2 (where NFFT is the size of the transform/number of samples in the signal) for purely real valued input. This means the "number of numbers" is the same - NFFT/2 bins of necessary complex values after transformation, versus NFFT real values in the original series.


Ah yeah, forgot about that. Thank you.


I liked your article, however, it is not so much the Fourier Transform itself that makes possible all these wonderful applications, but rather the Fast Fourier Transform, which you fail to mention even once.

The "naive" Fourier Transform has a pleasing mathematical simplicity and makes it very useful for deriving all sorts of theoretical properties for scientific analysis, but in practice it has a computational complexity of O(N^2), that would quickly make all those applications you name in your article utterly infeasible.

Instead, only with the discovery of the Fast Fourier Transform algorithm roughly halfway the 20th century[0], lowered the complexity to a very manageable O(N log N). And only then all these applications and properties of the Fourier Transform came within computational reach. Without the FFT, the Fourier Transform would mostly just be useful on paper for functional analysis and math proofs, things like that, but not in practice and we'd be missing out on all those MP3s, JPGs and speech recognition.

It's definitely not an overstatement when the WP article says: Fast Fourier transforms have been described as "the most important numerical algorithm[s] of our lifetime".

[0] or 1805, depending: https://en.wikipedia.org/wiki/Fast_fourier_transform "The basic ideas were popularized in 1965, but some FFTs had been previously known as early as 1805."


Awesome article! Here are two things that stuck out for me...

1.) In the first part of the article talking about decomposing a periodic signal into sinusoids of different frequencies - it would be more correct to refer to it as the Fourier _Series_, which is different from the transform.

2.) The image showing three sinusoids summing together into a non-period signal made me cringe (the sum is also periodic - that's the whole point of the Fourier Series).


1) ...which when applied to a sampled signal, is the discrete-time Fourier transform (DTFT)

2.) It will only be periodic if there exist some frequency for which the frequencies of the sinusoids are integer multiple.


Another cool thing is that orthonormal bases are not unique - there are many other basis functions that you can choose beyond just sine and cosine to decompose a function (or digital signal). Though they are a natural choice if you are specifically looking to analyze periodicities.

One direction to go in for further study:

https://en.wikipedia.org/wiki/Wavelet


Yes! Same goes for pulling eigenvectors out of a data set e.g. for PCA (Principal Components Analysis) -- those form an orthonormal basis. You can even pick a set of random orthogonal vectors of the same number and dimension as your original data and re-represent the data with no loss of information.

I had a lot of fun demonstrating this concept for reconstructing images via a Processing sketch a few years ago and still use it for teaching from time to time. All source code for quick and dirty Haar, Eigen, Random, and Fourier-like methods included here: http://www.cc.gatech.edu/~phlosoft/transforms/


The newer JPEG algorithm JPEG2000 uses Wavelet Transforms which is kind of similar to the Fourier. The Fourier applies a finite window then decomposes into a sum of infinite waveforms. The Wavelet on the other hand applies no windowing function, and directly decomposes the signal into a sum of _finite_ waveforms.

The Fourier has the disadvantage that you can't arrange the components into a time hierarchy; that is, no component occurs "before" any other.

The Wavelet transform _does_ have a natural time hierarchy. This makes it much better for streaming compression like voice calls.

The Fourier perfectly describes signals of infinite duration (think tone or color) while the Wavelet perfectly describes the position of things within a signal (think rhythm or space).

With the Fourier filtering is really easy. You can do hard, hard cutoffs -- literally no contributions within a certain frequency band -- just by removing components of the decomposition. Similarly, you can accurately apply any arbitrary mathematical filtering function.

The disadvantage of the Wavelet is that, well, the only meaningful transformation you can apply to it is compression -- dropping the shorter timescale components. If you want to filter, it's not enough to trim off timescale components because the wavelet itself can contain any frequency components. There's also nothing like a simple mathematical function you can apply to the coefficients to get a smooth filter.

Neat!


> You could just tell them a handful of numbers—the sizes of the different circles in the picture above.

Maybe I'm crazy and just missing something, but this feels a little too good to be true. This would put the set of smooth curves in 1-1 correspondence with the set of finite sets (since each curve is being specified completely by a finite set of numbers). But the set of finite sets is countably infinite since it's a countable union (this may require the axiom of choice) and the set of smooth curves is uncountably infinite, a contradiction.

(Disclaimer: I know nothing about Fourier analysis.)


It's not actually a finite set of numbers - a continuous Fourier transform would have an infinite number of amplitudes, for every possible frequency.

However, you can very closely approximate most curves with a relatively small number of amplitudes.


First of all, there are four kinds of "Fourier transforms" in common use, so the cardinality problem that you have observed cannot really be discussed until we pick one to talk about. For instance, the FT maps uncountable to uncountable and the DFT maps finite to finite (and can be understood with intro Linear Algebra as a change of basis). The things we are calling countable and uncountable are actually dimensionalities of vector spaces, since real multiples of a single function would produce an uncountable set. Anyway, here are the types of transforms:

D=continuous domain, d=discrete domain, E=infinite extent, e=finite extent

time domain <-> frequency domain

Fourier Transform: DE<->DE

Fourier Series: De<->dE

Discrete Time Fourier Transform: dE<->De

Discrete Fourier Transform: de<->de

The cardinality mismatch shows up for FS and DTFT. It is resolved through a different notion of equality than you may be used to. The "distance" between two functions can be measured using several norms, and when this distance is zero we claim two functions are equal. L1, L2, and Linf are the common ones. Linf corresponds to pointwise equality, which is what you assumed (not unreasonably) that they meant. L1 and L2 do not.

L1 norm: d(f,g)=integrate(abs(f(x)-g(x)),a,b)

L2 norm: d(f,g)^2=integrate(abs(f(x)-g(x))^2,a,b)

Linf norm: d(f,g)=max(f(x)-g(x)) over the interval (a,b)

Fourier Series only guarantee reproduction (take the transform and then take the inverse to get the reproduced curve) up to the L2 norm. This makes physicists and engineers happy, since L2 norms correspond to energy measurements, and if the difference between two physical quantities has no energy then it's effectively irrelevant to physical processes.

The remaining hurdles for mathematicians are to show that

1) The reproduction coefficients found by the transform are the best possible coefficients (the curve they generate is closer to the original than for any other set of coefficients of the same size). This is a trivial proof using inner products that usually goes by a name like "best approximation theorem" or "generalized pythagorean theorem."

2) Reproducible curves are dense in the space of actual curves. That is, there is always a reproducible curve that has zero L2 difference from an arbitrary input curve. If this is true, then by #1 we know that the transforms will find it. Unfortunately, this proof is quite involved. A "quick and dirty" method uses the Stone-Wierstrass theorem (polynomials can reproduce continuous functions with arbitrary L1 or L2 fidelity + the FS can reproduce polynomials with arbitrary L2 fidelity). A better method arises in the context of Sturm-Liouville theory that generalizes "density" to solutions of a large class of differential equations. This is important for physicists and engineers because it justifies normal mode expansions even when the normal modes aren't perfect sinusoids.

If you want to know more, any linear algebra book that discusses inner products should hit on #1. For #2 you want an Analysis textbook. Analysis is a tough subject so it's far more important to be sure that the book starts at your level and has an understandable proof of Stone-Wierstrass than it is to be sure that it covers this exact topic. If it doesn't, supplement it with the first few chapters from "Completeness and Basis Properties of Sets of Special Functions" by Higgins and you'll be set.


Some math geek nitpicks to a generally good post:

The things we are calling countable and uncountable are actually dimensionalities of vector spaces...

The dimensionality of L^2(R) is still countable. Proof: f(k,j,x) = e^{2 pi i k x}, x in [j,j+1], k and j both integers forms a basis. So does H_n(x) exp(-x^2/2), for H_n the Hermite polynomials.

The Fourier transform merely does not map L^2(R) -> l^2(Z) in this case - it maps L^2(R) -> L^2(R).

That is, there is always a reproducible curve that has zero L2 difference from an arbitrary input curve.

This isn't what density means. Density means that for any epsilon, there is a reproducible curve with L2 distance < epsilon to that curve.

Linf norm: d(f,g)=max(f(x)-g(x)) over the interval (a,b)

By most standard definitions, this is only true almost everywhere. A typical definition is ||f(x)||_{\infty}= Lim_{p -> infty} ||f(x)||_p, and this allows two functions to differ on a set of measure zero.

As an example, consider f(x)=1 and g(x)={0 on rational numbers, 1 on irrational numbers}. These two functions are equal in any L^p space, and are hence equal in L^infty as well.


>> The dimensionality of L^2(R) is still countable.

At this point I was implicitly talking about Linf(R). Does it still have countable dimension? With the sup norm I'm pretty sure the answer is no, but with Lp taken at p->inf maybe it does?

In any case, thanks for cleaning up the rough edges.


Don't know off the top of my head. I suspect it does have countable dimension but I don't know how to prove it.

The space of continuous functions with the sup norm is actually a much smaller space than the space of L^\infty - the former is not even dense in the latter.

I suspect actual functions on R with the max norm probably is uncountable, but that's also a very weird space. The overwhelming majority of functions in there are unmeasurable.


A related point of entry is the "Gibbs Phenomenon" -- http://en.wikipedia.org/wiki/Gibbs_phenomenon .

If I remember my "history of the Fourier transform" correctly, before the difference between the sup-norm and L2 norms were fully understood by everyone (e.g., by the late 1800s), the sense of convergence of a Fourier series to a square wave was confusing. In what sense did the Fourier series represent the square wave? When was it a good model and when would it fail?


> Fourier Series only guarantee reproduction (take the transform and then take the inverse to get the reproduced curve) up to the L2 norm

This completely resolved the confusion. Thanks very much for typing this up.

This is one of those details which is probably pretty unimportant for the lay reader, but for someone with some mathematical training (outside of this specific domain) its omission threw a big red flag for me.


I wish I could give this post 5 upvotes.


In the article I described the discrete fourier transform, but there is a continuous version, where a signal is represented by an infinite sum (really an integral) of component frequencies. This is the version that is more generally useful. Here's a great animation by LucasVB of the continuous fourier transform in action: http://upload.wikimedia.org/wikipedia/commons/a/a3/Continuou...


I don't know much more about Fourier analysis than you, but theoretically it's an infinite series. So to describe an arbitrary smooth curve, you need a potentially infinite number of circles (or sin terms). I'm not sure if it's a countably infinite series or not. I think in principle you could have a component for every number in the real line.


For a periodic input, it's countably infinite (Fourier series [1]). For aperiodic, its uncountably infinite ((continuous) Fourier transform).

I don't think this implies that there are a countably infinite number of smooth, periodic curves though, because the each one of the parameters is a real.

I'm not a mathematician though, so I could be abusing these concepts...

There are two more major Fourier transforms: the discrete Fourier transform, which maps a discrete aperiodic function to a continuous periodic (i.e. band-limited) frequency domain representation [3], and the discrete-time Fourier transform [4], which maps a discrete period (i.e. time-limited) function to a discrete period (i.e. band-limited) frequency representation.

Notably, in the applications mentioned in the article, what's really being described is the short-time discrete-time Fourier transform (STFT [5]). In this most useful variation, you chop the input signal of arbitrary duration into fixed-sized (and possibly overlapping) chunks, (usually) apply a windowing function to temper down the edges of the chunks, and then apply DTFT to each chunk. The result is a sort of spectrogram -- a time-frequency representation of the input. In image processing this is done in two dimensions.

The famous FFT [6] is a O(n * log(n)) algorithm for calculating DTFTs.

[1] http://en.wikipedia.org/wiki/Fourier_series

[2] http://en.wikipedia.org/wiki/Fourier_transform

[3] http://en.wikipedia.org/wiki/Discrete_Fourier_transform

[4] http://en.wikipedia.org/wiki/Discrete-time_Fourier_transform

[5] http://en.wikipedia.org/wiki/STFT

[6] http://en.wikipedia.org/wiki/FFT


If you are interested in audio fingerprinting using the FFT check out my IPython notebook that explores this idea in more detail. I used a spectrogram and image processing tools to identify what a given audio sample is. You should be able to download the notebook and run all the examples.

http://jack.minardi.org/software/computational-synesthesia/


There is also an extremely important property that would be worth an article of it's own. Namely the fact that pointwise multiplication of 2 fourier transformed functions is the same as convolution of the functions themselves.

What does this mean in practice? Let's take a simple gaussian blur for images. A single output pixel is formed by overlapping the gaussian kernel on top of the image, multiplying then pointwise, then summing the result. Repeat for every pixels. What you can also do is take FFT of the gaussian kernel and multiply it with the FFT of the image and inverse transform and you will get the same result as actually calculating it for every point separately. Is this faster than doing it point by point? Depends on the blur radius.

You can do awesome things with this blazingly fast. As an example a simple water wave simulation can be made by simply taking a fourier transform, multiplying it with the dispersion relation of the water waves and then doing an inverse transformation. http://www.youtube.com/watch?v=MTUztfD2pg0 Just like what is done here. Normally this convolution would take O(N^2) amount of operations where N is amount of vertices but with FFT it's O(N log N).

FFT is for convolution what quicksort is for sorting. Imagine how limited would you be if all your sorts would take O(N^2) time. The examples I gave are quite limited in scope, going trough all the applications of convolution would take textbooks. It's probably one of the most important concepts in electrical engineering.

Oh yeah, convolution is just like cross correlation except in another case the function is reversed. So you can imagine the applications in data mining etc.

All in all Fourier Transform, and related ones, is an extremely huge and massively important concept, it's hard to overstate it's usefulness.

Personally I can say that I've used filter design tools to make a really smooth accelerometer data processing function. It does not jump around like a raw signal does nor does it lag a lot just like the standard exponential smoothing does.


Do you have any write-up on the accelerometer data processing? I'd be interested. Thanks!


I would add to your list of reasons why Fourier Transforms are awesome.

By complete coincidence, Bragg's law, used to do everything from X-Ray Diffraction to particle scattering, just happens to be a fourier transform. Every time we bombard a tiny thing with light or radiation in order to understand the structure, what we literally get out of it is emission dots that correspond to the periodicity of the lattice -- literally the 2D fourier transform of the scattering cross section. When I heard that in Quantum III, it blew my mind. It's straight out of quantum scattering theory.


I find this kind of stuff fascinating.

Are there any decent books (kindle or proper books) with this kind of content? I've got no background in Maths (other than some (UK) A-level maths at school), but always love reading these sort of posts.


aatish, since you seem interested, I'll throw another fun-fact at you that I found very interesting even after years of working with Fourier transforms:

You can view the Fourier transform as a fitting problem. Yes, you fit the data to a function. Ie you take the data points and fit it to a sum of exponential functions. There is actually a much more general approach called "Prony method" that extends the concept and adds a dampening factor into the function to fit:

http://www.engr.uconn.edu/~sas03013/docs/PronyAnalysis.pdf

You can take it further and use matrix pencil methods and eventually you'll see connections to ESPRIT algorithm and even least squares algorithm. It's really interesting how they're all actually connected.

Cheers


EDIT: I mixed up high vs low frequencies, as the reply pointed out, so I've edited this to be correct now.

The reason we can get away with throwing away low frequencies in JPEG is because humans are prone to notice significant details rather than tiny details.

High frequencies of a Fourier transform of an image == tiny detail (like being able to distinguish individual hairs)

Low frequencies of a Fourier transform of an image == huge details (like someone's face).

So you transform, set part of the result to zeroes, and compress. To display it you uncompress, transform back, and display it. The zeroes manifest themselves as an almost-imperceptible blur.


You don't generally throw away the high frequencies in JPEG, you throw away the low-amplitude frequencies. Unless you're talking frequencies that are so high they're really just artifacts of the segmentation of the image (sharp borders show up in every frequency range, so smoothing them a bit helps).


To illustrate, here is an example of zeroing the lowest frequencies:

http://imgur.com/a/wVjYk


Oh, so your brain can still mostly recreate the shape even with no low frequencies!


Depends on how far away from the image you are, or how big it is. By taking the low frequency from one image, and the high frequency from another, you can get some interesting results. Take a look at this image, first sitting near the screen, then step away a couple of meters.

http://cvcl.mit.edu/GroupFaceHybrid.jpg


Holy wow. That's freakin' awesome!

Open that image in a new tab, then hold down CTRL and scroll up/down to zoom in and out. When you zoom out to 25%, each of them switches from smiling to frowning or vice versa. The lady on the right is clearly smiling at 25% zoom. I had no idea that was possible.


I would love to know more about FT's, along with FFT's and how they help with for example signal processing or finding a signal when looking at a sample or multiple samples of a SDR.

Are there any good books/papers/web articles on this topic that are accessible? I often find myself reading papers where some of the math goes over my head.

Something with examples/code (code makes me understand math so much easier!) would be fantastic!


Richard Lyons - Understanding Digital Signal Processing

Focuses more on explaining the concepts behind the math than presenting a wall of theorems. Given that math is the language of DSP though, there's still a reasonable amount of math.

It assumes the reader has an EE or similar background, but I think it's still fairly approachable regardless. Given that my own background is in EE/embedded systems though, I'm not sure what my opinion counts for there.


I have taken classes in electrical engineering and embedded systems, so I am fairly familiar with them.

I will take a look at the book, thanks!


I'm sorry, but what's an SDR?


Software Defined Radio



The Shazam algorithm -- I don't want to be all cynical and dumpy because it's not like I remember exactly how it works either (it's proprietary, after all... and even the explanation I was given was not definitive) but one of my Music Information Retrieval professors once described his anecdotal knowledge of it. It was based on some features derived from FFT for sure but didn't seemed very concerned with note identification, if at all. There are a ton of features that can be post-processed from FFTs that can't be equated to "pitch". Beware misleading analogies... the frequency domain (& quefrency, etc. etc.) is a difficult space to conceptualize.

And when you get into machine learning, some of the operations performed by neural networks and the like don't really represent super linear, human-understandable transformations. It's important to understand feature extraction, but more important in the grand scheme of these things is to understand how to dig data that is useful and how it can be used.


There is a paper [1] describing the algorithms used by Shazam.

[1] http://www.ee.columbia.edu/~dpwe/papers/Wang03-shazam.pdf


awesome I will give this a read for sure. Thanks!


If you would like a rigorous but equally enthusiastic and readable treatment of Fourier transforms, then you can't do better than the (free!) book The Scientist's and Engineer's Guide to Digital Signal Procesing: http://www.dspguide.com/


Why is the phase not even mentioned in the article?


This. Phase is the most important subtle point about the FT that everybody misses the first time around.

The Fourier Toy mentioned in the article: http://toxicdump.org/stuff/FourierToy.swf

Notice how you can click and drag to change the size and initial phase of the circle widgets. Try changing the initial phase of any of the larger components without changing the size and see what happens. It goes haywire!

The first FT toy that I wrote years ago also ignored phase. It took me forever to figure out why my reconstructed images looked like crap.

Turns out you can't just throw away half of your transform data (you get frequency and phase for each component you care about) without being fabulously clever.


When we first covered the FS (Fourier Series) and FT (Fourier Transform) and the relation between the two in our Signals and Systems course in EE, I was amazed. It was the greatest thing I've ever learned, and I think it'll be hard to top.

Once you understand the FT, you basically understand how a signal is structured. By converting (or transforming) a time signal to the frequency domain, one can clearly see what frequency components (or harmonics) contribute to said signal. If one were to try the same in the time domain, it would be much more difficult to visualize.


"One weird trick that made pure mathematicians hate him"


Coincidentally, I just had a talk with one of our principle developers about Fourier transforms. He's an audio expert and was trying to explain re-sampling and aliasing to me. I understand the high level steps, but the math is all a blur to me. Recently I've been trying to become much stronger in math, as I eventually want to study aerodynamics and astrophysics. So I've been studying calculus (textbook) and dynamics (edx) lately.


Concerning sampling and aliasing, you might find this video series from xipf interesting: http://www.xiph.org/video/


For those of you seeking an intuitive understanding of the Fourier transform, checkout http://betterexplained.com/articles/an-interactive-guide-to-...

For more details, check out Steven Smith's Digital Signal Processing. The entire book is available to read online, and has an excellent treatment of DSP algorithms.


This article was interesting but didn't really tell me anything I don't already know. Does anyone know where I can find a good article that actually explains the mathematics of performing a Fourier transformation? I thought that is what this article was going to be about.


I'm not sure what level you're looking for, but I found this really good: http://www.mikeash.com/pyblog/friday-qa-2012-10-26-fourier-t...


I found the book, "Who Is Fourier?: A Mathematical Adventure" to be an intuitive explanation of the Fourier Series and transform: http://amzn.com/0964350432


We learned about Fourier Series in my diffyQ course. The basic idea is that any periodic function( like say y=sinx) can be approximated by an infinite sum of sines and cosines.


I like how you make it sound so incredibly easy - especially thinking back to how much I struggled with the math behind this :-) (To be very clear: There's nothing wrong with explaining things in a simple way and leaving out the scary parts)

Great post.


Elegantly explained. Does anyone know of similar posts on other transforms like Z, etc?


It's not a short pithy blog post, but this is a good intro to the Laplace transform: http://ocw.mit.edu/courses/mathematics/18-03-differential-eq...



I'm going to kill the next person I who writes a Fourier Transform article and doesn't talk about the phase data. Its complex in complex out, if you input is real you still get complex out!!!


What you call a "squarish" looking wave is what I call an unfortunate failure of the Fourier transform, that it takes infinite bandwidth to represent a square wave.


What an amazing article.

It has tied together a bunch of seemingly separate ideas that I've often wondered about, and I feel measurably more intelligent having read it.


>The really high notes aren’t so important (our ears can barely hear them), so MP3s throw them out,

Quantization is not the same thing as "throwing out".


What an excellent article. I wish every teacher had the ability to be so clear and concise and most importantly interesting in their work.


Theres a nice approach to this through linear algebra and the discrete cosine transform (DCT), as just another base.


Another family for those interested in FTs are Discrete Cosign Transforms (DCTs). Also widely used.


If your into software and you don't know a out fourior transform, your not into software. this is something programmers without math will discover, along with Pythagorean theorem and basic trig. Otherwise, you are an over-hyped semantic duck-taper.


This needs to be upvoted rather than downvoted.


I disagree.

It's needlessly inflammatory and misspelt throughout. (Some people just have trouble spelling, and that's fair enough. But writing "a out" instead of "about" and not fixing it is just lazy and disrespectful to readers.

AsymetricCom would probably have got a different response had s/he written something like this instead:

"If you're really into software and want to be more than a semantic duck-taper, you need to know about Fourier transforms. Just like the Pythagorean theorem and basic trig, sooner or later you'll find you need it."

(Note 1. Although I have seen a whole lot of Fourier transforms in my time, I don't agree that you can't be truly "into software" without them. Note 2. It should probably be "duct tape" rather than "duck tape" but (a) the history is really complicated -- see [1] for some details -- and (b) I like the parallel with "duck typing"[2].)

[1] http://www.worldwidewords.org/qa/qa-duc4.htm

[2] If it walks like a duck and quacks like a duck, it is a duck. I think the term "duck typing" originated in the Python community, though Python's by no means the only language to have done a lot of things this way.




Applications are open for YC Winter 2020

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: