Hacker News new | comments | ask | show | jobs | submit login

Hi Hacker News - I'm the author of the piece, also on twitter @aatishb. Look forward to hearing your thoughts. I encourage you to share your thoughts and insights with other readers by leaving a comment on the post, particularly if you know of other interesting applications about the Fourier transform. Cheers!



OK, since you asked:

> The sound wave produced by a piano note is a simple sine wave.

No, it's not. A piano note is a complicated stack of overtones (some of which are harmonic and some of which aren't) and transients. If it was just a sine wave, it would sound like a sine wave and not like a piano.

This is part of why things like Shazam are so difficult: musical notes aren't just a single frequency in the FFT, they are a stack of them.

> You could just tell them a handful of numbers—the sizes of the different circles in the picture above.

This is actually the exact same number of numbers as in the time-domain series. Taking the FFT on its own doesn't reduce the amount of data, it's about discarding or compressing some of the frequencies after you do.

> The really high notes aren’t so important (our ears can barely hear them), so MP3s throw them out, resulting in added data compression.

This is only part of what MP3's do, and at high bitrates this really is inaudible. The main source of compression is that the precision of the numbers used to represent the amplitude of the sine waves is reduced.

When you have a loud sound and a quiet sound at the same time, the loud sound will "drown out" the quiet one (called "auditory masking"). You won't be able to hear the quiet one, or one be able to hear it precisely. MP3 and other audio codecs take advantage of that by encoding quieter frequencies with less fidelity when there are other louder frequencies at the same time. You don't notice the loss of precision since it's buried under louder sounds.

> Just as MP3s throw out the really high notes, JPEGs throw out the really tiny circles.

This is off too. If JPEG discarded high-frequency signals, you would just be blurring the entire image. It would be exactly like saving it scaled down and then scaling it back up with some smooth interpolation.

Obviously, JPEGs don't appear to be stretched out thumbnails, so that isn't what happens. Instead, it's not that high-frequency signals are discarded, it's that their precision is reduced.

Human eyes are quite good at detecting sharp edges and fine details. What they aren't good at is detecting how sharp an edge is. We can definitely see a break between two colors, but we can't accurately detect the magnitude of that the difference.

JPEG takes advantage of that by rounding off those high-frequency variances to nearby values. That means there are fewer possible values at high frequencies, so fewer bits are needed to encode them.

I realize I'm being a negative Nancy here. I really liked your post and I agree 100% on how awesome the Fourier transform is. It's also quite hard to describe it in an approachable way, and you've done an admirable job. I just get bugged when simplifications for a lay audience are actually off the mark.


>> The sound wave produced by a piano note is a simple sine wave.

> No, it's not. A piano note is a complicated stack of overtones (some of which are harmonic and some of which aren't) and transients. If it was just a sine wave, it would sound like a sine wave and not like a piano.

This is really such a fundamental* aspect of frequencies and sound, it should have been the starting point of the article. "Look how noisy and squiggly this wave is, but with FFT we can see that it really only consists of waves at multiples of the same frequency".

* no pun intended


This is actually really valuable feedback and one of the things I like about the HN audience. I struggle to strike a balance between precision and simplicity, and your comments raised many interesting subtleties that I had overlooked. So thanks for helping me get a better handle on this.


Auditory masking is more than that. The psychoacoustic model used for MP3s will discard some frequencies during others, like you said, but it will also discard "information" proceeding and following an acoustic event.

While not a perfect analog, it is similar to how vision works. If you are in a bright room and the lights go out, it takes you a little time to readjust to the darker room before you can see things again. While your irises were adjusting, the chair and take were still there, and light reflecting from those objects were still reaching your eyes, but you were unable to see them. It works the other way too. If you are still in that dark room and the lights come on, it will take your eyes a little time to readjust, less time then it took your eyes to adjust to the dark, but still your eyes are effectively discarding that information.

The psychoacoustic model in MP3s is even more bizarre. It turns out that some frequencies, when heard BEFORE another frequency, your brain will throw that first frequency away. It is unintuitive, but that has been proven in laboratory settings. Knowing how the majority of humans discard the same auditory information under different conditions, MP3s are compressed by throwing away the same information that the brain would normally discard. The Fourier transform is a significant part, but it isn't the whole story.


I love the FFT even more than you and enjoyed that it is getting lauded, but would have found more algorithmic details even more interesting. Breaking down DFT, etc. and then showing the performance magic of FFT is a great way to approach discussion of many issues in problem analysis and algorithm design.

So, Nice enough article for slipping into the topic - now give me more! harder! faster!


Sparse fast Fourier transform is even more "magical" than fast Fourier transform (FFT). If you assume that the discrete Fourier transform (DFT) has only k non zero coefficients, then, there exists an algorithm to compute it in O(k log(n)). That's right, you do not have to see the entire signal to compute the DFT, which is pretty awesome.

If you are interested, see http://groups.csail.mit.edu/netmit/sFFT/ .


I work in this field (though I'm none of the authors). If anybody has any questions about the sFFT, let me know.


> This is actually the exact same number of numbers as in the time-domain series.

In a way, it's twice as many, since they have both real and imaginary components.


Nope - the amount of numbers is in fact identical. The discrete Fourier transform has redundant information in bins above NFFT/2 (where NFFT is the size of the transform/number of samples in the signal) for purely real valued input. This means the "number of numbers" is the same - NFFT/2 bins of necessary complex values after transformation, versus NFFT real values in the original series.


Ah yeah, forgot about that. Thank you.


I liked your article, however, it is not so much the Fourier Transform itself that makes possible all these wonderful applications, but rather the Fast Fourier Transform, which you fail to mention even once.

The "naive" Fourier Transform has a pleasing mathematical simplicity and makes it very useful for deriving all sorts of theoretical properties for scientific analysis, but in practice it has a computational complexity of O(N^2), that would quickly make all those applications you name in your article utterly infeasible.

Instead, only with the discovery of the Fast Fourier Transform algorithm roughly halfway the 20th century[0], lowered the complexity to a very manageable O(N log N). And only then all these applications and properties of the Fourier Transform came within computational reach. Without the FFT, the Fourier Transform would mostly just be useful on paper for functional analysis and math proofs, things like that, but not in practice and we'd be missing out on all those MP3s, JPGs and speech recognition.

It's definitely not an overstatement when the WP article says: Fast Fourier transforms have been described as "the most important numerical algorithm[s] of our lifetime".

[0] or 1805, depending: https://en.wikipedia.org/wiki/Fast_fourier_transform "The basic ideas were popularized in 1965, but some FFTs had been previously known as early as 1805."


Awesome article! Here are two things that stuck out for me...

1.) In the first part of the article talking about decomposing a periodic signal into sinusoids of different frequencies - it would be more correct to refer to it as the Fourier _Series_, which is different from the transform.

2.) The image showing three sinusoids summing together into a non-period signal made me cringe (the sum is also periodic - that's the whole point of the Fourier Series).


1) ...which when applied to a sampled signal, is the discrete-time Fourier transform (DTFT)

2.) It will only be periodic if there exist some frequency for which the frequencies of the sinusoids are integer multiple.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: