
Fourier Transforms – The Math Trick Behind MP3s, JPEGs, and Homer Simpson’s Face - aatish
http://nautil.us/blog/the-math-trick-behind-mp3s-jpegs-and-homer-simpsons-face
======
aliston
This is a great post, but it's a little bit misleading when talking about MP3s
and lossy compression and conflates analog fourier analysis with discrete
analysis.

When you're talking about a digital signal, it is the sample rate that
determines the maximum frequency you can represent. It's not MP3s that "throw
out the really high notes" \-- it's any digital signal. A discrete fourier
transform actually is lossless, but it is bandwidth limited.

The reason audiophiles prefer Flac to MP3s, for instance, is because MP3s do
more than just "throw out the high notes." Both are bandwidth limited, but
MP3s also throw out other information based on psychoacoustic principles.

~~~
aatish
Thanks for your feedback. Sure, any digital signal is by definition finite in
its resolution (the sample rate or bits). I was trying to address the
distinction between wave files of the type stored on audio CDs, and MP3s -
both digital signals. I agree that the Fourier transform is in principle
lossless, but it's particularly useful to use it in a lossy way, i.e. to throw
out the least important (to us) components of the signal. If I was
particularly misleading about this, I'd like to know.

~~~
aliston
Well, if you want to be 100% accurate, I think the section talking about how
the high notes aren't important could be clarified. The really high "notes"
have already been lost when you recoded the wav file digitally. The lossy step
of mp3 encoding is not a result of the transform, but what you do with that
information and is more complex than just discarding high frequency
components.

Also, the the word "note" is confusing in the context of music, since really
low notes usually contain a lot of high frequency information.

~~~
acjohnson55
Correct. (For the author's benefit) We usually call them _harmonics_ , in the
context of pitched sounds, but more accurately, the sinusoidal components of
any sound are called its _partials_. Partials differ from harmonics in that
harmonics are restricted to be sinusoids with frequencies that are integer
multiples of a fundamental frequency. Real world musical notes don't often
exactly fit this paradigm [1].

In any case, if discarding high frequency information is all you needed to do
to compress, you could simply low-pass filter the time-domain signal. A better
description of what goes into MP3 compression is that it omits frequency
components in sound that we can't hear because they are shadowed by nearby (in
time and/or frequency) components that are louder.

[1]
[http://en.wikipedia.org/wiki/Piano_tuning#Stretch](http://en.wikipedia.org/wiki/Piano_tuning#Stretch)

------
0x09
There's a bit of a mischaracterization of the way 2 dimensional Fourier
transforms operate on images in this post. The 2D DFT (or DCT rather) doesn't
deal with anything as complex as tracing shapes on a 2D plane like seen in the
video. What it does is treat each 1 dimensional line of the image as a
signal/waveform, with the pixel intensity as amplitude. Fourier-family
transforms are separable, so the 2D (and ND) case is equivalent to the
transform of each line followed by a transform along the resulting columns. So
the actual representation that arises from this looks like so, with each image
representing the corresponding cosine component:
[http://0x09.net/img/dct32.png](http://0x09.net/img/dct32.png) (here grey is
0)

Visually most of the sinusoidal components here are zero or nearly so. However
if we scale them logarithmically, we'll see that it's actually not so:
[http://0x09.net/img/dct32log.png](http://0x09.net/img/dct32log.png)

What transform coders like JPEG do is reduce the precision of these
components, causing many of them to become zero. Which is good for the entropy
coder, and mostly imperceptible to us. Of course JPEG operates on 8x8 blocks
only * rather than a whole image like here.

It's hard to imagine this as an image, so here's a progressive sum starting
from the second term, which essentially demonstrates an inverse DCT:
[http://0x09.net/img/idct32.png](http://0x09.net/img/idct32.png)

mind that 0 is adjusted to grey in this rendering, and the brightness of the
result is not an artifact of the transform.

It's easier to understand what goes on with these transforms if you can
visualize things in terms of the basis functions. Which in the case of a 32x32
image like above would be
[http://0x09.net/img/basis.png](http://0x09.net/img/basis.png) (warning: eye
strain).

All the examples above pertain to the DCT, partly because of JPEG and partly
so I could avoid getting phase involved, but the principles apply equally to
the other transforms in the family.

* although recent versions of libjpeg can use other sizes

------
Dn_Ab
Ah such a shame. Tried to resist but I can't help pointing out you missed a
chance to explain how when Fourier discovered the expansion, it really was an
example of "one weird trick all the Mathematical Physicists will hate him
for".

 _It was during his time in Grenoble that Fourier did his important
mathematical work on the theory of heat. His work on the topic began around
1804 and by 1807 he had completed his important memoir On the Propagation of
Heat in Solid Bodies. The memoir was read to the Paris Institute on 21
December 1807 and a committee consisting of Lagrange, Laplace, Monge and
Lacroix was set up to report on the work. Now this memoir is very highly
regarded but at the time it caused controversy.

There were two reasons for the committee to feel unhappy with the work. The
first objection, made by Lagrange and Laplace in 1808, was to Fourier's
expansions of functions as trigonometrical series, what we now call Fourier
series. Further clarification by Fourier still failed to convince them. As is
pointed out in [4]:-

"All these are written with such exemplary clarity - from a logical as opposed
to calligraphic point of view - that their inability to persuade Laplace and
Lagrange ... provides a good index of the originality of Fourier's views"_

[http://www-history.mcs.st-and.ac.uk/Biographies/Fourier.html](http://www-
history.mcs.st-and.ac.uk/Biographies/Fourier.html)

------
hobb0001
These rotating circles that are referenced by the article will forever remain
a cautionary tale for me: [http://blog.matthen.com/post/42112703604/the-
smooth-motion-o...](http://blog.matthen.com/post/42112703604/the-smooth-
motion-of-rotating-circles-can-be-used)

In pre-Copernican astronomy, where the Earth was considered the center of the
cosmos, the erratic orbits of the other planets relative to us were explained
away as epicycles within perfect circles. Later, closer examination of the
orbits required modeling the orbits as epicycles within epicycles, exactly as
depicted in the rotating circles above. In the end, it turned out that there
aren't epicycles in the orbits, they were just creating a Fourier transform to
explain the orbital paths. We now know that you can create a Fourier transform
to generate _any_ arbitrary path.

I sometimes wonder how many of our modern physics models, such as the standard
model, differential geometry, and M-theory, are just highly sophisticated
versions of the Fourier transform.

------
quasque
ImageMagick has a nice tutorial on the use of Fourier transforms in image
processing, if you want to get a more intuitive feel on the subject:
[http://www.imagemagick.org/Usage/fourier/](http://www.imagemagick.org/Usage/fourier/)

~~~
wlian
But why do they use holocaust_tn.gif as an example filename?

~~~
Renaud
Holocaust Memorial:
[http://en.wikipedia.org/wiki/Memorial_to_the_Murdered_Jews_o...](http://en.wikipedia.org/wiki/Memorial_to_the_Murdered_Jews_of_Europe)

------
zenbowman
They really are a beautiful thing, it is unfortunate that most computer
science majors don't get at least a basic introduction to signal processing as
part of their standard curriculum. Especially because in my experience, the
best way to understand a Fourier transform is to implement it in a program,
feed in different signals, and wait for the light in your mind to go off.

Images and audio signals provide a particularly stunning insight.

~~~
dools
They don't? At University of Sydney every engineering degree does 2 years of
maths and that includes Fourier Series.

~~~
zenbowman
Nope, I only learned about it because my cousin went to the same school and
was doing his electrical engineering graduate degree at the time. He had a
book on signal processing lying around and that's where I was introduced to
it.

------
zwieback
It's interesting to note that Fourier wasn't trying to any of the things that
the Fourier Transform is commonly used for today, namely signal processing. He
was trying to solve heat transfer equations when he came up with the Fourier
series. I'm not even sure if he was that interested in the Transform as such,
e.g. looking at a signal in frequency space and then efficiently applying
filters before transforming back to time domain.

My dad remembers his professor, sometime in the 40s, posing the question of
calculating when a worm buried in the ground would experience the same
temperature we'd experience at Christmas (Erdwuermchen's Weihnachten) and the
solution had to be calculated with Fourier's heat transfer equations.

~~~
dnautics
when it first came out most people ridiculed it for being a mere intellectual
curiosity.

I would characterize the fourier transform (especially the DFT) as the single
most important mathematical innovation that enables the interface between the
digital and analog world.

~~~
zwieback
I agree although there's a continuum from Fourier to Cooley-Tukey and so on so
it's hard to pinpoint which part is the most important.

Also, maybe it's more beloved by engineers (like myself) than mathematicians.

------
aatish
Hi Hacker News - I'm the author of the piece, also on twitter @aatishb. Look
forward to hearing your thoughts. I encourage you to share your thoughts and
insights with other readers by leaving a comment on the post, particularly if
you know of other interesting applications about the Fourier transform.
Cheers!

~~~
munificent
OK, since you asked:

> The sound wave produced by a piano note is a simple sine wave.

No, it's not. A piano note is a complicated stack of overtones (some of which
are harmonic and some of which aren't) and transients. If it was just a sine
wave, it would _sound_ like a sine wave and not like a piano.

This is part of why things like Shazam are so difficult: musical notes aren't
just a single frequency in the FFT, they are a stack of them.

> You could just tell them a handful of numbers—the sizes of the different
> circles in the picture above.

This is actually the exact same number of numbers as in the time-domain
series. Taking the FFT on its own doesn't reduce the amount of data, it's
about discarding or compressing some of the frequencies after you do.

> The really high notes aren’t so important (our ears can barely hear them),
> so MP3s throw them out, resulting in added data compression.

This is only part of what MP3's do, and at high bitrates this really is
inaudible. The main source of compression is that the precision of the numbers
used to represent the amplitude of the sine waves is reduced.

When you have a loud sound and a quiet sound at the same time, the loud sound
will "drown out" the quiet one (called "auditory masking"). You won't be able
to hear the quiet one, or one be able to hear it precisely. MP3 and other
audio codecs take advantage of that by encoding quieter frequencies with less
fidelity when there are other louder frequencies at the same time. You don't
notice the loss of precision since it's buried under louder sounds.

> Just as MP3s throw out the really high notes, JPEGs throw out the really
> tiny circles.

This is off too. If JPEG _discarded_ high-frequency signals, you would just be
blurring the entire image. It would be _exactly_ like saving it scaled down
and then scaling it back up with some smooth interpolation.

Obviously, JPEGs don't appear to be stretched out thumbnails, so that isn't
what happens. Instead, it's not that high-frequency signals are _discarded_ ,
it's that their _precision_ is reduced.

Human eyes are quite good at detecting sharp edges and fine details. What they
aren't good at is detecting _how sharp_ an edge is. We can definitely see a
break between two colors, but we can't accurately detect the _magnitude_ of
that the difference.

JPEG takes advantage of that by rounding off those high-frequency variances to
nearby values. That means there are fewer possible values at high frequencies,
so fewer bits are needed to encode them.

I realize I'm being a negative Nancy here. I really liked your post and I
agree 100% on how awesome the Fourier transform is. It's also quite hard to
describe it in an approachable way, and you've done an admirable job. I just
get bugged when simplifications for a lay audience are actually off the mark.

~~~
pistle
I love the FFT even more than you and enjoyed that it is getting lauded, but
would have found more algorithmic details even more interesting. Breaking down
DFT, etc. and then showing the performance magic of FFT is a great way to
approach discussion of many issues in problem analysis and algorithm design.

So, Nice enough article for slipping into the topic - now give me more!
harder! faster!

~~~
exg
Sparse fast Fourier transform is even more "magical" than fast Fourier
transform (FFT). If you assume that the discrete Fourier transform (DFT) has
only k non zero coefficients, then, there exists an algorithm to compute it in
O(k log(n)). That's right, you do not have to see the entire signal to compute
the DFT, which is pretty awesome.

If you are interested, see
[http://groups.csail.mit.edu/netmit/sFFT/](http://groups.csail.mit.edu/netmit/sFFT/)
.

~~~
susi22
I work in this field (though I'm none of the authors). If anybody has any
questions about the sFFT, let me know.

------
thearn4
Another cool thing is that orthonormal bases are not unique - there are many
other basis functions that you can choose beyond just sine and cosine to
decompose a function (or digital signal). Though they are a natural choice if
you are specifically looking to analyze periodicities.

One direction to go in for further study:

[https://en.wikipedia.org/wiki/Wavelet](https://en.wikipedia.org/wiki/Wavelet)

~~~
GrantS
Yes! Same goes for pulling eigenvectors out of a data set e.g. for PCA
(Principal Components Analysis) -- those form an orthonormal basis. You can
even pick a set of random orthogonal vectors of the same number and dimension
as your original data and re-represent the data with no loss of information.

I had a lot of fun demonstrating this concept for reconstructing images via a
Processing sketch a few years ago and still use it for teaching from time to
time. All source code for quick and dirty Haar, Eigen, Random, and Fourier-
like methods included here:
[http://www.cc.gatech.edu/~phlosoft/transforms/](http://www.cc.gatech.edu/~phlosoft/transforms/)

------
Sheepshow
The newer JPEG algorithm JPEG2000 uses Wavelet Transforms which is kind of
similar to the Fourier. The Fourier applies a finite window then decomposes
into a sum of infinite waveforms. The Wavelet on the other hand applies no
windowing function, and directly decomposes the signal into a sum of _finite_
waveforms.

The Fourier has the disadvantage that you can't arrange the components into a
time hierarchy; that is, no component occurs "before" any other.

The Wavelet transform _does_ have a natural time hierarchy. This makes it much
better for streaming compression like voice calls.

The Fourier perfectly describes signals of infinite duration (think tone or
color) while the Wavelet perfectly describes the position of things within a
signal (think rhythm or space).

With the Fourier filtering is really easy. You can do hard, hard cutoffs --
literally no contributions within a certain frequency band -- just by removing
components of the decomposition. Similarly, you can accurately apply any
arbitrary mathematical filtering function.

The disadvantage of the Wavelet is that, well, the only meaningful
transformation you can apply to it is compression -- dropping the shorter
timescale components. If you want to filter, it's not enough to trim off
timescale components because the wavelet itself can contain any frequency
components. There's also nothing like a simple mathematical function you can
apply to the coefficients to get a smooth filter.

Neat!

------
nilkn
> You could just tell them a handful of numbers—the sizes of the different
> circles in the picture above.

Maybe I'm crazy and just missing something, but this feels a little too good
to be true. This would put the set of smooth curves in 1-1 correspondence with
the set of finite sets (since each curve is being specified completely by a
finite set of numbers). But the set of finite sets is countably infinite since
it's a countable union (this may require the axiom of choice) and the set of
smooth curves is uncountably infinite, a contradiction.

(Disclaimer: I know nothing about Fourier analysis.)

~~~
jjoonathan
First of all, there are four kinds of "Fourier transforms" in common use, so
the cardinality problem that you have observed cannot really be discussed
until we pick one to talk about. For instance, the FT maps uncountable to
uncountable and the DFT maps finite to finite (and can be understood with
intro Linear Algebra as a change of basis). The things we are calling
countable and uncountable are actually dimensionalities of vector spaces,
since real multiples of a single function would produce an uncountable set.
Anyway, here are the types of transforms:

D=continuous domain, d=discrete domain, E=infinite extent, e=finite extent

time domain <-> frequency domain

Fourier Transform: DE<->DE

Fourier Series: De<->dE

Discrete Time Fourier Transform: dE<->De

Discrete Fourier Transform: de<->de

The cardinality mismatch shows up for FS and DTFT. It is resolved through a
different notion of equality than you may be used to. The "distance" between
two functions can be measured using several norms, and when this distance is
zero we claim two functions are equal. L1, L2, and Linf are the common ones.
Linf corresponds to pointwise equality, which is what you assumed (not
unreasonably) that they meant. L1 and L2 do not.

L1 norm: d(f,g)=integrate(abs(f(x)-g(x)),a,b)

L2 norm: d(f,g)^2=integrate(abs(f(x)-g(x))^2,a,b)

Linf norm: d(f,g)=max(f(x)-g(x)) over the interval (a,b)

Fourier Series only guarantee reproduction (take the transform and then take
the inverse to get the reproduced curve) up to the L2 norm. This makes
physicists and engineers happy, since L2 norms correspond to energy
measurements, and if the difference between two physical quantities has no
energy then it's effectively irrelevant to physical processes.

The remaining hurdles for mathematicians are to show that

1) The reproduction coefficients found by the transform are the best possible
coefficients (the curve they generate is closer to the original than for any
other set of coefficients of the same size). This is a trivial proof using
inner products that usually goes by a name like "best approximation theorem"
or "generalized pythagorean theorem."

2) Reproducible curves are dense in the space of actual curves. That is, there
is always a reproducible curve that has zero L2 difference from an arbitrary
input curve. If this is true, then by #1 we know that the transforms will find
it. Unfortunately, this proof is quite involved. A "quick and dirty" method
uses the Stone-Wierstrass theorem (polynomials can reproduce continuous
functions with arbitrary L1 or L2 fidelity + the FS can reproduce polynomials
with arbitrary L2 fidelity). A better method arises in the context of Sturm-
Liouville theory that generalizes "density" to solutions of a large class of
differential equations. This is important for physicists and engineers because
it justifies normal mode expansions even when the normal modes aren't perfect
sinusoids.

If you want to know more, any linear algebra book that discusses inner
products should hit on #1. For #2 you want an Analysis textbook. Analysis is a
tough subject so it's far more important to be sure that the book starts at
your level and has an understandable proof of Stone-Wierstrass than it is to
be sure that it covers this exact topic. If it doesn't, supplement it with the
first few chapters from "Completeness and Basis Properties of Sets of Special
Functions" by Higgins and you'll be set.

~~~
yummyfajitas
Some math geek nitpicks to a generally good post:

 _The things we are calling countable and uncountable are actually
dimensionalities of vector spaces..._

The dimensionality of L^2(R) is still countable. Proof: f(k,j,x) = e^{2 pi i k
x}, x in [j,j+1], k and j both integers forms a basis. So does H_n(x)
exp(-x^2/2), for H_n the Hermite polynomials.

The Fourier transform merely does not map L^2(R) -> l^2(Z) in this case - it
maps L^2(R) -> L^2(R).

 _That is, there is always a reproducible curve that has zero L2 difference
from an arbitrary input curve._

This isn't what density means. Density means that for any epsilon, there is a
reproducible curve with L2 distance < epsilon to that curve.

 _Linf norm: d(f,g)=max(f(x)-g(x)) over the interval (a,b)_

By most standard definitions, this is only true almost everywhere. A typical
definition is ||f(x)||_{\infty}= Lim_{p -> infty} ||f(x)||_p, and this allows
two functions to differ on a set of measure zero.

As an example, consider f(x)=1 and g(x)={0 on rational numbers, 1 on
irrational numbers}. These two functions are equal in any L^p space, and are
hence equal in L^infty as well.

~~~
jjoonathan
>> The dimensionality of L^2(R) is still countable.

At this point I was implicitly talking about Linf(R). Does it still have
countable dimension? With the sup norm I'm pretty sure the answer is no, but
with Lp taken at p->inf maybe it does?

In any case, thanks for cleaning up the rough edges.

~~~
yummyfajitas
Don't know off the top of my head. I suspect it does have countable dimension
but I don't know how to prove it.

The space of continuous functions with the sup norm is actually a much
_smaller_ space than the space of L^\infty - the former is not even dense in
the latter.

I suspect actual functions on R with the max norm probably is uncountable, but
that's also a very weird space. The overwhelming majority of functions in
there are unmeasurable.

------
doctoboggan
If you are interested in audio fingerprinting using the FFT check out my
IPython notebook that explores this idea in more detail. I used a spectrogram
and image processing tools to identify what a given audio sample is. You
should be able to download the notebook and run all the examples.

[http://jack.minardi.org/software/computational-
synesthesia/](http://jack.minardi.org/software/computational-synesthesia/)

------
sharpneli
There is also an extremely important property that would be worth an article
of it's own. Namely the fact that pointwise multiplication of 2 fourier
transformed functions is the same as convolution of the functions themselves.

What does this mean in practice? Let's take a simple gaussian blur for images.
A single output pixel is formed by overlapping the gaussian kernel on top of
the image, multiplying then pointwise, then summing the result. Repeat for
every pixels. What you can also do is take FFT of the gaussian kernel and
multiply it with the FFT of the image and inverse transform and you will get
the same result as actually calculating it for every point separately. Is this
faster than doing it point by point? Depends on the blur radius.

You can do awesome things with this blazingly fast. As an example a simple
water wave simulation can be made by simply taking a fourier transform,
multiplying it with the dispersion relation of the water waves and then doing
an inverse transformation.
[http://www.youtube.com/watch?v=MTUztfD2pg0](http://www.youtube.com/watch?v=MTUztfD2pg0)
Just like what is done here. Normally this convolution would take O(N^2)
amount of operations where N is amount of vertices but with FFT it's O(N log
N).

FFT is for convolution what quicksort is for sorting. Imagine how limited
would you be if all your sorts would take O(N^2) time. The examples I gave are
quite limited in scope, going trough all the applications of convolution would
take textbooks. It's probably one of the most important concepts in electrical
engineering.

Oh yeah, convolution is just like cross correlation except in another case the
function is reversed. So you can imagine the applications in data mining etc.

All in all Fourier Transform, and related ones, is an extremely huge and
massively important concept, it's hard to overstate it's usefulness.

Personally I can say that I've used filter design tools to make a really
smooth accelerometer data processing function. It does not jump around like a
raw signal does nor does it lag a lot just like the standard exponential
smoothing does.

~~~
eru
Do you have any write-up on the accelerometer data processing? I'd be
interested. Thanks!

------
neltnerb
I would add to your list of reasons why Fourier Transforms are awesome.

By complete coincidence, Bragg's law, used to do everything from X-Ray
Diffraction to particle scattering, just _happens_ to be a fourier transform.
Every time we bombard a tiny thing with light or radiation in order to
understand the structure, what we literally get out of it is emission dots
that correspond to the periodicity of the lattice -- literally the 2D fourier
transform of the scattering cross section. When I heard that in Quantum III,
it blew my mind. It's straight out of quantum scattering theory.

------
arb99
I find this kind of stuff fascinating.

Are there any decent books (kindle or proper books) with this kind of content?
I've got no background in Maths (other than some (UK) A-level maths at
school), but always love reading these sort of posts.

------
susi22
aatish, since you seem interested, I'll throw another fun-fact at you that I
found very interesting even after years of working with Fourier transforms:

You can view the Fourier transform as a fitting problem. Yes, you fit the data
to a function. Ie you take the data points and fit it to a sum of exponential
functions. There is actually a much more general approach called "Prony
method" that extends the concept and adds a dampening factor into the function
to fit:

[http://www.engr.uconn.edu/~sas03013/docs/PronyAnalysis.pdf](http://www.engr.uconn.edu/~sas03013/docs/PronyAnalysis.pdf)

You can take it further and use matrix pencil methods and eventually you'll
see connections to ESPRIT algorithm and even least squares algorithm. It's
really interesting how they're all actually connected.

Cheers

------
sillysaurus2
EDIT: I mixed up high vs low frequencies, as the reply pointed out, so I've
edited this to be correct now.

The reason we can get away with throwing away low frequencies in JPEG is
because humans are prone to notice significant details rather than tiny
details.

High frequencies of a Fourier transform of an image == tiny detail (like being
able to distinguish individual hairs)

Low frequencies of a Fourier transform of an image == huge details (like
someone's face).

So you transform, set part of the result to zeroes, and compress. To display
it you uncompress, transform back, and display it. The zeroes manifest
themselves as an almost-imperceptible blur.

~~~
eliteraspberrie
To illustrate, here is an example of zeroing the lowest frequencies:

[http://imgur.com/a/wVjYk](http://imgur.com/a/wVjYk)

~~~
sp332
Oh, so your brain can still mostly recreate the shape even with no low
frequencies!

~~~
tobr
Depends on how far away from the image you are, or how big it is. By taking
the low frequency from one image, and the high frequency from another, you can
get some interesting results. Take a look at this image, first sitting near
the screen, then step away a couple of meters.

[http://cvcl.mit.edu/GroupFaceHybrid.jpg](http://cvcl.mit.edu/GroupFaceHybrid.jpg)

~~~
sillysaurus2
Holy wow. That's freakin' awesome!

Open that image in a new tab, then hold down CTRL and scroll up/down to zoom
in and out. When you zoom out to 25%, each of them switches from smiling to
frowning or vice versa. The lady on the right is clearly smiling at 25% zoom.
I had no idea that was possible.

------
X-Istence
I would love to know more about FT's, along with FFT's and how they help with
for example signal processing or finding a signal when looking at a sample or
multiple samples of a SDR.

Are there any good books/papers/web articles on this topic that are
accessible? I often find myself reading papers where some of the math goes
over my head.

Something with examples/code (code makes me understand math so much easier!)
would be fantastic!

~~~
gentlefolk
Richard Lyons - Understanding Digital Signal Processing

Focuses more on explaining the concepts behind the math than presenting a wall
of theorems. Given that math is the language of DSP though, there's still a
reasonable amount of math.

It assumes the reader has an EE or similar background, but I think it's still
fairly approachable regardless. Given that my own background is in EE/embedded
systems though, I'm not sure what my opinion counts for there.

~~~
X-Istence
I have taken classes in electrical engineering and embedded systems, so I am
fairly familiar with them.

I will take a look at the book, thanks!

------
fat0wl
The Shazam algorithm -- I don't want to be all cynical and dumpy because it's
not like I remember exactly how it works either (it's proprietary, after
all... and even the explanation I was given was not definitive) but one of my
Music Information Retrieval professors once described his anecdotal knowledge
of it. It was based on some features derived from FFT for sure but didn't
seemed very concerned with note identification, if at all. There are a ton of
features that can be post-processed from FFTs that can't be equated to
"pitch". Beware misleading analogies... the frequency domain (& quefrency,
etc. etc.) is a difficult space to conceptualize.

And when you get into machine learning, some of the operations performed by
neural networks and the like don't really represent super linear, human-
understandable transformations. It's important to understand feature
extraction, but more important in the grand scheme of these things is to
understand how to dig data that is useful and how it can be used.

~~~
danbruc
There is a paper [1] describing the algorithms used by Shazam.

[1]
[http://www.ee.columbia.edu/~dpwe/papers/Wang03-shazam.pdf](http://www.ee.columbia.edu/~dpwe/papers/Wang03-shazam.pdf)

~~~
fat0wl
awesome I will give this a read for sure. Thanks!

------
anigbrowl
If you would like a rigorous but equally enthusiastic and readable treatment
of Fourier transforms, then you can't do better than the (free!) book _The
Scientist 's and Engineer's Guide to Digital Signal Procesing_:
[http://www.dspguide.com/](http://www.dspguide.com/)

------
xylem
Why is the phase not even mentioned in the article?

~~~
ohazi
This. Phase is the most important subtle point about the FT that _everybody_
misses the first time around.

The Fourier Toy mentioned in the article:
[http://toxicdump.org/stuff/FourierToy.swf](http://toxicdump.org/stuff/FourierToy.swf)

Notice how you can click and drag to change the size _and initial phase_ of
the circle widgets. Try changing the initial phase of any of the larger
components without changing the size and see what happens. It goes haywire!

The first FT toy that I wrote years ago also ignored phase. It took me
_forever_ to figure out why my reconstructed images looked like crap.

Turns out you can't just throw away _half of your transform data_ (you get
frequency _and_ phase for each component you care about) without being
fabulously clever.

------
Cyph0n
When we first covered the FS (Fourier Series) and FT (Fourier Transform) and
the relation between the two in our Signals and Systems course in EE, I was
amazed. It was the greatest thing I've ever learned, and I think it'll be hard
to top.

Once you understand the FT, you basically understand how a signal is
structured. By converting (or transforming) a time signal to the frequency
domain, one can clearly see what frequency components (or harmonics)
contribute to said signal. If one were to try the same in the time domain, it
would be much more difficult to visualize.

------
new_test
"One weird trick that made pure mathematicians hate him"

------
kevin_rubyhouse
Coincidentally, I just had a talk with one of our principle developers about
Fourier transforms. He's an audio expert and was trying to explain re-sampling
and aliasing to me. I understand the high level steps, but the math is all a
blur to me. Recently I've been trying to become much stronger in math, as I
eventually want to study aerodynamics and astrophysics. So I've been studying
calculus (textbook) and dynamics (edx) lately.

~~~
hadronzoo
Concerning sampling and aliasing, you might find this video series from xipf
interesting: [http://www.xiph.org/video/](http://www.xiph.org/video/)

------
mailshanx
For those of you seeking an intuitive understanding of the Fourier transform,
checkout [http://betterexplained.com/articles/an-interactive-guide-
to-...](http://betterexplained.com/articles/an-interactive-guide-to-the-
fourier-transform/)

For more details, check out Steven Smith's Digital Signal Processing. The
entire book is available to read online, and has an excellent treatment of DSP
algorithms.

------
data-cat
This article was interesting but didn't really tell me anything I don't
already know. Does anyone know where I can find a good article that actually
explains the mathematics of performing a Fourier transformation? I thought
that is what this article was going to be about.

~~~
gmac
I'm not sure what level you're looking for, but I found this really good:
[http://www.mikeash.com/pyblog/friday-
qa-2012-10-26-fourier-t...](http://www.mikeash.com/pyblog/friday-
qa-2012-10-26-fourier-transforms-and-ffts.html)

------
itengelhardt
I like how you make it sound so incredibly easy - especially thinking back to
how much I struggled with the math behind this :-) (To be very clear: There's
nothing wrong with explaining things in a simple way and leaving out the scary
parts)

Great post.

------
arunc
Elegantly explained. Does anyone know of similar posts on other transforms
like Z, etc?

~~~
defen
It's not a short pithy blog post, but this is a good intro to the Laplace
transform: [http://ocw.mit.edu/courses/mathematics/18-03-differential-
eq...](http://ocw.mit.edu/courses/mathematics/18-03-differential-equations-
spring-2010/video-lectures/lecture-19-introduction-to-the-laplace-transform/)

------
TheMakeA
One of my favorites: [http://www.altdevblogaday.com/2011/05/17/understanding-
the-f...](http://www.altdevblogaday.com/2011/05/17/understanding-the-fourier-
transform/)

------
frozenport
I'm going to kill the next person I who writes a Fourier Transform article and
doesn't talk about the phase data. Its complex in complex out, if you input is
real you still get complex out!!!

------
cowsandmilk
What you call a "squarish" looking wave is what I call an unfortunate failure
of the Fourier transform, that it takes infinite bandwidth to represent a
square wave.

------
mpclark
What an amazing article.

It has tied together a bunch of seemingly separate ideas that I've often
wondered about, and I feel measurably more intelligent having read it.

------
mistercow
>The really high notes aren’t so important (our ears can barely hear them), so
MP3s throw them out,

Quantization is not the same thing as "throwing out".

------
smrtinsert
What an excellent article. I wish every teacher had the ability to be so clear
and concise and most importantly interesting in their work.

------
revelation
Theres a nice approach to this through linear algebra and the discrete cosine
transform (DCT), as just another base.

------
xarien
Another family for those interested in FTs are Discrete Cosign Transforms
(DCTs). Also widely used.

------
AsymetricCom
If your into software and you don't know a out fourior transform, your not
into software. this is something programmers without math will discover, along
with Pythagorean theorem and basic trig. Otherwise, you are an over-hyped
semantic duck-taper.

~~~
anigbrowl
This needs to be upvoted rather than downvoted.

~~~
gjm11
I disagree.

It's needlessly inflammatory and misspelt throughout. (Some people just have
trouble spelling, and that's fair enough. But writing "a out" instead of
"about" and not fixing it is just lazy and disrespectful to readers.

AsymetricCom would probably have got a different response had s/he written
something like this instead:

"If you're really into software and want to be more than a semantic duck-
taper, you need to know about Fourier transforms. Just like the Pythagorean
theorem and basic trig, sooner or later you'll find you need it."

(Note 1. Although I have seen a whole lot of Fourier transforms in my time, I
don't agree that you can't be truly "into software" without them. Note 2. It
should probably be "duct tape" rather than "duck tape" but (a) the history is
really complicated -- see [1] for some details -- and (b) I like the parallel
with "duck typing"[2].)

[1] [http://www.worldwidewords.org/qa/qa-
duc4.htm](http://www.worldwidewords.org/qa/qa-duc4.htm)

[2] If it walks like a duck and quacks like a duck, it is a duck. I think the
term "duck typing" originated in the Python community, though Python's by no
means the only language to have done a lot of things this way.

