
Digital Music Couldn't Exist Without the Fourier Transform - xoher
http://gizmodo.com/digital-music-couldnt-exist-without-the-fourier-transfo-1699155287?utm_campaign=socialflow_gizmodo_facebook&utm_source=gizmodo_facebook&utm_medium=socialflow
======
fl0wenol
A key insight for someone interested in this sort of thing not touched on in
the article is the relationship between the fourier transform and a discrete
consine transform.

JPEG uses DCT in particular because it has the nice property that the "top
left" corner of the block will contain the DC offset (since cosine of 0 is 1)
and the coefficients near the top corner correspond to half-wave and full
cycles which gets you most of the way to simple gradients of color across the
block with the right coefficients. So for most areas of an image only the top
left coefficients will be significant. By using a zig-zag pattern for each
block we are grouping the largest values to the front and zeroes to the back,
which when coupled with RLE makes the rows of zero in each block a very
compact, further-compressable representation.

Meanwhile, a fourier transform gives you imaginary magnitudes for frequencies
which corresponds to the phase shift that is most appropriate for that
frequency to match most strongly (as opposed to be aligned at the
corner/beginning of the integral window). Not useful in an image format where
you won't get the transformed magnitudes all nice and grouped for you. This is
useful in audio compression where we care to find the location of transients
that correspond to note attacks, percussion strikes, etc. Note that even in
MP3 this is only used to drive the psychoacoustical model that decides the
frame type and where to allocate the bits; the audio data itself is processed
out of the time domain by overlapped DCT just like Ogg Vorbis.

~~~
sinwave
Thought I'd chime in that for image compression (e.g in the JPEG2000
standard), the 2D discrete wavelet transform takes advantage of similar pixel
intensities for neighboring pixels at various scales (i.e. "transformed
magnitudes all nice and grouped for you"). The 2D-DWT is actually pretty cool
under the hood. And, asymptotically, a bit faster than the FFT (DWT runs in
O(N), and in 2D, O(width*height)).

------
kazinator
Digital music could absolutely exist without the Fourier Transform.

The author should familiarize himself with the history of digital audio, and
its milestones such as the development of the compact disc format: long before
MP3, Ogg and the popularization of the Internet, and its use for media
streaming.

At its bare bones, digital audio requires time domain sampling and
reconstruction, sandwiched between some filters that can be analog. It
requires understanding of the Nyquist limit, which can be purely in time
domain terms (sampling frequently enough to avoid an temporal aliasing
ambiguity in the reconstruction).

Digital _synthesis_ of music can also be as simple as playing recorded samples
in loops, and scaling them in the time domain for various pitches (or changing
the sample rate, or both), which doesn't require Fourier.

~~~
agoetz
Claiming that the FT is not necessary for digial audio is like claiming that
the you don't need the rocket equation in order to build a missile. Sure, its
technically possible, and yes, there were probably early pioneers that forged
ahead before the mathematical theory was fully sketched out, but our
understanding of the fourier transform has drastically increased our ability
to design acoustical systems.

Those analog anti-aliasing and anti-imaging filters are designed using LTI
systems theory, that fundamentally rely on the Fourier tranform to reason
about their transfer functions. The Nyquist-Shannon sampling theorem was
proven using the fourier transform. Without the fourier transform, you need to
rely entirely upon time domain representations of signals, and perform your
analysis using tedious convolutions. You can't use a spectrum analyzer to
examine the signal to noise ratio of your CD player. While it's true that
digital music could technically exist without the fourier, there is no way in
hell it would be as pervasive as it is today.

~~~
acqq
I don't agree with you. Read about ADPCM. Try it. Listen the results.
Apparently ffmpeg has codecs for it.

[http://ffmpeg.org/general.html#Audio-
Codecs](http://ffmpeg.org/general.html#Audio-Codecs)

~~~
tacos
The Windows 95 startup sound, created by Brian Eno, was coded ADPCM. Seemed to
work okay. A billion people heard it. (8 bit MS ADPCM also sounded horrible.)

~~~
acqq
You surely mean 8 bit PCM (without "AD") sounded horrible? ADPCM encodes
differences in just 4 bits but the decoded values are in the range of 16 bits.

~~~
tacos
There were 8-bit "ADPCM" (er, ADPCM-ish?) variants, too! They used 2, 3 or 4
bits for encoding but were deltas against an 8 bit target.

~~~
acqq
Not used by MS, as far as I know (you specifically wrote originally "8 bit MS
ADPCM also sounded horrible").

Microsoft's ADPCM was always 4 bits encode of the 16-bit sample. The distorted
sound in some files was due to reducing the sampling rate.

[https://support.microsoft.com/en-
us/kb/89879](https://support.microsoft.com/en-us/kb/89879)

"ADPCM stores the value differences between two adjacent PCM samples and makes
some assumptions that allow data reduction. Because of these assumptions, low
frequencies are properly reproduced, but any high frequencies tend to get
distorted. The distortion is easily audible in 11 kHz ADPCM files, but becomes
more difficult to discern with higher sampling rates, and is virtually
impossible to recognize with 44 kHz ADPCM files."

I've already linked this article and it has even more details, highly
recommended.

~~~
tacos
I wrote some of those original codecs. I'm aware of what they do. :) The
original SoundBlaster card was 8-bit. Creative ADPCM is 8 bit. Dialogic ADPCM
-- basically every recorded sound you've ever heard over a telephone -- is 12
bit. You are correct with the modern definition but I'm talking about 20 years
ago so let's not stomp on history for sake of Hacker News karma points.

The Microsoft article gets a few things wrong. The distorted sound is not due
to reducing the sample rate. The distorted sound comes from taking a
perfectly-good 11k file and then ADPCM compressing it. This is obviously due
to throwing away information on each sample as part of the encoding process,
not anything due to sample rate. (Of course it sounds better at higher sample
rates. More data, more better.)

ADPCM for telephony seldom even hit 11k rates. 6000 and 8000Hz ADPCM files are
common. (And nope, not 16 bit either.)

~~~
acqq
I fully agree with you re 8-bit SoundBlasters and phones. I was talking about
the music recorded for CDs, 16-bits. Converting that to ADPCM was certainly
not a process that was guaranteed to automatically give the good results but
it was at least possible to produce reasonably good sound and save some space.

I'd be of course happy to hear something more about the work you did.

------
tacos
DCT, FFT, close enough I guess. No mention of Shannon or Nyquist?

Ugh, I see a trend starting here:

"This is the first in a new experimental series called Favored Equation. Each
month, we’ll dive into a piece of math which makes your life easier in some
way without you even realizing."

This is a spin on
[http://objectsobjectsobjects.com/](http://objectsobjectsobjects.com/) by The
Atlantic:

"Object lessons: An ongoing series about the hidden lives of ordinary things."

I'm a fan of this type of writing. But when Sagan and Feynman did it -- hell
when pornographers did it with OMNI Magazine -- it wasn't quite so rough
around the edges.

I'm now a month into arguing with some ex-Gawker hack at _The Atlantic_ over
quotes like "New effects can change a guitarist’s playing ability completely"
and a declaration that the transistor was invented in the 1960s. No
corrections or retractions imminent.

No interest in battling the newer, younger, even-less-experienced Gawker
editor too.

~~~
disantlor
Tangent but I disagree about one point:

Effects can and do change the guitarist's playing ability. Distortion or
compression, for example, generally lowers the threshold for making a note
sound clearly (or at least clear enough). The result is a smoother sounding
performance which can give the player more confidence when then actually
results in a better performance. It hardly matters how you get to the end
waveforms if they are indeed sounding good.

But more than interpreting "ability" to mean a degree of technical skill,
certain effects can completely change the way a good guitar player approaches
the instrument creatively, particularly if they're listening closely and
reacting (not playing from muscle memory).

~~~
tacos
Rephrase it as "colored pencils can change a writer's writing ability
completely" and you can see the logic error.

"Put a little reverb on it" is a good way to comfort a singer or a musician
and perhaps coax a better performance out of them. But the effect itself does
not change the skill level of the performer.

If the article were discussing production techniques I'd agree with you. But
it says things like "Guitar effects have modified their users" and gives a
comical explanation of how a rotating speaker works so I think the author is
just nuts.

~~~
tlb
Rotating speakers don't require less player skill, but distortion does.

Guitar distortion reduces the need to mute adjacent strings, a very difficult
thing to master, because only the strongest tone comes through.

Or, given equal skill, distortion meant you could jump around, play writhing
on your back with your tongue, play with your guitar on fire, which you
absolutely cannot playing without distortion.

~~~
tacos
Distortion adds harmonic content. It is not a filter that lets "only the
strongest tone come through." If there's a 600Hz thump as you move your hand
between strings, now you've got that same 600Hz thump plus a 1200Hz and 1800Hz
thump too. That's the definition of harmonic distortion. And those 1200/1800Hz
components are approaching where the ear is most sensitive. So you've made it
worse.

A previous poster said that compression helps. Well, no. Compression reduces
dynamic range. If there's a soft squeak between loud notes, a compressor makes
the soft squeak louder and the loud note quieter. That's the definition of a
compressor.

And none of this changes the fundamental skill of the performer, in the same
way that an Instagram filter does not change the fundamental beauty of the
object. Sure, presentation is important and changing your resume font might
even get you a better coding job. But changing the font doesn't make you a
better coder.

~~~
disantlor
IMO The only fundamental skill that is relevant to music is the ability to
express a particular feeling, and effects can "completely change" one's
ability to do that.

The only way you could quantify the type of skill you seem to be referring to
is to entirely remove any degree of expression or improvisation (or effects)
and boil it down to the raw performance data. You may then succeed in
determining who is objectively a "better" musician but you've lost all the
aspects that make a Beatles song based on a I IV V chord progression sound
different than one by the Velvet Underground, or anyone else.

EDIT: also, yes to some degree that is how harmonic distortion works, though
the particular harmonics and amplitudes of those harmonics vary widely and
there is often some filtering added to reign in those harmonics in a
particular way. Distortion also effectively compresses the signal. Sometimes
it starts oscillating and generating notes that aren't even being played. The
point is that if you're playing _with_ the distortion (not just laying it on
top) then it is changing how you play.

~~~
tacos
You're basically arguing "it's the gear" whereas any musician over a certain
age and skill level will tell you "hell no it isn't."

Instagram filters don't make people better photographers; AutoTune doesn't
make people better singers. Hell, the Abbey Road mic collection didn't make
John and Paul better singers.

By your logic you can flip anything around and make the author's argument:
Shitty British industrial towns modified their users. Beer and the tiny stage
at CBGB modified their users. Cheap Seattle heroin and terrible weather
modified their users.

Is any artistic pursuit highly dependent on the mood of the performer and her
implicit capacity to make you _feel_? Of course.

But the author did not say "playing a guitar through a stomp box changes the
way you play it" or "the Edge's signature delay effect created a new sound
that nobody had heard -- or felt -- before." He said "guitar effects have
modified their users" and "new effects can change a guitarist's playing
ability completely."

If you read the entire article you'll see the author suffers from an bad case
of "word salad" and these are not meant as debatable nuances. He's suffering
from dysgraphia, ignorance of the subject matter as a whole, and a really bad
editor.

Which is why he defines "clipping" with a phrase lifted from the Wikipedia
article for digital clipping, applies it to slicing a speaker with a razor
blade, then claims rotating motors were picked up by speakers in a Leslie
after being picked up by "coiled magnets" in a guitar pickup.

------
cnvogel
...they could at least have used a proper example. Their "squiggly sine lines"
don't make sense at all:

"""But add them together, and that pleasant sounding chord actually looks
altogether more messy, like this:"""

No, it doesn't look like this, at all.

------
jrlocke
Or almost all forms of recorded audio for that matter. It is basically
miraculous that a single speaker can create the impression of an orchestra; it
obviously relies on an implicit instantiation of this same principle.

~~~
kazinator
It's miraculous that pressure variations on your ear drum reproduce the
impression of the orchestra; the speaker just builds on that.

~~~
jrlocke
From that point of view a speaker makes perfect sense, and that insight points
to an even deeper importance of implicit versions of this transform.

~~~
tacos
I don't know what that means. But "Two ears = two speakers" must really blow
your mind.

FFT is closer to what's going on with Cochlear Implants. Which both suck for
some of the same reasons.

------
dwarman
certainly could, and did. What enabled digitization and reproduction of
digital audio were the works of Nyquist and Shannon, work that showed how it
would work. FFT is an elaboration useful for filtering and spectrum
contouring, and for compression. But "Digital Audio" is not a synonym for
"Compressed Digital Audio". And digitization had to be a precursor for
applying an FFT filter to the digitized result. FFT is _not_ used to implement
the digitization itself, and is useless on its own with the digitzed sample
stream to work on.

------
adaml_623
What about Wavelet transforms:
[http://en.wikipedia.org/wiki/Wavelet_transform](http://en.wikipedia.org/wiki/Wavelet_transform)
?

I think it's just a total different approach to compression that doesn't use
Fourier.

~~~
fl0wenol
It's misleading to say that they don't use Fourier.

The theory around any family of functions useful for compression/feature
detection (like wavelets) is going to have the property that they define a
Hilbert basis. And the idea of how to conceive of such families of functions
and their potential required them being generalized from the specific cases of
the Fourier and Laplacian transformations. Moreover wavelets have
properties/tradeoffs defined in terms of time and frequency which are couched
in terms and based on theories that are derived from this early complex
analysis.

------
trhway
complete space with an orthogonal basis ... without it digital music would be
less of our worries (well, if "we" ever existed)

