
Digital audio signals and how they really behave [video] - nullc
http://www.xiph.org/video/vid2.shtml
======
metajack
Monty and the Xiph.org team have been slaving away at this for quite some
time. They put an amazing amount of work into it, and it really shows.

You might be interested to know that there were about 50k lines of code[1]
written for the demonstrations, all of which is available.

Also, don't miss the original video[2], or the article on HD audio[3].

[1]
[https://wiki.xiph.org/Videos/Digital_Show_and_Tell#Use_The_S...](https://wiki.xiph.org/Videos/Digital_Show_and_Tell#Use_The_Source_Luke)

[2] <http://xiph.org/video/vid1.shtml>

[3] <http://people.xiph.org/~xiphmont/demo/neil-young.html>

~~~
keeperofdakeys
The last video must have been more then a year ago now, I've been waiting for
more in this series to appear ever since.

------
Derbasti
Totally awesome presentation! I've never seen anyone describe this stuff this
cleanly (even though I studied it in college).

Plus, I love the audio work on that video. The positioning of his voice to
tell you what he is doing while you don't see him is pure genius. Also, note
how you can hear he is facing away from the microphone even if you can't see
him. Great stuff. I wish more videos would use effects like these.

~~~
cjbprime
> The positioning of his voice to tell you what he is doing while you don't
> see him is pure genius.

I'm told that this is pure accident based on using a stereo mic and having the
input-side equipment be on the left and the output on the right, but I thought
it was a nice touch too. :)

~~~
xiphmont
It was not exactly an accident, though the image was a bit 'wider' than I'd
have preferred. Tradeoffs.

I don't like wearing lavalier mics for this sort of thing (an extra layer of
complexity, and they have to be headsets to avoid vocal amplitude changing)
and the stereo image reduces the apparent echoeyness of the space. The main
goal was to sound better than last vid :-)

~~~
sneak
Could you perhaps make a quick write-up about the making of the video?

I was more impressed with the fact that it was produced using all F/OSS
software than anything. Other than the type on the title cards looking goofy,
the production value is higher than anything I've ever seen come out of a non-
fruity unix box.

FCP is one of the very last reasons I still have a mac...

~~~
xiphmont
There's a fairly detailed writeup about the making of the previous video:

[http://wiki.xiph.org/Videos/A_Digital_Media_Primer_For_Geeks...](http://wiki.xiph.org/Videos/A_Digital_Media_Primer_For_Geeks/making)

Only a few things have changed since then. I've done some more work within
Cinelerra (I wrote a new resampler and a new color grading filter for it this
time around). I'm also using a pair of matched suspension microphones (AT853a
Unipoints) instead of the Crown boundary piezeos, which I've decided just
aren't suited to anything I do.

Cinelerra is not for the faint of heart, and I don't casually recommend it. It
is very powerful, rather unfriendly, and picky about the formats it uses.
However, it will do things none of the other FOSS editors can.

------
pslam
This is the clearest presentation I have ever seen about digital audio in
general, even if it's not trying to be that general.

It really is peculiar how many otherwise technically minded people with
experience in DSP and/or audio somehow think there is something "magical"
outside of what can easily be proved, and this video demostrates. I don't
doubt that there will still be those who will contest the results, regardless.

I did especially enjoy the use of analog test equipment, in an effort to put
to rest any conspiracy that the talk was flawed by being digital. This must
have taken lot of time to put together, but it's certainly worth it. I'm going
to drop this URL into a lot of debates in future.

~~~
xiphmont
It was fun rebuilding that 3585. I had to wait almost six months for a working
CRT that would fit that model to float past on eBay :-)

------
ivan_ah
This is EPIC!

An entire signal processing tutorial in 25 minutes. This guy is a master of
explaining AND of using tech so the combination is very powerful. You will
probably learn more useful stuff from this video than from taking an entire
signals and systems ugrad course.

    
    
        ♫ > #ADC(soundcard) > (wav) > #DSP.compress > (mp3) > #DAC(soundcard) > ♫
    

What else is there in #DSP? Watch this video!

------
aidos
Really interesting and well presented.

As a layperson there were a couple of things I didn't quite get.

I don't 100% get the band limited signal bit. How does band limiting imply
that there's only a single possible reconstruction of the digital signal? I
can kind of picture the fourier transform meaning that there's only one
representation but it's a bit of a leap for me. Can the converter itself not
create the signal incorrectly?

Isn't there also an argument that frequencies above 19khz can be heard by some
people so need to be accurately represented?

I also don't understand how you can necessarily say that because it's ok for a
sine wave, it's ok for all waves. with the square wave, the output is a sort
of digital approximation right? Presumably that's a perceptible difference
from the original analogue version? This video was focusing on differences in
digital processing so maybe that's just not relevant to the discussion.

Would be interesting to see a complex analogue signal being fed through the
digital pipeline and then inverted against the original to see the
differences. Again, maybe that's a known difference and not really part of the
argument here.

Just holes in my understanding really - credit to this video though, it's
forced me to actually have to think about this :)

~~~
elwin
> I don't 100% get the band limited signal bit. How does band limiting imply
> that there's only a single possible reconstruction of the digital signal? I
> can kind of picture the fourier transform meaning that there's only one
> representation but it's a bit of a leap for me. Can the converter itself not
> create the signal incorrectly?

Band limiting means that there's a maximum frequency. You can think of it as a
limit on how fast the signal can change. There's only one possible way to draw
a curve that goes through all the points without changing too quickly.

It's something like graphs of polynomials. There's only one possible line
(first-degree polynomial) going through any two points. Three points
completely determine a second-degree polynomial (parabola), etc. There are an
infinite number of parabolas you can draw through two points, just as there
are an infinite number of squiggles you could draw through the digital sample
points. But you can't create a different solution without adding higher-order
terms to the polynomial/higher frequencies to the signal. The number of points
you have is sufficient to completely specify the curve.

Though there's only one theoretical solution, the converter can still "create
it incorrectly".

~~~
randy909
> Band limiting means that there's a maximum frequency. You can think of it as
> a limit on how fast the signal can change. There's only one possible way to
> draw a curve that goes through all the points without changing too quickly.

This was incredibly helpful for my understanding. This is the one corner of
this story that was always fuzzy for me. In the video he seemed to be making
this point when he drew silly stuff all over the screen but it didn't quite
click. Thank you.

~~~
TylerE
It's also the same deal as the "wagon wheel" effect in video. At 30fps, a
wheel rotating at 1800rpm (30rps) will appear to be standing still, and one's
at, say, 2400rpm and 600rpm will appear to be rotating at the same speed.

------
AlexDanger
Wow. That video is fantastic. The conceptual leap between DSP theory and audio
behaviour is not a trivial thing to understand. This is the best educational
demonstration of these concepts I've ever seen. I hope high school and
university educators embrace this shining example of online learning.

Is there a Xiph Foundation or something I can sign up for when new stuff is
published? Material of this quality is a great way to raise awareness of
Xiph's core goals.

~~~
xiphmont
Yes, there's an RSS atom feed on the video pages, and we also have a 'video'
mailing list at Xiph.org for new video announcements.

------
cmicali
Amazingly well done. Clear, great animations, and does a great job of showing
you exactly what is going on. I wish all learning material/tutorials were this
well produced.

------
tharshan09
Really great video. He goes things quite fast but he does nearly every
explanation with a demo! I can't think of many of my uni lectures in this
topic like that. I hope they release more. Is there any planned?

~~~
nullc
You've seen the first one, right? <http://xiph.org/video/vid1.shtml>

They take an insane amount of work, so they are slow in coming. This latest
one had about 50kloc of code written to build the demo software, animations,
and to improve the FOSS video editing software that was used.

~~~
Rovanion
Though I do find the choice of using Cinelerra to edit the video a bit odd.
Apparently there was an update to the project less than a year ago but I have
personally never even been able to even import a video without it crashing.

Using something like OpenShot, KDenLive or PiTiVi or even Blender only makes
sense to me.

But Cinelerra obviously worked out for them; the productions are absolutely
spectacular. Loved every bit of this one. The clearest explanation one could
hope to get.

~~~
xiphmont
The sad truth is that all the open source video editors are like that
depending on what you're doing (with the possible exception of OpenShot, which
I found to be reliable but limited).

In short, I found no FOSS video editor that did what I needed reliably out of
the box. Given that problem, Cinelerra at least came close, the code was
readable, and the design sound. Its biggest problem is that its file
loaders/exporters are all old and bitrotting so they have become unreliable
and crash-prone. For my stuff, I use raw video and so don't hit those
problems.

In the process of making Episode 2, I wrote a new compositor resampler and a
new color grading filter for Cinelerra (named Blue Banana). These should be
appearing officially in the next major release. I hope to have some more file
loader work done by then too :-)

------
aidos
There was a lot of discussion around his original article which shows how
contested this topic seems to be.

<http://news.ycombinator.com/item?id=3668310>

~~~
effn
There is a lot of ignorance in that thread though. This is one of those
subjects where you should be very careful about who you listen to, given all
the disinformation floating around on the internet.

~~~
goostavos
Honestly, the ignorance and arrogance regarding audio and its playback is one
of the distasteful things about the audio engineering/audiophile community.
Even in a room full of engineers, you will still get people arguing to the
death about things outside of human perception (My family, full of electrical
engineers is unfortunately filled with this type). For whatever reason, the
idea that their ears may be just as limited to the audio spectrum as their
eyes are to light spectrum is a concepts just too foreign to grasp.

You have places like gearslutz, which is, as far as I know (though I haven't
been there in a _long_ time), the biggest audio engineering forum out there.
The community, as a whole seems to disagree with the very concept of an AB, or
ABX test. I mean, it's approached flat-out superstition.

The thread that finally made me through up my hands and leave the place was
when a big post erupted after a very well respected member of the community
had the gal to suggest that microphones were not filled with magic, and thus
limited to physical properties. It devolved into personal attacks against the
guy, and his education, and people asserting that "science can't measure what
I'm hearing."

I hate this industry...

------
Lewton
So good I felt compelled to donate

If I learned that much from every 20 minute video I watched my life would be
amazing

~~~
gameshot911
Same here. It was so well done, so informative, so fun that I was driven to
support this and future endeavors!

~~~
xiphmont
thanks to both of you :-)

------
drudru11
I wish I could upvote this more.

I recently checked out a few online courses on Signal Processing and EE. (Not
the MIT ones, though). Khan Academy is good too, but doesn't cover advanced
topics like this.

Nothing touches this.

When I see this, I get this tremendous optimism. Online video is going to be a
fantastic complement to the internet as a learning platform.

Huge kudos to the xiph.org team for producing this!

------
peripetylabs
Monty was the first to translate FFTPACK into C, in 1996:

<http://www.netlib.org/fftpack/fft.c>

------
sesqu
Excellent video, though I do wonder about the treatment of pixels as point
samples. I've seen it before, and it appears to be a popular viewpoint among
code designers, but it doesn't agree with how I use and create pixels, or with
my understanding of how they are sampled by cameras. So it feels like a
convenient misrepresentation.

Given that, I also wonder about the point sample nature of acoustic signals,
espoused in this video. I know nothing about the sampling, but imagine a
stairstep function could be more accurate for that, just reasonably
represented as point samples transformed with the use of theoretical
understanding of sound.

The square wave example provides a further interesting representational
quandary. Given that it's a simple function, the bandlimiting results in a
surprisingly large signal loss at a notable increase in bit depth. Combined
with the signal loss that results from assuming the frequencies a priori, I
wonder if there is enough signal lost and converted to noise in the
digitization process to spend processing power looking for non-sinusoidal,
perhaps merely skewed, signals.

~~~
mpyne
The math treats pixels as point samples, because that's what sampling _is_.

You can quibble about how a CCD cell in a camera converts its narrow field-of-
view into its output but at the conclusion of the snapshot you will have a
finite set of data points from that cell (the "pixel"), nothing more or less.

His point is that _after_ this sampling process all you can say about the
original data is that the cell at X,Y recorded a value of RRGGBB (expand to as
many bits as appropriate for your camera). That is a "point sample" by
definition: An ordered pair of [ Point, Sample-Value ]. It doesn't matter how
special your code that uses or creates the pixels, this is a simple fact-of-
life of the physical sampling process itself.

Given that you admittedly know nothing about the sampling, I would imagine
that you would be careful about how you judge the adequacy of the fit for a
mapping from the sampled data to arbitrary functions intersecting with the
sample.

~~~
sesqu
Fair enough wrt/ point samples.

But I didn't see his point as being that; I saw him arguing that pixels are
correctly interpreted as representing an area of zero, which is very
convenient when you're fitting a function to interpolate between them (since
that way you don't have to correct for integrals).

I would argue that this approach is what gets you JPEG artifacts, and that
there might be an analogous effect in audio to explore, since the bitrate
avenue has been exhausted. It's perfectly possible that more accurate bases
for the decomposition step would result in unjustifiable bitrates, but I want
to hear of it.

As for judging the fit, my aim is to judge the mapping from input to output,
not from sample to output (the ideal mapping from sample to output would of
course be an identity function). This naturally requires recognizing some
assumptions.

~~~
chipsy
JPEG artifacts are unrelated to the definition of pixels. They're an artifact
of using lossy compression, much as how MP3 exhibits some distinctive
artifacts at lower bitrates. Monty is only talking about plain old
uncompressed sampled data.

Since you're focused on the "size of a pixel" you might also be thinking of
resampling artifacts, which is also beyond the scope of the video. Resampling
is the process of changing the effective sample rate of data by reconstructing
a new, larger or smaller set of samples from the original data. Monty alludes
to this when he says "the samples are not stairstepped", because when we go to
reconstruct samples we have to figure out a method that is NOT stairstepped,
but rather:

1\. Reasonably represents the original signal 2\. Remains band-limited

Fortunately there's theory that explains what a mathematically ideal resampler
should look like. It's good study material for thinking about sampling theory
in more depth and understanding some outcomes that develop from defining
samples as infinitely small points. It also explains why pixel art doesn't
respond well to standard image resampling algorithms.

~~~
sesqu
JPEG artifacts are separate from the definition of a pixel, yes, but not to
the extent you propose. The artifacts are specifically a result of pixels
becoming corrupted during the compress-decompress cycle, and feature most
prominently where the source function has discontinuities. Treating a pixel as
a point sample of a smooth function removes our ability to consider such
things.

Quantization is a form of lossy compression, which is why it adds noise. Monty
certainly spoke of that, and of randomization as a recourse in the face of
systematic error.

I wasn't thinking about resampling, no, but it certainly fits the topic. Could
you expand on how the ideal resampler explains why standard resamplers are
occasionally poor, and whether this can be affected by treating the sample as
filtered?

~~~
mpyne
> The artifacts are specifically a result of pixels becoming corrupted during
> the compress-decompress cycle, and feature most prominently where the source
> function has discontinuities. Treating a pixel as a point sample of a smooth
> function removes our ability to consider such things.

The pixels are not "corrupted", the source pixels are instead deliberately
replaced by a pixel set that compresses better. The way the JPEG encoder
chooses this replacement pixel set is why you see the artifacts in areas with
significant edges: JPEG deliberately removes the image counterpart to high-
frequency data, much the same way that MPEG audio encoding removes high-
frequency audio.

What you see is the "Gibbs effect" in the visual realm from deliberately
throwing away that high frequency data, which is not at all due to "treating a
pixel as a point sample" (since as before, that's the very definition of a
pixel, a value at a point). The solution is easy enough: force JPEG to keep
the extra data, by driving the quality to the maximum permitted level (at the
penalty of horrible compression).

JPEG has other artifacts too (e.g. you can see the block size used by the
encoder), but those are also not inherent to treating pixels as point samples
but are instead inherent to the design of the encoder.

~~~
sesqu
Deliberate corruption is still corruption. As for your comparison to MPEG
audio, I feel that's symptomatic - high-frequency components are perceptually
negligible in audio, but stand out in video (until the stroboscopic effect).
You wouldn't want to use the same encoder for both.

The Gibbs effect is absolutely due to discontinuities not being gracefully
handled by a Fourier transform. Luckily point samples from a continuous
function don't have to worry about that like a fully defined discrete function
might, which is why I expect people want to define pixels as the former.

------
randy909
I've heard that one of the problem areas is that you can't make a perfect,
colorless filter - one where 20KHz is allowed through but 20.001 is not. A
filter has to slope down at least somewhat gradually. The more steep you make
the filter, the more it colors (distorts) the signal around the cutoff
frequency. One of the purported benefits to a higher sampling rate is that you
can make the high frequency filter with a more gradual slope which introduces
less distortion.

I realize any differences in this area would be very slight if they exist at
all. Anyone know more about that?

Edit: Apparently no one can dell a difference in listening tests with well
made equipment. Theoretically if a device was built using an extremely cheap
filter you'd be better off with the higher sampling rate, but I can't find any
real examples.

------
femto
In my experience, another source of great DSP explanations is Professor fred
harris [1]. I attended a 2-day lecture by him (20 years ago) and it was
beautiful in its clarity.

[1] <https://en.wikipedia.org/wiki/Fredric_J._Harris>

------
tshadwell
This is absolutely fascinating, and completely transparent.

------
teeja
That series of demos shure puts a few old bedbugs to rest.

(Sorry analog purists)

------
xinsight
Youtube version:

<http://youtu.be/d7kJdFGH-WI>

------
radd9er
Amazing presenter. I will be following this guy.

------
mckoss
Wow!

------
Karn
That was the digital audio version of the Feynman Lectures on Physics. After
years of being bombarded with mediocre TED talks, this is a shining example of
pure excellence.

