This is a very infuential video. I often see it referenced when digital audio gets explained.
It is also very insiduously misleading in a way that is hard to fault it for.
That "band limited signal" that uniquely satisfies Niquist theorem? That is an infinite, periodic signal.
No finite (e. g. a song), aperiodic signal can be band limited. That includes any signal with transients.
Well, how big is the difference? How much overhead/error/lookahead is needed to approach the Niquist result? It is never mentioned by people referring to this theorem when talking about audio signal sampling!
While no finite audio clip qualifies as bandlimited, the Nyquist theorem cheats by assuming that the audio clip repeats indefinitely. Doing so results in sharp frequency lines, separated by gaps of zero. Each frequency line lies on an integer multiple of the audio clip's length, the fundamental frequency.
Equivalently, every finite audio clip has a time-discrete Fourier transform.
Mathematically, an audio clip of length T seconds at a sample rate S Hz must have DFT coefficients separated by 1/T Hz, with a maximum frequency less than S/2 Hz. For example, a 1 second clip at 48000 Hz has DFT coefficients between [0,24000) at every 1 Hz. By increasing the length of the audio clip, the frequency resolution increases.
In asking for the error, you ask for the values between the discrete Fourier coefficients. What happens outside of the audio clip determines what happens between coefficients. If the signal repeats, interpolate zeros between coefficients. If the signal goes to zero (not exactly bandlimited), interpolate a sinc function summation between coefficients (this has to do with summation of the rectangular/boxcar function).
How much overhead/error/lookahead is needed to approach the Nyquist result? Theoretically, none. But in practice, perfect filters don't exist.
In practice, how big is the difference? In order to properly record a real waveform, the signal must go through a physical low-pass filter, or else risk unbounded aliasing. The answer depends on the filter specification. I pulled up a Realtek ALC892 datasheet. When sampling at 44100 Hz, a -1 dB passband at 20158 Hz, and a -80 dB ADC stopband at 24916 Hz. Yep, that allows aliasing to pass through, yet it remains somewhat passable. No surprises from a cheap chip. Hence the importance of oversampling during recording or reconstruction in better chips. The audio files themselves don't need it because the error comes from imperfect filters.
It does! Thank you a lot. But I still don't know the name for the hard part (how to quantify the "almost" of the reconstruction filter)
Making signal band-limited by repeating it does a nice hat trick: it makes any part of the signal depend on all preceding and all following samples. That effect is not insignificant -- sinc(x) decays as 1/x . This requires that low-pass filter either to generate long pre-ringing tails, or to emit way above the cut-off.
I'm sure analog filters have pre-ringing too (by delaying the peak), I just don't know how it is called.
But, in short, sampling signal and then reconstructing it causes it to "travel back in time". Trying to limit pre-ringing messes up phase. Nothing in there looks like the ideal iFFT case, and it is hard to find accurate information about all of this.
Chapter 14 demonstrates how digital filters can achieve perfect linear phase accuracy. Chapter 17 demonstrates how digital filters can also compensate for the phase inaccuracy of physical filters. Blew my mind when I read that. I can't thank you enough for spurring me into reading about this.
The book calls "pre-ringing" ripple and overshoot. Ripple happens on both ends, the ringing before, and the delayed peak after. (See the step response examples in Chapter 14, page 267.) We also call it the Gibbs phenomenon, a necessary effect of a bandlimited Fourier series reconstructing a discontinuous waveform. Analog filters don't have ripple because of they exhibit exponential decay to a step function, necessarily a non-linear phase response. Symmetric ripple and overshoot in the filter step response demonstrates a linear phase response (Chapter 14, page 268). In fact, I would call the pre-ringing a desirable trait for dealing with audio: a reconstructed bandlimited phase-accurate signal will enter the ear in the same way as the original non-bandlimited signal (unless >20000 Hz actually matters). Additionally, we likely want a windowed sinc filter (Chapter 16), which deals with the 1/x ripple decay of the sinc function with acceptable stopband performance.
So, with all of that, I see how modern codec chips can achieve a nearly linear phase response in the 0-20000 Hz range, with -80 dB of aliasing noise at 20000 Hz, during both recording and playback.
For recording, these modern chips likely use a cheap physical low pass filter, followed by sampling at a high rate, followed by a digital filter, followed by downsampling. For playback, do the reverse: the chip likely uses upsampling, followed by a digital filter, followed by a delta-sigma circuit (or some other DAC), followed by a cheap physical low pass filter. Check out the block diagram of the ALC892. https://www.alldatasheet.com/html-pdf/1137676/REALTEK/ALC892...
I think the constraint of using a band limited signal is the big misunderstanding many people have in regards to digital audio.
Yes, you can perfectly reproduce a band limited signal as long as the highest frequency is below fs/2.
But to get a band limited signal from a “real life” signal without any artifacts can be trickier than one might think. Especially when the Nyquist frequency is near the limit of human hearing.
And this is the one big argument in favor of Hi Res audio - moving those filter frequencies high above the hearing threshold.
>And this is the one big argument in favor of Hi Res audio
It's really not. For redbook, fs/2 is 22 kHz and while human hearing maxes at 20 kHz, it's not a hard cutoff: combine our low sensibility to high frequencies (cf ISO 226), the average listener's hearing not going much further than 18 in reality and frequency masking by other simultaneous notes and the small aliasing/imaging issues near fs/2 aren't a real problem.
But the real important factor rarely mentioned is material: the amount of music with meaningful content at a meaningful volume at those frequencies is statistically negligible.
Hi Res audio is snake oil designed to sell the same thing multiple times, period.
"Hi Res audio is snake oil designed to sell the same thing multiple times, period."
The "period" is unwarranted, because there are too important caveats. Firstly, it has been extremely common for albums to be sold with very compressed dynamic range, assuming the average consumer will be listening in noisy environments etc. However, the mastering supplied to Hi Res shops sometimes lacks that compression, so that is where you can hear the album with room to breathe.
Secondly, the SACD format allowed adding a layer for 5.1 surround sound. In classical music, this is especially important for works where the performers are spread out around the hall, not just all on stage in front of the listeners.
So, with Hi Res the higher frequencies and 24 bit depth are snake oil, but the ancillary benefits are audible to anyone with a good listening environment.
> Firstly, it has been extremely common for albums to be sold with very compressed dynamic range, assuming the average consumer will be listening in noisy environments etc. However, the mastering supplied to Hi Res shops sometimes lacks that compression, so that is where you can hear the album with room to breathe
I had a friend who was extolling the virtues of Hi Res for the pop music he was buying so I asked him to send me a track, and it had the same brick compression as the standard iTunes version and sounded just as flat (I was hoping that even if it was compressed the same, the extra resolution meant that you could recover the detail, but there wasn't an audible improvement).
If that's what they want to sell, they need to create an actual term for that, like the audio version of "Director's Cut", not just sneak it into some random Hi Res releases and hope you find "the good ones" while the rest are snake oil.
> However, the mastering supplied to Hi Res shops sometimes lacks that compression, so that is where you can hear the album with room to breathe.
I've almost never heard of Hi Res with a totally new master that wouldn't have been previously available as CD, to be honest. This isn't common, right?
> the SACD format allowed adding a layer for 5.1 surround sound
Well, yeah. Too bad I don't have a PS3 to rip the SACD layer =)
"I've almost never heard of Hi Res with a totally new master that wouldn't have been previously available as CD, to be honest. This isn't common, right?"
It has been a few years now since I did all this collecting, but I do remember instances where the CD was brickwalled, but both the Hi Res downloads and the vinyl release got a mastering with more dynamic range.
> Hi Res audio is snake oil designed to sell the same thing multiple times, period
There's a karaoke bar I go to with "Hi Res" logos on the speakers. These are basically MIDI files, in a loud bar atmosphere, who is going to hear the difference, haha.
The only thing really in favor of hi-res audio is that is allows you to have rather lazy circuit design. You can have a super lazy 20 kHz -> 48 kHz anti-aliasing filter that is cheap... or you could just properly make a nice 20-22k sharp filter, and stop wasting all that bandwidth on worthless doubled sample rate audio.
In reality, there is absolutely 0 use for digital audio sampled above 16-bit, 48 kHz for listening. (44.1 kHz is fine too, I guess, but the sample rate is annoying for compatibility with a lot of modern systems) It has uses for music production, but that's about it. Final mix should be 16-bit/48k.
That's unrelated. This difference is the inter-aural or inter-channel difference and 16/44.1k can capture that to much greater precision than microseconds.
Some math [1]
44.1k file containing pulses with sub-sample delays [2]
Something similar, but square wave, and nicely showing that timing precision actually depends on bit-depth and not the sampling rate [3]
Some practical experiments with capturing the playback of such files and verifying that the delay is preserved: pulse [4] and square [5]
Super thanks! This is exactly the kind of non-handwavy math I have trouble coming upon myself.
It gets much more complicated with more and modern speakers doing room calibration (at least Genelec and Neumann have their own, and Sonos also has some iphone waving trueplay thing). And just saying "Niquist!" does not help to understand, e.g., how precisely can phase be aligned by applying FIR filters, and what impact does the sample rate or bit depth have.
In practice it doesn't really matter. Most good converters have pretty sharp cutoffs on their anti-aliasing filters, so any aliasing is severely reduced when mixed back in to the converter. Sure an aperiodic signal needs infinite bandwidth to 'accurately' represent it, but you're quickly below the noise floor.
Sure. And having that 2+ kHz headroom surely helps. Still, it is a bit jarring to almost never see reconstruction filter mentioned (except for sigma delta, where you simply cannot ignore it), and always see appeals to Nyquist theorem as a mathematical proof, even though it does not apply.
Taking the opportunity, how is the math called that actually does apply to this case?
Every ADC/DAC used in audio in existence has an anti-aliasing filter, though. The Xiph video even talks about band-limiting signals with the 21 kHz cutoff filter he shows as the example.
It's not talked about, largely, because modern circuit designs already have great anti-aliasing filters that have quite sharp rejection curves.
It is also very insiduously misleading in a way that is hard to fault it for.
That "band limited signal" that uniquely satisfies Niquist theorem? That is an infinite, periodic signal.
No finite (e. g. a song), aperiodic signal can be band limited. That includes any signal with transients.
Well, how big is the difference? How much overhead/error/lookahead is needed to approach the Niquist result? It is never mentioned by people referring to this theorem when talking about audio signal sampling!
And I wish it was mentioned and explained.