I’m not sure how big that market is these days, though. I recently decided to move from a block in the city to a house in the surrounding countryside purely to get a quieter listening environment and let me really enjoy the high-dynamic-range recordings I have – I listen to a lot of classical music, especially the avant-garde with its even greater range, e.g. the Ligeti cello concerto starting from pppppp. Yet even among my friends who are really obsessed with such music, seeking a better listening environment seemed an extreme measure to take.
So, people who a generation ago would have invested in higher-end equipment (not audiophile snake oil, just better speakers) and who would have sought silence are now giving in to listening to music on their phones or cheap computer speakers. It’s a big shift.
Music streaming services have since become a very important music listening medium. Spotify, Google Music, Apple Music and others do normalization of these loudness levels. This neutralizes the loudness wars as it makes loudness wars treatment of music useless, at least when listened on streaming services . Music mastered in the loudness war will still have problems with dynamic range but the perverse motivations causing the war are simply not as relevant for new music.
I agree the loudness war was a huge problem a few years ago, but changing trends in the ways music is being listened to is solving it. The way you present the loudness war problem is therefore somewhat out of touch.
It is especially noticeable if one listens to "modern" vs older CDs vs the radio. I suppose Producers now feel compelled to exercise subwoofers and thump the audience, or compensate for crappy earbuds in the target audience, or maybe that's just what people want because even radio commercials are complicit -- it's difficult to listen to someone talk with "thump/thump/thump" drumbeats in the background. Often, I simply turn it off.
In my car, I have the bass response dialed back by about 50%, and even dip the mid-range by about 15% to get what sounds to my ears like a flat response.
I have mild hearing loss from the 250 to 1k range, so my ears are already attenuating the signal -- I can only imagine how "bass heavy" it must sound to someone with normal hearing!
There is nothing inherently good or bad in music production, no timeless rules as to how much treble or bass, compression, distortion, reverb or anything else you need. It has all been done to the extreme and what is deemed good is subject to constant change.
My personal example of a song where it doesn't work is Johnny Cash's "Hurt". Around 3 minutes into the song, Johnny Cash's vocal noticeably distorts. From my perspective, the distortion is absolutely a loudness war phenomenon: if you look at the waveform in an editor, the song starts off pretty "hot" given that there's a very noticeable crescendos at the end. At the end of the song, where the music is pretty much maxing out, there is no bandwidth left. There is no other option to stand out but to push Johnny's voice to distortion.
I've seen people in message boards say the distortion in the vocals adds "intensity". Personally, I'd love to see an un-Loudness War version where Johnny Cash gets several dB more to cleanly sing over the music. Where dynamics, not distortion, is what is driving the intensity. For my tastes, it would be much preferred.
: https://en.wikipedia.org/wiki/Loudness_war#Examples_of_.22lo... (not including the 'remasters')
I found the album a bit tiring to listen to because of the continuous loudness. No particular parts really stood out to me, because everything was just sort of loud and it just seemed to go in one ear, out the other. I could pay real close attention, one can always listen better :) But again it is a bit tiring and you can't do much else.
Then again, I just really prefer their first 2-3 albums, which have quite a different sound altogether.
And I'm curious which Oizo albums you're referring to? I love his stuff, and yes some of his tracks are quite loud, (but not all the time, the whole track), but they never quite struck me as a typical "loudness war" type of loudness. Unless I'm thinking of the wrong tracks here (No Day Massacre? Last Night a DJ Killed My Dog?), he seems to like to hit a well sound-designed overdriven bass buzz, with not too much else playing at the same time (and where it does, close attention to envelopes, cutting off hits, ms transient timings) if you do that right and just normalize to max amplitude, you're 95% of the way there (at which point my experience is that compression on top of that usually fucks up that careful detail work, but maybe I need more practice or a different plugin). Possibly I'm thinking of the wrong tracks here, at least you gave me an reason to re-listen his stuff with a different ear/attention (loudness sound-design), which is always interesting :)
You'll need an exceptional good clock to start with, and all other equipment needs to align to that clock. Then all plugins/processing you use needs to be in the same 24/192 domain, otherwise your signal is reduced to the limit of that plugin/processing and all previous efforts are lost.
Most music producers use samples, most are 16/44, so what's the point to try to get that to 24/192, filling the signal with zero's..
If a piece of music is in very rare occasion truly 24/192 then the listener who downloaded the track still needs a exceptional good clock (that are both expensive and hard to find) to playback without signal reduction.
IMAO 24/192 is just a marketing thing for audiophiles that don't really understand the implications. 24/96 should be a reasonable limit for now, although personally I think 24/48 is enough for very high quality audio.
> Most music producers use samples...
Most people interested in better-quality sound in this particular context aren't listening to contemporary electronic music with samples. 24/192 or SACD is so desirable for reissues of older recordings in pop or jazz genres where those formats were mastered with higher dynamic range, while the available CD versions or lower-bitrate downloads were mastered with loudness-wars compression. The format is also attractive to classical music listeners, because SACD gives you multichannel audio; and some classical labels are now giving loudness-wars treatment to the non-SACD or non-24/192 formats of a particular new release.
In the studio, I would say that 24 bit at least should be the norm for recording purposes.
24 bit recording gives you very noticeable increased headroom (about 20dB). This gives you quite a bit more flexibility recording lower levels without concerning yourself about the noise floor. The difference isn't huge for most prosumer setups in practice, but given that the processing power and storage power of computers makes recording in 24 bit trivial to do, there really is no reason not to record 24 bit these days IMHO.
Sample rate also comes into play, mainly if you have older plugins that do not oversample. Some of the mathematical calculations involved, particularly if they are quickly responding to audio changes (eg limiting / compression, distortion), or are using "naive" aliasing-prone algorithms (eg naive sawtooth wave vs. something like BLEP / PolyBLEP etc.), can introduce frequencies beyond the Nyquist that may translate into aliasing. These days, I would say most plugins do oversample internally or at the very least give you the option to do so. There's also a VST wrapper to over-sample older plugins as well (http://www.experimentalscene.com/software/antialias/). So I do not think recording over 44.1kHz is very necessary these days. I don't discount opinions from people that recording at 192kHz "sounds better", though, given the possibility that they are using plugins that are prone to aliasing at 44.1kHz rates.
I personally do not see any benefit of 16/44.1kHz for playback most recordings. Maybe 24 bit would be useful for heavily dynamic music (one of the few categories where you generally find this is orchestral music), but I'm thinking even for here the 96dB range of 16 bit audio should be enough for most cases.
To be fair, that only applies to the digital part of your signal chain. The analog portion is going to have nowhere near 24 bits of room above the noise floor.
The article is pretty clear that 24/192 can be reasonable for production-- it's just not reasonable for playback.
But your arguments aren't quite right, IMO. If you have a 16/44 sample, and you don't play it at full volume, you get some use out of those extra bits. Especially if you have a volume envelope.
Also many modern samples are actually saved as 24 (or 32 bit even). Especially if they're my own creation from noodling around with softsynths, but they're shared like that as well, obviously.
Then, if you apply a plugin or effect that supports 192/24 output, on a 16/44 sample, you still get useful information in those additional bits, even if the sample did not. Think of the very quiet end of a reverb tail, for instance.
But that's for producers. It's always good to have your working stuff in high quality, you never know how far you'll end up amplifying those low order bits.
So I can see the use for 24 bit audio (in certain contexts), but I'm really not so sure at all what the 192kHz is good for. Since it's waaaayyy above the human hearing range, all I can think of is dithering. You can hide a lot of your dithering noise in the ultrasonic range (which almost seems like free lunch IMHO) and then ... you obtained even more (virtual) bits of fidelity! Which you didn't really need cause you were using 24 bit audio in the first place.
I agree it's mostly a marketing gimmick, otherwise.
the 24/192 downloads or SACD releases are often the only way to hear the album with real dynamic range.
Also true? The completely absurd fact that sometimes a vinyl rip of an album will actually have the highest dynamic range, even though vinyl has a much smaller dynamic range than 16/44 audio. Bands often use "loudness wars" mastering for the digital release and then proper mastering for the vinyl release.
Back in the day you'd occasionally see remastered "gold" discs released. The advertising made a big deal about the disc material. Those probably sounded different too (they managed to sell them at a great premium) but with those, they sounded better because of the newer remastering, not the disc technology.
It's certainly possible this is the case with some of those releases remastered for SACD as well. The label probably didn't give Green Day a huge amount of money for production when they made Dookie, for example, but it eventually sold 20 million copies and additional production made sense. If it sounds better to the listener it's a real benefit, too, but it is quite likely not down to the playback technology.
Two oddball things I've noticed about remasters in the last couple of years: there's some agreement out there that the newest remaster of Hendrix's records is not the best. And King Crimson, who has an audiophile following and decorates their catalog with remasters with alarming regularity, removed a little dynamic range (an oversimplification) from the newest remaster of Larks Tongues in Aspic when remastering the CD and mastering for DVD/A because people very understandably complained about the (technically very good) previous version being too quiet. Audiophiles say they want dynamic range, but...
The label probably didn't give Green Day a huge amount of
money for production when they made Dookie, for example,
but it eventually sold 20 million copies and additional
production made sense.
No. It's still entirely silly. The reason? The dynamics are unrecoverable once squashed by a limiter or compressor during the remastering process. The fidelity of the delivery medium is moot after that happens.
If the SA cd promised that they reloaded the source tape/protools/whatever DAW was used and remixed/ remastered the songs to actually have dynamic range then I would be interested. As far as I am aware this isn't happening, and is implausible for any record of considered a classic
Many SACD reissues do go back to the source tape. This is a frequent cause of complaint with SACD reissues of classic jazz recordings from the 1960s: sometimes you get better sound in terms of dynamic range than any previous CD issue of that recording, but in the meantime the source tape may have deteriorated in parts.
Even with recordings from the “loudness wars”, there is sometimes room for dynamic range improvement when remastering. A good example is Rush’s album Vapor Trails. This was an infamously botched recording upon its original release, on a scale with Death Magnetic. Because loudness-wars treatment plagued the original tracks before mixing, the damage could never entirely be repaired. However, the additional process of compression applied to the source during transfer to CD could be reversed, so the album was eventually reissued as Vapor Trails Remixed, and while still flawed, that reissue has a lot more room to breathe than the original CD release.
"Remastered" reissues on the other formats – CD or lower-bitrate downloads – are nowadays expected to be listened to through cheap earbuds or speakers and perhaps in noisy urban environments. So, the engineer applies "loudness wars" treatment, compressing their dynamic range so the listener can still hear the quiet parts even with all the noise around them.
The other poster is not saying otherwise, though. They're just saying that "hi-def" formats, while technically unnecessary for end-user listening, are often the only way to obtain a decently-mastered recording.
There's no technical reason for things to be that way. But that's how things are.
It's sort of like buying a new car. You want the more powerful engine? Well then you also have to buy the package with heated seats, the moonroof, and the fog lights. There's no technical reason for it, but that's the only way they'll sell it to you.
It's also what this entire discussion's about.
Of course formats above 44.1/16 are useful for professional work; nobody's ever said otherwise. Just like graphics professionals work with high-res images and lossless formats even if the end product is a .jpg on a web page.
I noticed a similar thing with TV a few years ago. Despite watching a standard def. channel on a SD TV some programmes had a very noticeably better image quality. I think these had been shot and edited in HD and although the 'last mile' was still in SD there had been less degradation in the processing so the final picture was much better.
the mastering engineer has to be approved and there are some minimum dynamic range standards
also 24 bit (but not 192khz) master files have to be supplied
reportedly some of the streaming services (Spotify, YouTube) are now applying some 'loudness normalisation' which will bring some of the bad loud masters into line (it won't restore their dynamic range but will make them similar loudness to other tracks)
the loudness wars were never about what's good for listeners, but rather a competition for tracks to sound louder than other people's tracks when played consecutively on radio or in your music player
and the iTunes files are 256 kbps AACs, you can't hear the compression
remember that 'compression' in this context is data compression and not audio compression (which acts directly on the dynamic range of the source)
I can most certainly hear the compression when compared with CD digital audio.
I fully understand the difference between data compression and dynamic range (not audio) compression.
What I'm saying is, lossy data compressed audio formats are already compromised enough to rule it out as a medium for audiophile use. Worrying about the dynamic range at that point is moot. It's going to be played on tiny, terrible sounding speakers.
with 256kbps AAC, really? yeah MP3 is old and even at 320kbps it throws away everything above 18kHz (which I can't hear personally, but some people can). However AAC is newer and better and blows MP3 out of the water (so do OGG and OPUS, btw). We got a lot better at psychoacoustics since MP3 started out. I strongly doubt you could hear it in a proper A/B test.
Then, 24/192 is mostly a "weak signal" to help you estimate if the audio was treated with care.
16-bits = 2^16 * 0.01 = 655 discrete levels
24-bits = 2^24 * 0.01 = 168000 discrete levels
To help you understand, imagine 1-bit audio. What would it sound like? For each frequency, you can only have a single volume, i.e. it's maximally compressed.
I'm not sure the 1-bit extreme helps with understanding this, particularly because the audio we'd be dealing with will be a mix of many frequencies; the waveform is an aggregate of the overall 'signal' at each frequency present in the mix; this is one of the reasons why multiband rather than single-band compression has become so popular (as it allows you to get the 'max' in each frequency band rather than one band dominating the overall result of the compression).
I think there's a difference to take into account when considering any given momentary sample and the overall effect - yes, compression does reduce the dynamic range of the music, but you would need some sort of variable bit depth which was linked to the instantaneous amount of compression being applied to get any kind of workable lower-bit-depth encoding scheme, which seems like a lot of complexity for no significant gain (to me?).
> Compression means it's effectively using less of the [dynamic] range and the dynamics [range] could be represented with fewer bits in an optimal encoding.
Lower DNR can indeed be encoded with fewer bits per sample.
The meme "Math class is hard, let's go shopping" is only slightly apocryphal. Two of the voicebox lines were "Math class is tough" and "Want to go shopping?"
Because we only have 8 bits per channel, we all can see banding in (dynamically) compressed images. That's what increasing the depth improves.
You cannot buy good recordings in other formats, in this format you can. So there's a market - not a very big one as such, and maybe created for all the wrong reasons, but it is there.
The problem here is that the "you" in this sentence is not "me", or even an entity I can really influence, much less control.
It's not really even the audio engineer whose boss is telling him to turn up the volume so high that only the top few bits of the 16-bit recording are in use. The boss is saying this so that the song can be heard when the user turns the volume down because of the obnoxiously noisy commercials on the radio. Those commercials have found that they're more effective when they turn the volume way, way up. And they don't give a whit about the quality of their audio as long as they can sell more cars or tickets or whatever, much less the quality of the songs that play after them, much much less the quality of the songs that someone downloads to listen to offline in a nice listening environment without commercials.
The solution isn't just "turn down your compressor ratio", there's a big, hairy Nash equilibrium/politics problem that can be bypassed by offering 24/192 as a secondary product. If you want to remaster it to 16/48 after downloading, you're welcome to do so.
Doing this process in 24 bit gives you a large margin of error to play with. No real point to keeping that for the recording people are going to listen to
Yes, those high dynamic range releases usually get published in 24/192. No, the fact that they get published in 24/192 does not, as far as human hearing goes, add anything to the dynamic range or otherwise the fidelity of the recording.
Since the correlation is so strong, it is of course entirely understandable that people assume causation exists.
Is there a technical reason why the mastering is so different for the two mediums, CD versus vinyl?
More often than not these days, the same compressed master is used for the vinyl. To combat the groove-jumping problem, the overall level is simply dropped.
Thus, people started mastering CDs for loudness.
An alternative idea of mine is simpler. Loud music is considered worse than quiet music (quiet music sounds worse, but loud music still does, and also bothers other people). So, when you need to pick a volume setting for your collection, you bias towards setting it lower, so the really loud ones don't become too loud. Thus, the quieter CDs are annoying because they always sound quiet, whilst the load CDs sound about right, because your volume is much more suitable for them.
I guess special purpose releases don't usually end up on the radio so they can be mastered for people who actually appreciate music. ;)
As ever this difference can be impossible to detect if the equipment and environment aren't of a sufficient fidelity/quality.
What kind of gaps? There are no gaps. Sampled signals perfectly represent the original wave up to half the sampling frequency. Analog systems are inevitably limited in their frequency response as well, so, given the same bandwidth, there would be no difference at all.
In the real world, imperfect A/D and D/A conversions are typically still far less destructive than all the mechanical and electromagnetic sources of noise that affect analog systems. You can't consider one but not the other.
I think you're right that recording equipment has a long way to go, though; regardless of format I think people can relatively easily distinguish real acoustical instruments from recordings.
Although in any case, nothing justifies the pricetag dished out to audio enthusiasts.
It's only at higher tiers that the difference might really be imperceptible to many ears.
A 200$ pickup is certainly better than a 20$ needle, maybe even 10x better
The difference between a 200$ pickup and one that goes for 2000$ is miniscule. There certainly is a difference, but it's never as big as between the 20$ and the 200$ model.
That said, there are listeners who believe in the value of a 2000$ pickup and derive a lot of enjoyment from the difference to a lesser model. Who am I to say they're wrong?
Now when it comes to a manufacturer of very expensive cables (for example): Don't make me laugh...
Two things come to mind though: voltage/current delivery of your amp, and damping ratio.
The first depends on the characteristics of your amplifier. Some are better at delivering current, some voltage. The lower impedance (32) is better suited to high-current/low-voltage sources, which includes most portable devices, phones, etc. Conversely, the higher impedance (250) is better for high-voltage/low-current sources like tube amps.
The second is about the ratio of the headphone impedance to the amp output impedance. You want a high ratio, so if your source has a large output impedance then the higher impedance headphones will sound better. Good headphone amps sometimes specify the output impedance, or you can measure it.
Headphone outputs of mixers fall into this category as well. Proper audio interfaces and sound cards have no issues at all driving 250 Ω to deafening volumes. Laptops no issues as well (for me).
You are misconstruing Monty's argument here. He is very much against mp3...in fact he says he could tell the difference between high bitrate mp3 and 16 bit 44k wav. The real point of the video is that 16 bit 44k wav is beyond sufficient...don't need to go beyond that to 24-bit 192kHz.
In the early days, when all mp3 encoders were pretty bad, I could tell which encoder produced a given file. mp3 encoders today are vastly better. I've not been able to do that party trick in a long time.
But 256+ and I certainly cannot tell the difference reliably.
Pretty much found the same thing for myself at 256+.
CD quality is the very least you could want for a serious big club or theater system (much less auditorium). Between peaks and the requirements for deep bass, the peak in a digital audio file is (a)much farther above the body of music than you'd think, and (b) should never be reached, because that's clipping.
People routinely behave as if the theoretical maximum dynamic range of a Red Book CD is relevant to anything. It's incredibly easy to play back music loud enough that the noise floor will get obnoxious and relevant to listening, it's only 96 db down. Any small system in an enclosed live space can push louder than that. Cranking music over headphones will blow way past that and you won't even hear the peaks, but you'll be able to hear how bad the subtleties (or lack of same) on 16 bit red book CD are.
Electronic music, especially live, is totally relevant to high resolution audio. I more than suspect some of the big name acts (for instance, Deadmau5) are using converters to the mains running at not less than 24/96. Certain synthesizer mixes are very revealing of faults in the playback. If the live performance over a modern PA sounds absolutely huge and awesome, but not strained or grainy, then they're not using CD quality. The SPLs are more than enough to make these distinctions obvious.
Anyone can get these SPLs over headphones, trivially, and headphones that'll handle it cleanly are only a few hundred dollars.
(Personally I downloaded some of those killer samples and couldn't tell the difference, but other people reliably tell them apart in an ABX test.)
I ABXed 320kbit mp3 from an uncompressed original, I think 9/10 IIRC. It was a recording of castanets, and listening for frequency response differences was useless so I keyed off of 'personality' differences in the sounds and did it that way.
I was also just as horrible at detecting wow and flutter, as I was good at detecting lossy compression 'changes of sonic personality'. Comes from my experience being with analog audio, which is less true of people these days.
The idea that 'the best' listeners cannot tell 256k from even something as limited as 16/44.1 is ridiculous. Any mp3 format… any lossy format… is seriously compromised. Talk to techno/house DJs and producers about how useful it is to perform off mp3s, this is not a hypothetical argument.
Usually what is happening here is you're getting a master that hasn't been compressed to death (see loudness wars). Vinyl is a shit source for 'quality', but most records aren't compressed to death so they can still sound better on a good set of speakers due to the dynamic range.
* With analogue equipment the latency between the input and output of the effect is often below 1 ms (unless the effect is supposed to have a delay).
* Standalone digital effect equipment that processes the samples one at a time can also have a latency below 1 ms. (48kHz ≈ 0.02 ms)
* If you use a computer for real-time effects, the samples are not transferred sample by sample from the audio interface, but in a block of many samples. The number of buffers and the size of them can usually be changed by the user. With a buffer of 1024 samples, the oldest sample is already about 21 ms old before the computer can process it. After the buffer is processed it has to be transferred to the audio interface, and that will add another 21 ms. So the minimum latency for any real-time effect in the computer is about 42 ms at 48 kHz if the size of the buffers is 1024 samples. Often it is much worse because the operating system adds more latency. If the equipment can handle a sample rate of 192 kHz, the same latency is about 10 ms. If the computer can handle smaller buffers, the latency can be lowered. With 256 samples per buffer the minimum latency will be about 11 ms at 48 kHz and 3 ms at 192 kHz.
So well prepared, so well presented, so little that could be removed without ruining it.
I aspire to do such good demos but always fall so short.
Without diversion via the questions embedded in the profession and the perception and self perception of the male geek working and academic world, I so rarely see a presenter reacting to a sense of apparent human warmth in the room, and beyond the lens, which even with the most encouraging assistance behind the lens, is genuinely hard to do. Hard enough that I think it is a classic contribution to the stereotypes of inflated ego newsreel presenters, which Hollywood loves to satirise, in my opinion because Hollywood is mocking, to their narrow and insecure view, a subspecies of acting which when done well, can so massively capture the greater audience than ever some most serious actors may manage to capture.
This is a bit more than a little bit of geek knowhow and applied thought, but I think many geeks by virtue of sheer analysis without a obstruction of a ego, could be handily outperforming the supposedly inherent talent they are "meant" to possess. It may be reaching well into "real serious" acting, very easily. I don't pretend to be a judge of that, but if acting abilities are "I know it when I see it", this is excellent acting indeed.
Edit, is not was, first line. A comma for clarity but later on.
The camerawork was excellent and the demonstration integration/trinket-wielding was seamlessly done. I get the impression the people who made this had a case of "let's use the ThinkPad for this, and let's do it this and that way," and they pulled it off so perfectly, exactly how engineers wish would happen.
If you ever need a reference demo for "open-source software can make good looking presentations," this would be on the shortlist, I think. (The credits say it was made using Cinerella, for reference.)
I have said the same thing. I was a film major, a video producer, and a tech writer (now a programmer), and I am in awe.
Check out also the making of:
The sound was done similarly to the previous video, which hadn't drawn comments. This time though, a sizable fraction of people said it was distracting.
It is the same with most "well-written" articles in newspapers nowadays. We do not even feel it but its just canned food, a lot of jelly and chemical taste enhancer but the amount of real meat inside is near zero.
The problem whenever somebody writes about digital audio, is that it is very tempting to hold on to sampling theory (Nyquist limit, etc) and totally discard the problems of implementing an actual Analog-Digital and Digital-Analog chain that works perfectly at 44100Hz sample rate.
I agree with the assesment that 16 bit depth is good enough; even 14 bit is good enough and was used with good results in the past (!). However, the problem is with the sampling rate.
> All signals with content entirely below the Nyquist frequency (half the sampling rate) are captured perfectly and completely by sampling;
Here lies the problem. This is what theory says, however, when using 44KHz sample rate, this means that to capture the audio you need to low-pass at 22KHz. And this is not your gentle (6, 12 or 24db) low-pass filter; no, this needs to be HARD filtering; nothing should pass beyond 22KHz. And this must be on the analog domain, because your signal is analog. To implement such a filter, you need a brickwall analog filter and this is not only expensive, but it also makes mess with the audio, either 'ringing' effects and/or ripple on the frequency response and/or strong phase shifts.
So on Analog-to-digital in 2017, converters should be operating at a higher rate (say, 192KHz), because this makes analog filtering of the signal much easier and without side effects.
Now, for Digital-to-Analog, if your sample rate is 44KHz, you have two alternatives:
a) Analog brickwall filtering, with the problems noted above
b) filtering on the digital domain + using oversampling
the article mentions:
>So the math is ideal, but what of real world complications? The most notorious is the band-limiting requirement. Signals with content over the Nyquist frequency must be lowpassed before sampling to avoid aliasing distortion; this analog lowpass is the infamous antialiasing filter. Antialiasing can't be ideal in practice, but modern techniques bring it very close. ...and with that we come to oversampling."
So they are mentioning alternative (b). The problem is that oversampling does not solve all problems. Oversampling implies that the filtering is done on the digital domain and there are several choices of filtering you could use, for example FIR (Finite Impulse Response), IIR (infinite impulse response), etc.
And each one of these choices have side effects...
In short, the problem is that with 44KHz sampling rate, your filter cutoff (22KHz) is too short to your desired bandwidth (20Hz-20KHz). Using a sample rate of 192KHz gives the DAC designer much more leeway for a better conversion. And CONVERSION is the key to good digital sound.
>What actually works to improve the quality of the digital audio to which we're listening?
It is interesting that the author mentions things such as "buying better headphones" (agree), but he never mentions "Getting a better Digital to Analog converter", which is highly important !!
On the other hand, he backs up his claim that "44KHz is enough" with an interesting AES test i was already aware of in the past:
>Empirical evidence from listening tests backs up the assertion that 44.1kHz/16 bit provides highest-possible fidelity playback. There are numerous controlled tests confirming this, but I'll plug a recent paper, Audibility of a CD-Standard A/D/A Loop Inserted into High-Resolution Audio Playback, done by local folks here at the Boston Audio Society.
This is a very interesting paper, and I did have the copy, however the test equipment should be checked. There are systems and better systems. The AES paper cited above had the particularity that the ADC and DAC used were provided by exactly the same machine (a Sony PCM converter), with the same strategy: no oversampling, brickwall analog filters. I can bet (99% sure) that the brickwall filters were identical on the ADC and the DAC on that machine; Murata-brand filters in a package.
The devil, as they say, is in the details.
> Oversampling is simple and clever. You may recall from my A Digital Media Primer for Geeks that high sampling rates provide a great deal more space between the highest frequency audio we care about (20kHz) and the Nyquist frequency (half the sampling rate). This allows for simpler, smoother, more reliable analog anti-aliasing filters, and thus higher fidelity. This extra space between 20kHz and the Nyquist frequency is essentially just spectral padding for the analog filter.
> That's only half the story. Because digital filters have few of the practical limitations of an analog filter, we can complete the anti-aliasing process with greater efficiency and precision digitally. The very high rate raw digital signal passes through a digital anti-aliasing filter, which has no trouble fitting a transition band into a tight space. After this further digital anti-aliasing, the extra padding samples are simply thrown away. Oversampled playback approximately works in reverse.
> This means we can use low rate 44.1kHz or 48kHz audio with all the fidelity benefits of 192kHz or higher sampling (smooth frequency response, low aliasing) and none of the drawbacks (ultrasonics that cause intermodulation distortion, wasted space). Nearly all of today's analog-to-digital converters (ADCs) and digital-to-analog converters (DACs) oversample at very high rates. Few people realize this is happening because it's completely automatic and hidden.
The main point of the article is to argue that storing or transmitting music above 16-bit, 48 kHz is wasteful and potentially harmful. It still fully condones using higher specs for audio capture, editing, and rendering.
Of course it is acceptable. Even 14 bit audio at 36KHz with a great DAC would be fairly nice, acceptable.
What the article claims is that 192KHz is useless, of no benefit. And i contend that it is of benefit when you want more than just good or acceptable performance. Not if you have a run of the mill DAC and OK headphones/speakers, but it is if you are a music lover and critical listener.
It doesn't matter if you're a music lover or critical listener!
The article claims that 192KHz downloads are of no benefit. It's right there in the article's title. It's difficult to not accuse you of willfully misinterpreting his argument.
>So they are mentioning alternative (b). The problem is that oversampling does not solve all problems. Oversampling implies that the filtering is done on the digital domain and there are several choices of filtering you could use, for example FIR (Finite Impulse Response), IIR (infinite impulse response), etc.
And each one of these choices have side effects...
Citation needed here, oversampling solves virtually all the problems + with modern DSP the FIR filters can be made extremely good. the induced noise of modern adc/dac's is seriously tiny, and swamped by the recording noise of your audio.
That is no reason to store or even process your music at higher rates, though.
You are describing alternative (b) i mentioned above: digital filtering plus oversampling. This also isn't without side effects.
Oversampling a 44KHz signal is not the same as having a 192Khz material to start with. Very different.
In practice, signal quality issues are usually layout and supply issues, not problems with the converter itself.
In practice, the speakers and ears are the worst parts with the largest non-linearities and a frequency response that looks like a nuke test area compared to the FR of the converter. Of course, in the case of speakers, we have concluded that we want them to have their own sound, because neutral speakers are not possible.
(I tend to avoid discussing the other critical parts — original recording quality and the listener's ears — because these are often immutable constants.)
Consider a dog that lives with a musician who plays, for example, trumpet. The musician plays the trumpet at home to practice, and also records his practices to review.
A trumpet produces significant acoustic energy out to about 100 KHz . When the musician plays live the dog hears a rich musical instrument. When the musician plays back his recordings, half the frequency range that the dog could hear in the live trumpet will be gone. I'd imagine that this makes the recorded trumpet a lot less aesthetically pleasing to the poor dog.
I only say this because most people are happy listening to music on CDs but when in the presence of a live band (eg an orchestra) it is suddenly obvious how incredibly loud it is. My brother is a drummer and I find it incredibly loud; I am a bass player and I don't play loud although he sometimes complains that there's "too much bass". Perhaps we just go deaf in our relative audio spectrum.
Both those huge efforts has gone into controlling the humanly audible part of the sound. Whatever sound is accidentally produced in other frequencies is probably at best aesthetically neutral, but more likely distracting.
Though my guess trumpeting is just noise to dogs either way.
Then I'll be mighty glad we made all these high-res recordings.
What if, this actually becomes possible, but we discover that because we previously couldn't hear these frequencies, our instruments and equipment are HORRIBLY mis-tuned and sound terrible? We may end up having to re-record tons of stuff.
Something something premature optimization. And part of me is glad that the art of hand-making instruments is not yet lost; we might need the originals in the future.
Disclaimer: I say this as a completely naive person when it comes to instruments. The answer to this may be "if it wasn't built to resonate at frequency X, it won't by itself," which would be a good thing.
There's another effect that comes into play, though. There's a minimum pitch separation between simultaneous notes that we expect, and when notes are closer than that, they clash. That separation is usually around a minor third (~300 cents) in most of the human hearing range, but in the bass it's a lot wider, and in the high treble it's smaller. That's why you can play two notes a major second apart (~200 cents) on a piano in the high treble and it sounds okay, but down in the low bass it sounds muddy if they're closer than about a major third or perfect fourth (~400-500 cents). So, if we extrapolate into higher frequency ranges, then it's not unreasonable to expect that we would be able to interpret musical intervals that are a lot closer than 200 cents as consonant.
It's also possible that the minimum note separation thing is just an artifact of how our ears physically work, and that an artificial ear would have no such limitation. Which could open the possibility of enjoying kinds of music that we can't currently imagine as pleasant with our normal ears.
And if they were (such as in 96 kHz hi-res audio), you could just run it through a low-pass filter to strip off the higher frequencies.
And... heh, using a filter to strip out the audio we used all that extra filesize to deliberately store. Haha. :)
The good news is that you strip out all of this robot propaganda and still hear the exact same music, simply by encoding at a reasonable rate.
Have we asked the dogs?
The frequencies there are no longer musical in the sense of being recognized as musical notes.
That is to say, although in a pure arithmetic sense, frequency 148080 is an A, since it is a power of two multiple of 440; five octaves above it, we don't hear it as an A tone.
The frequencies in that ballpark just have to be present to add a sense of "definition" or "crispness" or "air" to the sound.
In fact, this can be faked!
In the OPUS codec, there is something called SBC: spectral band compression. What this refers to is basically a hack whereby the upper harmonics of the signal are completely stripped away, and then re-synthesized on the other end based on a duplicate of the lower harmonics or something like that. The listener just hears the increased definition.
The thing is, the higher sample rate data doesn't actually have a lot of the higher components after 20 kHz.
What the faster sample rate allows is to use a less aggressive filter.
Instead of a "brick wall" filter that rapidly cuts off after around 20 kHz, one with fewer poles can be used which rolls off more gently.
The higher sample rate ensures that there isn't any aliasing.
192 kHz audio does not reproduce flat up to 90 kHz.
(I'm going to gloss over the microphone used on the trumpet, or the loudspeakers which ultimately reproduce it.)
However, all musicians I know use these high-rez formats internally. The reason for that, when you apply audio effects, especially complex VST ones, these discretization artifacts noticeably decrease the result quality.
Maybe, the musicians who distribute their music in 24/192 format expect their music to be mixed and otherwise processed.
Not 192 kHz; no friggin' way.
Repeated processing through multiple blocks at a given sample rate does not produce cumulative discretization problems in the time domain; it's just not how the math works.
Both your inputs (ADC) and outputs (DAC) are fixed-point. Why would you want to use a floating point in between? Technically, 64-bit floating point format would be enough for precision. But that would inflate both bandwidth and CPU requirements for no value. 32-bit floating point ain’t enough. Many people in the industry already use 32-bit integers for these samples.
> Not 192 kHz; no friggin' way.
I think you’re underestimating the complexity of modern musician-targeted VST effects. Take a look: https://www.youtube.com/watch?v=-AGGl5R1vtY
I’m not an expert, i.e. I’ve never programmed that kind of software. But I’m positive such effects are overwhelmingly more complex than just multiply+add these sample values. Therefore, extra temporal resolution helps.
BTW, professionals use 24bit/192kHz audio interfaces for decades already. E.g. ESI Juli@ was released in 2004, and that was very affordable device back then.
> Why would you want to use a floating point in between?
Because 32-bit float has enough mantissa bits to represent all 24-bit integer fixed-point values exactly, so it is at least as good.
Because 32-bit float is friendly to vectorization/SIMD, whereas 24-bit integer is not.
Because with 32-bit integers, you still have to worry about overflow if you start stacking like 65536 voices on top of each other, whereas 32-bit float will behave more gracefully.
Because 32-bit floating-point audio editing is only double the storage/memory requirements compared to 16-bit integer, but it buys you the ultimate peace of mind against silly numerical precision problems.
If you quiet the amplitude by some decibels, that is just decrementing the exponent field in the float; the mantissa stays 24 bits wide.
If you quiet the amplitude of integer samples, they lose resolution (bits per sample).
If you divide a float by two, and then multiply by two, you recover the original value without loss, because just the exponent decremented and then incremented again.
(Of course, I mean: in the absence of underflow. But underflow is far away. If the sample value of 1 is represented as 1.0, you have tons of room in either direction.)
Fixed point arithmetic is non-trivial and not well supported by CPU instruction sets.
(Hint: you can't just use integer add/multiply.)
> I think you’re underestimating the complexity of modern musician-targeted VST effects. I’ve never programmed that kind of software. But I’m positive such effects are overwhelmingly more complex than just multiply+add these sample values. Therefore, extra temporal resolution helps.
Indeed, many audio effects require upsampling to work well with common inputs, e.g highly non-linear effects like distortion/saturation or analog filter models.
However usually they perform upsampling and downsampling internally (commonly between 2x-4x-8x).
While upsampling/downsampling is expensive (especially if you are using multiple of these types of plugins) its not clear if running at a higher sample rate across the board is worth it just to save those steps.
But it's not resolution, right? It's extra frequencies outside the audible range. Is there any natural process that would make those affect the audible components, if I were listening to the music live instead of a recording?
If a sonic and ultrasonic frequency are combined together, but a low pass filter doesn't pass the ultrasonic one, the ultrasonic one doesn't exist on the other end.
Hence, there can be no beat.
The main reason is that it solves clipping in the pipeline.
Because if you don't you accumulate small errors at each processing step due to rounding. Remember that it is very common for an input to pass through multiple digital filters, EQs, some compressors, a few plugins, then to be grouped and have more plugins applied to the group. You can end up running the sample through hundreds of equations before final output. Small errors at the beginning can be magnified.
Pretty much all pro-level mix engines use 32-bit floating point for all samples internally. This gives you enough precision that there isn't a useful limit to the number of processing steps before accumulated error becomes a problem. By all samples I mean the input comes from a 24-bit ADC and gets converted to 32-bit FP. From that point on all plugins and processes use 32-bit FP. The final output busses convert back to 24-bit and dither to feed the DAC (for higher-end gear the DAC may handle this in hardware).
As for 192 kHz I've never seen or heard a difference. Even 96 kHz seems like overkill. A lot of albums have been recorded at 48 kHz without any problems. As the video explains there is no "missed" audible information if you're sampling at 48 kHz. I know that seems counter-intuitive but the math (and experiments) bear this out.
An inaccurate but intuitive way to think about it is your ear can't register a sound at a given frequency unless it gets enough of the wave which has a certain length in the time domain (by definition). If an impulse is shorter than that then it has a different frequency, again by definition. 1/16th of a 1 kHz wave doesn't actually happen. Even if it did a speaker is a physical moving object and can't respond fast enough to make that happen (speakers can't reproduce square waves either for the same reasons - they'll end up smoothing it out somewhat). Even if it could the air can't transmit 1/16th of a wave - the effect will be a lower-amplutide wave of a different frequency. And again your ear drum can't transmit such an impulse (nor can it transmit a true square wave).
I've done a lot of live audio mixing and a little bit of studio work, including helping a band cut a vinyl album. Fun fact: almost all vinyl is made from CD masters and has been for years. The vinyl acetate (and master) are cut by squashing the crap out of the CD master and applying a lot of EQ to shape the signal (both to prevent the needle from cutting the groove walls too thin), then having the physical medium itself roll off the highs.
The only case where getting a 24-bit/192kHz recording might be worthwhile is if it is pre-mastering. Then it won't be over-compressed and over-EQ'd, but that applies just as well to any master. (For the vinyl we cut I compressed the MP3 version myself from the 24-bit 48 kHz masters so they had the best dynamic range of anything: better than the CD and far better than the Vinyl).
But no, musicians aren't releasing things at ultra-resolutions because they expect others to reuse their work. The ones that are, are providing multitracks.
That isn't entirely true. e.g. It's common for an audio DSP to use fixed point 24bit coefficients for an FIR filter. If you're trying to implement a filter at low frequency then there can be significant error due to coefficient precision, that error is reduced by increasing the sampling rate.
It can be useful to run your signal processing chain at a higher rate because many digital effects are not properly bandlimited internally (and it would be pretty CPU hungry to do so).
But that doesn't mean you need to store data even that you'll process later at 192KHz though it might be easier to do so.
Oversampling in studio recording is mostly about eliminating aliasing in software that's producing distortion, and it's only relevant in that context: I don't think it's nearly so relevant on, say, an EQ.
That's an anthropocentric statement.
I want to leave it at that for now just to see if I get dinged for low effort.
Anyway, check out the frequency range for other animals:
Notice that there are a number of species whose range extends well past 20kHz. Even with 192kHz you're still chopping off the upper end of what dolphins and porpoises can hear and produce.
So please convince Apple and friends that you need 200+kHz to truly capture the "warmth" of Neil Young's live albums. Then we'll be able to crowdsource recording all the animals and eventually synthesize the sounds to communicate with them all.
Maybe then we can synthesize a compelling, "We're really sorry, we promise we're going to fix all the things now," for the dolphins and convince them not to leave. :)
All this comes at a high computational and storage cost though.
I personally use 44khz 24bit settings for DAW use.
Music is different. If you have the original multi-track composition, you can’t reproduce the result unless you also have the original DAW software (of the specific version), and all the original VST plugins (again, of the specific version each).
All that software is non-free, and typically it’s quite expensive (esp. VSTi). Some software/plugins are only available on Windows or Mac. There’s little to no compatibility even across different versions of the same software.
Musicians don’t support nor maintain their music. Therefore, DAW software vendors don’t care about open formats, interoperability, or standardization.
BTW, I'm merely a dilettante when it comes to recording and especially mixing.
I know and understand how incorrect down-sampling from high frequencies can cause distortion in the form of sub-harmonics in the audioable range.
I know about audible dynamic range and how many decibels of extra range 8-bits are going to give you.
I know all this, but I still have to admit: if there's a hi-res recording (24-bit, >48kHz) available for download/purchase, I'll always go for that instead of the "regular" 16-bit 44.1/48kHz download. I guess escaping your former audiophile self is very, very hard.
Anyone else want to admit their guilty, stupid pleasures? :)
I'm up to about 300 different models.
Deriving pleasure from listening to music has a large subjective component. So if I've paid more for a track and / or I got the most amount of bits I could I'll probably enjoy it more. Also makes for great conversation topics.
I also have some gear that aliases at a volume you can't hear, but when you plug it into an analog distortion pedal, the aliasing in the high frequencies becomes apparent. This would be avoided if it had a higher sample rate so the aliasing was far out of the audible range.
For other sorts of effects like spectral processing, pitch shifting, the extra detail above 22khz really does make a difference, especially if pitching down.
• Less quantisation noise, so your noise floor is a bit lower and therefore, when you're mixing stuff together, you accumulate less noise
• More numerical room to play with, you can scale or shift the volume up and down more without clipping and with less loss of accuracy
With 16-bit CD audio, you can't just convert to 24-bit and suddenly have less noise. You might get more room, though.
As for higher sampling rates (more kHz, if you will), I think Monty mentioned some benefit regarding less sharp aliasing filters (can have a larger transition band from 20–48kHz, say, rather than from 20—22.1kHz), but it's not something I understand the benefit of well.
Specifically, if 24/192 AAC is worth it compared to 16/44.1 AAC (and the answer to that is yes, although the answer to 24/192 WAV is no)
The point of higher sample rates isn't to capture bat squeaks, but to relax the engineering constraints on pre-post filtering.
Nyquist is fine in theory, and if you've never actually tried to implement a clean filter you'll likely watch the xiph video and think "Well that makes sense."
If actually know something about practical DSP and the constraints and challenges of real filter design you're not going to be quite so easily impressed.
Likewise with higher bit depths. "Common sense" suggests that no one should be able to hear a noise signal at -90dB.
Common sense is dead wrong, because the effects of a single bit of dither are absolutely audible.
And if you can hear the effects of noise added at -90dB, you can certainly hear the effects of quantisation noise artefacts on reverb tails and long decaying notes at -30 to -40dB, added by recording at 16 bits instead of 24 bits.
Whether or not that level of detail is present in a typical pop or classical recording is a different issue. Realistically most music is heavily compressed and limited, so the answer is usually "no."
And not all sources have 24-bits of detail . (Recordings made on the typical digital multitrack machines used in the 80s and 90s certainly don't.)
That doesn't mean that a clean unprocessed recording of music with a wide dynamic range made on no-compromise equipment won't show the difference clearly.
Speaking from experience, it certainly does.
 Technically no sources have 24-bits of detail. The best you'll get from a real world converter is around 22-bits.
What video? This thread is about an article.
> The point of higher sample rates isn't to capture bat squeaks, but to relax the engineering constraints on pre-post filtering.
I just said that, I think.
As Monty demonstrates, it's a fraudulent waste to try to sell the result as a product to the end listener.
The article is highly technical. Does anyone have a way to describe this phenomenon intuitively?
Therefore, rather than just being useless extra data, it can be actively harmful to the listening experience.
- 480p vs 2160p is measuring the resolution of your cell phone propped up on a pillow at the other end of your living room
- experimental evidence shows that your eyesight is not good enough to pick up on the increased resolution, you've maxed out your sensory perception
- Your phone stutters trying to stream at 4k so the playback might actually be worse.
When the high resolution image is downscaled poorly, some high (spatial) frequencies are aliased down into lower (spatial) frequencies, manifesting as blocky/jagged lines. The images are 2D data in the spatial domain, while the audio is 1D data in the temporal domain, but the aliasing of high frequency signal to lower frequency artifact is similar.
Viewing a high resolution image on a panel that has enough pixels to display it without scaling is analogous to listening to 192 kHz audio on a fantastic speaker system that can reproduce those high frequencies accurately, instead of causing distortion by aliasing them to lower frequencies. On the other side, viewing a high resolution image which has been downscaled poorly is analogous to listening to that 192 kHz audio on a realistic speaker system that cannot reproduce high frequencies, which results in those signals aliasing down into the audible range.
And as you say, there is a point where, for the viewer/listener's sake, it doesn't make sense to push for higher frequencies because even if you can build a panel/speaker that will faithfully reproduce those frequencies without aliasing, the eye/ear will not be able to perceive the additional detail.
Yet people have been reliably _unable_ to do this.
The gold standard in listening tests for this is an ABX where you are simply trying to show that you can discern a difference.
When properly setup and calibrated people are unable to show that they can distinguish 48kHz and 192kHz.
Moreover, by the numbers, audio hardware tends to work less well at higher rates if they're different, because running them at higher rates makes their own internal noise shaping less powerful. (Though for anything done well the distinction should still be inaudible).
Finally, if you do have cruddy converters that sound significantly different at different rates nothing stops you from using a transparent software resampler (SSRC is well respected for good reason) to put the audio at whatever rate you want.. until you get better hardware. :)
That would have to be the noise. The math doesn't lie...
When you play something stored at 44 khz, there are no ultrasonic sounds recorded.
At 192 khz instead, there are, and the speakers may push some of the ultrasonic sounds down to the audible spectrum, causing distortion that the human ear can hear.
Imagine the normal operation of a speaker as a swing, except you are pushing and pulling the swing all throughout the cycle as it goes up and down. Now, you can technically move the swing at a variety of frequencies if you’re holding onto it the whole time. However imagine as you push it back and forth (low frequencies), you also vigorously shake the swing at the same time (high frequencies). This would probably result in the chains rattling, similar to the unwanted distortions in the speakers caused by ultrasonic frequencies.
Another way to look at is that digital audio is not like digital imaging. There aren't pixels. Increasing the data rate does not continue to make the waveform more and more detailed to human auditory perception in the way that raising the pixel density does for human visual perception.
To describe it intuitively, forget your intuition that audio is like visual and start from "there is no metaphor between these two things."
Even analog imaging has concerns that are better described in the frequency domain than the spatial domain:
The difference is audio is "actually" bandlimited and frequency-based, but images are "actually" spatial. When you try using frequency math on images you get artifacts called ringing, which are easy to see with JPEGs or sharp scalers like Lanczos.
Of course audio isn't really frequency-based either, or else it would just be one sound repeated forever. So there's still ringing artifacts (called "pre-echo") and cymbals are the first to go in an MP3.
I.e. audio is sampled along one dimension, while images are sampled along two dimensions. Note that frequency-domain considerations play a crucial role in all optical design, including imaging sensors.
Early low data rate codecs - such as the one used for GSM mobiles - are obviously inferior, but still functional. I think a better analogy is that an iPhone 7 has a 1 megapixel screen, so there's no difference between a 1 megapixel image and a 5 megapixel image, except one is much larger. Of course visually you can zoom in (or move closer in real life), but audibly you can't.
For the mathematically inclined, this would probably be a good time to repeat: pixels are not little squares.
Digital distortion basically simulates high gain with soft clipping, which basically takes a narrow slice of the amplitude domain and magnifies it. The extra resolution has to be there for that not to turn into ugly sounding shit with artifacts.
1 / (2 * pi * bandwidth * 2 ^ bitdepth)
So for full scale 44/16 signal it is about 0.1 nanosecond.
More here . There are also some sample files for ABX tests there.
 - https://hydrogenaud.io/index.php/topic,108987.msg896449.html...
I haven't had time to check the linked article but if it's the one I'm thinking of I'm pretty sure it goes into this.
In the end though, double blind testing (again, I think mentioned in the article) shows that there's a threshold (a bit below 48kHz, probably why 44.1kHz was chosen for CD audio) after which people cant distinguish between higher sample rates at any more accuracy than randomly guessing.
If we keep that in mind, then the setup of the problem changes significantly and most arguments made here do not apply to the download itself.
320kbit CBR MP3s were... I'd say, generally OK if you did not plan on skewing their tempo much outside of a very small range (give or take 5% speed). Really bad artifacting becomes audible quickly beyond that range. Maybe it was placebo, but I also found differences between FLAC and 320kbit MP3 discernible when working on big, more powerful speakers.
But, with FLAC, it didn't matter if the sound was played at 10% tempo or extreme amplitude, audio was always crisp, clear, and free of compression artifacts (obviously).
On my laptop speakers or earbuds, no way would I be able to tell the difference between any of these today.
This particular article doesn't touch on compression as well, which is probably the biggest thing for remixing or DJing. You really want uncompressed or at least high-bitrate AAC.
Could you expand on this?
I don't hear a difference between various rips that are 16/128 or 24/192, but I have noticed a difference listening to the Blu-ray version of it (which is many-many gigabytes in size). It is a definitely interesting experience, but the way I can describe it is as an absolute absence of noise.
Every single version but this one exhibits noise at the start (the heartbeat sound) as the sound goes from very quiet to loud.
But, to be fair, it could just be different masters.
That explains why some people can tell the difference between 44.1 khz and the higher-resolution sampling rates. It also means that the ideal audiophile sampling rate is somewhere between 58khz and 60khz, not 192khz.
Say I've got an 18kHz wave and a 9kHz wave and the 9kHz wave is ever so slightly out of phase. Then imagine there are 10 different waves under 20kHz all interfering with each other in different wages.
Is it still possible to reproduce everything accurately?
And on bit-depth and dynamic range: Given that much audio doesn't use the full range available to it, wouldn't higher bit depth increase the fidelity in the range the audio does fill? The article talks about the bit depth and range only in terms of the maximum volume but what about fidelity? What's the minimum difference in volume the human ear can hear?
With a perfect DAC, and perfect filters, and expensive headphones, there is no difference between 16bit/44.1kHz WAV and 24bit/192kHz WAV (until you add an EQ, then there is).
With a usual home DAC, and its filters, and at best $100 headphones, there is a major difference between 16/44.1 AAC and 24/192 AAC.
(And guess what the article attacked? The decision of Apple to sell lossy files in 24/192 for home usage. The discussion was never about lossless files, or professional usage)
Sound is linear, so yes. Any signal can be represented correctly as a sum of frequencies.
> wouldn't higher bit depth increase the fidelity in the range the audio does fill?
When you do the maths quantisation turns into the original sound + quantization noise. The bit range determines the noise (dither helps here). So it's a question of whether the noise is loud enough to hear. There are samples on the internet you can try to determine yourself at which stage you can notice the noise. It's shocking how few bits are actually required.
For a sloppy recording or repeated processing the quantisation noise goes up, so more bits are required.
> It is also worth mentioning that increasing the bit depth of the audio representation from 16 to 24 bits does not increase the perceptible resolution or 'fineness' of the audio. It only increases the dynamic range, the range between the softest possible and the loudest possible sound, by lowering the noise floor. However, a 16-bit noise floor is already below what we can hear.
But I don't understand, why is this? Is the step in volume between two values fixed regardless of the bit depth? If so, why?
You can think of it as in the sampling rate provides a mathimatical solution to the waveform - it's simply 100% accurate all the time. The bit depth adds extra "buckets" to divide the volumes into . If you have 2 bits, you then have volume levels 0, 1, 2, 3.
Do you not hear anything? Yeah... all that sound is what amounts to lost fidelity when you down-sample. You can't just argue on bit-rates alone... It's about having head room in the mix and room for more fidelity. Sure, if you mix shit badly, you can't hear the difference, but that's missing the point entirely.
It's not nearly as bad as the time someone tried to tell me that Opus was an adequate audio file format for music, but still... frustrating.
I know I don't have super-human hearing, and I can easily hear the difference. Just because everyone can't hear the difference, or can't tell that they can hear the difference, doesn't negate the fidelity loss of lower bitrates.
Further, given a large enough speaker stack, it's not about what you can hear any more, it's about what you can feel.
And never-mind the benefits of a 24bit DAW pipeline... Hello, low latency?
The higher frequencies (hi hats for example) are mushy and
sound like they are warbling, and the sound just generally has a lack of depth. What is the explanation for this, or am I just imagining it?
Sirius XM might be one of the last to standardize on a reasonable LUFS setting of 12 to 16.
Bandwidth allocation is per channel, talk only channels are even worse and some words are barely comprehensible.
It's via bluetooth, and the source is Pandora. I even googled Pandora and pitch change, and found some forums with folks discussing bluetooth causing this to happen, which has never been my experience.
It occurred to me that slowing the playback (eg, causing the pitch to sag) could be a fantastic way of dealing with a connection that's too slow. It would be far better than stuttering, even though many of us would find the pitch change really annoying. The instructor and my wife simply can't hear it, or say "I thought that was just part of the song".
Anyway, has anyone ever heard of this happening? Do certain products do it? Were the Pandora forums wrong and this is actually a Pandora problem? Of all of the bluetooth problems I've had in my life, I've never experienced this before, so I tend to lean towards it being a data rate issue on the cell connection and Pandora slowing the music down.
Many Bluetooth speakers are not loseless. Cheaper ones are likely going to support MP3 and that's it (or even worse, just the minimum required SBC).
Pandora is already compressed, if your instructor doesn't know to flip the "HQ sound" toggle (which is turned off by default on Mobile, even with a paid subscription!), then 128kbit MP3 recompressed to MP3 yet again, is going to sound pretty atrocious.
Not sure if one of the other codecs used for A2DP can cause pitch changes though.
To your point, I would lean towards bluetooth as I've never associated the phenomena with Pandora. My friends and I mostly use Spotify.
I would expect the samples to be interpolated before the final DA conversion takes place, so no extra ultrasonics should be involved. And there would be filters for them anyway, we still have to use them for 44.1, 48, 96 etc.
Who knows, maybe technology or genetic changes will allow future humans to hear in a wider range than we do now. Those people might appreciate being able to hear what our music really sounded like.
In general, bandwidth is more of a concern than disk space. Most media people consume is never stored on the local device.
And if you do any downsampling, or any low pass filtering in general, then having 192kHz sampling is even more useless, in addition to only having the "benefit" of adding a frequency band to the spectrum that's completely inaudible anyway.
For markets with people that are more likely to care about sound quality, though, a much larger dynamic range is preserved. This is why the same album often sounds better on vinyl than on digital media. It has nothing to do with the media, it's the superior mastering that was consciously chosen.
The Wikipedia article on the Loudness War offers a good explanation.
* Well, technically, music with compressed dynamic has less entropy, so can be encoded at a lower bitrate without loss.
 See this database for example: http://dr.loudness-war.info
Is it that Spotify changes the dynamic for lower and high quality encodings of the songs?
I only use higher bitrates when creating music so when I mix or record I don't need to make the levels go almost into red to get a hot enough signal, I can do all kinds of changes like slowing down or up, distortion, EQ, compression without losing any perceived detail in the end. If I record on 16 / 44.1 then start manipulating the sound I start losing detail immediately.
But in the end more than 20 / 88.1 won't do much for you. 16 / 44.1 might be a bit less than ideal for music with a lot of dynamics but it's absolutely fine for most purposes.
Unfortunately I lost a lot of hearing through concerts, headphones, and traffic, but I can still tell the difference between an MP3 and CD-quality music much of the time.
I think you probably understand that people get locked into arguing for victory, and if you tell them you were tested with thus and so range (as a young person, which is plausible) they will simply call you a liar.
My own experience is this: when I was a kid, I got an ear wax problem, and had it removed (hasn't recurred). It was a horrible painful nightmare with nasty tweezers and water squirters, and I was just a little kid… but afterwards, sound (especially very high frequency sound) was a revelation.
Later, when I was a little older, the advent of digital audio (at first, in record albums) was a nightmare to me, because I couldn't understand how or why that stuff sounded SO BAD. And of course my early experience of CDs was pretty nightmarish: pod people music, with all emotion and humanity weirdly excised. That's what got me into audio: wanting to understand this, and then later, fix it.
I did actually succeed: I can produce and mix and process digital audio that young me would not be horrified by. But especially if I had to meet that higher bar, I won't be able to do it at less than say 22 bit/80K, well engineered. If I get to use all my current tricks I could do it at 20 bit/80K: I can cheat word length easier than I can compensate for a nasty brickwall filter.
24/96K is widely prevalent and enough, given good converters. I'm not convinced 192K is at all necessary, but the more people crusade against it, the more contrarian I get ;) I've got a Beauty Pill album mastered in 24/192K and it sounds freaking phenomenal. Mind you, I have professional equipment designed to handle that.
I also got glasses this year for the first time in my life, which the need for came on rather suddenly... So maybe it's linked with some kind of degeneration? I really can't say. I honestly didn't think to mention it to the doctor, but now I'm thinking maybe I should have.
And the "44.1KHz is enough" or "48KHz" is enough" people are, sadly, kind of dumb.
How do I know? Because I was dumb, too.
Being a coding/math/audio/video badass, after a few years of industry experience, I rattled off some mouthy kid quip, saying "well, I don't know why we do 24/96, since nobody can hear above 20K anyway..."
And a very talented, very knowledgable, and generally reserved engineer suddenly perked up one eyebrow and said, incredulously "because temporal and frequency response are inherently linked..."
That look was in 2001, and I still remember that feeling of dread sinking in, realizing that I had no idea what he was talking about, no concept of why that would matter. I knew about Nyquist and could write a quick FFT, but he'd spent four years getting a degree in pure audio engineering at the most selective program in the country. That look, which I completely remember today, was like a deeply disappointed parent after a kid has just been bailed out of jail, from one of the nicest engineers I know.
It was late, I was brash, and I pressed on, asking him what he meant. "What's a transient look like, spectrally?" he asked. He waited for my blinks to make audible sounds (due to the apparent hollowness of my head), then he asked "and how many channels of audio do we listen to?"
He watched me stand there, like a doofus, for what seemed to me like several minutes (probably 5-10 seconds), then he went back to coding.
It didn't hit me until weeks later, and I didn't really internalize until years later, that he was hinting at inter-ear phasing the other facilities of our auditory systems besides frequency response. Years later, I read up on Georg von Békésy's incredible work (including positional acuity), and I worked as a tech lead at Digidesign on the first-generation Venue live sound system which operated at 48KHz but with incredibly low latency (processing steps were 1, 2, or 16 samples) due to the requirements of, for instance, vocalists using in-ear monitors.
Along the way, I ran across Microsoft engineers who thought that ~10-20ms inter-channel timing consistency would be okay in Windows Vista (it wasn't), conducted blind tests between 96KHz and 44.1KHz audio (for people who were shocked to immediately notice differences), came across plenty of hot-shot kids who said exactly the same kind of stuff I'd said, and saw postings from xiph making a mix of valid and grossly sophistic arguments ranging from "here's how waveform reconstruction from regularized samples work" (good) to "audio equipment can't even capture signals beyond this range" (dumb). At times, I thought about setting up refutation articles, then I realized, like many, that I had actual work to do.
Von Békésy's work points to positional inter-ear phase fidelity of roughly 10µs. What's the sampling interval at 44.1KHz? >22µs? Good luck rebuilding that stereo field at 44.1...
The trick is that there is a really serious diminishing return on audio sampling rate. 4KHz to 8KHz is enormous... 8KHz to 16KHz is transformative... 16KHz to 32KHz is eye-opening... 32KHz to 48KHz is appreciable... 48KHz to 96KHz is.... pretty minor, especially in the age of crappy $30 earbuds, streaming services delivering heavily compressed audio that will be crammed into a BT headset that may or may not be connected with additional compression, and all of the convenience that those changes bring. You may detect it in some audio if you're really listening, if you know what to listen for, and it may present advantages in system design (converters, processing, etc). From a data-rate perspective, the low-hanging fruit has already been picked.
But people who smugly say that there is "no difference", that audiophiles are buying "snake oil", are letting their ignorance show, and that's including that kid that I was, 16 years ago.
I've since moved out of pure pro media to consumer devices, where precision takes a back seat to the big picture a lot of the time. When discussing an audio fidelity multi-channel problem with a possible vendor last year, I expressed my concern about the inter-channel timing assurance slipping from 1µs to 50µs in product generations. "Depending on the sampling rate, that's several samples of audio", I said.
A very senior engineer on our side (Director equivalent) quickly admonished me, saying "it's microseconds, not milliseconds", to which I said "I know... Which is why it's several samples, not several thousand..."
From the look on his face, I'm 100% sure that he didn't understand me at the time, but I hope he put it together eventually.
In the end, the industry has moved in the opposite direction of 24/192 for a long time. If we can get back to normalization of CD quality audio, I'll be happy.
The xiph-cited studies I've seen show an identification of difference and a preference for... MP3. Hey, we want what we want.
Otherwise, go read Von Békésy's work for the foundation, established in the pre-digital era, but transferable if you understand digital audio.
For recognition of difference in high res audio, see:
Reiss - https://qmro.qmul.ac.uk/xmlui/bitstream/handle/123456789/134...
...And the papers referenced.
It's an interesting meta analysis and a good survey of the last 20 years of publication on the subject.
If you have some properly controlled double blind trials that show no discrimination ability, I'd be happy to read them. I'll admit that I haven't conducted statistically sufficient tests. I have, however, double-blinded (via software pseudo-random sample randomization).
Like I said, though, I've got work to do. Do some listening. Read some papers.
192 and 320 usually refers to the bitrate for mp3 compressed audio, which indeed is something different. The mp3 compression removes more and more details from the original audio to fit "inside" this bitrate speed window, hence higher is better because less original content is removed. Only great ears can hear what was removed from a 320 stream.
common sample rates are 44.1 and 48 kHz and multiples: 96 kHz ... 192 kHz
Monty is wrong. To cover the range of human listeners, the required specs even through use of very insensitive double blind testing (which is geared to substantially indicate the PRESENCE of a difference between examples if that's present, and does NOT similarly indicate/prove the absence of a difference with a comparable degree of confidence: that is a horrible logical fallacy with realworld consequences) are more like 19-21 bit resolution at 55-75K sampling.
Beyond this, there's pretty much no problem (unless you are doing further processing: I've established that quantization exists even in floating point, which a surprising number of audio DSP people seem not to understand. There's a tradeoff between the resolution used in typical audio sample values, and the ability of the exponent to cover values way outside what's required)
That said, it is absurd and annoying to strive so tirelessly to limit the format of audio data to EXACTLY the limits of human hearing and not a inch beyond. What the hell? I would happily double it just for comfort and assurance that nobody would ever possibly have an issue, no matter who they were. Suddenly audio data is so expensive that we can't allow formats to use bytes freely? That's the absurdity I speak of.
Our computers process things in 32-bit chunks (or indeed 64!). If you take great pains to snip away spare bits to where your audio data words are exactly 19 bits or something, the data will only be padded so it can be processed using general purpose computing. It is ludicrous to struggle heroically to limit audio workers and listeners to some word length below 32 bit for their own good, or to save space in a world where video is becoming capable of 1080p uncompressed raw capture. Moore's law left audio behind years ago, never to be troubled by audio's bandwidth requirements again.
Sample rate's another issue as only very nearby or artificial sounds (or some percussion instruments, notably cymbals) contain large amounts of supersonic energy in the first place. However, sharp cutoffs are for synthesizers, not audio. Brickwall filters are garbage, technically awful, and expanding sample rate allows for completely different filter designs. Neil Young's ill-fated Pono took this route. I've got one and it sounds fantastic (and is also a fine tool for getting digital audio into the analog domain in the studio: drive anything with a Pono and it's pretty much like using a live feed). I've driven powerful amplifiers running horn-loaded speakers, capable of astonishing dynamic range. Total lack of grain or any digital sonic signature, at any playback level.
My choice for sample rate at the extreme would be 96K, not 192K. Why? Because it's substantially beyond my own needs and it's established. I'm not dissing 192K, but I wouldn't go to war for it: as an output format, I would rather leave the super high sample rate stuff to DSD (which is qualitatively different from PCM audio in that the error in DSD is frequency-sensitive: more noise in the highs, progressively less as frequency drops).
Even with DSD, which is known to produce excessive supersonic noise even while sounding great, the scaremongering about IM distortion is foolish and wrong. If you have a playback system which is suffering from supersonic noise modulating the audio and harming it, I have three words you should be studying before trying to legislate against other people's use of high sample rates.
"Capacitor", and "Ferrite Choke".
Or, you could simply use an interconnect cable which has enough internal capacitance to tame your signal. If you have a playback system that's capable of being ruined just by 192K digital audio, your playback system is broken and it's wrong to blame that on the format. That would be very silly indeed.
I hope this has been expressed civilly: I am very angry with this attitude as expressed by Monty.
The accuracy is limited by the combination of sample rate AND word length: any alteration of the sample's value will also shift the position of the transient in time.
But since the 'timing' issue is a factor of reconstruction, you can improve the 'timing' of transients at 44.1K by moving from 16 to 24 bit. The positioning of samples will be a tiny bit more accurate, and that means the location of the reconstructed wave will be that much more time-accurate, since it's calculated using the known sample positions as signposts.
Positioning of high frequency transients does not occur only at sample boundaries, so that alone isn't an argument for high sample rates. You can already place a transient anywhere between the sample boundaries, in any PCM digital audio system. The argument for higher sample rates is use of less annoying filters, and to some extent the better handling of borderline-supersonic frequencies. For me, the gentler filters is by far more important, and I can take or leave the 'bug killing' super-highs. I don't find 'em that musical as a rule.
Edit: And the linked "Show & Tell" video is a great way to get some "intuition" about the sampling theorem. https://video.xiph.org/vid2.shtml
I mean it references Steve Jobs in the present ffs.
If you look at the angular resolution of the eye, unless you are sitting very close to the screen, you can't resolve 4k video.
Most living-room TVs are probably placed at a 15- to 25-degree angle of view. 1920x1080 is enough for up to a 32-degree horizontal angle of view, which is 7 feet away from a TV with a 55" diagonal, for example. I will say, however, that movie theaters look a little better with 4K, since 30 degrees is supposed to be the back row, and the front row might be 60 or so.
I use a 65" 4k TV with my HTPC in the living room, but often do desktop-ish things on it. (Logitech's wireless keyboards are great).
The difference in resolution between 1080 and 2160 is huge. 1080 is just fuzzy.
Sure, if you're looking at a little 40" TV five meters away, not so much, but for many screens in common use 4K is useful (but getting close to the point of diminishing returns, sure). Whereas as this article says, higher than 48kHz audio should be literally impossible to discern for any human in any setup.
So, you have to up-sample the songs to high rates with less bits, like 1bit to 6bits, then do the conversion, and get the best SNR you can.
In this sense, there's simply a lot of advantage of using 24/196 since the above conversion can result in less loss and higher SNR
as engineers we will never solve this problem as long as the "44.1kHz is good enough" dogma is perpetuated.
here's a question. why are frequency and bit depth the only two variables under discussion here? how does the human ear locate a sound in space? suppose I place a series of 20kHz tone generators along a wall (and that I can still hear 20kHz :) and trigger them at different times, and record the session in stereo at 44.1kHz with a standard X-Y mic setup. will I be able to reconstruct the performance?
It's the opposite. We are never going to solve this problem if we are going to focus on things that have nothing to do with the problem. Compare and contrast:
>as engineers we will never solve this problem as long as the "copper wires are good enough" dogma is perpetuated
Also, please read the article. The author specifically lists advances in audio tech they think are worthwhile to consider, such as surround sound. This actually addresses the problem you mentioned (reproducing the live performance) and the question you asked, i.e.
>here's a question. why are frequency and bit depth the only two variables under discussion here?
They are not, at least not in the article. Here it's because that's what's in the title, and not everyone gets to the end of the article.
Some comments do talk about the importance of having a good DAC for a good sound.
Of course it does. It’s not meant to provide VR.
Same thing with sampling and bit-depth. Those address digital encoding of analog signals. They have nothing to say about speaker design, number of audio channels, room acoustics, or the myriad other factors that go into replicating a live stage performance.
And you haven't answered my question about the array of 20kHz tone generators. In fact, NOBODY has, and yet the question has been down-voted! How is that even possible? Posing a novel experiment which might invalidate the populist view considered harmful?
TFA's author is not active in the field of advancing man's ability to recreate live music more convincingly, AFAIK; he writes codecs. He believes people shouldn't purchase 192kHz downloads. He's certainly right that most consumers won't be able to tell the difference with their current equipment. But he makes no mention of the interaural time difference in human auditory perception, so he's already not telling the whole story. There is more to learn here, folks, and down-voting a question is an embarrassing failure of these forums. Why aren't posts in support of music piracy down-voted (read above)?
I imagine a pair of microphones inserted into the ear canals of a dummy head should be able to capture what a real person sitting there would. Once the signals are captured, and assuming perfect frequency response of the microphones and noiseless transfer of the signals to an ADC, 44.1kHz would absolutely be enough to perfectly encode the 20Hz frequencies.
I put emphasis on the frequency response of the microphones. They’d have to match the frequency response of the human ear. Meaning they would not capture ultrasonics, just like our ears don’t.
I am less sure of the math behind bit-depth and how it relates to our own dynamic range. I also agree that if you intend to transform the recording, mix it with others, etc, then go ahead and encode at higher bitrates and even with a higher frequency (both mic and ADC sampling). But the final product, being sold for direct listening, need not be sampled at a rate that’s beyond our hearing. No more than a video recording should be able to encode colors in the ultraviolet spectrum (A better analogy than my previous one)
As your other questions have been addressed by others, I simply would like to point out that this seems to be quite an arrogant stance to have.
The development of codecs has a lot to do with understanding of how the humans perceive sound, and how to effectively encode and reproduce sounds - which is useful even if you personally never listen to anything but analog recordings on analog systems.
However, we do live in a digital world, and one where codecs are a necessity. Codecs made recording, sharing, and distributing digital media at all possible - and now, they are making it possible to create better recordings by any metric you choose.
Consider this: bandwidth and space-saving that codecs give you allows you to record more data with the same equipment at the highest settings. That's why I don't have to think if I'll run out of memory I want to record 4-channel surround sound on my Zoom H2N (something that definitely goes towards a more faithful reproduction of being there than, say, bumping the frequency to 192kHz, which, incidentally, is the point of the article).
Unless you are there to record every live show, we'll have to rely on other people doing that - and guess what, they'll use codecs! How do I know that - that's because I do, they do, and the absolute majority of live show recordings that I've seen were not available in lossless formats. For that matter, good codecs contribute directly to the quality of the sound you'll hear.
Therefore, advancing the codecs does advance man's ability to recreate live music more convincingly.
So please, pause before dismissing other people's work.
>But he makes no mention of the interaural time difference in human auditory perception
He also doesn't mention how long it would take from Earth to Mars on a rocket, or the airspeed velocity of an unladen swallow. If you want to make a claim that this is somehow relevant to the question, you need to argue why, with sources - or simply ask the author, who might just answer.
>There is more to learn here, folks, and down-voting a question is an embarrassing failure of these forums. Why aren't posts in support of music piracy down-voted (read above)?
Not all questions are created equal. Your last question is an example of one that rightly deserves to be downvoted, as it contributes nothing to the discussion (of whether 192Khz really does anything for us), appeals to emotion, and derails the conversation off the topic. Please don't do that.
Only where bandwidth and storage are constrained. If we're trying to push the state of the art, it's not going to be with a Zoom H2N.
The best music reproduction systems use lossless compression. Psychoacoustic compression does NOT get us closer to the original performance. I'm stating this as someone who gets 5 out of 5 correct, every time, on the NPR test:
(I'm ignoring the Suzanne Vega vocal-only track due to both its absence of musical complexity and use as test content during the development of the MP3 algorithm.)
While I appreciate xiphmont's codec work, I am dismissive of his open attempt to steer research and commerce in this area.
Why is his article posted as "neil-young.html"? Is that really fair?
> If you want to make a claim that this is somehow relevant to the question, you need to argue why, with sources - or simply ask the author, who might just answer.
Please see chaboud's excellent post above, referencing the work of Georg von Bekesy.
> Your last question is an example of one that rightly deserves to be downvoted
You're referring to my array-of-20kHz-tone-generators experiment? Sorry I don't know the answer, but I haven't done the experiment myself; I was hoping someone here had! Where's the appeal to emotion, though? If the experiment shows a higher sample rate is necessary (that's the whole point of the experiment) it's germane.
I.e. everywhere in this universe. There is not such thing as unlimited bandwidth/storage. Gains that codecs give allow us to record information that otherwise would be lost.
>If we're trying to push the state of the art, it's not going to be with a Zoom H2N.
I wish I could see the future so clearly!
I only have guesses, and my guess tells me that audio captured from 10 Zoom H2N's at 48kHz will store more information than audio from a single microphone at 480kHz. Current "state of the art" seems to use fewer channels. An advance in the state of the art in the direction of utilizing more sources seems more than feasible to me.
>Psychoacoustic compression does NOT get us closer to the original performance
I think you have missed my point. An uncompressed source is obviously not going to be better than the lossy-compressed data.
However, we do not live in a world of infinite resources. Given the constraints, compression offers new possibilities.
At the same space/bandwidth, you can have, e.g.:
- uncompressed audio from a single source
- compressed audio from 5x many sources
- compressed audio from 2x sources, plus some other data which affects the perception of the sound (???)
This plays right into your question "Why are we only considering bitrate/frequency?" - we don't. Compression offers more flexibility in making other directions viable.
This is why I believe that codec research is important for advances of the state of the art.
>I am dismissive of his open attempt to steer research and commerce in this area.
In what area exactly? What research? He is not "steering research", he is educating the less knowledgeable general public. So far, your dismissive attitude can also be applied verbatim to anyone who explains why super-thick-golden-cables from MonstrousCable(tm) are a waste of money.
>> Your last question is an example of one that rightly deserves to be downvoted
>You're referring to my array-of-20kHz-tone-generators experiment?
No, I was referring to this:
>Why aren't posts in support of music piracy down-voted (read above)?
The problem is that many readers of neil-young.html will come away thinking they understand human hearing and digital sampling, when in fact the article is far too sparse on details to understand either; there is no discussion of how sounds are located in 3D space, or of how phase information is recovered. It is amazing that you can completely cover one ear, rub your fingers together behind your head and precisely pinpoint where your fingers are. It is also amazing that "Sampling doesn't affect frequency response or phase" but xiphmont doesn't explain this at all.
And then there's this lovely quote:
"It's true enough that a properly encoded Ogg file (or MP3, or AAC file) will be indistinguishable from the original at a moderate bitrate."
which is provably wrong. I can very reliably pick the uncompressed WAV each try when compared against 320kbps MP3.
My attitude is in support of furthering research in the area of live sound reproduction. As I've said, we are VERY far away right now. It is foolish to believe we understand human musical perception completely today. We cannot even replicate a simple cymbal strike with today's recording and playback technology.
I would encourage the curious to stand in the center of an outdoor arc of 100 horn players, like this (feel free to skip first 48 seconds):
Once you experience that live, try to figure out how to replicate the input to your two ears. You can't, without 100 brass players.
Interestingly, these two examples of trumpet and cymbal have significant ultrasonic frequency content:
I don't believe it's a coincidence.
The microphones would probably be the bottleneck in reproducing the sound. If your microphone setup doesn't perfectly model the ears of the listener (with respect to how the headphones are worn and their frequency response), you're not going to be able to plausibly reproduce the whole sound field using a stereo recording. That has little to do with sample rate, though.
That being said, I'm using quite a bit less compression than the loudness-war-type mastering that is all too typical with pop music.
Yes, I have. With the right combination of speaker setups and hi fi recordings, it is possible to fool yourself into believing there are musicians there.
That has a lot more to do with the spatial component of the audio than anything else.
Unfortunately, surround sound sufficient to really reproduce acoustic fields (and not just sound effects ping ponging around) require more cost and concessions in the listening room than many are willing to tolerate.
So long as people continue to get the engineering wrong and think the sampling rate and bit-depth have anything to do with it we'll probably continue to see the market invest in the wrong solutions.
So, let’s look at a similar issue with video. Your display is likely only 720p, or 1080p, but a 4K video on youtube will still look a lot better, although technically it should have no visible difference.
But the reality is, we don’t get uncompressed video, or uncompressed audio.
We have a choice between audio or video compressed with lossy codecs, at 16bit/44kHz or 16bit/96kHz, or 4:2:0 video at 1080p or 4:2:0 video at 4K.
And just like you need 4K 4:2:0 mp4 video to get even close to the quality of uncompressed 1080p 4:4:4 video, you also need far higher sampling rate and depth of highly compressed audio to get a quality similar to 16bit/44.1kHz PCM.
That’s the real reason why 24bit/192kHz AAC downloads were even discussed.
However, if you have a 20 Mbps budget for the 4k to account for having 4 times as much original data, then there shouldn't be much of a difference in the downsampled 1080p video (ignoring peculiarities of the codec).
All this is not very relevant to the audio issue being discussed. It would be relevant if it were physically impossible to perceive the difference between 1080p and 4k video, and if watching 4k video potentially caused optical illusions. In that case, the only reason to prefer the 20 Mbps 4k stream would be if you planned to edit, mix, or zoom around in the video instead of simply watching it.
When it comes to audio, since size isn't as much of a concern as video, in most cases I would say "maybe I'll want to edit it someday" is strong enough reason to get the 24/192 material at a correspondingly high bitrate if it's available.
It’s all about peculiarities of the codec!
The issue at hand is apple selling 24bit/192kHz versions of lossy AAC compressed files, compared to 16bit/44.1kHz versions of AAC files.
And the issue I was comparing with video was the same – with video, codecs enforce Chroma Subsampling, where the resolution for color is half that of the actual imagery.
In the same way, AAC and mp3 heavily reduce the bandwidth for the upper half of the frequency spectrum, spending like 90% of their available bandwidth on the lower half (with 44.1kHz, they prioritize the range between 4 and 8kHz, specifically, where speech is).
The entire topic is if using a codec that specifically cuts away the lower and upper parts of the frequency spectrum, increasing the frequency spectrum can improve quality. And yes, it does. Apple is selling AAC, not WAV. Which makes the entire article useless.
Yes, we should all focus on replacing 16bit/44.1kHz AAC with 16bit/44.1kHz FLAC instead of 24bit/192kHz AAC, and we all should focus on replacing 4:2:0 1080p mp4 with 4:4:4 1080p mp4 instead of 4:2:0 4K mp4 (the chroma subsampling issue I mentioned). But that’s not the reality we live in, and given the choice between 16bit/44.1kHz AAC and 24bit/192kHz AAC, I’ll choose the second.
Same for video. YouTube very likely allocates more bandwidth for 4k video than would be required for equivalent quality at lower resolutions, assuming that you are interested in higher overall quality (e.g. less artifacts) if you go through the trouble of 4k video. That's a conscious choice, not a technical necessity.
Basically, mp4 and webm video only encodes the brightness channel Y at full resolution, and the color channels Pb Pr at half resolution. You can’t have mp4 or webm video without Chroma Subsampling, it’s defined in the codecs standard.
Audio codecs do something very similar, cutting off a large percentage of the higher and lower frequencies. mp3 (and AAC) for example allocate almost the entire space for the frequencies around 4 to 8kHz, and drops then drops a certain percentage of the upper frequencies entirely.
The article talks about uncompressed audio, but the topic he responds to is Apple choosing to sell 24bit/192kHz AAC lossy compressed audio. The author of the article is responding to a business decision, which has nothing to do with the actual topic of the article.
Not at all, bitrate differences are audible.
> And just like you need 4K 4:2:0 mp4 video to get even close to the quality of uncompressed 1080p 4:4:4 video, you also need far higher sampling rate and depth of highly compressed audio to get a quality similar to 16bit/44.1kHz PCM.
Even with the same bitrate, you’ll need 4K video to get quality comparable to uncompressed 1080p.
For example, mp4 defines that the brightness channel Y should be stored with full resolution, and the color channels Pb Pr should be stored at half resolution. So in a 4K mp4 video at lossless bitrate, the actual colors will only be stored at 1080p. This functionality is called Chroma Subsampling.
Audio Codecs do something very similar, causing these exact issues.
Of course, cross-channel intra prediction would work better but 4:2:0 is pretty good quality considering you can throw out 3/4 the pixels.
The same issue happens with audio. Nice theory, completely broken realistic implementations.