I’m not sure how big that market is these days, though. I recently decided to move from a block in the city to a house in the surrounding countryside purely to get a quieter listening environment and let me really enjoy the high-dynamic-range recordings I have – I listen to a lot of classical music, especially the avant-garde with its even greater range, e.g. the Ligeti cello concerto starting from pppppp. Yet even among my friends who are really obsessed with such music, seeking a better listening environment seemed an extreme measure to take.
So, people who a generation ago would have invested in higher-end equipment (not audiophile snake oil, just better speakers) and who would have sought silence are now giving in to listening to music on their phones or cheap computer speakers. It’s a big shift.
Music streaming services have since become a very important music listening medium. Spotify, Google Music, Apple Music and others do normalization of these loudness levels. This neutralizes the loudness wars as it makes loudness wars treatment of music useless, at least when listened on streaming services . Music mastered in the loudness war will still have problems with dynamic range but the perverse motivations causing the war are simply not as relevant for new music.
I agree the loudness war was a huge problem a few years ago, but changing trends in the ways music is being listened to is solving it. The way you present the loudness war problem is therefore somewhat out of touch.
It is especially noticeable if one listens to "modern" vs older CDs vs the radio. I suppose Producers now feel compelled to exercise subwoofers and thump the audience, or compensate for crappy earbuds in the target audience, or maybe that's just what people want because even radio commercials are complicit -- it's difficult to listen to someone talk with "thump/thump/thump" drumbeats in the background. Often, I simply turn it off.
In my car, I have the bass response dialed back by about 50%, and even dip the mid-range by about 15% to get what sounds to my ears like a flat response.
I have mild hearing loss from the 250 to 1k range, so my ears are already attenuating the signal -- I can only imagine how "bass heavy" it must sound to someone with normal hearing!
There is nothing inherently good or bad in music production, no timeless rules as to how much treble or bass, compression, distortion, reverb or anything else you need. It has all been done to the extreme and what is deemed good is subject to constant change.
My personal example of a song where it doesn't work is Johnny Cash's "Hurt". Around 3 minutes into the song, Johnny Cash's vocal noticeably distorts. From my perspective, the distortion is absolutely a loudness war phenomenon: if you look at the waveform in an editor, the song starts off pretty "hot" given that there's a very noticeable crescendos at the end. At the end of the song, where the music is pretty much maxing out, there is no bandwidth left. There is no other option to stand out but to push Johnny's voice to distortion.
I've seen people in message boards say the distortion in the vocals adds "intensity". Personally, I'd love to see an un-Loudness War version where Johnny Cash gets several dB more to cleanly sing over the music. Where dynamics, not distortion, is what is driving the intensity. For my tastes, it would be much preferred.
: https://en.wikipedia.org/wiki/Loudness_war#Examples_of_.22lo... (not including the 'remasters')
I found the album a bit tiring to listen to because of the continuous loudness. No particular parts really stood out to me, because everything was just sort of loud and it just seemed to go in one ear, out the other. I could pay real close attention, one can always listen better :) But again it is a bit tiring and you can't do much else.
Then again, I just really prefer their first 2-3 albums, which have quite a different sound altogether.
And I'm curious which Oizo albums you're referring to? I love his stuff, and yes some of his tracks are quite loud, (but not all the time, the whole track), but they never quite struck me as a typical "loudness war" type of loudness. Unless I'm thinking of the wrong tracks here (No Day Massacre? Last Night a DJ Killed My Dog?), he seems to like to hit a well sound-designed overdriven bass buzz, with not too much else playing at the same time (and where it does, close attention to envelopes, cutting off hits, ms transient timings) if you do that right and just normalize to max amplitude, you're 95% of the way there (at which point my experience is that compression on top of that usually fucks up that careful detail work, but maybe I need more practice or a different plugin). Possibly I'm thinking of the wrong tracks here, at least you gave me an reason to re-listen his stuff with a different ear/attention (loudness sound-design), which is always interesting :)
You'll need an exceptional good clock to start with, and all other equipment needs to align to that clock. Then all plugins/processing you use needs to be in the same 24/192 domain, otherwise your signal is reduced to the limit of that plugin/processing and all previous efforts are lost.
Most music producers use samples, most are 16/44, so what's the point to try to get that to 24/192, filling the signal with zero's..
If a piece of music is in very rare occasion truly 24/192 then the listener who downloaded the track still needs a exceptional good clock (that are both expensive and hard to find) to playback without signal reduction.
IMAO 24/192 is just a marketing thing for audiophiles that don't really understand the implications. 24/96 should be a reasonable limit for now, although personally I think 24/48 is enough for very high quality audio.
> Most music producers use samples...
Most people interested in better-quality sound in this particular context aren't listening to contemporary electronic music with samples. 24/192 or SACD is so desirable for reissues of older recordings in pop or jazz genres where those formats were mastered with higher dynamic range, while the available CD versions or lower-bitrate downloads were mastered with loudness-wars compression. The format is also attractive to classical music listeners, because SACD gives you multichannel audio; and some classical labels are now giving loudness-wars treatment to the non-SACD or non-24/192 formats of a particular new release.
In the studio, I would say that 24 bit at least should be the norm for recording purposes.
24 bit recording gives you very noticeable increased headroom (about 20dB). This gives you quite a bit more flexibility recording lower levels without concerning yourself about the noise floor. The difference isn't huge for most prosumer setups in practice, but given that the processing power and storage power of computers makes recording in 24 bit trivial to do, there really is no reason not to record 24 bit these days IMHO.
Sample rate also comes into play, mainly if you have older plugins that do not oversample. Some of the mathematical calculations involved, particularly if they are quickly responding to audio changes (eg limiting / compression, distortion), or are using "naive" aliasing-prone algorithms (eg naive sawtooth wave vs. something like BLEP / PolyBLEP etc.), can introduce frequencies beyond the Nyquist that may translate into aliasing. These days, I would say most plugins do oversample internally or at the very least give you the option to do so. There's also a VST wrapper to over-sample older plugins as well (http://www.experimentalscene.com/software/antialias/). So I do not think recording over 44.1kHz is very necessary these days. I don't discount opinions from people that recording at 192kHz "sounds better", though, given the possibility that they are using plugins that are prone to aliasing at 44.1kHz rates.
I personally do not see any benefit of 16/44.1kHz for playback most recordings. Maybe 24 bit would be useful for heavily dynamic music (one of the few categories where you generally find this is orchestral music), but I'm thinking even for here the 96dB range of 16 bit audio should be enough for most cases.
To be fair, that only applies to the digital part of your signal chain. The analog portion is going to have nowhere near 24 bits of room above the noise floor.
The article is pretty clear that 24/192 can be reasonable for production-- it's just not reasonable for playback.
But your arguments aren't quite right, IMO. If you have a 16/44 sample, and you don't play it at full volume, you get some use out of those extra bits. Especially if you have a volume envelope.
Also many modern samples are actually saved as 24 (or 32 bit even). Especially if they're my own creation from noodling around with softsynths, but they're shared like that as well, obviously.
Then, if you apply a plugin or effect that supports 192/24 output, on a 16/44 sample, you still get useful information in those additional bits, even if the sample did not. Think of the very quiet end of a reverb tail, for instance.
But that's for producers. It's always good to have your working stuff in high quality, you never know how far you'll end up amplifying those low order bits.
So I can see the use for 24 bit audio (in certain contexts), but I'm really not so sure at all what the 192kHz is good for. Since it's waaaayyy above the human hearing range, all I can think of is dithering. You can hide a lot of your dithering noise in the ultrasonic range (which almost seems like free lunch IMHO) and then ... you obtained even more (virtual) bits of fidelity! Which you didn't really need cause you were using 24 bit audio in the first place.
I agree it's mostly a marketing gimmick, otherwise.
the 24/192 downloads or SACD releases are often the only way to hear the album with real dynamic range.
Also true? The completely absurd fact that sometimes a vinyl rip of an album will actually have the highest dynamic range, even though vinyl has a much smaller dynamic range than 16/44 audio. Bands often use "loudness wars" mastering for the digital release and then proper mastering for the vinyl release.
Back in the day you'd occasionally see remastered "gold" discs released. The advertising made a big deal about the disc material. Those probably sounded different too (they managed to sell them at a great premium) but with those, they sounded better because of the newer remastering, not the disc technology.
It's certainly possible this is the case with some of those releases remastered for SACD as well. The label probably didn't give Green Day a huge amount of money for production when they made Dookie, for example, but it eventually sold 20 million copies and additional production made sense. If it sounds better to the listener it's a real benefit, too, but it is quite likely not down to the playback technology.
Two oddball things I've noticed about remasters in the last couple of years: there's some agreement out there that the newest remaster of Hendrix's records is not the best. And King Crimson, who has an audiophile following and decorates their catalog with remasters with alarming regularity, removed a little dynamic range (an oversimplification) from the newest remaster of Larks Tongues in Aspic when remastering the CD and mastering for DVD/A because people very understandably complained about the (technically very good) previous version being too quiet. Audiophiles say they want dynamic range, but...
The label probably didn't give Green Day a huge amount of
money for production when they made Dookie, for example,
but it eventually sold 20 million copies and additional
production made sense.
No. It's still entirely silly. The reason? The dynamics are unrecoverable once squashed by a limiter or compressor during the remastering process. The fidelity of the delivery medium is moot after that happens.
If the SA cd promised that they reloaded the source tape/protools/whatever DAW was used and remixed/ remastered the songs to actually have dynamic range then I would be interested. As far as I am aware this isn't happening, and is implausible for any record of considered a classic
Many SACD reissues do go back to the source tape. This is a frequent cause of complaint with SACD reissues of classic jazz recordings from the 1960s: sometimes you get better sound in terms of dynamic range than any previous CD issue of that recording, but in the meantime the source tape may have deteriorated in parts.
Even with recordings from the “loudness wars”, there is sometimes room for dynamic range improvement when remastering. A good example is Rush’s album Vapor Trails. This was an infamously botched recording upon its original release, on a scale with Death Magnetic. Because loudness-wars treatment plagued the original tracks before mixing, the damage could never entirely be repaired. However, the additional process of compression applied to the source during transfer to CD could be reversed, so the album was eventually reissued as Vapor Trails Remixed, and while still flawed, that reissue has a lot more room to breathe than the original CD release.
"Remastered" reissues on the other formats – CD or lower-bitrate downloads – are nowadays expected to be listened to through cheap earbuds or speakers and perhaps in noisy urban environments. So, the engineer applies "loudness wars" treatment, compressing their dynamic range so the listener can still hear the quiet parts even with all the noise around them.
The other poster is not saying otherwise, though. They're just saying that "hi-def" formats, while technically unnecessary for end-user listening, are often the only way to obtain a decently-mastered recording.
There's no technical reason for things to be that way. But that's how things are.
It's sort of like buying a new car. You want the more powerful engine? Well then you also have to buy the package with heated seats, the moonroof, and the fog lights. There's no technical reason for it, but that's the only way they'll sell it to you.
It's also what this entire discussion's about.
Of course formats above 44.1/16 are useful for professional work; nobody's ever said otherwise. Just like graphics professionals work with high-res images and lossless formats even if the end product is a .jpg on a web page.
I noticed a similar thing with TV a few years ago. Despite watching a standard def. channel on a SD TV some programmes had a very noticeably better image quality. I think these had been shot and edited in HD and although the 'last mile' was still in SD there had been less degradation in the processing so the final picture was much better.
the mastering engineer has to be approved and there are some minimum dynamic range standards
also 24 bit (but not 192khz) master files have to be supplied
reportedly some of the streaming services (Spotify, YouTube) are now applying some 'loudness normalisation' which will bring some of the bad loud masters into line (it won't restore their dynamic range but will make them similar loudness to other tracks)
the loudness wars were never about what's good for listeners, but rather a competition for tracks to sound louder than other people's tracks when played consecutively on radio or in your music player
and the iTunes files are 256 kbps AACs, you can't hear the compression
remember that 'compression' in this context is data compression and not audio compression (which acts directly on the dynamic range of the source)
I can most certainly hear the compression when compared with CD digital audio.
I fully understand the difference between data compression and dynamic range (not audio) compression.
What I'm saying is, lossy data compressed audio formats are already compromised enough to rule it out as a medium for audiophile use. Worrying about the dynamic range at that point is moot. It's going to be played on tiny, terrible sounding speakers.
with 256kbps AAC, really? yeah MP3 is old and even at 320kbps it throws away everything above 18kHz (which I can't hear personally, but some people can). However AAC is newer and better and blows MP3 out of the water (so do OGG and OPUS, btw). We got a lot better at psychoacoustics since MP3 started out. I strongly doubt you could hear it in a proper A/B test.
Then, 24/192 is mostly a "weak signal" to help you estimate if the audio was treated with care.
16-bits = 2^16 * 0.01 = 655 discrete levels
24-bits = 2^24 * 0.01 = 168000 discrete levels
To help you understand, imagine 1-bit audio. What would it sound like? For each frequency, you can only have a single volume, i.e. it's maximally compressed.
I'm not sure the 1-bit extreme helps with understanding this, particularly because the audio we'd be dealing with will be a mix of many frequencies; the waveform is an aggregate of the overall 'signal' at each frequency present in the mix; this is one of the reasons why multiband rather than single-band compression has become so popular (as it allows you to get the 'max' in each frequency band rather than one band dominating the overall result of the compression).
I think there's a difference to take into account when considering any given momentary sample and the overall effect - yes, compression does reduce the dynamic range of the music, but you would need some sort of variable bit depth which was linked to the instantaneous amount of compression being applied to get any kind of workable lower-bit-depth encoding scheme, which seems like a lot of complexity for no significant gain (to me?).
> Compression means it's effectively using less of the [dynamic] range and the dynamics [range] could be represented with fewer bits in an optimal encoding.
Lower DNR can indeed be encoded with fewer bits per sample.
The meme "Math class is hard, let's go shopping" is only slightly apocryphal. Two of the voicebox lines were "Math class is tough" and "Want to go shopping?"
Because we only have 8 bits per channel, we all can see banding in (dynamically) compressed images. That's what increasing the depth improves.
You cannot buy good recordings in other formats, in this format you can. So there's a market - not a very big one as such, and maybe created for all the wrong reasons, but it is there.
The problem here is that the "you" in this sentence is not "me", or even an entity I can really influence, much less control.
It's not really even the audio engineer whose boss is telling him to turn up the volume so high that only the top few bits of the 16-bit recording are in use. The boss is saying this so that the song can be heard when the user turns the volume down because of the obnoxiously noisy commercials on the radio. Those commercials have found that they're more effective when they turn the volume way, way up. And they don't give a whit about the quality of their audio as long as they can sell more cars or tickets or whatever, much less the quality of the songs that play after them, much much less the quality of the songs that someone downloads to listen to offline in a nice listening environment without commercials.
The solution isn't just "turn down your compressor ratio", there's a big, hairy Nash equilibrium/politics problem that can be bypassed by offering 24/192 as a secondary product. If you want to remaster it to 16/48 after downloading, you're welcome to do so.
Doing this process in 24 bit gives you a large margin of error to play with. No real point to keeping that for the recording people are going to listen to
Yes, those high dynamic range releases usually get published in 24/192. No, the fact that they get published in 24/192 does not, as far as human hearing goes, add anything to the dynamic range or otherwise the fidelity of the recording.
Since the correlation is so strong, it is of course entirely understandable that people assume causation exists.
Is there a technical reason why the mastering is so different for the two mediums, CD versus vinyl?
More often than not these days, the same compressed master is used for the vinyl. To combat the groove-jumping problem, the overall level is simply dropped.
Thus, people started mastering CDs for loudness.
An alternative idea of mine is simpler. Loud music is considered worse than quiet music (quiet music sounds worse, but loud music still does, and also bothers other people). So, when you need to pick a volume setting for your collection, you bias towards setting it lower, so the really loud ones don't become too loud. Thus, the quieter CDs are annoying because they always sound quiet, whilst the load CDs sound about right, because your volume is much more suitable for them.
I guess special purpose releases don't usually end up on the radio so they can be mastered for people who actually appreciate music. ;)
As ever this difference can be impossible to detect if the equipment and environment aren't of a sufficient fidelity/quality.
What kind of gaps? There are no gaps. Sampled signals perfectly represent the original wave up to half the sampling frequency. Analog systems are inevitably limited in their frequency response as well, so, given the same bandwidth, there would be no difference at all.
In the real world, imperfect A/D and D/A conversions are typically still far less destructive than all the mechanical and electromagnetic sources of noise that affect analog systems. You can't consider one but not the other.
I think you're right that recording equipment has a long way to go, though; regardless of format I think people can relatively easily distinguish real acoustical instruments from recordings.
Although in any case, nothing justifies the pricetag dished out to audio enthusiasts.
It's only at higher tiers that the difference might really be imperceptible to many ears.
A 200$ pickup is certainly better than a 20$ needle, maybe even 10x better
The difference between a 200$ pickup and one that goes for 2000$ is miniscule. There certainly is a difference, but it's never as big as between the 20$ and the 200$ model.
That said, there are listeners who believe in the value of a 2000$ pickup and derive a lot of enjoyment from the difference to a lesser model. Who am I to say they're wrong?
Now when it comes to a manufacturer of very expensive cables (for example): Don't make me laugh...
Two things come to mind though: voltage/current delivery of your amp, and damping ratio.
The first depends on the characteristics of your amplifier. Some are better at delivering current, some voltage. The lower impedance (32) is better suited to high-current/low-voltage sources, which includes most portable devices, phones, etc. Conversely, the higher impedance (250) is better for high-voltage/low-current sources like tube amps.
The second is about the ratio of the headphone impedance to the amp output impedance. You want a high ratio, so if your source has a large output impedance then the higher impedance headphones will sound better. Good headphone amps sometimes specify the output impedance, or you can measure it.
Headphone outputs of mixers fall into this category as well. Proper audio interfaces and sound cards have no issues at all driving 250 Ω to deafening volumes. Laptops no issues as well (for me).
You are misconstruing Monty's argument here. He is very much against mp3...in fact he says he could tell the difference between high bitrate mp3 and 16 bit 44k wav. The real point of the video is that 16 bit 44k wav is beyond sufficient...don't need to go beyond that to 24-bit 192kHz.
In the early days, when all mp3 encoders were pretty bad, I could tell which encoder produced a given file. mp3 encoders today are vastly better. I've not been able to do that party trick in a long time.
But 256+ and I certainly cannot tell the difference reliably.
Pretty much found the same thing for myself at 256+.
CD quality is the very least you could want for a serious big club or theater system (much less auditorium). Between peaks and the requirements for deep bass, the peak in a digital audio file is (a)much farther above the body of music than you'd think, and (b) should never be reached, because that's clipping.
People routinely behave as if the theoretical maximum dynamic range of a Red Book CD is relevant to anything. It's incredibly easy to play back music loud enough that the noise floor will get obnoxious and relevant to listening, it's only 96 db down. Any small system in an enclosed live space can push louder than that. Cranking music over headphones will blow way past that and you won't even hear the peaks, but you'll be able to hear how bad the subtleties (or lack of same) on 16 bit red book CD are.
Electronic music, especially live, is totally relevant to high resolution audio. I more than suspect some of the big name acts (for instance, Deadmau5) are using converters to the mains running at not less than 24/96. Certain synthesizer mixes are very revealing of faults in the playback. If the live performance over a modern PA sounds absolutely huge and awesome, but not strained or grainy, then they're not using CD quality. The SPLs are more than enough to make these distinctions obvious.
Anyone can get these SPLs over headphones, trivially, and headphones that'll handle it cleanly are only a few hundred dollars.
(Personally I downloaded some of those killer samples and couldn't tell the difference, but other people reliably tell them apart in an ABX test.)
I ABXed 320kbit mp3 from an uncompressed original, I think 9/10 IIRC. It was a recording of castanets, and listening for frequency response differences was useless so I keyed off of 'personality' differences in the sounds and did it that way.
I was also just as horrible at detecting wow and flutter, as I was good at detecting lossy compression 'changes of sonic personality'. Comes from my experience being with analog audio, which is less true of people these days.
The idea that 'the best' listeners cannot tell 256k from even something as limited as 16/44.1 is ridiculous. Any mp3 format… any lossy format… is seriously compromised. Talk to techno/house DJs and producers about how useful it is to perform off mp3s, this is not a hypothetical argument.
Usually what is happening here is you're getting a master that hasn't been compressed to death (see loudness wars). Vinyl is a shit source for 'quality', but most records aren't compressed to death so they can still sound better on a good set of speakers due to the dynamic range.
* With analogue equipment the latency between the input and output of the effect is often below 1 ms (unless the effect is supposed to have a delay).
* Standalone digital effect equipment that processes the samples one at a time can also have a latency below 1 ms. (48kHz ≈ 0.02 ms)
* If you use a computer for real-time effects, the samples are not transferred sample by sample from the audio interface, but in a block of many samples. The number of buffers and the size of them can usually be changed by the user. With a buffer of 1024 samples, the oldest sample is already about 21 ms old before the computer can process it. After the buffer is processed it has to be transferred to the audio interface, and that will add another 21 ms. So the minimum latency for any real-time effect in the computer is about 42 ms at 48 kHz if the size of the buffers is 1024 samples. Often it is much worse because the operating system adds more latency. If the equipment can handle a sample rate of 192 kHz, the same latency is about 10 ms. If the computer can handle smaller buffers, the latency can be lowered. With 256 samples per buffer the minimum latency will be about 11 ms at 48 kHz and 3 ms at 192 kHz.
So well prepared, so well presented, so little that could be removed without ruining it.
I aspire to do such good demos but always fall so short.
The camerawork was excellent and the demonstration integration/trinket-wielding was seamlessly done. I get the impression the people who made this had a case of "let's use the ThinkPad for this, and let's do it this and that way," and they pulled it off so perfectly, exactly how engineers wish would happen.
If you ever need a reference demo for "open-source software can make good looking presentations," this would be on the shortlist, I think. (The credits say it was made using Cinerella, for reference.)
I have said the same thing. I was a film major, a video producer, and a tech writer (now a programmer), and I am in awe.
Check out also the making of:
The sound was done similarly to the previous video, which hadn't drawn comments. This time though, a sizable fraction of people said it was distracting.
It is the same with most "well-written" articles in newspapers nowadays. We do not even feel it but its just canned food, a lot of jelly and chemical taste enhancer but the amount of real meat inside is near zero.
Without diversion via the questions embedded in the profession and the perception and self perception of the male geek working and academic world, I so rarely see a presenter reacting to a sense of apparent human warmth in the room, and beyond the lens, which even with the most encouraging assistance behind the lens, is genuinely hard to do. Hard enough that I think it is a classic contribution to the stereotypes of inflated ego newsreel presenters, which Hollywood loves to satirise, in my opinion because Hollywood is mocking, to their narrow and insecure view, a subspecies of acting which when done well, can so massively capture the greater audience than ever some most serious actors may manage to capture.
This is a bit more than a little bit of geek knowhow and applied thought, but I think many geeks by virtue of sheer analysis without a obstruction of a ego, could be handily outperforming the supposedly inherent talent they are "meant" to possess. It may be reaching well into "real serious" acting, very easily. I don't pretend to be a judge of that, but if acting abilities are "I know it when I see it", this is excellent acting indeed.
Edit, is not was, first line. A comma for clarity but later on.
The problem whenever somebody writes about digital audio, is that it is very tempting to hold on to sampling theory (Nyquist limit, etc) and totally discard the problems of implementing an actual Analog-Digital and Digital-Analog chain that works perfectly at 44100Hz sample rate.
I agree with the assesment that 16 bit depth is good enough; even 14 bit is good enough and was used with good results in the past (!). However, the problem is with the sampling rate.
> All signals with content entirely below the Nyquist frequency (half the sampling rate) are captured perfectly and completely by sampling;
Here lies the problem. This is what theory says, however, when using 44KHz sample rate, this means that to capture the audio you need to low-pass at 22KHz. And this is not your gentle (6, 12 or 24db) low-pass filter; no, this needs to be HARD filtering; nothing should pass beyond 22KHz. And this must be on the analog domain, because your signal is analog. To implement such a filter, you need a brickwall analog filter and this is not only expensive, but it also makes mess with the audio, either 'ringing' effects and/or ripple on the frequency response and/or strong phase shifts.
So on Analog-to-digital in 2017, converters should be operating at a higher rate (say, 192KHz), because this makes analog filtering of the signal much easier and without side effects.
Now, for Digital-to-Analog, if your sample rate is 44KHz, you have two alternatives:
a) Analog brickwall filtering, with the problems noted above
b) filtering on the digital domain + using oversampling
the article mentions:
>So the math is ideal, but what of real world complications? The most notorious is the band-limiting requirement. Signals with content over the Nyquist frequency must be lowpassed before sampling to avoid aliasing distortion; this analog lowpass is the infamous antialiasing filter. Antialiasing can't be ideal in practice, but modern techniques bring it very close. ...and with that we come to oversampling."
So they are mentioning alternative (b). The problem is that oversampling does not solve all problems. Oversampling implies that the filtering is done on the digital domain and there are several choices of filtering you could use, for example FIR (Finite Impulse Response), IIR (infinite impulse response), etc.
And each one of these choices have side effects...
In short, the problem is that with 44KHz sampling rate, your filter cutoff (22KHz) is too short to your desired bandwidth (20Hz-20KHz). Using a sample rate of 192KHz gives the DAC designer much more leeway for a better conversion. And CONVERSION is the key to good digital sound.
>What actually works to improve the quality of the digital audio to which we're listening?
It is interesting that the author mentions things such as "buying better headphones" (agree), but he never mentions "Getting a better Digital to Analog converter", which is highly important !!
On the other hand, he backs up his claim that "44KHz is enough" with an interesting AES test i was already aware of in the past:
>Empirical evidence from listening tests backs up the assertion that 44.1kHz/16 bit provides highest-possible fidelity playback. There are numerous controlled tests confirming this, but I'll plug a recent paper, Audibility of a CD-Standard A/D/A Loop Inserted into High-Resolution Audio Playback, done by local folks here at the Boston Audio Society.
This is a very interesting paper, and I did have the copy, however the test equipment should be checked. There are systems and better systems. The AES paper cited above had the particularity that the ADC and DAC used were provided by exactly the same machine (a Sony PCM converter), with the same strategy: no oversampling, brickwall analog filters. I can bet (99% sure) that the brickwall filters were identical on the ADC and the DAC on that machine; Murata-brand filters in a package.
The devil, as they say, is in the details.
> Oversampling is simple and clever. You may recall from my A Digital Media Primer for Geeks that high sampling rates provide a great deal more space between the highest frequency audio we care about (20kHz) and the Nyquist frequency (half the sampling rate). This allows for simpler, smoother, more reliable analog anti-aliasing filters, and thus higher fidelity. This extra space between 20kHz and the Nyquist frequency is essentially just spectral padding for the analog filter.
> That's only half the story. Because digital filters have few of the practical limitations of an analog filter, we can complete the anti-aliasing process with greater efficiency and precision digitally. The very high rate raw digital signal passes through a digital anti-aliasing filter, which has no trouble fitting a transition band into a tight space. After this further digital anti-aliasing, the extra padding samples are simply thrown away. Oversampled playback approximately works in reverse.
> This means we can use low rate 44.1kHz or 48kHz audio with all the fidelity benefits of 192kHz or higher sampling (smooth frequency response, low aliasing) and none of the drawbacks (ultrasonics that cause intermodulation distortion, wasted space). Nearly all of today's analog-to-digital converters (ADCs) and digital-to-analog converters (DACs) oversample at very high rates. Few people realize this is happening because it's completely automatic and hidden.
The main point of the article is to argue that storing or transmitting music above 16-bit, 48 kHz is wasteful and potentially harmful. It still fully condones using higher specs for audio capture, editing, and rendering.
Of course it is acceptable. Even 14 bit audio at 36KHz with a great DAC would be fairly nice, acceptable.
What the article claims is that 192KHz is useless, of no benefit. And i contend that it is of benefit when you want more than just good or acceptable performance. Not if you have a run of the mill DAC and OK headphones/speakers, but it is if you are a music lover and critical listener.
It doesn't matter if you're a music lover or critical listener!
The article claims that 192KHz downloads are of no benefit. It's right there in the article's title. It's difficult to not accuse you of willfully misinterpreting his argument.
>So they are mentioning alternative (b). The problem is that oversampling does not solve all problems. Oversampling implies that the filtering is done on the digital domain and there are several choices of filtering you could use, for example FIR (Finite Impulse Response), IIR (infinite impulse response), etc.
And each one of these choices have side effects...
Citation needed here, oversampling solves virtually all the problems + with modern DSP the FIR filters can be made extremely good. the induced noise of modern adc/dac's is seriously tiny, and swamped by the recording noise of your audio.
That is no reason to store or even process your music at higher rates, though.
You are describing alternative (b) i mentioned above: digital filtering plus oversampling. This also isn't without side effects.
Oversampling a 44KHz signal is not the same as having a 192Khz material to start with. Very different.
In practice, signal quality issues are usually layout and supply issues, not problems with the converter itself.
In practice, the speakers and ears are the worst parts with the largest non-linearities and a frequency response that looks like a nuke test area compared to the FR of the converter. Of course, in the case of speakers, we have concluded that we want them to have their own sound, because neutral speakers are not possible.
(I tend to avoid discussing the other critical parts — original recording quality and the listener's ears — because these are often immutable constants.)
Consider a dog that lives with a musician who plays, for example, trumpet. The musician plays the trumpet at home to practice, and also records his practices to review.
A trumpet produces significant acoustic energy out to about 100 KHz . When the musician plays live the dog hears a rich musical instrument. When the musician plays back his recordings, half the frequency range that the dog could hear in the live trumpet will be gone. I'd imagine that this makes the recorded trumpet a lot less aesthetically pleasing to the poor dog.
I only say this because most people are happy listening to music on CDs but when in the presence of a live band (eg an orchestra) it is suddenly obvious how incredibly loud it is. My brother is a drummer and I find it incredibly loud; I am a bass player and I don't play loud although he sometimes complains that there's "too much bass". Perhaps we just go deaf in our relative audio spectrum.
Both those huge efforts has gone into controlling the humanly audible part of the sound. Whatever sound is accidentally produced in other frequencies is probably at best aesthetically neutral, but more likely distracting.
Though my guess trumpeting is just noise to dogs either way.
Then I'll be mighty glad we made all these high-res recordings.
What if, this actually becomes possible, but we discover that because we previously couldn't hear these frequencies, our instruments and equipment are HORRIBLY mis-tuned and sound terrible? We may end up having to re-record tons of stuff.
Something something premature optimization. And part of me is glad that the art of hand-making instruments is not yet lost; we might need the originals in the future.
Disclaimer: I say this as a completely naive person when it comes to instruments. The answer to this may be "if it wasn't built to resonate at frequency X, it won't by itself," which would be a good thing.
There's another effect that comes into play, though. There's a minimum pitch separation between simultaneous notes that we expect, and when notes are closer than that, they clash. That separation is usually around a minor third (~300 cents) in most of the human hearing range, but in the bass it's a lot wider, and in the high treble it's smaller. That's why you can play two notes a major second apart (~200 cents) on a piano in the high treble and it sounds okay, but down in the low bass it sounds muddy if they're closer than about a major third or perfect fourth (~400-500 cents). So, if we extrapolate into higher frequency ranges, then it's not unreasonable to expect that we would be able to interpret musical intervals that are a lot closer than 200 cents as consonant.
It's also possible that the minimum note separation thing is just an artifact of how our ears physically work, and that an artificial ear would have no such limitation. Which could open the possibility of enjoying kinds of music that we can't currently imagine as pleasant with our normal ears.
And if they were (such as in 96 kHz hi-res audio), you could just run it through a low-pass filter to strip off the higher frequencies.
And... heh, using a filter to strip out the audio we used all that extra filesize to deliberately store. Haha. :)
The good news is that you strip out all of this robot propaganda and still hear the exact same music, simply by encoding at a reasonable rate.
Have we asked the dogs?
The frequencies there are no longer musical in the sense of being recognized as musical notes.
That is to say, although in a pure arithmetic sense, frequency 148080 is an A, since it is a power of two multiple of 440; five octaves above it, we don't hear it as an A tone.
The frequencies in that ballpark just have to be present to add a sense of "definition" or "crispness" or "air" to the sound.
In fact, this can be faked!
In the OPUS codec, there is something called SBC: spectral band compression. What this refers to is basically a hack whereby the upper harmonics of the signal are completely stripped away, and then re-synthesized on the other end based on a duplicate of the lower harmonics or something like that. The listener just hears the increased definition.
The thing is, the higher sample rate data doesn't actually have a lot of the higher components after 20 kHz.
What the faster sample rate allows is to use a less aggressive filter.
Instead of a "brick wall" filter that rapidly cuts off after around 20 kHz, one with fewer poles can be used which rolls off more gently.
The higher sample rate ensures that there isn't any aliasing.
192 kHz audio does not reproduce flat up to 90 kHz.
(I'm going to gloss over the microphone used on the trumpet, or the loudspeakers which ultimately reproduce it.)
However, all musicians I know use these high-rez formats internally. The reason for that, when you apply audio effects, especially complex VST ones, these discretization artifacts noticeably decrease the result quality.
Maybe, the musicians who distribute their music in 24/192 format expect their music to be mixed and otherwise processed.
Not 192 kHz; no friggin' way.
Repeated processing through multiple blocks at a given sample rate does not produce cumulative discretization problems in the time domain; it's just not how the math works.
Both your inputs (ADC) and outputs (DAC) are fixed-point. Why would you want to use a floating point in between? Technically, 64-bit floating point format would be enough for precision. But that would inflate both bandwidth and CPU requirements for no value. 32-bit floating point ain’t enough. Many people in the industry already use 32-bit integers for these samples.
> Not 192 kHz; no friggin' way.
I think you’re underestimating the complexity of modern musician-targeted VST effects. Take a look: https://www.youtube.com/watch?v=-AGGl5R1vtY
I’m not an expert, i.e. I’ve never programmed that kind of software. But I’m positive such effects are overwhelmingly more complex than just multiply+add these sample values. Therefore, extra temporal resolution helps.
BTW, professionals use 24bit/192kHz audio interfaces for decades already. E.g. ESI Juli@ was released in 2004, and that was very affordable device back then.
> Why would you want to use a floating point in between?
Because 32-bit float has enough mantissa bits to represent all 24-bit integer fixed-point values exactly, so it is at least as good.
Because 32-bit float is friendly to vectorization/SIMD, whereas 24-bit integer is not.
Because with 32-bit integers, you still have to worry about overflow if you start stacking like 65536 voices on top of each other, whereas 32-bit float will behave more gracefully.
Because 32-bit floating-point audio editing is only double the storage/memory requirements compared to 16-bit integer, but it buys you the ultimate peace of mind against silly numerical precision problems.
If you quiet the amplitude by some decibels, that is just decrementing the exponent field in the float; the mantissa stays 24 bits wide.
If you quiet the amplitude of integer samples, they lose resolution (bits per sample).
If you divide a float by two, and then multiply by two, you recover the original value without loss, because just the exponent decremented and then incremented again.
(Of course, I mean: in the absence of underflow. But underflow is far away. If the sample value of 1 is represented as 1.0, you have tons of room in either direction.)
Fixed point arithmetic is non-trivial and not well supported by CPU instruction sets.
(Hint: you can't just use integer add/multiply.)
> I think you’re underestimating the complexity of modern musician-targeted VST effects. I’ve never programmed that kind of software. But I’m positive such effects are overwhelmingly more complex than just multiply+add these sample values. Therefore, extra temporal resolution helps.
Indeed, many audio effects require upsampling to work well with common inputs, e.g highly non-linear effects like distortion/saturation or analog filter models.
However usually they perform upsampling and downsampling internally (commonly between 2x-4x-8x).
While upsampling/downsampling is expensive (especially if you are using multiple of these types of plugins) its not clear if running at a higher sample rate across the board is worth it just to save those steps.
But it's not resolution, right? It's extra frequencies outside the audible range. Is there any natural process that would make those affect the audible components, if I were listening to the music live instead of a recording?
If a sonic and ultrasonic frequency are combined together, but a low pass filter doesn't pass the ultrasonic one, the ultrasonic one doesn't exist on the other end.
Hence, there can be no beat.
The main reason is that it solves clipping in the pipeline.
Because if you don't you accumulate small errors at each processing step due to rounding. Remember that it is very common for an input to pass through multiple digital filters, EQs, some compressors, a few plugins, then to be grouped and have more plugins applied to the group. You can end up running the sample through hundreds of equations before final output. Small errors at the beginning can be magnified.
Pretty much all pro-level mix engines use 32-bit floating point for all samples internally. This gives you enough precision that there isn't a useful limit to the number of processing steps before accumulated error becomes a problem. By all samples I mean the input comes from a 24-bit ADC and gets converted to 32-bit FP. From that point on all plugins and processes use 32-bit FP. The final output busses convert back to 24-bit and dither to feed the DAC (for higher-end gear the DAC may handle this in hardware).
As for 192 kHz I've never seen or heard a difference. Even 96 kHz seems like overkill. A lot of albums have been recorded at 48 kHz without any problems. As the video explains there is no "missed" audible information if you're sampling at 48 kHz. I know that seems counter-intuitive but the math (and experiments) bear this out.
An inaccurate but intuitive way to think about it is your ear can't register a sound at a given frequency unless it gets enough of the wave which has a certain length in the time domain (by definition). If an impulse is shorter than that then it has a different frequency, again by definition. 1/16th of a 1 kHz wave doesn't actually happen. Even if it did a speaker is a physical moving object and can't respond fast enough to make that happen (speakers can't reproduce square waves either for the same reasons - they'll end up smoothing it out somewhat). Even if it could the air can't transmit 1/16th of a wave - the effect will be a lower-amplutide wave of a different frequency. And again your ear drum can't transmit such an impulse (nor can it transmit a true square wave).
I've done a lot of live audio mixing and a little bit of studio work, including helping a band cut a vinyl album. Fun fact: almost all vinyl is made from CD masters and has been for years. The vinyl acetate (and master) are cut by squashing the crap out of the CD master and applying a lot of EQ to shape the signal (both to prevent the needle from cutting the groove walls too thin), then having the physical medium itself roll off the highs.
The only case where getting a 24-bit/192kHz recording might be worthwhile is if it is pre-mastering. Then it won't be over-compressed and over-EQ'd, but that applies just as well to any master. (For the vinyl we cut I compressed the MP3 version myself from the 24-bit 48 kHz masters so they had the best dynamic range of anything: better than the CD and far better than the Vinyl).
But no, musicians aren't releasing things at ultra-resolutions because they expect others to reuse their work. The ones that are, are providing multitracks.
That isn't entirely true. e.g. It's common for an audio DSP to use fixed point 24bit coefficients for an FIR filter. If you're trying to implement a filter at low frequency then there can be significant error due to coefficient precision, that error is reduced by increasing the sampling rate.
It can be useful to run your signal processing chain at a higher rate because many digital effects are not properly bandlimited internally (and it would be pretty CPU hungry to do so).
But that doesn't mean you need to store data even that you'll process later at 192KHz though it might be easier to do so.
Oversampling in studio recording is mostly about eliminating aliasing in software that's producing distortion, and it's only relevant in that context: I don't think it's nearly so relevant on, say, an EQ.
That's an anthropocentric statement.
I want to leave it at that for now just to see if I get dinged for low effort.
Anyway, check out the frequency range for other animals:
Notice that there are a number of species whose range extends well past 20kHz. Even with 192kHz you're still chopping off the upper end of what dolphins and porpoises can hear and produce.
So please convince Apple and friends that you need 200+kHz to truly capture the "warmth" of Neil Young's live albums. Then we'll be able to crowdsource recording all the animals and eventually synthesize the sounds to communicate with them all.
Maybe then we can synthesize a compelling, "We're really sorry, we promise we're going to fix all the things now," for the dolphins and convince them not to leave. :)
All this comes at a high computational and storage cost though.
I personally use 44khz 24bit settings for DAW use.
Music is different. If you have the original multi-track composition, you can’t reproduce the result unless you also have the original DAW software (of the specific version), and all the original VST plugins (again, of the specific version each).
All that software is non-free, and typically it’s quite expensive (esp. VSTi). Some software/plugins are only available on Windows or Mac. There’s little to no compatibility even across different versions of the same software.
Musicians don’t support nor maintain their music. Therefore, DAW software vendors don’t care about open formats, interoperability, or standardization.
BTW, I'm merely a dilettante when it comes to recording and especially mixing.
I know and understand how incorrect down-sampling from high frequencies can cause distortion in the form of sub-harmonics in the audioable range.
I know about audible dynamic range and how many decibels of extra range 8-bits are going to give you.
I know all this, but I still have to admit: if there's a hi-res recording (24-bit, >48kHz) available for download/purchase, I'll always go for that instead of the "regular" 16-bit 44.1/48kHz download. I guess escaping your former audiophile self is very, very hard.
Anyone else want to admit their guilty, stupid pleasures? :)
I'm up to about 300 different models.
Deriving pleasure from listening to music has a large subjective component. So if I've paid more for a track and / or I got the most amount of bits I could I'll probably enjoy it more. Also makes for great conversation topics.
I also have some gear that aliases at a volume you can't hear, but when you plug it into an analog distortion pedal, the aliasing in the high frequencies becomes apparent. This would be avoided if it had a higher sample rate so the aliasing was far out of the audible range.
For other sorts of effects like spectral processing, pitch shifting, the extra detail above 22khz really does make a difference, especially if pitching down.
• Less quantisation noise, so your noise floor is a bit lower and therefore, when you're mixing stuff together, you accumulate less noise
• More numerical room to play with, you can scale or shift the volume up and down more without clipping and with less loss of accuracy
With 16-bit CD audio, you can't just convert to 24-bit and suddenly have less noise. You might get more room, though.
As for higher sampling rates (more kHz, if you will), I think Monty mentioned some benefit regarding less sharp aliasing filters (can have a larger transition band from 20–48kHz, say, rather than from 20—22.1kHz), but it's not something I understand the benefit of well.
Specifically, if 24/192 AAC is worth it compared to 16/44.1 AAC (and the answer to that is yes, although the answer to 24/192 WAV is no)
The point of higher sample rates isn't to capture bat squeaks, but to relax the engineering constraints on pre-post filtering.
Nyquist is fine in theory, and if you've never actually tried to implement a clean filter you'll likely watch the xiph video and think "Well that makes sense."
If actually know something about practical DSP and the constraints and challenges of real filter design you're not going to be quite so easily impressed.
Likewise with higher bit depths. "Common sense" suggests that no one should be able to hear a noise signal at -90dB.
Common sense is dead wrong, because the effects of a single bit of dither are absolutely audible.
And if you can hear the effects of noise added at -90dB, you can certainly hear the effects of quantisation noise artefacts on reverb tails and long decaying notes at -30 to -40dB, added by recording at 16 bits instead of 24 bits.
Whether or not that level of detail is present in a typical pop or classical recording is a different issue. Realistically most music is heavily compressed and limited, so the answer is usually "no."
And not all sources have 24-bits of detail . (Recordings made on the typical digital multitrack machines used in the 80s and 90s certainly don't.)
That doesn't mean that a clean unprocessed recording of music with a wide dynamic range made on no-compromise equipment won't show the difference clearly.
Speaking from experience, it certainly does.
 Technically no sources have 24-bits of detail. The best you'll get from a real world converter is around 22-bits.
What video? This thread is about an article.
> The point of higher sample rates isn't to capture bat squeaks, but to relax the engineering constraints on pre-post filtering.
I just said that, I think.
As Monty demonstrates, it's a fraudulent waste to try to sell the result as a product to the end listener.
The article is highly technical. Does anyone have a way to describe this phenomenon intuitively?
Therefore, rather than just being useless extra data, it can be actively harmful to the listening experience.
- 480p vs 2160p is measuring the resolution of your cell phone propped up on a pillow at the other end of your living room
- experimental evidence shows that your eyesight is not good enough to pick up on the increased resolution, you've maxed out your sensory perception
- Your phone stutters trying to stream at 4k so the playback might actually be worse.
When the high resolution image is downscaled poorly, some high (spatial) frequencies are aliased down into lower (spatial) frequencies, manifesting as blocky/jagged lines. The images are 2D data in the spatial domain, while the audio is 1D data in the temporal domain, but the aliasing of high frequency signal to lower frequency artifact is similar.
Viewing a high resolution image on a panel that has enough pixels to display it without scaling is analogous to listening to 192 kHz audio on a fantastic speaker system that can reproduce those high frequencies accurately, instead of causing distortion by aliasing them to lower frequencies. On the other side, viewing a high resolution image which has been downscaled poorly is analogous to listening to that 192 kHz audio on a realistic speaker system that cannot reproduce high frequencies, which results in those signals aliasing down into the audible range.
And as you say, there is a point where, for the viewer/listener's sake, it doesn't make sense to push for higher frequencies because even if you can build a panel/speaker that will faithfully reproduce those frequencies without aliasing, the eye/ear will not be able to perceive the additional detail.
Yet people have been reliably _unable_ to do this.
The gold standard in listening tests for this is an ABX where you are simply trying to show that you can discern a difference.
When properly setup and calibrated people are unable to show that they can distinguish 48kHz and 192kHz.
Moreover, by the numbers, audio hardware tends to work less well at higher rates if they're different, because running them at higher rates makes their own internal noise shaping less powerful. (Though for anything done well the distinction should still be inaudible).
Finally, if you do have cruddy converters that sound significantly different at different rates nothing stops you from using a transparent software resampler (SSRC is well respected for good reason) to put the audio at whatever rate you want.. until you get better hardware. :)
That would have to be the noise. The math doesn't lie...
When you play something stored at 44 khz, there are no ultrasonic sounds recorded.
At 192 khz instead, there are, and the speakers may push some of the ultrasonic sounds down to the audible spectrum, causing distortion that the human ear can hear.
Imagine the normal operation of a speaker as a swing, except you are pushing and pulling the swing all throughout the cycle as it goes up and down. Now, you can technically move the swing at a variety of frequencies if you’re holding onto it the whole time. However imagine as you push it back and forth (low frequencies), you also vigorously shake the swing at the same time (high frequencies). This would probably result in the chains rattling, similar to the unwanted distortions in the speakers caused by ultrasonic frequencies.
Another way to look at is that digital audio is not like digital imaging. There aren't pixels. Increasing the data rate does not continue to make the waveform more and more detailed to human auditory perception in the way that raising the pixel density does for human visual perception.
To describe it intuitively, forget your intuition that audio is like visual and start from "there is no metaphor between these two things."
Even analog imaging has concerns that are better described in the frequency domain than the spatial domain:
The difference is audio is "actually" bandlimited and frequency-based, but images are "actually" spatial. When you try using frequency math on images you get artifacts called ringing, which are easy to see with JPEGs or sharp scalers like Lanczos.
Of course audio isn't really frequency-based either, or else it would just be one sound repeated forever. So there's still ringing artifacts (called "pre-echo") and cymbals are the first to go in an MP3.
I.e. audio is sampled along one dimension, while images are sampled along two dimensions. Note that frequency-domain considerations play a crucial role in all optical design, including imaging sensors.
Early low data rate codecs - such as the one used for GSM mobiles - are obviously inferior, but still functional. I think a better analogy is that an iPhone 7 has a 1 megapixel screen, so there's no difference between a 1 megapixel image and a 5 megapixel image, except one is much larger. Of course visually you can zoom in (or move closer in real life), but audibly you can't.
For the mathematically inclined, this would probably be a good time to repeat: pixels are not little squares.
Digital distortion basically simulates high gain with soft clipping, which basically takes a narrow slice of the amplitude domain and magnifies it. The extra resolution has to be there for that not to turn into ugly sounding shit with artifacts.
1 / (2 * pi * bandwidth * 2 ^ bitdepth)
So for full scale 44/16 signal it is about 0.1 nanosecond.
More here . There are also some sample files for ABX tests there.
 - https://hydrogenaud.io/index.php/topic,108987.msg896449.html...
I haven't had time to check the linked article but if it's the one I'm thinking of I'm pretty sure it goes into this.
In the end though, double blind testing (again, I think mentioned in the article) shows that there's a threshold (a bit below 48kHz, probably why 44.1kHz was chosen for CD audio) after which people cant distinguish between higher sample rates at any more accuracy than randomly guessing.