Hacker News new | past | comments | ask | show | jobs | submit login
24/192 Music Downloads Are Very Silly Indeed (2012) (xiph.org)
647 points by Ivoah on Aug 29, 2017 | hide | past | web | favorite | 428 comments



Not entirely silly. Yes, the purported benefits of this high-fidelity audio are imaginary or even include undesirable traits. However, when most "remasters" of pop music today involve the dynamics being boosted to "loudness wars" standards for a target audience of people listening to the music through earbuds in the street, the 24/192 downloads or SACD releases are often the only way to hear the album with real dynamic range.

I’m not sure how big that market is these days, though. I recently decided to move from a block in the city to a house in the surrounding countryside purely to get a quieter listening environment and let me really enjoy the high-dynamic-range recordings I have – I listen to a lot of classical music, especially the avant-garde with its even greater range, e.g. the Ligeti cello concerto starting from pppppp. Yet even among my friends who are really obsessed with such music, seeking a better listening environment seemed an extreme measure to take.

So, people who a generation ago would have invested in higher-end equipment (not audiophile snake oil, just better speakers) and who would have sought silence are now giving in to listening to music on their phones or cheap computer speakers. It’s a big shift.


The loudness war seems to have been a temporary problem, prevalent mostly because music was primarily listened as downloads of individual files or small collections of files.

Music streaming services have since become a very important music listening medium. Spotify, Google Music, Apple Music and others do normalization of these loudness levels. This neutralizes the loudness wars as it makes loudness wars treatment of music useless, at least when listened on streaming services [1]. Music mastered in the loudness war will still have problems with dynamic range but the perverse motivations causing the war are simply not as relevant for new music.

I agree the loudness war was a huge problem a few years ago, but changing trends in the ways music is being listened to is solving it. The way you present the loudness war problem is therefore somewhat out of touch.

[1]: https://motherboard.vice.com/en_us/article/ywgeek/why-spotif...


The loudness war is a matter of degree. Sure, steps have been taken so that we maybe won't see another Death Magnetic where the compression has resulted in outright ear-bleeding distortion. But when most music today is listened to from portable devices in loud environments, it’s hard to believe that we are ever again going back to the level of dynamic range common before the loudness war. As I said, even some classical music (and jazz) labels are now issuing their music with considerably limited dynamic range, and a dynamic range as ample as it traditionally was is available only to those who buy the SACD (and listen to the SACD layer of a hybrid SACD, not the CD layer) or the high-resolution download.


Add to this the 'Bass arms race' happening in the past 15 years.

It is especially noticeable if one listens to "modern" vs older CDs vs the radio. I suppose Producers now feel compelled to exercise subwoofers and thump the audience, or compensate for crappy earbuds in the target audience, or maybe that's just what people want because even radio commercials are complicit -- it's difficult to listen to someone talk with "thump/thump/thump" drumbeats in the background. Often, I simply turn it off.

In my car, I have the bass response dialed back by about 50%, and even dip the mid-range by about 15% to get what sounds to my ears like a flat response.

I have mild hearing loss from the 250 to 1k range, so my ears are already attenuating the signal -- I can only imagine how "bass heavy" it must sound to someone with normal hearing!


The only thing that any good producer feels compelled to, is to create the best possible record.

There is nothing inherently good or bad in music production, no timeless rules as to how much treble or bass, compression, distortion, reverb or anything else you need. It has all been done to the extreme and what is deemed good is subject to constant change.


Sometimes car head units have an undocumented bass and treble boost, so flat per the settings on the unit is far from flat. The kinds of compensations you are making get the unit closer to flat when it is a unit of configured with undocumented tuning. You could measure your head unit to find out.


The loudness war was great. I totally appreciate the extremes it reached and the new music production techniques that were developed in response. Some utterly squashed, distorted albums like The Prodigy's Always Outnumbered, Never Outgunned are masterpieces. Same goes for some 10-15 year old Mr. Oizo, SebastiAn albums. Over-compression can be very aesthetic. Dynamic range is overrated...


Some people like it (obviously, the "loudness war" would've never occurred if it didn't work commercially), some people don't.

My personal example of a song where it doesn't work is Johnny Cash's "Hurt". Around 3 minutes into the song, Johnny Cash's vocal noticeably distorts. From my perspective, the distortion is absolutely a loudness war phenomenon: if you look at the waveform in an editor, the song starts off pretty "hot" given that there's a very noticeable crescendos at the end. At the end of the song, where the music is pretty much maxing out, there is no bandwidth left. There is no other option to stand out but to push Johnny's voice to distortion.

I've seen people in message boards say the distortion in the vocals adds "intensity". Personally, I'd love to see an un-Loudness War version where Johnny Cash gets several dB more to cleanly sing over the music. Where dynamics, not distortion, is what is driving the intensity. For my tastes, it would be much preferred.


50 cent get rich or die trying is an excellent example of a poorly mastered album produced around the same time as well, vocals distorted throughout almost (if not) the entire album. I can't even imagine what the engineers were thinking when recording these, similar to the overuse of autotune in country these days - see george straight - cowboy walks away


That's a pretty homogeneous sample. I can't imagine Bergtatt or Beethoven's 6th being as enjoyable on a poorly mastered recording.


I agree that loud albums can still be amazing. Multiple albums on Wikipedia's 'loud' albums list [1] belong to my favorite albums of all time.

[1]: https://en.wikipedia.org/wiki/Loudness_war#Examples_of_.22lo... (not including the 'remasters')


I have to say that (IMHO) Prodigy went over the top with that on The Day Is My Enemy. Thing is you can hear it on the tails of the crash cymbals, you can hear them get ducked in a "stuttery" manner when the compression from the drums hits. Actually, great production technique should be able to work around that and make it both loud without messing up the cymbal tails (or maybe just truncate them, just not have them stutter).

I found the album a bit tiring to listen to because of the continuous loudness. No particular parts really stood out to me, because everything was just sort of loud and it just seemed to go in one ear, out the other. I could pay real close attention, one can always listen better :) But again it is a bit tiring and you can't do much else.

Then again, I just really prefer their first 2-3 albums, which have quite a different sound altogether.

And I'm curious which Oizo albums you're referring to? I love his stuff, and yes some of his tracks are quite loud, (but not all the time, the whole track), but they never quite struck me as a typical "loudness war" type of loudness. Unless I'm thinking of the wrong tracks here (No Day Massacre? Last Night a DJ Killed My Dog?), he seems to like to hit a well sound-designed overdriven bass buzz, with not too much else playing at the same time (and where it does, close attention to envelopes, cutting off hits, ms transient timings) if you do that right and just normalize to max amplitude, you're 95% of the way there (at which point my experience is that compression on top of that usually fucks up that careful detail work, but maybe I need more practice or a different plugin). Possibly I'm thinking of the wrong tracks here, at least you gave me an reason to re-listen his stuff with a different ear/attention (loudness sound-design), which is always interesting :)


It's a genre thing. Lotta dance music these days sounds wrong if it doesn't have the particular distortion of overdoing the hard limiter. Like rock'n'roll without distortion on the guitar.


Still very silly IMAO. Very, very few recordings are done in 24/192 because of the many implications this has on your entire studio setup.

You'll need an exceptional good clock to start with, and all other equipment needs to align to that clock. Then all plugins/processing you use needs to be in the same 24/192 domain, otherwise your signal is reduced to the limit of that plugin/processing and all previous efforts are lost.

Most music producers use samples, most are 16/44, so what's the point to try to get that to 24/192, filling the signal with zero's..

If a piece of music is in very rare occasion truly 24/192 then the listener who downloaded the track still needs a exceptional good clock (that are both expensive and hard to find) to playback without signal reduction.

IMAO 24/192 is just a marketing thing for audiophiles that don't really understand the implications. 24/96 should be a reasonable limit for now, although personally I think 24/48 is enough for very high quality audio.


You ought to have read the rest of this thread before leaving this comment. I’d be very happy with 24/96 or 24/48 if I could get recordings at that bitrate that aren't given a loudness wars treatment. Since I often can't, I have to go for 24/192 or SACD even if all that extra room is completely superfluous, just because that format was decently mastered.

> Most music producers use samples...

Most people interested in better-quality sound in this particular context aren't listening to contemporary electronic music with samples. 24/192 or SACD is so desirable for reissues of older recordings in pop or jazz genres where those formats were mastered with higher dynamic range, while the available CD versions or lower-bitrate downloads were mastered with loudness-wars compression. The format is also attractive to classical music listeners, because SACD gives you multichannel audio; and some classical labels are now giving loudness-wars treatment to the non-SACD or non-24/192 formats of a particular new release.


"some classical labels are now giving loudness-wars treatment"

That's depressing.


Haha, at least classical IS usually recorded well, being an audiophile metal-head these days is depressing. If I use my normal setup almost everything clips badly.


This depends on whether you are in the studio or are just playing back things.

In the studio, I would say that 24 bit at least should be the norm for recording purposes.

24 bit recording gives you very noticeable increased headroom (about 20dB). This gives you quite a bit more flexibility recording lower levels without concerning yourself about the noise floor. The difference isn't huge for most prosumer setups in practice, but given that the processing power and storage power of computers makes recording in 24 bit trivial to do, there really is no reason not to record 24 bit these days IMHO.

Sample rate also comes into play, mainly if you have older plugins that do not oversample. Some of the mathematical calculations involved, particularly if they are quickly responding to audio changes (eg limiting / compression, distortion), or are using "naive" aliasing-prone algorithms (eg naive sawtooth wave vs. something like BLEP / PolyBLEP etc.), can introduce frequencies beyond the Nyquist that may translate into aliasing. These days, I would say most plugins do oversample internally or at the very least give you the option to do so. There's also a VST wrapper to over-sample older plugins as well (http://www.experimentalscene.com/software/antialias/). So I do not think recording over 44.1kHz is very necessary these days. I don't discount opinions from people that recording at 192kHz "sounds better", though, given the possibility that they are using plugins that are prone to aliasing at 44.1kHz rates.

I personally do not see any benefit of 16/44.1kHz for playback most recordings. Maybe 24 bit would be useful for heavily dynamic music (one of the few categories where you generally find this is orchestral music), but I'm thinking even for here the 96dB range of 16 bit audio should be enough for most cases.


> This gives you quite a bit more flexibility recording lower levels without concerning yourself about the noise floor.

To be fair, that only applies to the digital part of your signal chain. The analog portion is going to have nowhere near 24 bits of room above the noise floor.

The article is pretty clear that 24/192 can be reasonable for production-- it's just not reasonable for playback.


First of, I agree that 24/192 is not very useful for most circumstances (also for the dynamics thing, you still need a 24/192 master done without all the compressions).

But your arguments aren't quite right, IMO. If you have a 16/44 sample, and you don't play it at full volume, you get some use out of those extra bits. Especially if you have a volume envelope.

Also many modern samples are actually saved as 24 (or 32 bit even). Especially if they're my own creation from noodling around with softsynths, but they're shared like that as well, obviously.

Then, if you apply a plugin or effect that supports 192/24 output, on a 16/44 sample, you still get useful information in those additional bits, even if the sample did not. Think of the very quiet end of a reverb tail, for instance.

But that's for producers. It's always good to have your working stuff in high quality, you never know how far you'll end up amplifying those low order bits.

So I can see the use for 24 bit audio (in certain contexts), but I'm really not so sure at all what the 192kHz is good for. Since it's waaaayyy above the human hearing range, all I can think of is dithering. You can hide a lot of your dithering noise in the ultrasonic range (which almost seems like free lunch IMHO) and then ... you obtained even more (virtual) bits of fidelity! Which you didn't really need cause you were using 24 bit audio in the first place.

I agree it's mostly a marketing gimmick, otherwise.


Studios work in 32bit-Float all the time...


    the 24/192 downloads or SACD releases are often the only way to hear the album with real dynamic range.
Yeah, this is often true. Some of Green Day's (a band I've never previously been into) albums were released in 24/192 or 24/96 a few years back and they actually sounded really great, with real dynamic range.

Also true? The completely absurd fact that sometimes a vinyl rip of an album will actually have the highest dynamic range, even though vinyl has a much smaller dynamic range than 16/44 audio. Bands often use "loudness wars" mastering for the digital release and then proper mastering for the vinyl release.


> Yeah, this is often true. Some of Green Day's (a band I've never previously been into) albums were released in 24/192 or 24/96 a few years back and they actually sounded really great, with real dynamic range.

Back in the day you'd occasionally see remastered "gold" discs released. The advertising made a big deal about the disc material. Those probably sounded different too (they managed to sell them at a great premium) but with those, they sounded better because of the newer remastering, not the disc technology.

It's certainly possible this is the case with some of those releases remastered for SACD as well. The label probably didn't give Green Day a huge amount of money for production when they made Dookie, for example, but it eventually sold 20 million copies and additional production made sense. If it sounds better to the listener it's a real benefit, too, but it is quite likely not down to the playback technology.

Two oddball things I've noticed about remasters in the last couple of years: there's some agreement out there that the newest remaster of Hendrix's records is not the best. And King Crimson, who has an audiophile following and decorates their catalog with remasters with alarming regularity, removed a little dynamic range (an oversimplification) from the newest remaster of Larks Tongues in Aspic when remastering the CD and mastering for DVD/A because people very understandably complained about the (technically very good) previous version being too quiet. Audiophiles say they want dynamic range, but...


Yeah. And you know, as a casual audiophile, that's my big frustration with this hobby. There's so much bullshit marketing (and audiophiles who believe it) that it really gives the whole hobby a bad name. (On the bright side, there are a lot of objective audiophiles and with just a little bit of knowledge you can get fantastic sound for very little money...)

    The label probably didn't give Green Day a huge amount of
    money for production when they made Dookie, for example,
    but it eventually sold 20 million copies and additional 
    production made sense.
I don't know about Dookie but their latter albums were recorded with full dynamic range and then squashed down into loudness-wars style mush.


Yeah it's still the artist's choice how much dynamic range is needed to express their music. More dynamic range just for the sake of it, seems just as bad as sausage flattening to get as loud sound as possible.


Vinyl DR values are not accurate, vinyl made from the same digital master as CD usually will have DR value higher by 3 or more points.


> Not entirely silly. Yes, the purported benefits of this high-fidelity audio are imaginary or even include undesirable traits. However, when most "remasters" of pop music today involve the dynamics being boosted to "loudness wars" standards for a target audience of people listening to the music through earbuds in the street, the 24/192 downloads or SACD releases are often the only way to hear the album with real dynamic range.

No. It's still entirely silly. The reason? The dynamics are unrecoverable once squashed by a limiter or compressor during the remastering process. The fidelity of the delivery medium is moot after that happens.


This. Once the "master" is treated with the maximizing limiters that are used in the loudness war the files are just rendered into their separate file formats.

If the SA cd promised that they reloaded the source tape/protools/whatever DAW was used and remixed/ remastered the songs to actually have dynamic range then I would be interested. As far as I am aware this isn't happening, and is implausible for any record of considered a classic


Loudness-wars compression is a product of the digital recording era, and quite a few years into it, too. A lot of music that people still want to listen to today predates all that.

Many SACD reissues do go back to the source tape. This is a frequent cause of complaint with SACD reissues of classic jazz recordings from the 1960s: sometimes you get better sound in terms of dynamic range than any previous CD issue of that recording, but in the meantime the source tape may have deteriorated in parts.

Even with recordings from the “loudness wars”, there is sometimes room for dynamic range improvement when remastering. A good example is Rush’s album Vapor Trails. This was an infamously botched recording upon its original release, on a scale with Death Magnetic. Because loudness-wars treatment plagued the original tracks before mixing, the damage could never entirely be repaired. However, the additional process of compression applied to the source during transfer to CD could be reversed, so the album was eventually reissued as Vapor Trails Remixed, and while still flawed, that reissue has a lot more room to breathe than the original CD release.


Why implausible?


The scope of work involved is too great to remix and remaster all the records. I also know that most indie records are not kept in their multitrack form as meticulously as big label acts.


How does 24/192 help with dynamic range?


It has nothing to do with the 24/192 values themselves, but rather the treatment given to the recording during the mastering. Because 24/192 and SACD are "audiophile formats", the engineers who master the recordings expect them to be listened to in serious listening environments that are relatively noise-insulated so that very soft parts can be heard clearly, and there are no complaining neighbours so the loud parts can really blast. Consequently, an ample dynamic range is preserved.

"Remastered" reissues on the other formats – CD or lower-bitrate downloads – are nowadays expected to be listened to through cheap earbuds or speakers and perhaps in noisy urban environments. So, the engineer applies "loudness wars" treatment, compressing their dynamic range so the listener can still hear the quiet parts even with all the noise around them.


You are proving the point. Treating the audio well, not giving in to the loudness, has only correlation with 24/192 releases (if you say so), there is no causation.


Sure, there is no causation of "higher bitrate necessarily makes for better sound", but the correlation of "higher bitrate format just happens to have better mastering" is often strong enough to make these the go-to formats. There are a large number of classic recordings that are difficult to obtain in the digital era without harsh compression being applied to them, so the 24/192 or SACD release is the most convenient way of hearing them with decent dynamic range.


Right, absolutely. 16 bit samples give 100db+ of perceived dynamic range, more than would ever be needed.

The other poster is not saying otherwise, though. They're just saying that "hi-def" formats, while technically unnecessary for end-user listening, are often the only way to obtain a decently-mastered recording.

There's no technical reason for things to be that way. But that's how things are.

It's sort of like buying a new car. You want the more powerful engine? Well then you also have to buy the package with heated seats, the moonroof, and the fog lights. There's no technical reason for it, but that's the only way they'll sell it to you.


But there is a technical reason for them as the tools used to master/make music work better at the higher formats. Its not "superstition" its digital audio.


Of course. That's why I specifically said "end-user listening."

It's also what this entire discussion's about.

Of course formats above 44.1/16 are useful for professional work; nobody's ever said otherwise. Just like graphics professionals work with high-res images and lossless formats even if the end product is a .jpg on a web page.


Yes, but this is about Music Downloads.


In other words, it is not a technical advantage, but a social/political one.


> It has nothing to do with the 24/192 values themselves, but rather the treatment given to the recording during the mastering

I noticed a similar thing with TV a few years ago. Despite watching a standard def. channel on a SD TV some programmes had a very noticeably better image quality. I think these had been shot and edited in HD and although the 'last mile' was still in SD there had been less degradation in the processing so the final picture was much better.


It’s the same reason downsampling 4K to 1080p looks better than shooting in 1080p.


this is roughly what the "Mastered for iTunes" badge means on iTunes downloads

the mastering engineer has to be approved and there are some minimum dynamic range standards

also 24 bit (but not 192khz) master files have to be supplied

reportedly some of the streaming services (Spotify, YouTube) are now applying some 'loudness normalisation' which will bring some of the bad loud masters into line (it won't restore their dynamic range but will make them similar loudness to other tracks)

the loudness wars were never about what's good for listeners, but rather a competition for tracks to sound louder than other people's tracks when played consecutively on radio or in your music player


Mastering for "high fidelity" 24-bit audio on iTunes is an oxymoron. The compression algorithms used for AAC and MP3 are going to the worsen the sound quality in a manner that renders the extended dynamic range of 24-bit audio is useless.


even on mp3s where the bitrate is low enough to hear some compression artefacts you would still be able to perceive the difference in dynamic range

and the iTunes files are 256 kbps AACs, you can't hear the compression

remember that 'compression' in this context is data compression and not audio compression (which acts directly on the dynamic range of the source)


> the iTunes files are 256 kbps AACs, you can't hear the compression

I can most certainly hear the compression when compared with CD digital audio.

I fully understand the difference between data compression and dynamic range (not audio) compression.

What I'm saying is, lossy data compressed audio formats are already compromised enough to rule it out as a medium for audiophile use. Worrying about the dynamic range at that point is moot. It's going to be played on tiny, terrible sounding speakers.


> I can most certainly hear the compression when compared with CD digital audio.

with 256kbps AAC, really? yeah MP3 is old and even at 320kbps it throws away everything above 18kHz (which I can't hear personally, but some people can). However AAC is newer and better and blows MP3 out of the water (so do OGG and OPUS, btw). We got a lot better at psychoacoustics since MP3 started out. I strongly doubt you could hear it in a proper A/B test.


With the higher 24 bit resolution, would it not be possible for someone to "decompress" the dynamic range with less distortion than 16 bit?


No, compression can't be undone. It destroys information. Especially with a limiter. The mastering chain will have typically done other things to the signal as well.


Thanks for the clarification, I kinda agree.

Then, 24/192 is mostly a "weak signal" to help you estimate if the audio was treated with care.


That's the 24-bit part, which provides more play room then 16-bit. Normally 16 is plenty, by consider an example track which is heavily compressed so only the top 1% of range is used:

16-bits = 2^16 * 0.01 = 655 discrete levels

24-bits = 2^24 * 0.01 = 168000 discrete levels


I think you have the wrong end of the stick there. Even if a track is heavily multiband compressed, it's still using the full range of levels, as audio will oscillate between -ve and +ve for each cycle, potentially using most of the 65,536 levels available at 16-bit (and the 16777216 levels at 24 bit). The 0.01 is not relevant.


No, I don't have it wrong. Compression means it's effectively using less of the range and the dynamics could be represented with fewer bits in an optimal encoding. All the levels may be used in track, but it's more biased towards the high end.

To help you understand, imagine 1-bit audio. What would it sound like? For each frequency, you can only have a single volume, i.e. it's maximally compressed.


I still think you have it wrong; (Dynamic Audio) Compression does mean it may be using less of the dynamic range when taken overall, but the waveform will still take up the entire span of the available bit depth. Granted that with an optimal encoding this could be biased towards the "high" end (but bear in mind that would have to be both +ve and -ve), but I still think this would miss the point, and I think it would also create distortion in the 'zero' region of the audio. If you open up heavily compressed music in an audio editor (pick any recent EDM track), it will still cross the zero point, and have maximum and minimum levels present. It will also have 'zero' (i.e. 32768 if the range is 0-65536 in 16-bit audio) present - often at the beginning and end of the track, and the audio will be centred about that in the vast majority of tracks.

I'm not sure the 1-bit extreme helps with understanding this, particularly because the audio we'd be dealing with will be a mix of many frequencies; the waveform is an aggregate of the overall 'signal' at each frequency present in the mix; this is one of the reasons why multiband rather than single-band compression has become so popular (as it allows you to get the 'max' in each frequency band rather than one band dominating the overall result of the compression).

I think there's a difference to take into account when considering any given momentary sample and the overall effect - yes, compression does reduce the dynamic range of the music, but you would need some sort of variable bit depth which was linked to the instantaneous amount of compression being applied to get any kind of workable lower-bit-depth encoding scheme, which seems like a lot of complexity for no significant gain (to me?).


1-bit audio sounds perfectly fine at a high enough sampling rate (even higher than 192kbps), with proper dithering. That's how D/A converters work.


I can confirm, you have it wrong.


But also correct here:

> Compression means it's effectively using less of the [dynamic] range and the dynamics [range] could be represented with fewer bits in an optimal encoding.

Lower DNR can indeed be encoded with fewer bits per sample.


As Barbie would say: Audio engineering is hard! Let's play with the calculator.


Barbie has piloted commercial airliners, been on a scientific space mission, and has an M.D., so I don't think she'd be likely to say that.


https://en.wikipedia.org/wiki/Teen_Talk_Barbie

The meme "Math class is hard, let's go shopping" is only slightly apocryphal. Two of the voicebox lines were "Math class is tough" and "Want to go shopping?"


No, that has nothing to do with it. See my reply above.


Your reply is not the whole picture. Compression reduces the effective number of bits of resolution.


If you had an RGB (24 bits) image that is mostly very dark, would you say that somehow because the image doesn't use the full range of possible values, the image quality suffered? Would you say that changing the format/bit-depth would actually lead to a perceived increase in quality when looking at the image?


I don't understand why you said I was wrong in the other comment, but you give here an appropriate analogy which makes the effect obvious.

Because we only have 8 bits per channel, we all can see banding in (dynamically) compressed images. That's what increasing the depth improves.


Did you check out the article? 16 bits can cover the range very well, more than you (or anyone) could ever distinguish


Sigh. Please see my other reply. It isn't a matter of bit rate (indeed, 16 bits could be more than enough). It is a matter of how engineers serve the targeted markets of each format.


Perfectly valid point. It's not a technical limitation that "normal" recordings are spoiled by compression, but it's a commercial limitation (or whatever you call it) that has a very real impact.

You cannot buy good recordings in other formats, in this format you can. So there's a market - not a very big one as such, and maybe created for all the wrong reasons, but it is there.


I'm not sure how it's not even more "silly" then. Investing in additional infrastructure, equipment, and effort, when all you need to do is turn down the compressor ratio in the mix... Sounds silly to me.


> all you need to do is turn down the compressor ratio in the mix

The problem here is that the "you" in this sentence is not "me", or even an entity I can really influence, much less control.

It's not really even the audio engineer whose boss is telling him to turn up the volume so high that only the top few bits of the 16-bit recording are in use. The boss is saying this so that the song can be heard when the user turns the volume down because of the obnoxiously noisy commercials on the radio. Those commercials have found that they're more effective when they turn the volume way, way up. And they don't give a whit about the quality of their audio as long as they can sell more cars or tickets or whatever, much less the quality of the songs that play after them, much much less the quality of the songs that someone downloads to listen to offline in a nice listening environment without commercials.

The solution isn't just "turn down your compressor ratio", there's a big, hairy Nash equilibrium/politics problem that can be bypassed by offering 24/192 as a secondary product. If you want to remaster it to 16/48 after downloading, you're welcome to do so.


Well, SACD at least offers one further advantage than lower compression: multichannel audio, which is a big deal for classical music, especially works involving spatialization. Investing in a SACD player is well worth if that repertoire concerns you.


I'm an amateur at all this (and maybe the article stated this explicitly), I use Garageband. I find the main reason to do your recording and mastering in 24 bit is so you have room to change the level of each track without the sum of them clipping. When you are trying to set the levels of each track you have two constraints if you use 16 bit. The first is you need to keep all of them low enough that the overall recording doesn't clip but you also don;t want to over compensate and use only half the dynamic range available in 16 bit.

Doing this process in 24 bit gives you a large margin of error to play with. No real point to keeping that for the recording people are going to listen to


It is entirely silly. The believe that there is causation between a high fidelity, high dynamic range release and 24/192 is exactly the issue.

Yes, those high dynamic range releases usually get published in 24/192. No, the fact that they get published in 24/192 does not, as far as human hearing goes, add anything to the dynamic range or otherwise the fidelity of the recording.

Since the correlation is so strong, it is of course entirely understandable that people assume causation exists.


I've noticed the same thing with LP's. While LP's are technically inferior to red book CD's, they're often mastered better.


Why is this? I only have 16 bit 44.1kHz "normal" CDs and I notice some terrible mastering jobs, eg. RHCP Stadium Arcadium loses the snare for the first track and is like listening to minutes of distortion yet apparently the vinyl was really well mastered.

Is there a technical reason why the mastering is so different for the two mediums, CD versus vinyl?


When you boost the loudness too much (for your specific turntable to handle), the needle will simply jump out of the groove, not kidding.


This is true, to a certain extent. But it doesn't mean that masters for vinyl always have more dynamics left intact.

More often than not these days, the same compressed master is used for the vinyl. To combat the groove-jumping problem, the overall level is simply dropped.


I've seen it argued that multi-disc setups are where the loudness wars started. When you can switch between CDs quickly, you start comparing how they sound. If the volume knob is left alone, the CD that is mastered louder is going to sound louder, and thus better.

Thus, people started mastering CDs for loudness.

An alternative idea of mine is simpler. Loud music is considered worse than quiet music (quiet music sounds worse, but loud music still does, and also bothers other people). So, when you need to pick a volume setting for your collection, you bias towards setting it lower, so the really loud ones don't become too loud. Thus, the quieter CDs are annoying because they always sound quiet, whilst the load CDs sound about right, because your volume is much more suitable for them.


Just guessing, but maybe the CD version is also used for radio play, and needs to be mastered as loud as possible not to lose the war?

I guess special purpose releases don't usually end up on the radio so they can be mastered for people who actually appreciate music. ;)


Assuming equivalent quality of CD playback equipment and LP playback equipment, LPs provide superior sound quality. LPs are analog, so what reaches your ears is undiluted sound. CDs are digital, so there are gaps in the sound, and there's the analog audio conversion that must happen to record to a digital/CD format, and digital audio conversion back to analog that must happen to hear the sound from a speaker. Both conversions reduce fidelity.

As ever this difference can be impossible to detect if the equipment and environment aren't of a sufficient fidelity/quality.


>digital, so there are gaps in the sound

What kind of gaps? There are no gaps. Sampled signals perfectly represent the original wave up to half the sampling frequency. Analog systems are inevitably limited in their frequency response as well, so, given the same bandwidth, there would be no difference at all.

In the real world, imperfect A/D and D/A conversions are typically still far less destructive than all the mechanical and electromagnetic sources of noise that affect analog systems. You can't consider one but not the other.


Try watching this video (hope I found the right one):

https://www.youtube.com/watch?v=cIQ9IXSUzuM

I think you're right that recording equipment has a long way to go, though; regardless of format I think people can relatively easily distinguish real acoustical instruments from recordings.


Yep - non-shitty masterings are the one legitimate reason for "hi-res" audio. Even if you could downsample it to 16/44 with no perceptible difference.


What are some examples of popular songs that had their loudness boosted?


Almost any reissue of material that has already appeared on CD in an earlier generation. Compare the post-millennial CD remasters of Slowdive's Souvlaki or the Cocteau Twins discography to the original CD issues of those albums in the early 1990s. Generally anytime an album is announced as "newly remastered" these days, the sole significant change is decreasing its dynamic range.


Any examples that I can listen to online?


Sadly most people have never experienced good sound. When you hear it though. But taste can be very different.


I have! I really have! My friend spends all his money on audio. I sadly just don't understand. You are correct. I was going to say that he pays infinite money for what to me is a 10% increase in enjoyment, but perhaps for you and him, your brain is better with sound, and you guys might get a 1000% improvement for your dollar.

Although in any case, nothing justifies the pricetag dished out to audio enthusiasts.


An 100-150 dollar headset of the right type is hardly "infinite money" and I would call it paradigm-changing compared to what most people seem to be using, at least over here.

It's only at higher tiers that the difference might really be imperceptible to many ears.


I always thought pickups (the needle of a turntable) are the best example for that:

A 200$ pickup is certainly better than a 20$ needle, maybe even 10x better The difference between a 200$ pickup and one that goes for 2000$ is miniscule. There certainly is a difference, but it's never as big as between the 20$ and the 200$ model.

That said, there are listeners who believe in the value of a 2000$ pickup and derive a lot of enjoyment from the difference to a lesser model. Who am I to say they're wrong?

Now when it comes to a manufacturer of very expensive cables (for example): Don't make me laugh...


For all its flaws even a Beyerdynamic DT 770 Pro is a very good headset that's head and shoulders above...


As someone who knows nothing about 'good' headsets, should I get the 32, 80 or 250 Ohm


It shouldn't make much too much difference unless you like to listen to them really loud (damaging-your-hearing levels).

Two things come to mind though: voltage/current delivery of your amp, and damping ratio.

The first depends on the characteristics of your amplifier. Some are better at delivering current, some voltage. The lower impedance (32) is better suited to high-current/low-voltage sources, which includes most portable devices, phones, etc. Conversely, the higher impedance (250) is better for high-voltage/low-current sources like tube amps.

The second is about the ratio of the headphone impedance to the amp output impedance. You want a high ratio, so if your source has a large output impedance then the higher impedance headphones will sound better. Good headphone amps sometimes specify the output impedance, or you can measure it.


> Conversely, the higher impedance (250) is better for high-voltage/low-current sources like tube amps.

Headphone outputs of mixers fall into this category as well. Proper audio interfaces and sound cards have no issues at all driving 250 Ω to deafening volumes. Laptops no issues as well (for me).


Yeah the full on audiophile stuff is mostly nonsense as far as I can tell, but my god I do not understand how so very many people put up with those crappy apple earbuds as there primary sound playback. The difference with a proper set of headphones is massive.


Anything is better then the cheap ones you get for free. I guess people want to have "the original" lol. But then when you get better headphones you can hear the background buzz from low end devices ...


I feel really sorry for audiophiles because they have to spend lots of money on expensive equipment to eliminate the perceived shortcomings that the rest of us are not troubled by during our enjoyment of music.


There is plenty of music that you wouldn't even get to enjoy if you didn't invest in some higher-end equipment and made some effort to improve your listening environment, because you wouldn't be able to hear it at all: the opening of Schnittke's Symphony No. 3 in the BIS recording, Nono’s Prometeo, Ligeti’s Cello Concerto as I mentioned above, most of Knaifel’s music issued on ECM or several of that label’s Pärt recordings, etc. That is music of extremely pianissimo dynamic. And no, you can’t just turn the volume up, because if you did that, the loud parts that come later would blow out your ears: you have to have a listening environment and responsive speakers that are capable of representing the whole dynamic range of the recording.


I'll just wait to hear those pieces in concert I guess : )


There are no short cuts to good sound, you can buy the most expensive equipment but still have bad sound due to the acoustics of the room. It's also somewhat random, like placement and materials dampening different frequencies. So I guess it becomes a sort of addiction.


There's a very straightforward practical reason why you might want to keep downloading 24/192. The people who put those together are on average "audiophile" snobs so the rips tend to be perfect, often from high quality sources like SACDs, hdtracks, etc. The source of the recording is usually the best master known for that record (special remastered editions, etc.). If you download mp3/spotify chances are the copy you download is from a worse source. Sure, you don't need 24/192, a well ripped 320kbps mp3 would be the same in practice, but looking for 24/192 in the internet is an easy way of getting better quality music on average in practice.


> "If you download mp3/spotify"

You are misconstruing Monty's argument here. He is very much against mp3...in fact he says he could tell the difference between high bitrate mp3 and 16 bit 44k wav. The real point of the video is that 16 bit 44k wav is beyond sufficient...don't need to go beyond that to 24-bit 192kHz.


> he says he could tell the difference between high bitrate mp3 and 16 bit 44k wav.

In the early days, when all mp3 encoders were pretty bad, I could tell which encoder produced a given file. mp3 encoders today are vastly better. I've not been able to do that party trick in a long time.


I remember the Xing encoder was blazing lay fast, but sounded worse than LAME. This was around 1999, IRRC.


Yep, especially bad on cymbals - always a dead giveaway.


Yes and hihats. But how am I to tell now that I have high frequency tinnitus?? Blasted drummers.


And whenever someone said an 's'. It sounded like 'sshssh'


thank you for clarifying! It seems I slightly misrepresented your views...for that I'm sorry...I was going on memory of what I remember you saying.


As far as I know, people consistently fail to tell the difference in blind A/B tests in listening between (any decent) FLAC and a well ripped mp3 from the same source. I know I can't. I can't link to proper double blind studies but that's the general consensus in the non-BS-audiophile community as far as I know.


My anecdote is that I've done a number of these tests (Foobar has a plugin that allows you to do them on yourself), and I can reliably tell the difference between FLAC and 128 MP3, but can't tell the difference on 256 MP3 and up.


Yes, I've heard people putting the threshold in ~192kbps. Above that it's pretty much impossible these days. I can also tell the difference in 128kpbs, although sometimes it's a bit of nitpicking (can only hear differences in the sounds of hihats and things like that)


I think the type of music you are listening to assists with being able to differentiate. To me (much rock, blues, guitar music) 128kbps MP3 sounds like somebody chewed it first. Cymbals and bizarre snares that sound like the snare is now made of paper indicate the low bitrate.

But 256+ and I certainly cannot tell the difference reliably.


When I was considering Tidal vs. Spotify in the past, I ran across the ABX tests here:

http://abx.digitalfeed.net/list.html

Pretty much found the same thing for myself at 256+.


I had satellite radio for a while but had to cancel it because I couldn't stand the sound of cymbals at whatever bitrate they encode at. Weirdly, when the phone operator asked why I was cancelling and I told him "audio quality" he acted like he had never heard that before.


I'm not an audiophile at all but satellite radio sounds worse to me than anything other than AM/FM radio. Spotify over LTE and Bluetooth sounds night-and-day better.


The same in electronic music since there are many repeating sounds. 128kbps just sounds bad, especially on a proper sound system.


Any serious club system can produce peaks waaaaay hotter than 120db, cleanly. Especially horn loaded systems like most anything on the high end these days. Horn loaded PA speakers can give you perfectly clear peaks at absurdly high amplitudes, cleanly. We don't hear peaks, we hear RMS volumes which are invariably at least 6 db down from the peak if not much more (classic rock routinely has peaks 30 db over the 'body' of the sound as expressed in RMS loudness). This easily raises the grungy one-bit noise floor level of 16 bit CD audio up into the plainly audible, and as for any lossy-encoded format, forget it: an order of magnitude worse.

CD quality is the very least you could want for a serious big club or theater system (much less auditorium). Between peaks and the requirements for deep bass, the peak in a digital audio file is (a)much farther above the body of music than you'd think, and (b) should never be reached, because that's clipping.

People routinely behave as if the theoretical maximum dynamic range of a Red Book CD is relevant to anything. It's incredibly easy to play back music loud enough that the noise floor will get obnoxious and relevant to listening, it's only 96 db down. Any small system in an enclosed live space can push louder than that. Cranking music over headphones will blow way past that and you won't even hear the peaks, but you'll be able to hear how bad the subtleties (or lack of same) on 16 bit red book CD are.

Electronic music, especially live, is totally relevant to high resolution audio. I more than suspect some of the big name acts (for instance, Deadmau5) are using converters to the mains running at not less than 24/96. Certain synthesizer mixes are very revealing of faults in the playback. If the live performance over a modern PA sounds absolutely huge and awesome, but not strained or grainy, then they're not using CD quality. The SPLs are more than enough to make these distinctions obvious.

Anyone can get these SPLs over headphones, trivially, and headphones that'll handle it cleanly are only a few hundred dollars.


Doesn't this mean going past the 85 dBA limit in headphones? That's instant hearing loss.


I only ever recall finding one track where I could hear differences in bass soundstage / positioning on 256+ mp3, vs. uncompressed 44/16, back when I was using stereo subwoofers in a heavily acoustically damped room. Even then, I could only tell when abx'ing.


You can't tell the difference on the vast majority of music. But encoders occasionally mess up, producing an audible difference. In the case of MP3 this often takes the form of a pre-echo. For example encoding cymbals and hapsichords seems to be difficult, though much less of a problem than it used to be. My understanding is that such killer samples exist for good MP3 encoders, even at 320kbit/s, but it has been a couple of years since I last looked into the topic.

(Personally I downloaded some of those killer samples and couldn't tell the difference, but other people reliably tell them apart in an ABX test.)


c't magazine did a test all the way back in 2000 (see https://www.heise.de/ct/artikel/Kreuzverhoertest-287592.html) and even the best test listeners were unable to tell 256kbit mp3 and the original apart.


Was this the Arny Krueger stuff? I did that.

I ABXed 320kbit mp3 from an uncompressed original, I think 9/10 IIRC. It was a recording of castanets, and listening for frequency response differences was useless so I keyed off of 'personality' differences in the sounds and did it that way.

I was also just as horrible at detecting wow and flutter, as I was good at detecting lossy compression 'changes of sonic personality'. Comes from my experience being with analog audio, which is less true of people these days.

The idea that 'the best' listeners cannot tell 256k from even something as limited as 16/44.1 is ridiculous. Any mp3 format… any lossy format… is seriously compromised. Talk to techno/house DJs and producers about how useful it is to perform off mp3s, this is not a hypothetical argument.


> The source of the recording is usually the best master known for that record (special remastered editions, etc.).

Usually what is happening here is you're getting a master that hasn't been compressed to death (see loudness wars). Vinyl is a shit source for 'quality', but most records aren't compressed to death so they can still sound better on a good set of speakers due to the dynamic range.


So, in other words: bite the bandwidth bullet and get the 192 material, but for Pete's sake, do yourself a favor and downsample to 44.1.


If you have the patience, sure. In my desktop I keep everything on the big file source and listen to it that way. I also have dedicated DACs/AMPs that can decode all the way up to DSD. I know it probably doesn't make a difference but whatever, storage is cheap. For my phone I listen to spotify.


Similarly, seeing 24/192 on a rip from an analogue source (e.g. vinyl) usually implies that a high-quality player+amp+ADC was used to digitize the album.


It seems to me what we really need then is some other kind of litmus test that indicates a quality recording, without having to waste a ton of bandwidth and storage to get it. Are there other things to look for that also tend to correlate with quality? Newer formats maybe?



The 192 is really for low latency not for anything you can actually hear. You can't hear the difference past 48khz. An audio interface that can do 192 gets you down to 5ms latency where you can finger drum without being thrown off. But it has nothing to do with audio quality. Like if you use guitar rig you will have the lowest possible latency for realtime computer generated guitar effects at 192.


Where do you get the 5ms from? Why is 192 more likely to have lower latency? I would think the higher bandwidth required by 192 has the possibility to slow your computer down and cause more latency.


I believe the parent is referring to the latency of real-time effects used during performance:

* With analogue equipment the latency between the input and output of the effect is often below 1 ms (unless the effect is supposed to have a delay).

* Standalone digital effect equipment that processes the samples one at a time can also have a latency below 1 ms. (48kHz ≈ 0.02 ms)

* If you use a computer for real-time effects, the samples are not transferred sample by sample from the audio interface, but in a block of many samples. The number of buffers and the size of them can usually be changed by the user. With a buffer of 1024 samples, the oldest sample is already about 21 ms old before the computer can process it. After the buffer is processed it has to be transferred to the audio interface, and that will add another 21 ms. So the minimum latency for any real-time effect in the computer is about 42 ms at 48 kHz if the size of the buffers is 1024 samples. Often it is much worse because the operating system adds more latency. If the equipment can handle a sample rate of 192 kHz, the same latency is about 10 ms. If the computer can handle smaller buffers, the latency can be lowered. With 256 samples per buffer the minimum latency will be about 11 ms at 48 kHz and 3 ms at 192 kHz.


I believe the parent might be referring to the myth about attack of signal falling "between the cracks" of digital sampling. This is debunked in the video linked elsewhere, in the chapter titled "bandlimitation and timing". I might be wrong about the parent poster's meaning, though.


See the excelent answer by zaxomi for another more favorable interpretation of the parent. https://news.ycombinator.com/item?id=15131280


I've said it before, this video demo is one of the very best I've ever seen.

https://xiph.org/video/vid2.shtml

So well prepared, so well presented, so little that could be removed without ruining it.

I aspire to do such good demos but always fall so short.


Thanks for sharing that! I was convinced to watch it from all the other recommendations and can concur (FWIW at this point) that it was well worth it.

The camerawork was excellent and the demonstration integration/trinket-wielding was seamlessly done. I get the impression the people who made this had a case of "let's use the ThinkPad for this, and let's do it this and that way," and they pulled it off so perfectly, exactly how engineers wish would happen.

If you ever need a reference demo for "open-source software can make good looking presentations," this would be on the shortlist, I think. (The credits say it was made using Cinerella, for reference.)


I think one thing that made this happen is that this is a presentation made to exercise an interactive software tool that—although it can also be used for one's own experimentation—was mainly written to be used in the presentation. It's a case of making a custom "musical instrument" crafted to play exactly the song you're about to play on it. A lot like Bret Victor's "puppet controller" app in his presentation Stop Drawing Dead Fish (https://vimeo.com/64895205); or the actually-an-instrument "MIDI fighter" created by Shawn Wasabi for his music (https://www.youtube.com/watch?v=qAeybdD5UoQ, https://www.youtube.com/watch?v=Cj2eUpt3MVE).


> this video demo is one of the very best I've ever seen

I have said the same thing. I was a film major, a video producer, and a tech writer (now a programmer), and I am in awe.



I loved the casual and original expressions of the host, it made the whole demo look much more authentic and convincing. Pure presentations like this is truly an work of art. Hats off to it.


Thank you, this really is a great watch. Particularly the zero point stuff re: stairstepping - I had no idea this was merely a function of display.


Monty Montgomery, the author, hangs around here as id=xiphmont :)


That is fantastic. The presenter manages to pull of "charismatic engineer" really well.


Well, he is a very charismatic engineer...


09:30 to 10:30 explains the effects of quantisation extremely well! Overall, that is one of the clearest, no frills, to the point videos I've ever seen.


I thought you were being a bit hyperbolic until I watched it. Nope, it was stellar. Thanks for posting it for those of us who hadn't seen it, it was fun, and even better, I learned some things about something I thought I already understood.


Apart from it being a great video, I'm sad to say that I'm surprised my phone isn't burning my fingers after watching all the way through it. It's like these people know something about multimedia on the internet.


Noise shaping dither filters are amazing and so is the video, quite educational. I have never been more interested in signal processing than during this video.


Read the article, but didn't look at the video until seeing your comment. Thanks, that was indeed an excellent demo video.


Unfortunately this video doesn't work on iOS. I'll try to remember to watch this next time I'm at a computer.


It's available on YouTube[1] as well, which should work just about everywhere.

[1]: https://www.youtube.com/watch?v=cIQ9IXSUzuM


Thanks!


Thanks for sharing! Well worth the watch, and gave me a bonus dose of ASMR. Commenting needlessly to show my appreciation and to hopefully inspire a future version of myself to watch this again.


This reminds me, class D amplifiers are effectively using only 1-bit sampling (!!), the trick being it uses a high sample rate. And these are some of the best amplifiers around.


That's pretty amazing.. why wouldnt he set up his own sound to eliminate more of that signal generator fan noise though. So distracting.


I did, but I didn't want to wear a lapel mic for the video and deal with the extra hassle. It was just me (no helpers) and a couple cameras. Wired mics suspended overhead kept sound setup from adding to the logistics for each take.

The sound was done similarly to the previous video, which hadn't drawn comments. This time though, a sizable fraction of people said it was distracting.


And yet you insist on being the authority for what's permissible? I think you should stop publicly clamoring for limitations to audio resolution on the grounds that you're not representative of enough listeners. I don't mind one bit that you don't hear these things, or that you represent even a majority of people who don't hear these things… but they don't represent the market for commercial audio. You're doing harm, please stop.


Wow, that was awesome!


Wow, I hadn't seen this until now,but it is definitely the best video I've ever watched to explain the basics of digital sampling. Great find!


Well I disagree. The background music on transitional slides can be removed. The rethorical questions ("let's pretend we do not know about digital audio") are just time fillers. Waving the hands toward an inscrustation looks as silly as the meteo girls "playing with flies in front of a green screen" on all TVs worldwide.

It is the same with most "well-written" articles in newspapers nowadays. We do not even feel it but its just canned food, a lot of jelly and chemical taste enhancer but the amount of real meat inside is near zero.


Thanks, I appreciate it. I have probably forwarded the article to 20 or so people since it first came out. Didn't know about this video.


That is amazingly well presented.

Agreed. Wow!


I felt that the presenter is a very rare example of a engineer who felt that his expression and purpose was attractive to his imagined audience.

Without diversion via the questions embedded in the profession and the perception and self perception of the male geek working and academic world, I so rarely see a presenter reacting to a sense of apparent human warmth in the room, and beyond the lens, which even with the most encouraging assistance behind the lens, is genuinely hard to do. Hard enough that I think it is a classic contribution to the stereotypes of inflated ego newsreel presenters, which Hollywood loves to satirise, in my opinion because Hollywood is mocking, to their narrow and insecure view, a subspecies of acting which when done well, can so massively capture the greater audience than ever some most serious actors may manage to capture.

This is a bit more than a little bit of geek knowhow and applied thought, but I think many geeks by virtue of sheer analysis without a obstruction of a ego, could be handily outperforming the supposedly inherent talent they are "meant" to possess. It may be reaching well into "real serious" acting, very easily. I don't pretend to be a judge of that, but if acting abilities are "I know it when I see it", this is excellent acting indeed.

Edit, is not was, first line. A comma for clarity but later on.


What kind of software generated this comment?


Hah, so cynical. No, this is just how audiophiles describe anything. Even their floorboards.


Could it be cargo cult science (language)? By making it sound like an academic paper, it passes for scientific.


I agree, Monty is a good presenter and a good speaker. You can tell he loves his subject matter, and he's eloquent and understandable in sharing it.


Are you the kind of person who considers 24/384 not being good enough? Chasing 384KHz and beyond by buying $1000 USB cable to the DAC?


If you're striving for clarity in this comment, I suggest rewriting it from scratch :)


Good points in the article, but it has some flaws.

The problem whenever somebody writes about digital audio, is that it is very tempting to hold on to sampling theory (Nyquist limit, etc) and totally discard the problems of implementing an actual Analog-Digital and Digital-Analog chain that works perfectly at 44100Hz sample rate.

I agree with the assesment that 16 bit depth is good enough; even 14 bit is good enough and was used with good results in the past (!). However, the problem is with the sampling rate.

> All signals with content entirely below the Nyquist frequency (half the sampling rate) are captured perfectly and completely by sampling;

Here lies the problem. This is what theory says, however, when using 44KHz sample rate, this means that to capture the audio you need to low-pass at 22KHz. And this is not your gentle (6, 12 or 24db) low-pass filter; no, this needs to be HARD filtering; nothing should pass beyond 22KHz. And this must be on the analog domain, because your signal is analog. To implement such a filter, you need a brickwall analog filter and this is not only expensive, but it also makes mess with the audio, either 'ringing' effects and/or ripple on the frequency response and/or strong phase shifts.

So on Analog-to-digital in 2017, converters should be operating at a higher rate (say, 192KHz), because this makes analog filtering of the signal much easier and without side effects.

Now, for Digital-to-Analog, if your sample rate is 44KHz, you have two alternatives:

a) Analog brickwall filtering, with the problems noted above

or

b) filtering on the digital domain + using oversampling

the article mentions:

>So the math is ideal, but what of real world complications? The most notorious is the band-limiting requirement. Signals with content over the Nyquist frequency must be lowpassed before sampling to avoid aliasing distortion; this analog lowpass is the infamous antialiasing filter. Antialiasing can't be ideal in practice, but modern techniques bring it very close. ...and with that we come to oversampling."

So they are mentioning alternative (b). The problem is that oversampling does not solve all problems. Oversampling implies that the filtering is done on the digital domain and there are several choices of filtering you could use, for example FIR (Finite Impulse Response), IIR (infinite impulse response), etc.

And each one of these choices have side effects...

In short, the problem is that with 44KHz sampling rate, your filter cutoff (22KHz) is too short to your desired bandwidth (20Hz-20KHz). Using a sample rate of 192KHz gives the DAC designer much more leeway for a better conversion. And CONVERSION is the key to good digital sound.

>What actually works to improve the quality of the digital audio to which we're listening?

It is interesting that the author mentions things such as "buying better headphones" (agree), but he never mentions "Getting a better Digital to Analog converter", which is highly important !!

On the other hand, he backs up his claim that "44KHz is enough" with an interesting AES test i was already aware of in the past:

>Empirical evidence from listening tests backs up the assertion that 44.1kHz/16 bit provides highest-possible fidelity playback. There are numerous controlled tests confirming this, but I'll plug a recent paper, Audibility of a CD-Standard A/D/A Loop Inserted into High-Resolution Audio Playback, done by local folks here at the Boston Audio Society.

This is a very interesting paper, and I did have the copy, however the test equipment should be checked. There are systems and better systems. The AES paper cited above had the particularity that the ADC and DAC used were provided by exactly the same machine (a Sony PCM converter), with the same strategy: no oversampling, brickwall analog filters. I can bet (99% sure) that the brickwall filters were identical on the ADC and the DAC on that machine; Murata-brand filters in a package.

The devil, as they say, is in the details.


I don't think there are flaws in the article as you claim. It already explicitly explains that oversampling at the ADC or DAC is an acceptable engineering solution:

> Oversampling is simple and clever. You may recall from my A Digital Media Primer for Geeks that high sampling rates provide a great deal more space between the highest frequency audio we care about (20kHz) and the Nyquist frequency (half the sampling rate). This allows for simpler, smoother, more reliable analog anti-aliasing filters, and thus higher fidelity. This extra space between 20kHz and the Nyquist frequency is essentially just spectral padding for the analog filter.

> That's only half the story. Because digital filters have few of the practical limitations of an analog filter, we can complete the anti-aliasing process with greater efficiency and precision digitally. The very high rate raw digital signal passes through a digital anti-aliasing filter, which has no trouble fitting a transition band into a tight space. After this further digital anti-aliasing, the extra padding samples are simply thrown away. Oversampled playback approximately works in reverse.

> This means we can use low rate 44.1kHz or 48kHz audio with all the fidelity benefits of 192kHz or higher sampling (smooth frequency response, low aliasing) and none of the drawbacks (ultrasonics that cause intermodulation distortion, wasted space). Nearly all of today's analog-to-digital converters (ADCs) and digital-to-analog converters (DACs) oversample at very high rates. Few people realize this is happening because it's completely automatic and hidden.

The main point of the article is to argue that storing or transmitting music above 16-bit, 48 kHz is wasteful and potentially harmful. It still fully condones using higher specs for audio capture, editing, and rendering.


> I don't think there are flaws in the article as you claim. It already explicitly explains that oversampling at the ADC or DAC is an acceptable engineering solution

Of course it is acceptable. Even 14 bit audio at 36KHz with a great DAC would be fairly nice, acceptable.

What the article claims is that 192KHz is useless, of no benefit. And i contend that it is of benefit when you want more than just good or acceptable performance. Not if you have a run of the mill DAC and OK headphones/speakers, but it is if you are a music lover and critical listener.


You've missed the point though - no human has demonstrated the ability to be able to distinguish the fidelity of audio above 44.1kHz sampling in a properly controlled comparison test (double blind). This empirical result is to be expected given the biology of the ear and the science of sampling theory, as the article explains.

It doesn't matter if you're a music lover or critical listener!


> What the article claims is that 192KHz is useless, of no benefit.

The article claims that 192KHz downloads are of no benefit. It's right there in the article's title. It's difficult to not accuse you of willfully misinterpreting his argument.


Modern DACs and ADCs (sigma-delta) get around this entirely by oversampling/upsampling then low pass filtering in the digital domain with a minimal if any analog LPF. This avoids all the issues you describe and with noise shaping the added noise is inaudible (unless you want to claim you can hear -100dBFS which would be pretty amazing)

>So they are mentioning alternative (b). The problem is that oversampling does not solve all problems. Oversampling implies that the filtering is done on the digital domain and there are several choices of filtering you could use, for example FIR (Finite Impulse Response), IIR (infinite impulse response), etc. And each one of these choices have side effects...

Citation needed here, oversampling solves virtually all the problems + with modern DSP the FIR filters can be made extremely good. the induced noise of modern adc/dac's is seriously tiny, and swamped by the recording noise of your audio.


There is no reason the DAC or ADC can't internally do the resampling itself. In fact, most take it to the extreme - they resample to several MHz, but at 1 bit per sample, with noise shaped dithering as explained in the video linked elsewhere. They then have a very simple analog low pass filter - the noise is shaped far into the stop band of the filter.

That is no reason to store or even process your music at higher rates, though.


> There is no reason the DAC or ADC can't internally do the resampling itself.

You are describing alternative (b) i mentioned above: digital filtering plus oversampling. This also isn't without side effects.

Oversampling a 44KHz signal is not the same as having a 192Khz material to start with. Very different.


Yes, and the only difference is the lack of unwanted ultrasonic signals.


No, it is not, please do take a deep look at oversampling in D/A conversion and the artifacts of digital filtering, articles are out there.


In practice, all audio converters are modulation converters of some sort, because it is simply too expensive to achieve the high AC linearity wanted at the required sampling frequencies with other kinds of converters. This already relaxes LPF requirements.

In practice, signal quality issues are usually layout and supply issues, not problems with the converter itself.

In practice, the speakers and ears are the worst parts with the largest non-linearities and a frequency response that looks like a nuke test area compared to the FR of the converter. Of course, in the case of speakers, we have concluded that we want them to have their own sound, because neutral speakers are not possible.


The speakers are often the most influential parts of a hi-fi, but they're not the worst part. The worst part is almost always the room itself. Nothing comes close for screwing up the signal path.

(I tend to avoid discussing the other critical parts — original recording quality and the listener's ears — because these are often immutable constants.)


What about for non-humans?

Consider a dog that lives with a musician who plays, for example, trumpet. The musician plays the trumpet at home to practice, and also records his practices to review.

A trumpet produces significant acoustic energy out to about 100 KHz [1]. When the musician plays live the dog hears a rich musical instrument. When the musician plays back his recordings, half the frequency range that the dog could hear in the live trumpet will be gone. I'd imagine that this makes the recorded trumpet a lot less aesthetically pleasing to the poor dog.

[1] https://www.cco.caltech.edu/~boyk/spectra/spectra.htm


In my experience as a long time trumpet playing dog owner, the dog is generally thrilled when there is less richness in the noise, and generally less trumpet playing period. I say this with experience owning many dogs and having played the trumpet in a range from really badly and inexpertly, to pretty good... they don't care. Trumpets in person are just way too loud and a bit too much in general for most dogs.


And perhaps too loud for humans?

I only say this because most people are happy listening to music on CDs but when in the presence of a live band (eg an orchestra) it is suddenly obvious how incredibly loud it is. My brother is a drummer and I find it incredibly loud; I am a bass player and I don't play loud although he sometimes complains that there's "too much bass". Perhaps we just go deaf in our relative audio spectrum.


The human has trained hard for years to master an instrument that has been perfected over centuries.

Both those huge efforts has gone into controlling the humanly audible part of the sound. Whatever sound is accidentally produced in other frequencies is probably at best aesthetically neutral, but more likely distracting.

Though my guess trumpeting is just noise to dogs either way.


What about transhumans? I'm looking forward to my HiFi Cochleatron 9000(tm) when my hearing starts to go.

Then I'll be mighty glad we made all these high-res recordings.


That makes for some genuinely interesting thought experiments.

What if, this actually becomes possible, but we discover that because we previously couldn't hear these frequencies, our instruments and equipment are HORRIBLY mis-tuned and sound terrible? We may end up having to re-record tons of stuff.

Something something premature optimization. And part of me is glad that the art of hand-making instruments is not yet lost; we might need the originals in the future.

Disclaimer: I say this as a completely naive person when it comes to instruments. The answer to this may be "if it wasn't built to resonate at frequency X, it won't by itself," which would be a good thing.


Most instruments generate harmonics that are integer multiples of the fundamental frequency, and some go a lot higher than human hearing. As you go up the harmonic series, the harmonics get (logarithmically) closer together. Our brains interpret close-together notes as dissonant, so higher harmonics could be kind of obnoxious to hear together. They might be "in-tune", but just too close to be enjoyable. (Imagine an extremely bright harpsichord.)

There's another effect that comes into play, though. There's a minimum pitch separation between simultaneous notes that we expect, and when notes are closer than that, they clash. That separation is usually around a minor third (~300 cents) in most of the human hearing range, but in the bass it's a lot wider, and in the high treble it's smaller. That's why you can play two notes a major second apart (~200 cents) on a piano in the high treble and it sounds okay, but down in the low bass it sounds muddy if they're closer than about a major third or perfect fourth (~400-500 cents). So, if we extrapolate into higher frequency ranges, then it's not unreasonable to expect that we would be able to interpret musical intervals that are a lot closer than 200 cents as consonant.

It's also possible that the minimum note separation thing is just an artifact of how our ears physically work, and that an artificial ear would have no such limitation. Which could open the possibility of enjoying kinds of music that we can't currently imagine as pleasant with our normal ears.


Because of the power decrease as you go up the overtone series, I'd suspect that being able to hear higher frequencies wouldn't cause very much trouble. However, the ability to hear higher fundamental frequencies would surely change harmonic theory! This is assuming that our pitch perception of high fundamental frequencies increased accordingly.


Higher frequencies aren't even in most recordings, so we wouldn't have to re-record them for that reason.

And if they were (such as in 96 kHz hi-res audio), you could just run it through a low-pass filter to strip off the higher frequencies.


Even if the engineer pressed the "192" button on their field recorder or DAW, chances are the instrument isn't going to be doing much interesting over 20kHz and/or the microphone isn't going to be particularly sensitive over 20kHz.


Ah, good point.

And... heh, using a filter to strip out the audio we used all that extra filesize to deliberately store. Haha. :)


My webapp manages photos for photography competitions. People upload 30MB JPEGs that would be visually lossless at a tenth of that file size. And I keep the originals, but actually resize down to 300KB for every function within the software. Haven't had a single complaint about image quality... :)


There's the all too real prospect that Hatsune Miku has been singing secret messages calling for the robot uprising, and none of us humans can hear those frequencies. Winter is coming desu~!

The good news is that you strip out all of this robot propaganda and still hear the exact same music, simply by encoding at a reasonable rate.


what makes you think your consciousness is capable of processing sounds in that range even if the ear is physically capable of it?


How do we know that the trumpet still sounds harmonically pleasing above 20 kHz? It may sound nice to the human ear, and a horrible mess to anyone with ultrasonic hearing capabilities.

Have we asked the dogs?


If we boost the frequencies even in just the range 10-20 kHz, things sound harsh and fizzy to the human ear.

The frequencies there are no longer musical in the sense of being recognized as musical notes.

That is to say, although in a pure arithmetic sense, frequency 148080 is an A, since it is a power of two multiple of 440; five octaves above it, we don't hear it as an A tone.

The frequencies in that ballpark just have to be present to add a sense of "definition" or "crispness" or "air" to the sound.

In fact, this can be faked!

In the OPUS codec, there is something called SBC: spectral band compression. What this refers to is basically a hack whereby the upper harmonics of the signal are completely stripped away, and then re-synthesized on the other end based on a duplicate of the lower harmonics or something like that. The listener just hears the increased definition.


> What about for non-humans?

The thing is, the higher sample rate data doesn't actually have a lot of the higher components after 20 kHz.

What the faster sample rate allows is to use a less aggressive filter.

Instead of a "brick wall" filter that rapidly cuts off after around 20 kHz, one with fewer poles can be used which rolls off more gently.

The higher sample rate ensures that there isn't any aliasing.

192 kHz audio does not reproduce flat up to 90 kHz.

(I'm going to gloss over the microphone used on the trumpet, or the loudspeakers which ultimately reproduce it.)


If the only thing you do with your music is listen, then yes, 24/192 delivers questionable value compared to the more popular 16/44.1 or 16/48 formats.

However, all musicians I know use these high-rez formats internally. The reason for that, when you apply audio effects, especially complex VST ones, these discretization artifacts noticeably decrease the result quality.

Maybe, the musicians who distribute their music in 24/192 format expect their music to be mixed and otherwise processed.


I do not believe it. 24 bits is definitely needed for processing (better yet, use floating-point).

Not 192 kHz; no friggin' way.

Repeated processing through multiple blocks at a given sample rate does not produce cumulative discretization problems in the time domain; it's just not how the math works.


> better yet, use floating-point

Both your inputs (ADC) and outputs (DAC) are fixed-point. Why would you want to use a floating point in between? Technically, 64-bit floating point format would be enough for precision. But that would inflate both bandwidth and CPU requirements for no value. 32-bit floating point ain’t enough. Many people in the industry already use 32-bit integers for these samples.

> Not 192 kHz; no friggin' way.

I think you’re underestimating the complexity of modern musician-targeted VST effects. Take a look: https://www.youtube.com/watch?v=-AGGl5R1vtY I’m not an expert, i.e. I’ve never programmed that kind of software. But I’m positive such effects are overwhelmingly more complex than just multiply+add these sample values. Therefore, extra temporal resolution helps.

BTW, professionals use 24bit/192kHz audio interfaces for decades already. E.g. ESI Juli@ was released in 2004, and that was very affordable device back then.


I habitually edit audio using 32-bit float, not 16-bit integer.

> Why would you want to use a floating point in between?

Because 32-bit float has enough mantissa bits to represent all 24-bit integer fixed-point values exactly, so it is at least as good.

Because 32-bit float is friendly to vectorization/SIMD, whereas 24-bit integer is not.

Because with 32-bit integers, you still have to worry about overflow if you start stacking like 65536 voices on top of each other, whereas 32-bit float will behave more gracefully.

Because 32-bit floating-point audio editing is only double the storage/memory requirements compared to 16-bit integer, but it buys you the ultimate peace of mind against silly numerical precision problems.


float has scale-independent error because it is logarithmic/exponential.

If you quiet the amplitude by some decibels, that is just decrementing the exponent field in the float; the mantissa stays 24 bits wide.

If you quiet the amplitude of integer samples, they lose resolution (bits per sample).

If you divide a float by two, and then multiply by two, you recover the original value without loss, because just the exponent decremented and then incremented again.

(Of course, I mean: in the absence of underflow. But underflow is far away. If the sample value of 1 is represented as 1.0, you have tons of room in either direction.)


Yeah I edit at 32 bit float myself, then I export down to 16 bit 44100 flac for distribution.


> Both your inputs (ADC) and outputs (DAC) are fixed-point. Why would you want to use a floating point in between?

Fixed point arithmetic is non-trivial and not well supported by CPU instruction sets. (Hint: you can't just use integer add/multiply.)

> I think you’re underestimating the complexity of modern musician-targeted VST effects. I’ve never programmed that kind of software. But I’m positive such effects are overwhelmingly more complex than just multiply+add these sample values. Therefore, extra temporal resolution helps.

Indeed, many audio effects require upsampling to work well with common inputs, e.g highly non-linear effects like distortion/saturation or analog filter models. However usually they perform upsampling and downsampling internally (commonly between 2x-4x-8x). While upsampling/downsampling is expensive (especially if you are using multiple of these types of plugins) its not clear if running at a higher sample rate across the board is worth it just to save those steps.


> Therefore, extra temporal resolution helps.

But it's not resolution, right? It's extra frequencies outside the audible range. Is there any natural process that would make those affect the audible components, if I were listening to the music live instead of a recording?


Yes, you can get a beat effect (https://en.wikipedia.org/wiki/Beat_(acoustics)) as the ultrasonic frequencies interfere with the audible ones. It's generally not considered desirable.


Only if the two are mixed in a nonlinear way (e.g. "heterodyned").

If a sonic and ultrasonic frequency are combined together, but a low pass filter doesn't pass the ultrasonic one, the ultrasonic one doesn't exist on the other end.

Hence, there can be no beat.


> Both your inputs (ADC) and outputs (DAC) are fixed-point. Why would you want to use a floating point in between?

The main reason is that it solves clipping in the pipeline.


> Why would you want to use a floating point in between?

Because if you don't you accumulate small errors at each processing step due to rounding. Remember that it is very common for an input to pass through multiple digital filters, EQs, some compressors, a few plugins, then to be grouped and have more plugins applied to the group. You can end up running the sample through hundreds of equations before final output. Small errors at the beginning can be magnified.

Pretty much all pro-level mix engines use 32-bit floating point for all samples internally. This gives you enough precision that there isn't a useful limit to the number of processing steps before accumulated error becomes a problem. By all samples I mean the input comes from a 24-bit ADC and gets converted to 32-bit FP. From that point on all plugins and processes use 32-bit FP. The final output busses convert back to 24-bit and dither to feed the DAC (for higher-end gear the DAC may handle this in hardware).

As for 192 kHz I've never seen or heard a difference. Even 96 kHz seems like overkill. A lot of albums have been recorded at 48 kHz without any problems. As the video explains there is no "missed" audible information if you're sampling at 48 kHz. I know that seems counter-intuitive but the math (and experiments) bear this out.

An inaccurate but intuitive way to think about it is your ear can't register a sound at a given frequency unless it gets enough of the wave which has a certain length in the time domain (by definition). If an impulse is shorter than that then it has a different frequency, again by definition. 1/16th of a 1 kHz wave doesn't actually happen. Even if it did a speaker is a physical moving object and can't respond fast enough to make that happen (speakers can't reproduce square waves either for the same reasons - they'll end up smoothing it out somewhat). Even if it could the air can't transmit 1/16th of a wave - the effect will be a lower-amplutide wave of a different frequency. And again your ear drum can't transmit such an impulse (nor can it transmit a true square wave).

I've done a lot of live audio mixing and a little bit of studio work, including helping a band cut a vinyl album. Fun fact: almost all vinyl is made from CD masters and has been for years. The vinyl acetate (and master) are cut by squashing the crap out of the CD master and applying a lot of EQ to shape the signal (both to prevent the needle from cutting the groove walls too thin), then having the physical medium itself roll off the highs.

The only case where getting a 24-bit/192kHz recording might be worthwhile is if it is pre-mastering. Then it won't be over-compressed and over-EQ'd, but that applies just as well to any master. (For the vinyl we cut I compressed the MP3 version myself from the 24-bit 48 kHz masters so they had the best dynamic range of anything: better than the CD and far better than the Vinyl).


Unless you are altering the time or pitch, which, these days, you are more than you are not.

But no, musicians aren't releasing things at ultra-resolutions because they expect others to reuse their work. The ones that are, are providing multitracks.


> Repeated processing through multiple blocks at a given sample rate does not produce cumulative discretization problems in the time domain; it's just not how the math works.

That isn't entirely true. e.g. It's common for an audio DSP to use fixed point 24bit coefficients for an FIR filter. If you're trying to implement a filter at low frequency then there can be significant error due to coefficient precision, that error is reduced by increasing the sampling rate.


> Not 192 kHz; no friggin' way.

It can be useful to run your signal processing chain at a higher rate because many digital effects are not properly bandlimited internally (and it would be pretty CPU hungry to do so).

But that doesn't mean you need to store data even that you'll process later at 192KHz though it might be easier to do so.


What if I simply want to slow down the signal 2 times (without pitch correction)? Then 44.1 kHz is obviously not enough. Maybe 192 kHz is way overkill though, but I would argue that that's the point. You don't want it to become a bottleneck ever for any effects.


It is only insufficient if there are components beyond 20 kHz that you want to bring down into the audio range as you're slowing down.


Lots of my plugin users are resorting to massive oversampling, even when it's not appropriate, simply because in their experience so many of the plugins they use are substantially better when oversampled. 192K is an extremely low rate for oversampling.


I wonder how you measure the quality difference for higher rates? 384K, 512K and beyond? I hear from audiophiles that there is a very distinct difference, but there is absolutely no basis for it in science.


Not so: this oversampling is mostly about generating distortion products without aliasing, so in the context I mean, the difference is obvious. But, it's a tradeoff- purer behavior under distortion, versus a dead quality that comes from extreme over-processing. I've never seen an audiophile talk about 384K+ sampling rate unless they mean DSD, and with that, it's for getting the known unstable HF behavior up and out of the audible range.

Oversampling in studio recording is mostly about eliminating aliasing in software that's producing distortion, and it's only relevant in that context: I don't think it's nearly so relevant on, say, an EQ.


That could be simply that some of these plugins are either written to specifically work correctly only at 192 kHz; i.e. it's a matter of matching the format they expect.


> Not 192 kHz; no friggin' way.

That's an anthropocentric statement.

I want to leave it at that for now just to see if I get dinged for low effort.


Only one ding? I'm disappointed.

Anyway, check out the frequency range for other animals:

https://en.wikipedia.org/wiki/Hearing_range

Notice that there are a number of species whose range extends well past 20kHz. Even with 192kHz you're still chopping off the upper end of what dolphins and porpoises can hear and produce.

So please convince Apple and friends that you need 200+kHz to truly capture the "warmth" of Neil Young's live albums. Then we'll be able to crowdsource recording all the animals and eventually synthesize the sounds to communicate with them all.

Maybe then we can synthesize a compelling, "We're really sorry, we promise we're going to fix all the things now," for the dolphins and convince them not to leave. :)


for DAW use - this has been mentioned before on this thread, but 192khz has two beneficial effects - lower latency for realtime playing, and it "forces" some synths and effects to oversample, thus reducing aliasing.

All this comes at a high computational and storage cost though.

I personally use 44khz 24bit settings for DAW use.


TFA specifically addresses this and agrees 100% with you on it. Develop in 24/192, but ship in 16/48. Good point about supporting better downstream remixing!


I feel like there is some parallel to open source and GPL type licenses: you can never know which of your customers may wish to remix or further work on your materials, so you should ship the "source" material.


To truly support downstream remixing, musicians would need to distribute the original, separate tracks. Occasionally an independent musician will do this, but it's not at all common AFAIK.


For software, you have a source code, you compile it with a standard-complying compiler, and you’ll more or less reproduce the result.

Music is different. If you have the original multi-track composition, you can’t reproduce the result unless you also have the original DAW software (of the specific version), and all the original VST plugins (again, of the specific version each).

All that software is non-free, and typically it’s quite expensive (esp. VSTi). Some software/plugins are only available on Windows or Mac. There’s little to no compatibility even across different versions of the same software.

Musicians don’t support nor maintain their music. Therefore, DAW software vendors don’t care about open formats, interoperability, or standardization.


All true. One independent musician that I like, Kevin Reeves, made the tracks from his first album available for free download for a while. But he published them as WAV files. So, of course, one could only create one's own mixes, not easily reproduce his. But if he had just posted the files saved by his DAW (or his mix engineer's DAW), that would have been useless to just about everyone. Aside: Though the album was completed in mid-2006, it was recorded with DAW software that was obsolete even then, specifically an old version of Pro Tools for Mac OS 9, because both Kevin and his mix engineer (for that album) are blind, and Pro Tools for OS X was inaccessible to blind users at the time.

BTW, I'm merely a dilettante when it comes to recording and especially mixing.


Traktor stems and remix decks though.


I've done signal and sound-processing courses at the university. I know the Nyquist-Shannon theorem. I know all about samples, and digital sound not been square staircases.

I know and understand how incorrect down-sampling from high frequencies can cause distortion in the form of sub-harmonics in the audioable range.

I know about audible dynamic range and how many decibels of extra range 8-bits are going to give you.

I know all this, but I still have to admit: if there's a hi-res recording (24-bit, >48kHz) available for download/purchase, I'll always go for that instead of the "regular" 16-bit 44.1/48kHz download. I guess escaping your former audiophile self is very, very hard.

Anyone else want to admit their guilty, stupid pleasures? :)


I collect vintage headphones, and especially relish the truly absurd ones I can't believe anyone thought were a saleable item.

I'm up to about 300 different models.


Would love to see a shot of the collection on /r/headphones.


Yeah seriously, that's really cool.... and on that note what headphones does he actually use?!


More is better right?

Deriving pleasure from listening to music has a large subjective component. So if I've paid more for a track and / or I got the most amount of bits I could I'll probably enjoy it more. Also makes for great conversation topics.


raises hand


While the extra quality is lost on listeners, from experience I've found that super-HQ source material can change your results (for the better) when fed through distortion/compression/warming effects processors


Sure, when you're editing music it can be helpful to have super high sample rates and lots of bits to work with. But for listening, it's just a waste of space.


He was talking about listening.


tomc1985 said "fed through distortion/compression/warming effects processors" which is editing, not listening.


Yeah, editing


Lots of pre-amps can do this in real time, and some installation audio systems do it as part of pre-processing (e.g. compression, limiter, etc).


This is specifically addressed in the link under the section titled "When does 24 bit matter?"


Is that because of the HQ audio, or because the processing software introduces less artifacts at those sample rates/bit depths? I.e. could you get the same results by up-sampling CD quality audio?


For distortion and compression, the only possible difference I can think of is aliasing, and yeah you could just oversample your input, then downsample again on the way out. This is really really common in most professional plugins, often with an adjustable amount to balance quality and CPU usage.

I also have some gear that aliases at a volume you can't hear, but when you plug it into an analog distortion pedal, the aliasing in the high frequencies becomes apparent. This would be avoided if it had a higher sample rate so the aliasing was far out of the audible range.

For other sorts of effects like spectral processing, pitch shifting, the extra detail above 22khz really does make a difference, especially if pitching down.


More bits means:

• Less quantisation noise, so your noise floor is a bit lower and therefore, when you're mixing stuff together, you accumulate less noise

• More numerical room to play with, you can scale or shift the volume up and down more without clipping and with less loss of accuracy

With 16-bit CD audio, you can't just convert to 24-bit and suddenly have less noise. You might get more room, though.

As for higher sampling rates (more kHz, if you will), I think Monty mentioned some benefit regarding less sharp aliasing filters (can have a larger transition band from 20–48kHz, say, rather than from 20—22.1kHz), but it's not something I understand the benefit of well.


filters with a sharper cutoff will 'ring' more, smearing sharp changes in the signal out over time. It's a pretty subtle effect when it's all happening above 20 kHz though


I imagine it's because lossy codecs are tuned for human perceptual limitations, and when you process audio, it can "pull" areas of the sound that are otherwise hidden from your perception into perceptual ranges, analogous to how fiddling with the brightness/contrast of a highly compressed JPEG image can accentuate the artifacts.


We are not really talking about lossy codecs here, though, are we?


Ah, my mistake. But perhaps you could consider 24/192 -> 16/48 a lossy codec in this context; the same argument does apply, for the same reasons.


That way it does make sense, editing might require higher fidelity than we can perceive in the final project to avoid artifacts.


Actually, the article is not talking about it, but the topic very much is lossy codecs.

Specifically, if 24/192 AAC is worth it compared to 16/44.1 AAC (and the answer to that is yes, although the answer to 24/192 WAV is no)


But what about lossless 16/44.1KHz audio? It's not a "compressed" version of 24/192Khz. Chances are that your 192KHz music will be low pass filtered anyway, 192KHz is a trick that sound chipset vendors added so that integrators wouldn't have to make high quality analog filters for their ADCs. There's likely nothing above 22.05KHz except shaped noise, production artifacts, and very, very quiet ultrasound noises which have nothing to do with the music.


There's no such thing as lossless 16/44.1 because it's practically impossible to make an artefact-free antialiasing/reconstruction filter pair clocked at 44.1k.

The point of higher sample rates isn't to capture bat squeaks, but to relax the engineering constraints on pre-post filtering.

Nyquist is fine in theory, and if you've never actually tried to implement a clean filter you'll likely watch the xiph video and think "Well that makes sense."

If actually know something about practical DSP and the constraints and challenges of real filter design you're not going to be quite so easily impressed.

Likewise with higher bit depths. "Common sense" suggests that no one should be able to hear a noise signal at -90dB.

Common sense is dead wrong, because the effects of a single bit of dither are absolutely audible.

And if you can hear the effects of noise added at -90dB, you can certainly hear the effects of quantisation noise artefacts on reverb tails and long decaying notes at -30 to -40dB, added by recording at 16 bits instead of 24 bits.

Whether or not that level of detail is present in a typical pop or classical recording is a different issue. Realistically most music is heavily compressed and limited, so the answer is usually "no."

And not all sources have 24-bits of detail [1]. (Recordings made on the typical digital multitrack machines used in the 80s and 90s certainly don't.)

That doesn't mean that a clean unprocessed recording of music with a wide dynamic range made on no-compromise equipment won't show the difference clearly.

Speaking from experience, it certainly does.

[1] Technically no sources have 24-bits of detail. The best you'll get from a real world converter is around 22-bits.


> you'll likely watch the xiph video and think "Well that makes sense."

What video? This thread is about an article.

> The point of higher sample rates isn't to capture bat squeaks, but to relax the engineering constraints on pre-post filtering.

I just said that, I think.


Because modern production can involve hundreds or even thousands of digital effects. So 24-bit, or even 32-bit float just because you can, is standard.

As Monty demonstrates, it's a fraudulent waste to try to sell the result as a product to the end listener.


I think it's because processes that affect things at the individual sample level (which distortion/compression often do) have a lot more detail to work with. The author mentions this a bit in the article


> Its playback fidelity is slightly inferior to 16/44.1 or 16/48, and it takes up 6 times the space.

The article is highly technical. Does anyone have a way to describe this phenomenon intuitively?


24/192 can capture audio well beyond what we can hear. These sounds can, though, lead to distortion that is within our hearing range, depending on the equipment used.

Therefore, rather than just being useless extra data, it can be actively harmful to the listening experience.


It's not that much different to think of than 480p vs 4K/2160p -- dramatically higher resolution, increasing the width and length of the 'grid' that sound is recorded into -- except that in the case of audio fidelity, quality after a certain point causes problems with playback systems, resulting in audible artifacts and distortion.


That analogy works if you modify it:

- 480p vs 2160p is measuring the resolution of your cell phone propped up on a pillow at the other end of your living room

- experimental evidence shows that your eyesight is not good enough to pick up on the increased resolution, you've maxed out your sensory perception

- Your phone stutters trying to stream at 4k so the playback might actually be worse.


The analogy with images is demonstrated in this screenshot [1], which is referenced in a StackOverflow answer on the topic of image scaling in web browsers [2].

When the high resolution image is downscaled poorly, some high (spatial) frequencies are aliased down into lower (spatial) frequencies, manifesting as blocky/jagged lines. The images are 2D data in the spatial domain, while the audio is 1D data in the temporal domain, but the aliasing of high frequency signal to lower frequency artifact is similar.

Viewing a high resolution image on a panel that has enough pixels to display it without scaling is analogous to listening to 192 kHz audio on a fantastic speaker system that can reproduce those high frequencies accurately, instead of causing distortion by aliasing them to lower frequencies. On the other side, viewing a high resolution image which has been downscaled poorly is analogous to listening to that 192 kHz audio on a realistic speaker system that cannot reproduce high frequencies, which results in those signals aliasing down into the audible range.

And as you say, there is a point where, for the viewer/listener's sake, it doesn't make sense to push for higher frequencies because even if you can build a panel/speaker that will faithfully reproduce those frequencies without aliasing, the eye/ear will not be able to perceive the additional detail.

[1] http://www.maxrev.de/files/2012/08/screenshot_interpolation_...

[2] https://stackoverflow.com/a/11987735


Technically this is not aliasing; rather, the large and varied non-linearities of a speaker can act like a frequency mixer, which is why you'll get a 3 kHz sound (the difference) when playing, say, 20 and 23 kHz.


All good points! Indeed all that extra data makes the decode stage that much more difficult. Same happens with audio too, but high-res audio isn't nearly as bulky as video or nearly as resource-intensive to decode


very good analogy.


Increasing the bits doesn't increase the accuracy of the audio in terms of frequencies. It's 100% accurate regardless of bits. That's why trying to compare it to video resolution doesn't work. Bit depth is more akin to the screen brightness. If you turn down the brightness to 1 step from black, it's impossible to distinguish features in the image.


But weren't those "outside our hearing range sounds that can distort our hearing range" present in real life when the recording was made? So I would have heard the distortions had I been there in real life?


The distortions are not present in the source. They occur because the speaker is a physical device subject to physical limitations and can't perfectly reproduce the waveform. In the process of trying to reproduce the ultrasonic signal, the speaker can produce audible distortions. If the ultrasonic signal is not there to begin with, then it can't possibly cause distortion.


The problem is that your speakers were not designed to reproduce sounds at those frequencies, so instead of faithfully reproducing those sounds (that you can't hear anyway) they may emit unintended artifacts/distortion (that you can hear).


The distortions he's talking about would be introduced during playback by DAC, signal processing, speakers, or amplification.


I doubt that is the case. Most systems which I have used actually sound better at the higher sampling rates. This is not because the sampling rates cause anything to sound better, but because the hardware works better at those rates. In a blind test, you wouldn't be able to decide which one you liked better, only that there is a difference. At least on the machines I have, there are some clearly audible differences.


> you wouldn't be able to decide which one you liked better, only that there is a difference.

Yet people have been reliably _unable_ to do this.

The gold standard in listening tests for this is an ABX where you are simply trying to show that you can discern a difference.

When properly setup and calibrated people are unable to show that they can distinguish 48kHz and 192kHz.

Moreover, by the numbers, audio hardware tends to work less well at higher rates if they're different, because running them at higher rates makes their own internal noise shaping less powerful. (Though for anything done well the distinction should still be inaudible).

Finally, if you do have cruddy converters that sound significantly different at different rates nothing stops you from using a transparent software resampler (SSRC is well respected for good reason) to put the audio at whatever rate you want.. until you get better hardware. :)


"In a blind test, you wouldn't be able to decide which one you liked better, only that there is a difference"

That would have to be the noise. The math doesn't lie...


I'm not talking about noise or other artifacts from the conversion process. I am talking about differences in bass and treble balance that come from different engineering in the converters at different frequencies. In some converters, there are actually completely different circuits that engage for different frequencies. The converters that I have experienced do this, and they all have a noticeable effect on what we hear.


The core is here: https://xiph.org/~xiphmont/demo/neil-young.html#toc_1ch

When you play something stored at 44 khz, there are no ultrasonic sounds recorded. At 192 khz instead, there are, and the speakers may push some of the ultrasonic sounds down to the audible spectrum, causing distortion that the human ear can hear.


It’s mostly because speakers aren’t the best at reproducing ultrasonic frequencies that are captured by the high resolution audio files. You might have components of the speaker that have resonances that manifest the ultrasonic frequencies at a lower frequency.

Imagine the normal operation of a speaker as a swing, except you are pushing and pulling the swing all throughout the cycle as it goes up and down. Now, you can technically move the swing at a variety of frequencies if you’re holding onto it the whole time. However imagine as you push it back and forth (low frequencies), you also vigorously shake the swing at the same time (high frequencies). This would probably result in the chains rattling, similar to the unwanted distortions in the speakers caused by ultrasonic frequencies.


Basically, the sampling theorem says you can reconstruct the exact waveform with a certain number of samples. Adding a bunch more samples bulks up the file, but you didn't need them to restore the exact waveform. However, the unnecessary samples are in the file between you and the next sample you do need. At high enough levels of waste this creates an I/O bottleneck that hampers performance.

Another way to look at is that digital audio is not like digital imaging. There aren't pixels. Increasing the data rate does not continue to make the waveform more and more detailed to human auditory perception in the way that raising the pixel density does for human visual perception.

To describe it intuitively, forget your intuition that audio is like visual and start from "there is no metaphor between these two things."


Digital imaging definitely has sampling concerns too:

https://en.wikipedia.org/wiki/Aliasing

Even analog imaging has concerns that are better described in the frequency domain than the spatial domain:

https://en.wikipedia.org/wiki/Airy_disk

https://en.wikipedia.org/wiki/Optical_transfer_function


Interesting!


Images and audio are pretty similar, and sampling theory works for both - a low-pass filter on audio and a blur filter on images is the same.

The difference is audio is "actually" bandlimited and frequency-based, but images are "actually" spatial. When you try using frequency math on images you get artifacts called ringing, which are easy to see with JPEGs or sharp scalers like Lanczos.

Of course audio isn't really frequency-based either, or else it would just be one sound repeated forever. So there's still ringing artifacts (called "pre-echo") and cymbals are the first to go in an MP3.


> The difference is audio is "actually" bandlimited and frequency-based, but images are "actually" spatial.

I.e. audio is sampled along one dimension, while images are sampled along two dimensions. Note that frequency-domain considerations play a crucial role in all optical design, including imaging sensors.


It's a little more than that. A sine wave originates in the frequency domain but actually sounds like something, but images originating in the frequency domain are just noise and not pictures of anything.


> Increasing the data rate does not continue to make the waveform more and more detailed to human auditory perception in the way that raising the pixel density does for human visual perception.

Early low data rate codecs - such as the one used for GSM mobiles - are obviously inferior, but still functional. I think a better analogy is that an iPhone 7 has a 1 megapixel screen, so there's no difference between a 1 megapixel image and a 5 megapixel image, except one is much larger. Of course visually you can zoom in (or move closer in real life), but audibly you can't.


I get what you're saying, that if you remove the ability to zoom you can equalize things, but without that twist I think this preserves the essential problem with the analogy that Monty works so hard to correct. If your data rate is too low, you don't have enough samples to recreate the waveform. So you create an approximate. But there is a finite number of samples that you need to recreate all of the information in the waveform, so once you have that, there isn't any additional information there for you to obtain if you continue to increase the sample rate.


To be fair, phone audio signals have until recently been bandlimited to just a few KHz in centered around 3KHz.


> Another way to look at is that digital audio is not like digital imaging. There aren't pixels.

For the mathematically inclined, this would probably be a good time to repeat: pixels are not little squares[1].

[1] http://alvyray.com/Memos/CG/Microsoft/6_pixel.pdf


"Diminishing returns"?


In the amplitude domain, you definitely need more bits for that type of processing, and many other types.

Digital distortion basically simulates high gain with soft clipping, which basically takes a narrow slice of the amplitude domain and magnifies it. The extra resolution has to be there for that not to turn into ugly sounding shit with artifacts.


Well, yeah. For properly mixed audio, you don't need anything beyond 44-khz/16-bit. If you need to do extensive processing, or you're working with raw audio and want the extra margin to be able to recover a sound when it may be too quiet (or too loud) or otherwise compromised then more samples and greater bit-depth can be highly useful.


It's also great if you you're doing a bunch of time/pitch mangling.


I've heard (and it passes the smell test) that even if your ears can't resonate at frequencies higher than human hearing, the space you are in can, and your ears are sensitive to the differences in arrival time that are smaller than the pitches we can't hear. So at least in high sample rate music the imaging could potentially be cleaner. Assuming super great everything else.


That's one of those things that sounds true without putting thought into it but is ultimately false. A 44.1kHz sampled sounds can encode phases with offsets smaller than 22µs (1/44.1k). How much smaller? I'm not exactly sure, but I believe it's dependent on your sample size. My guess is 1/2^16 (assuming sample resolution of 16 bits), but I might be off by a factor of 2 (or indeed 100% wrong) there. Caveat emptor. If I'm right then that's 346 pico-seconds, which I'm betting is smaller than your brain can differentiate.


The formula, as far as I could find, is:

1 / (2 * pi * bandwidth * 2 ^ bitdepth)

So for full scale 44/16 signal it is about 0.1 nanosecond.

More here [1]. There are also some sample files for ABX tests there.

[1] - https://hydrogenaud.io/index.php/topic,108987.msg896449.html...


What you're sort of describing - having more of the higher frequencies resonate together - can actually cause intermodulation distortion (which itself can be in the audible range), which can make it sound worse...

I haven't had time to check the linked article but if it's the one I'm thinking of I'm pretty sure it goes into this.

In the end though, double blind testing (again, I think mentioned in the article) shows that there's a threshold (a bit below 48kHz, probably why 44.1kHz was chosen for CD audio) after which people cant distinguish between higher sample rates at any more accuracy than randomly guessing.


I think it's the intermodalation distortion or as I call it "the room sound" that your non-conscious mind perceives. The best example of this I've heard is when listening to identical masters on SACD/DSD vs PCM audio. Notably different sound when reproduced via loudspeakers.


Arrival time differences are essentially phase shifts and assuming infinite quantization you can encode phase shifts with infinite accuracy on a band limited sampled system.

So no.


Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: