Hacker News new | comments | show | ask | jobs | submit login
24/192 Music Downloads Are Very Silly Indeed (2012) (xiph.org)
647 points by Ivoah 141 days ago | hide | past | web | favorite | 428 comments



Not entirely silly. Yes, the purported benefits of this high-fidelity audio are imaginary or even include undesirable traits. However, when most "remasters" of pop music today involve the dynamics being boosted to "loudness wars" standards for a target audience of people listening to the music through earbuds in the street, the 24/192 downloads or SACD releases are often the only way to hear the album with real dynamic range.

I’m not sure how big that market is these days, though. I recently decided to move from a block in the city to a house in the surrounding countryside purely to get a quieter listening environment and let me really enjoy the high-dynamic-range recordings I have – I listen to a lot of classical music, especially the avant-garde with its even greater range, e.g. the Ligeti cello concerto starting from pppppp. Yet even among my friends who are really obsessed with such music, seeking a better listening environment seemed an extreme measure to take.

So, people who a generation ago would have invested in higher-end equipment (not audiophile snake oil, just better speakers) and who would have sought silence are now giving in to listening to music on their phones or cheap computer speakers. It’s a big shift.


The loudness war seems to have been a temporary problem, prevalent mostly because music was primarily listened as downloads of individual files or small collections of files.

Music streaming services have since become a very important music listening medium. Spotify, Google Music, Apple Music and others do normalization of these loudness levels. This neutralizes the loudness wars as it makes loudness wars treatment of music useless, at least when listened on streaming services [1]. Music mastered in the loudness war will still have problems with dynamic range but the perverse motivations causing the war are simply not as relevant for new music.

I agree the loudness war was a huge problem a few years ago, but changing trends in the ways music is being listened to is solving it. The way you present the loudness war problem is therefore somewhat out of touch.

[1]: https://motherboard.vice.com/en_us/article/ywgeek/why-spotif...


The loudness war is a matter of degree. Sure, steps have been taken so that we maybe won't see another Death Magnetic where the compression has resulted in outright ear-bleeding distortion. But when most music today is listened to from portable devices in loud environments, it’s hard to believe that we are ever again going back to the level of dynamic range common before the loudness war. As I said, even some classical music (and jazz) labels are now issuing their music with considerably limited dynamic range, and a dynamic range as ample as it traditionally was is available only to those who buy the SACD (and listen to the SACD layer of a hybrid SACD, not the CD layer) or the high-resolution download.


Add to this the 'Bass arms race' happening in the past 15 years.

It is especially noticeable if one listens to "modern" vs older CDs vs the radio. I suppose Producers now feel compelled to exercise subwoofers and thump the audience, or compensate for crappy earbuds in the target audience, or maybe that's just what people want because even radio commercials are complicit -- it's difficult to listen to someone talk with "thump/thump/thump" drumbeats in the background. Often, I simply turn it off.

In my car, I have the bass response dialed back by about 50%, and even dip the mid-range by about 15% to get what sounds to my ears like a flat response.

I have mild hearing loss from the 250 to 1k range, so my ears are already attenuating the signal -- I can only imagine how "bass heavy" it must sound to someone with normal hearing!


The only thing that any good producer feels compelled to, is to create the best possible record.

There is nothing inherently good or bad in music production, no timeless rules as to how much treble or bass, compression, distortion, reverb or anything else you need. It has all been done to the extreme and what is deemed good is subject to constant change.


Sometimes car head units have an undocumented bass and treble boost, so flat per the settings on the unit is far from flat. The kinds of compensations you are making get the unit closer to flat when it is a unit of configured with undocumented tuning. You could measure your head unit to find out.


The loudness war was great. I totally appreciate the extremes it reached and the new music production techniques that were developed in response. Some utterly squashed, distorted albums like The Prodigy's Always Outnumbered, Never Outgunned are masterpieces. Same goes for some 10-15 year old Mr. Oizo, SebastiAn albums. Over-compression can be very aesthetic. Dynamic range is overrated...


Some people like it (obviously, the "loudness war" would've never occurred if it didn't work commercially), some people don't.

My personal example of a song where it doesn't work is Johnny Cash's "Hurt". Around 3 minutes into the song, Johnny Cash's vocal noticeably distorts. From my perspective, the distortion is absolutely a loudness war phenomenon: if you look at the waveform in an editor, the song starts off pretty "hot" given that there's a very noticeable crescendos at the end. At the end of the song, where the music is pretty much maxing out, there is no bandwidth left. There is no other option to stand out but to push Johnny's voice to distortion.

I've seen people in message boards say the distortion in the vocals adds "intensity". Personally, I'd love to see an un-Loudness War version where Johnny Cash gets several dB more to cleanly sing over the music. Where dynamics, not distortion, is what is driving the intensity. For my tastes, it would be much preferred.


50 cent get rich or die trying is an excellent example of a poorly mastered album produced around the same time as well, vocals distorted throughout almost (if not) the entire album. I can't even imagine what the engineers were thinking when recording these, similar to the overuse of autotune in country these days - see george straight - cowboy walks away


That's a pretty homogeneous sample. I can't imagine Bergtatt or Beethoven's 6th being as enjoyable on a poorly mastered recording.


I agree that loud albums can still be amazing. Multiple albums on Wikipedia's 'loud' albums list [1] belong to my favorite albums of all time.

[1]: https://en.wikipedia.org/wiki/Loudness_war#Examples_of_.22lo... (not including the 'remasters')


I have to say that (IMHO) Prodigy went over the top with that on The Day Is My Enemy. Thing is you can hear it on the tails of the crash cymbals, you can hear them get ducked in a "stuttery" manner when the compression from the drums hits. Actually, great production technique should be able to work around that and make it both loud without messing up the cymbal tails (or maybe just truncate them, just not have them stutter).

I found the album a bit tiring to listen to because of the continuous loudness. No particular parts really stood out to me, because everything was just sort of loud and it just seemed to go in one ear, out the other. I could pay real close attention, one can always listen better :) But again it is a bit tiring and you can't do much else.

Then again, I just really prefer their first 2-3 albums, which have quite a different sound altogether.

And I'm curious which Oizo albums you're referring to? I love his stuff, and yes some of his tracks are quite loud, (but not all the time, the whole track), but they never quite struck me as a typical "loudness war" type of loudness. Unless I'm thinking of the wrong tracks here (No Day Massacre? Last Night a DJ Killed My Dog?), he seems to like to hit a well sound-designed overdriven bass buzz, with not too much else playing at the same time (and where it does, close attention to envelopes, cutting off hits, ms transient timings) if you do that right and just normalize to max amplitude, you're 95% of the way there (at which point my experience is that compression on top of that usually fucks up that careful detail work, but maybe I need more practice or a different plugin). Possibly I'm thinking of the wrong tracks here, at least you gave me an reason to re-listen his stuff with a different ear/attention (loudness sound-design), which is always interesting :)


It's a genre thing. Lotta dance music these days sounds wrong if it doesn't have the particular distortion of overdoing the hard limiter. Like rock'n'roll without distortion on the guitar.


Still very silly IMAO. Very, very few recordings are done in 24/192 because of the many implications this has on your entire studio setup.

You'll need an exceptional good clock to start with, and all other equipment needs to align to that clock. Then all plugins/processing you use needs to be in the same 24/192 domain, otherwise your signal is reduced to the limit of that plugin/processing and all previous efforts are lost.

Most music producers use samples, most are 16/44, so what's the point to try to get that to 24/192, filling the signal with zero's..

If a piece of music is in very rare occasion truly 24/192 then the listener who downloaded the track still needs a exceptional good clock (that are both expensive and hard to find) to playback without signal reduction.

IMAO 24/192 is just a marketing thing for audiophiles that don't really understand the implications. 24/96 should be a reasonable limit for now, although personally I think 24/48 is enough for very high quality audio.


You ought to have read the rest of this thread before leaving this comment. I’d be very happy with 24/96 or 24/48 if I could get recordings at that bitrate that aren't given a loudness wars treatment. Since I often can't, I have to go for 24/192 or SACD even if all that extra room is completely superfluous, just because that format was decently mastered.

> Most music producers use samples...

Most people interested in better-quality sound in this particular context aren't listening to contemporary electronic music with samples. 24/192 or SACD is so desirable for reissues of older recordings in pop or jazz genres where those formats were mastered with higher dynamic range, while the available CD versions or lower-bitrate downloads were mastered with loudness-wars compression. The format is also attractive to classical music listeners, because SACD gives you multichannel audio; and some classical labels are now giving loudness-wars treatment to the non-SACD or non-24/192 formats of a particular new release.


"some classical labels are now giving loudness-wars treatment"

That's depressing.


Haha, at least classical IS usually recorded well, being an audiophile metal-head these days is depressing. If I use my normal setup almost everything clips badly.


This depends on whether you are in the studio or are just playing back things.

In the studio, I would say that 24 bit at least should be the norm for recording purposes.

24 bit recording gives you very noticeable increased headroom (about 20dB). This gives you quite a bit more flexibility recording lower levels without concerning yourself about the noise floor. The difference isn't huge for most prosumer setups in practice, but given that the processing power and storage power of computers makes recording in 24 bit trivial to do, there really is no reason not to record 24 bit these days IMHO.

Sample rate also comes into play, mainly if you have older plugins that do not oversample. Some of the mathematical calculations involved, particularly if they are quickly responding to audio changes (eg limiting / compression, distortion), or are using "naive" aliasing-prone algorithms (eg naive sawtooth wave vs. something like BLEP / PolyBLEP etc.), can introduce frequencies beyond the Nyquist that may translate into aliasing. These days, I would say most plugins do oversample internally or at the very least give you the option to do so. There's also a VST wrapper to over-sample older plugins as well (http://www.experimentalscene.com/software/antialias/). So I do not think recording over 44.1kHz is very necessary these days. I don't discount opinions from people that recording at 192kHz "sounds better", though, given the possibility that they are using plugins that are prone to aliasing at 44.1kHz rates.

I personally do not see any benefit of 16/44.1kHz for playback most recordings. Maybe 24 bit would be useful for heavily dynamic music (one of the few categories where you generally find this is orchestral music), but I'm thinking even for here the 96dB range of 16 bit audio should be enough for most cases.


> This gives you quite a bit more flexibility recording lower levels without concerning yourself about the noise floor.

To be fair, that only applies to the digital part of your signal chain. The analog portion is going to have nowhere near 24 bits of room above the noise floor.

The article is pretty clear that 24/192 can be reasonable for production-- it's just not reasonable for playback.


First of, I agree that 24/192 is not very useful for most circumstances (also for the dynamics thing, you still need a 24/192 master done without all the compressions).

But your arguments aren't quite right, IMO. If you have a 16/44 sample, and you don't play it at full volume, you get some use out of those extra bits. Especially if you have a volume envelope.

Also many modern samples are actually saved as 24 (or 32 bit even). Especially if they're my own creation from noodling around with softsynths, but they're shared like that as well, obviously.

Then, if you apply a plugin or effect that supports 192/24 output, on a 16/44 sample, you still get useful information in those additional bits, even if the sample did not. Think of the very quiet end of a reverb tail, for instance.

But that's for producers. It's always good to have your working stuff in high quality, you never know how far you'll end up amplifying those low order bits.

So I can see the use for 24 bit audio (in certain contexts), but I'm really not so sure at all what the 192kHz is good for. Since it's waaaayyy above the human hearing range, all I can think of is dithering. You can hide a lot of your dithering noise in the ultrasonic range (which almost seems like free lunch IMHO) and then ... you obtained even more (virtual) bits of fidelity! Which you didn't really need cause you were using 24 bit audio in the first place.

I agree it's mostly a marketing gimmick, otherwise.


Studios work in 32bit-Float all the time...


    the 24/192 downloads or SACD releases are often the only way to hear the album with real dynamic range.
Yeah, this is often true. Some of Green Day's (a band I've never previously been into) albums were released in 24/192 or 24/96 a few years back and they actually sounded really great, with real dynamic range.

Also true? The completely absurd fact that sometimes a vinyl rip of an album will actually have the highest dynamic range, even though vinyl has a much smaller dynamic range than 16/44 audio. Bands often use "loudness wars" mastering for the digital release and then proper mastering for the vinyl release.


> Yeah, this is often true. Some of Green Day's (a band I've never previously been into) albums were released in 24/192 or 24/96 a few years back and they actually sounded really great, with real dynamic range.

Back in the day you'd occasionally see remastered "gold" discs released. The advertising made a big deal about the disc material. Those probably sounded different too (they managed to sell them at a great premium) but with those, they sounded better because of the newer remastering, not the disc technology.

It's certainly possible this is the case with some of those releases remastered for SACD as well. The label probably didn't give Green Day a huge amount of money for production when they made Dookie, for example, but it eventually sold 20 million copies and additional production made sense. If it sounds better to the listener it's a real benefit, too, but it is quite likely not down to the playback technology.

Two oddball things I've noticed about remasters in the last couple of years: there's some agreement out there that the newest remaster of Hendrix's records is not the best. And King Crimson, who has an audiophile following and decorates their catalog with remasters with alarming regularity, removed a little dynamic range (an oversimplification) from the newest remaster of Larks Tongues in Aspic when remastering the CD and mastering for DVD/A because people very understandably complained about the (technically very good) previous version being too quiet. Audiophiles say they want dynamic range, but...


Yeah. And you know, as a casual audiophile, that's my big frustration with this hobby. There's so much bullshit marketing (and audiophiles who believe it) that it really gives the whole hobby a bad name. (On the bright side, there are a lot of objective audiophiles and with just a little bit of knowledge you can get fantastic sound for very little money...)

    The label probably didn't give Green Day a huge amount of
    money for production when they made Dookie, for example,
    but it eventually sold 20 million copies and additional 
    production made sense.
I don't know about Dookie but their latter albums were recorded with full dynamic range and then squashed down into loudness-wars style mush.


Yeah it's still the artist's choice how much dynamic range is needed to express their music. More dynamic range just for the sake of it, seems just as bad as sausage flattening to get as loud sound as possible.


Vinyl DR values are not accurate, vinyl made from the same digital master as CD usually will have DR value higher by 3 or more points.


> Not entirely silly. Yes, the purported benefits of this high-fidelity audio are imaginary or even include undesirable traits. However, when most "remasters" of pop music today involve the dynamics being boosted to "loudness wars" standards for a target audience of people listening to the music through earbuds in the street, the 24/192 downloads or SACD releases are often the only way to hear the album with real dynamic range.

No. It's still entirely silly. The reason? The dynamics are unrecoverable once squashed by a limiter or compressor during the remastering process. The fidelity of the delivery medium is moot after that happens.


This. Once the "master" is treated with the maximizing limiters that are used in the loudness war the files are just rendered into their separate file formats.

If the SA cd promised that they reloaded the source tape/protools/whatever DAW was used and remixed/ remastered the songs to actually have dynamic range then I would be interested. As far as I am aware this isn't happening, and is implausible for any record of considered a classic


Loudness-wars compression is a product of the digital recording era, and quite a few years into it, too. A lot of music that people still want to listen to today predates all that.

Many SACD reissues do go back to the source tape. This is a frequent cause of complaint with SACD reissues of classic jazz recordings from the 1960s: sometimes you get better sound in terms of dynamic range than any previous CD issue of that recording, but in the meantime the source tape may have deteriorated in parts.

Even with recordings from the “loudness wars”, there is sometimes room for dynamic range improvement when remastering. A good example is Rush’s album Vapor Trails. This was an infamously botched recording upon its original release, on a scale with Death Magnetic. Because loudness-wars treatment plagued the original tracks before mixing, the damage could never entirely be repaired. However, the additional process of compression applied to the source during transfer to CD could be reversed, so the album was eventually reissued as Vapor Trails Remixed, and while still flawed, that reissue has a lot more room to breathe than the original CD release.


Why implausible?


The scope of work involved is too great to remix and remaster all the records. I also know that most indie records are not kept in their multitrack form as meticulously as big label acts.


How does 24/192 help with dynamic range?


It has nothing to do with the 24/192 values themselves, but rather the treatment given to the recording during the mastering. Because 24/192 and SACD are "audiophile formats", the engineers who master the recordings expect them to be listened to in serious listening environments that are relatively noise-insulated so that very soft parts can be heard clearly, and there are no complaining neighbours so the loud parts can really blast. Consequently, an ample dynamic range is preserved.

"Remastered" reissues on the other formats – CD or lower-bitrate downloads – are nowadays expected to be listened to through cheap earbuds or speakers and perhaps in noisy urban environments. So, the engineer applies "loudness wars" treatment, compressing their dynamic range so the listener can still hear the quiet parts even with all the noise around them.


You are proving the point. Treating the audio well, not giving in to the loudness, has only correlation with 24/192 releases (if you say so), there is no causation.


Sure, there is no causation of "higher bitrate necessarily makes for better sound", but the correlation of "higher bitrate format just happens to have better mastering" is often strong enough to make these the go-to formats. There are a large number of classic recordings that are difficult to obtain in the digital era without harsh compression being applied to them, so the 24/192 or SACD release is the most convenient way of hearing them with decent dynamic range.


Right, absolutely. 16 bit samples give 100db+ of perceived dynamic range, more than would ever be needed.

The other poster is not saying otherwise, though. They're just saying that "hi-def" formats, while technically unnecessary for end-user listening, are often the only way to obtain a decently-mastered recording.

There's no technical reason for things to be that way. But that's how things are.

It's sort of like buying a new car. You want the more powerful engine? Well then you also have to buy the package with heated seats, the moonroof, and the fog lights. There's no technical reason for it, but that's the only way they'll sell it to you.


But there is a technical reason for them as the tools used to master/make music work better at the higher formats. Its not "superstition" its digital audio.


Of course. That's why I specifically said "end-user listening."

It's also what this entire discussion's about.

Of course formats above 44.1/16 are useful for professional work; nobody's ever said otherwise. Just like graphics professionals work with high-res images and lossless formats even if the end product is a .jpg on a web page.


Yes, but this is about Music Downloads.


In other words, it is not a technical advantage, but a social/political one.


> It has nothing to do with the 24/192 values themselves, but rather the treatment given to the recording during the mastering

I noticed a similar thing with TV a few years ago. Despite watching a standard def. channel on a SD TV some programmes had a very noticeably better image quality. I think these had been shot and edited in HD and although the 'last mile' was still in SD there had been less degradation in the processing so the final picture was much better.


It’s the same reason downsampling 4K to 1080p looks better than shooting in 1080p.


this is roughly what the "Mastered for iTunes" badge means on iTunes downloads

the mastering engineer has to be approved and there are some minimum dynamic range standards

also 24 bit (but not 192khz) master files have to be supplied

reportedly some of the streaming services (Spotify, YouTube) are now applying some 'loudness normalisation' which will bring some of the bad loud masters into line (it won't restore their dynamic range but will make them similar loudness to other tracks)

the loudness wars were never about what's good for listeners, but rather a competition for tracks to sound louder than other people's tracks when played consecutively on radio or in your music player


Mastering for "high fidelity" 24-bit audio on iTunes is an oxymoron. The compression algorithms used for AAC and MP3 are going to the worsen the sound quality in a manner that renders the extended dynamic range of 24-bit audio is useless.


even on mp3s where the bitrate is low enough to hear some compression artefacts you would still be able to perceive the difference in dynamic range

and the iTunes files are 256 kbps AACs, you can't hear the compression

remember that 'compression' in this context is data compression and not audio compression (which acts directly on the dynamic range of the source)


> the iTunes files are 256 kbps AACs, you can't hear the compression

I can most certainly hear the compression when compared with CD digital audio.

I fully understand the difference between data compression and dynamic range (not audio) compression.

What I'm saying is, lossy data compressed audio formats are already compromised enough to rule it out as a medium for audiophile use. Worrying about the dynamic range at that point is moot. It's going to be played on tiny, terrible sounding speakers.


> I can most certainly hear the compression when compared with CD digital audio.

with 256kbps AAC, really? yeah MP3 is old and even at 320kbps it throws away everything above 18kHz (which I can't hear personally, but some people can). However AAC is newer and better and blows MP3 out of the water (so do OGG and OPUS, btw). We got a lot better at psychoacoustics since MP3 started out. I strongly doubt you could hear it in a proper A/B test.


With the higher 24 bit resolution, would it not be possible for someone to "decompress" the dynamic range with less distortion than 16 bit?


No, compression can't be undone. It destroys information. Especially with a limiter. The mastering chain will have typically done other things to the signal as well.


Thanks for the clarification, I kinda agree.

Then, 24/192 is mostly a "weak signal" to help you estimate if the audio was treated with care.


That's the 24-bit part, which provides more play room then 16-bit. Normally 16 is plenty, by consider an example track which is heavily compressed so only the top 1% of range is used:

16-bits = 2^16 * 0.01 = 655 discrete levels

24-bits = 2^24 * 0.01 = 168000 discrete levels


I think you have the wrong end of the stick there. Even if a track is heavily multiband compressed, it's still using the full range of levels, as audio will oscillate between -ve and +ve for each cycle, potentially using most of the 65,536 levels available at 16-bit (and the 16777216 levels at 24 bit). The 0.01 is not relevant.


No, I don't have it wrong. Compression means it's effectively using less of the range and the dynamics could be represented with fewer bits in an optimal encoding. All the levels may be used in track, but it's more biased towards the high end.

To help you understand, imagine 1-bit audio. What would it sound like? For each frequency, you can only have a single volume, i.e. it's maximally compressed.


I still think you have it wrong; (Dynamic Audio) Compression does mean it may be using less of the dynamic range when taken overall, but the waveform will still take up the entire span of the available bit depth. Granted that with an optimal encoding this could be biased towards the "high" end (but bear in mind that would have to be both +ve and -ve), but I still think this would miss the point, and I think it would also create distortion in the 'zero' region of the audio. If you open up heavily compressed music in an audio editor (pick any recent EDM track), it will still cross the zero point, and have maximum and minimum levels present. It will also have 'zero' (i.e. 32768 if the range is 0-65536 in 16-bit audio) present - often at the beginning and end of the track, and the audio will be centred about that in the vast majority of tracks.

I'm not sure the 1-bit extreme helps with understanding this, particularly because the audio we'd be dealing with will be a mix of many frequencies; the waveform is an aggregate of the overall 'signal' at each frequency present in the mix; this is one of the reasons why multiband rather than single-band compression has become so popular (as it allows you to get the 'max' in each frequency band rather than one band dominating the overall result of the compression).

I think there's a difference to take into account when considering any given momentary sample and the overall effect - yes, compression does reduce the dynamic range of the music, but you would need some sort of variable bit depth which was linked to the instantaneous amount of compression being applied to get any kind of workable lower-bit-depth encoding scheme, which seems like a lot of complexity for no significant gain (to me?).


1-bit audio sounds perfectly fine at a high enough sampling rate (even higher than 192kbps), with proper dithering. That's how D/A converters work.


I can confirm, you have it wrong.


But also correct here:

> Compression means it's effectively using less of the [dynamic] range and the dynamics [range] could be represented with fewer bits in an optimal encoding.

Lower DNR can indeed be encoded with fewer bits per sample.


As Barbie would say: Audio engineering is hard! Let's play with the calculator.


Barbie has piloted commercial airliners, been on a scientific space mission, and has an M.D., so I don't think she'd be likely to say that.


https://en.wikipedia.org/wiki/Teen_Talk_Barbie

The meme "Math class is hard, let's go shopping" is only slightly apocryphal. Two of the voicebox lines were "Math class is tough" and "Want to go shopping?"


No, that has nothing to do with it. See my reply above.


Your reply is not the whole picture. Compression reduces the effective number of bits of resolution.


If you had an RGB (24 bits) image that is mostly very dark, would you say that somehow because the image doesn't use the full range of possible values, the image quality suffered? Would you say that changing the format/bit-depth would actually lead to a perceived increase in quality when looking at the image?


I don't understand why you said I was wrong in the other comment, but you give here an appropriate analogy which makes the effect obvious.

Because we only have 8 bits per channel, we all can see banding in (dynamically) compressed images. That's what increasing the depth improves.


Did you check out the article? 16 bits can cover the range very well, more than you (or anyone) could ever distinguish


Sigh. Please see my other reply. It isn't a matter of bit rate (indeed, 16 bits could be more than enough). It is a matter of how engineers serve the targeted markets of each format.


Perfectly valid point. It's not a technical limitation that "normal" recordings are spoiled by compression, but it's a commercial limitation (or whatever you call it) that has a very real impact.

You cannot buy good recordings in other formats, in this format you can. So there's a market - not a very big one as such, and maybe created for all the wrong reasons, but it is there.


I'm not sure how it's not even more "silly" then. Investing in additional infrastructure, equipment, and effort, when all you need to do is turn down the compressor ratio in the mix... Sounds silly to me.


> all you need to do is turn down the compressor ratio in the mix

The problem here is that the "you" in this sentence is not "me", or even an entity I can really influence, much less control.

It's not really even the audio engineer whose boss is telling him to turn up the volume so high that only the top few bits of the 16-bit recording are in use. The boss is saying this so that the song can be heard when the user turns the volume down because of the obnoxiously noisy commercials on the radio. Those commercials have found that they're more effective when they turn the volume way, way up. And they don't give a whit about the quality of their audio as long as they can sell more cars or tickets or whatever, much less the quality of the songs that play after them, much much less the quality of the songs that someone downloads to listen to offline in a nice listening environment without commercials.

The solution isn't just "turn down your compressor ratio", there's a big, hairy Nash equilibrium/politics problem that can be bypassed by offering 24/192 as a secondary product. If you want to remaster it to 16/48 after downloading, you're welcome to do so.


Well, SACD at least offers one further advantage than lower compression: multichannel audio, which is a big deal for classical music, especially works involving spatialization. Investing in a SACD player is well worth if that repertoire concerns you.


I'm an amateur at all this (and maybe the article stated this explicitly), I use Garageband. I find the main reason to do your recording and mastering in 24 bit is so you have room to change the level of each track without the sum of them clipping. When you are trying to set the levels of each track you have two constraints if you use 16 bit. The first is you need to keep all of them low enough that the overall recording doesn't clip but you also don;t want to over compensate and use only half the dynamic range available in 16 bit.

Doing this process in 24 bit gives you a large margin of error to play with. No real point to keeping that for the recording people are going to listen to


It is entirely silly. The believe that there is causation between a high fidelity, high dynamic range release and 24/192 is exactly the issue.

Yes, those high dynamic range releases usually get published in 24/192. No, the fact that they get published in 24/192 does not, as far as human hearing goes, add anything to the dynamic range or otherwise the fidelity of the recording.

Since the correlation is so strong, it is of course entirely understandable that people assume causation exists.


I've noticed the same thing with LP's. While LP's are technically inferior to red book CD's, they're often mastered better.


Why is this? I only have 16 bit 44.1kHz "normal" CDs and I notice some terrible mastering jobs, eg. RHCP Stadium Arcadium loses the snare for the first track and is like listening to minutes of distortion yet apparently the vinyl was really well mastered.

Is there a technical reason why the mastering is so different for the two mediums, CD versus vinyl?


When you boost the loudness too much (for your specific turntable to handle), the needle will simply jump out of the groove, not kidding.


This is true, to a certain extent. But it doesn't mean that masters for vinyl always have more dynamics left intact.

More often than not these days, the same compressed master is used for the vinyl. To combat the groove-jumping problem, the overall level is simply dropped.


I've seen it argued that multi-disc setups are where the loudness wars started. When you can switch between CDs quickly, you start comparing how they sound. If the volume knob is left alone, the CD that is mastered louder is going to sound louder, and thus better.

Thus, people started mastering CDs for loudness.

An alternative idea of mine is simpler. Loud music is considered worse than quiet music (quiet music sounds worse, but loud music still does, and also bothers other people). So, when you need to pick a volume setting for your collection, you bias towards setting it lower, so the really loud ones don't become too loud. Thus, the quieter CDs are annoying because they always sound quiet, whilst the load CDs sound about right, because your volume is much more suitable for them.


Just guessing, but maybe the CD version is also used for radio play, and needs to be mastered as loud as possible not to lose the war?

I guess special purpose releases don't usually end up on the radio so they can be mastered for people who actually appreciate music. ;)


Assuming equivalent quality of CD playback equipment and LP playback equipment, LPs provide superior sound quality. LPs are analog, so what reaches your ears is undiluted sound. CDs are digital, so there are gaps in the sound, and there's the analog audio conversion that must happen to record to a digital/CD format, and digital audio conversion back to analog that must happen to hear the sound from a speaker. Both conversions reduce fidelity.

As ever this difference can be impossible to detect if the equipment and environment aren't of a sufficient fidelity/quality.


>digital, so there are gaps in the sound

What kind of gaps? There are no gaps. Sampled signals perfectly represent the original wave up to half the sampling frequency. Analog systems are inevitably limited in their frequency response as well, so, given the same bandwidth, there would be no difference at all.

In the real world, imperfect A/D and D/A conversions are typically still far less destructive than all the mechanical and electromagnetic sources of noise that affect analog systems. You can't consider one but not the other.


Try watching this video (hope I found the right one):

https://www.youtube.com/watch?v=cIQ9IXSUzuM

I think you're right that recording equipment has a long way to go, though; regardless of format I think people can relatively easily distinguish real acoustical instruments from recordings.


Yep - non-shitty masterings are the one legitimate reason for "hi-res" audio. Even if you could downsample it to 16/44 with no perceptible difference.


What are some examples of popular songs that had their loudness boosted?


Almost any reissue of material that has already appeared on CD in an earlier generation. Compare the post-millennial CD remasters of Slowdive's Souvlaki or the Cocteau Twins discography to the original CD issues of those albums in the early 1990s. Generally anytime an album is announced as "newly remastered" these days, the sole significant change is decreasing its dynamic range.


Any examples that I can listen to online?


Sadly most people have never experienced good sound. When you hear it though. But taste can be very different.


I have! I really have! My friend spends all his money on audio. I sadly just don't understand. You are correct. I was going to say that he pays infinite money for what to me is a 10% increase in enjoyment, but perhaps for you and him, your brain is better with sound, and you guys might get a 1000% improvement for your dollar.

Although in any case, nothing justifies the pricetag dished out to audio enthusiasts.


An 100-150 dollar headset of the right type is hardly "infinite money" and I would call it paradigm-changing compared to what most people seem to be using, at least over here.

It's only at higher tiers that the difference might really be imperceptible to many ears.


I always thought pickups (the needle of a turntable) are the best example for that:

A 200$ pickup is certainly better than a 20$ needle, maybe even 10x better The difference between a 200$ pickup and one that goes for 2000$ is miniscule. There certainly is a difference, but it's never as big as between the 20$ and the 200$ model.

That said, there are listeners who believe in the value of a 2000$ pickup and derive a lot of enjoyment from the difference to a lesser model. Who am I to say they're wrong?

Now when it comes to a manufacturer of very expensive cables (for example): Don't make me laugh...


For all its flaws even a Beyerdynamic DT 770 Pro is a very good headset that's head and shoulders above...


As someone who knows nothing about 'good' headsets, should I get the 32, 80 or 250 Ohm


It shouldn't make much too much difference unless you like to listen to them really loud (damaging-your-hearing levels).

Two things come to mind though: voltage/current delivery of your amp, and damping ratio.

The first depends on the characteristics of your amplifier. Some are better at delivering current, some voltage. The lower impedance (32) is better suited to high-current/low-voltage sources, which includes most portable devices, phones, etc. Conversely, the higher impedance (250) is better for high-voltage/low-current sources like tube amps.

The second is about the ratio of the headphone impedance to the amp output impedance. You want a high ratio, so if your source has a large output impedance then the higher impedance headphones will sound better. Good headphone amps sometimes specify the output impedance, or you can measure it.


> Conversely, the higher impedance (250) is better for high-voltage/low-current sources like tube amps.

Headphone outputs of mixers fall into this category as well. Proper audio interfaces and sound cards have no issues at all driving 250 Ω to deafening volumes. Laptops no issues as well (for me).


Yeah the full on audiophile stuff is mostly nonsense as far as I can tell, but my god I do not understand how so very many people put up with those crappy apple earbuds as there primary sound playback. The difference with a proper set of headphones is massive.


Anything is better then the cheap ones you get for free. I guess people want to have "the original" lol. But then when you get better headphones you can hear the background buzz from low end devices ...


I feel really sorry for audiophiles because they have to spend lots of money on expensive equipment to eliminate the perceived shortcomings that the rest of us are not troubled by during our enjoyment of music.


There is plenty of music that you wouldn't even get to enjoy if you didn't invest in some higher-end equipment and made some effort to improve your listening environment, because you wouldn't be able to hear it at all: the opening of Schnittke's Symphony No. 3 in the BIS recording, Nono’s Prometeo, Ligeti’s Cello Concerto as I mentioned above, most of Knaifel’s music issued on ECM or several of that label’s Pärt recordings, etc. That is music of extremely pianissimo dynamic. And no, you can’t just turn the volume up, because if you did that, the loud parts that come later would blow out your ears: you have to have a listening environment and responsive speakers that are capable of representing the whole dynamic range of the recording.


I'll just wait to hear those pieces in concert I guess : )


There are no short cuts to good sound, you can buy the most expensive equipment but still have bad sound due to the acoustics of the room. It's also somewhat random, like placement and materials dampening different frequencies. So I guess it becomes a sort of addiction.


There's a very straightforward practical reason why you might want to keep downloading 24/192. The people who put those together are on average "audiophile" snobs so the rips tend to be perfect, often from high quality sources like SACDs, hdtracks, etc. The source of the recording is usually the best master known for that record (special remastered editions, etc.). If you download mp3/spotify chances are the copy you download is from a worse source. Sure, you don't need 24/192, a well ripped 320kbps mp3 would be the same in practice, but looking for 24/192 in the internet is an easy way of getting better quality music on average in practice.


> "If you download mp3/spotify"

You are misconstruing Monty's argument here. He is very much against mp3...in fact he says he could tell the difference between high bitrate mp3 and 16 bit 44k wav. The real point of the video is that 16 bit 44k wav is beyond sufficient...don't need to go beyond that to 24-bit 192kHz.


> he says he could tell the difference between high bitrate mp3 and 16 bit 44k wav.

In the early days, when all mp3 encoders were pretty bad, I could tell which encoder produced a given file. mp3 encoders today are vastly better. I've not been able to do that party trick in a long time.


I remember the Xing encoder was blazing lay fast, but sounded worse than LAME. This was around 1999, IRRC.


Yep, especially bad on cymbals - always a dead giveaway.


Yes and hihats. But how am I to tell now that I have high frequency tinnitus?? Blasted drummers.


And whenever someone said an 's'. It sounded like 'sshssh'


thank you for clarifying! It seems I slightly misrepresented your views...for that I'm sorry...I was going on memory of what I remember you saying.


As far as I know, people consistently fail to tell the difference in blind A/B tests in listening between (any decent) FLAC and a well ripped mp3 from the same source. I know I can't. I can't link to proper double blind studies but that's the general consensus in the non-BS-audiophile community as far as I know.


My anecdote is that I've done a number of these tests (Foobar has a plugin that allows you to do them on yourself), and I can reliably tell the difference between FLAC and 128 MP3, but can't tell the difference on 256 MP3 and up.


Yes, I've heard people putting the threshold in ~192kbps. Above that it's pretty much impossible these days. I can also tell the difference in 128kpbs, although sometimes it's a bit of nitpicking (can only hear differences in the sounds of hihats and things like that)


I think the type of music you are listening to assists with being able to differentiate. To me (much rock, blues, guitar music) 128kbps MP3 sounds like somebody chewed it first. Cymbals and bizarre snares that sound like the snare is now made of paper indicate the low bitrate.

But 256+ and I certainly cannot tell the difference reliably.


When I was considering Tidal vs. Spotify in the past, I ran across the ABX tests here:

http://abx.digitalfeed.net/list.html

Pretty much found the same thing for myself at 256+.


I had satellite radio for a while but had to cancel it because I couldn't stand the sound of cymbals at whatever bitrate they encode at. Weirdly, when the phone operator asked why I was cancelling and I told him "audio quality" he acted like he had never heard that before.


I'm not an audiophile at all but satellite radio sounds worse to me than anything other than AM/FM radio. Spotify over LTE and Bluetooth sounds night-and-day better.


The same in electronic music since there are many repeating sounds. 128kbps just sounds bad, especially on a proper sound system.


Any serious club system can produce peaks waaaaay hotter than 120db, cleanly. Especially horn loaded systems like most anything on the high end these days. Horn loaded PA speakers can give you perfectly clear peaks at absurdly high amplitudes, cleanly. We don't hear peaks, we hear RMS volumes which are invariably at least 6 db down from the peak if not much more (classic rock routinely has peaks 30 db over the 'body' of the sound as expressed in RMS loudness). This easily raises the grungy one-bit noise floor level of 16 bit CD audio up into the plainly audible, and as for any lossy-encoded format, forget it: an order of magnitude worse.

CD quality is the very least you could want for a serious big club or theater system (much less auditorium). Between peaks and the requirements for deep bass, the peak in a digital audio file is (a)much farther above the body of music than you'd think, and (b) should never be reached, because that's clipping.

People routinely behave as if the theoretical maximum dynamic range of a Red Book CD is relevant to anything. It's incredibly easy to play back music loud enough that the noise floor will get obnoxious and relevant to listening, it's only 96 db down. Any small system in an enclosed live space can push louder than that. Cranking music over headphones will blow way past that and you won't even hear the peaks, but you'll be able to hear how bad the subtleties (or lack of same) on 16 bit red book CD are.

Electronic music, especially live, is totally relevant to high resolution audio. I more than suspect some of the big name acts (for instance, Deadmau5) are using converters to the mains running at not less than 24/96. Certain synthesizer mixes are very revealing of faults in the playback. If the live performance over a modern PA sounds absolutely huge and awesome, but not strained or grainy, then they're not using CD quality. The SPLs are more than enough to make these distinctions obvious.

Anyone can get these SPLs over headphones, trivially, and headphones that'll handle it cleanly are only a few hundred dollars.


Doesn't this mean going past the 85 dBA limit in headphones? That's instant hearing loss.


I only ever recall finding one track where I could hear differences in bass soundstage / positioning on 256+ mp3, vs. uncompressed 44/16, back when I was using stereo subwoofers in a heavily acoustically damped room. Even then, I could only tell when abx'ing.


You can't tell the difference on the vast majority of music. But encoders occasionally mess up, producing an audible difference. In the case of MP3 this often takes the form of a pre-echo. For example encoding cymbals and hapsichords seems to be difficult, though much less of a problem than it used to be. My understanding is that such killer samples exist for good MP3 encoders, even at 320kbit/s, but it has been a couple of years since I last looked into the topic.

(Personally I downloaded some of those killer samples and couldn't tell the difference, but other people reliably tell them apart in an ABX test.)


c't magazine did a test all the way back in 2000 (see https://www.heise.de/ct/artikel/Kreuzverhoertest-287592.html) and even the best test listeners were unable to tell 256kbit mp3 and the original apart.


Was this the Arny Krueger stuff? I did that.

I ABXed 320kbit mp3 from an uncompressed original, I think 9/10 IIRC. It was a recording of castanets, and listening for frequency response differences was useless so I keyed off of 'personality' differences in the sounds and did it that way.

I was also just as horrible at detecting wow and flutter, as I was good at detecting lossy compression 'changes of sonic personality'. Comes from my experience being with analog audio, which is less true of people these days.

The idea that 'the best' listeners cannot tell 256k from even something as limited as 16/44.1 is ridiculous. Any mp3 format… any lossy format… is seriously compromised. Talk to techno/house DJs and producers about how useful it is to perform off mp3s, this is not a hypothetical argument.


> The source of the recording is usually the best master known for that record (special remastered editions, etc.).

Usually what is happening here is you're getting a master that hasn't been compressed to death (see loudness wars). Vinyl is a shit source for 'quality', but most records aren't compressed to death so they can still sound better on a good set of speakers due to the dynamic range.


So, in other words: bite the bandwidth bullet and get the 192 material, but for Pete's sake, do yourself a favor and downsample to 44.1.


If you have the patience, sure. In my desktop I keep everything on the big file source and listen to it that way. I also have dedicated DACs/AMPs that can decode all the way up to DSD. I know it probably doesn't make a difference but whatever, storage is cheap. For my phone I listen to spotify.


Similarly, seeing 24/192 on a rip from an analogue source (e.g. vinyl) usually implies that a high-quality player+amp+ADC was used to digitize the album.


It seems to me what we really need then is some other kind of litmus test that indicates a quality recording, without having to waste a ton of bandwidth and storage to get it. Are there other things to look for that also tend to correlate with quality? Newer formats maybe?



The 192 is really for low latency not for anything you can actually hear. You can't hear the difference past 48khz. An audio interface that can do 192 gets you down to 5ms latency where you can finger drum without being thrown off. But it has nothing to do with audio quality. Like if you use guitar rig you will have the lowest possible latency for realtime computer generated guitar effects at 192.


Where do you get the 5ms from? Why is 192 more likely to have lower latency? I would think the higher bandwidth required by 192 has the possibility to slow your computer down and cause more latency.


I believe the parent is referring to the latency of real-time effects used during performance:

* With analogue equipment the latency between the input and output of the effect is often below 1 ms (unless the effect is supposed to have a delay).

* Standalone digital effect equipment that processes the samples one at a time can also have a latency below 1 ms. (48kHz ≈ 0.02 ms)

* If you use a computer for real-time effects, the samples are not transferred sample by sample from the audio interface, but in a block of many samples. The number of buffers and the size of them can usually be changed by the user. With a buffer of 1024 samples, the oldest sample is already about 21 ms old before the computer can process it. After the buffer is processed it has to be transferred to the audio interface, and that will add another 21 ms. So the minimum latency for any real-time effect in the computer is about 42 ms at 48 kHz if the size of the buffers is 1024 samples. Often it is much worse because the operating system adds more latency. If the equipment can handle a sample rate of 192 kHz, the same latency is about 10 ms. If the computer can handle smaller buffers, the latency can be lowered. With 256 samples per buffer the minimum latency will be about 11 ms at 48 kHz and 3 ms at 192 kHz.


I believe the parent might be referring to the myth about attack of signal falling "between the cracks" of digital sampling. This is debunked in the video linked elsewhere, in the chapter titled "bandlimitation and timing". I might be wrong about the parent poster's meaning, though.


See the excelent answer by zaxomi for another more favorable interpretation of the parent. https://news.ycombinator.com/item?id=15131280


I've said it before, this video demo is one of the very best I've ever seen.

https://xiph.org/video/vid2.shtml

So well prepared, so well presented, so little that could be removed without ruining it.

I aspire to do such good demos but always fall so short.


I felt that the presenter is a very rare example of a engineer who felt that his expression and purpose was attractive to his imagined audience.

Without diversion via the questions embedded in the profession and the perception and self perception of the male geek working and academic world, I so rarely see a presenter reacting to a sense of apparent human warmth in the room, and beyond the lens, which even with the most encouraging assistance behind the lens, is genuinely hard to do. Hard enough that I think it is a classic contribution to the stereotypes of inflated ego newsreel presenters, which Hollywood loves to satirise, in my opinion because Hollywood is mocking, to their narrow and insecure view, a subspecies of acting which when done well, can so massively capture the greater audience than ever some most serious actors may manage to capture.

This is a bit more than a little bit of geek knowhow and applied thought, but I think many geeks by virtue of sheer analysis without a obstruction of a ego, could be handily outperforming the supposedly inherent talent they are "meant" to possess. It may be reaching well into "real serious" acting, very easily. I don't pretend to be a judge of that, but if acting abilities are "I know it when I see it", this is excellent acting indeed.

Edit, is not was, first line. A comma for clarity but later on.


What kind of software generated this comment?


Hah, so cynical. No, this is just how audiophiles describe anything. Even their floorboards.


Could it be cargo cult science (language)? By making it sound like an academic paper, it passes for scientific.


I agree, Monty is a good presenter and a good speaker. You can tell he loves his subject matter, and he's eloquent and understandable in sharing it.


Are you the kind of person who considers 24/384 not being good enough? Chasing 384KHz and beyond by buying $1000 USB cable to the DAC?


If you're striving for clarity in this comment, I suggest rewriting it from scratch :)


Thanks for sharing that! I was convinced to watch it from all the other recommendations and can concur (FWIW at this point) that it was well worth it.

The camerawork was excellent and the demonstration integration/trinket-wielding was seamlessly done. I get the impression the people who made this had a case of "let's use the ThinkPad for this, and let's do it this and that way," and they pulled it off so perfectly, exactly how engineers wish would happen.

If you ever need a reference demo for "open-source software can make good looking presentations," this would be on the shortlist, I think. (The credits say it was made using Cinerella, for reference.)


I think one thing that made this happen is that this is a presentation made to exercise an interactive software tool that—although it can also be used for one's own experimentation—was mainly written to be used in the presentation. It's a case of making a custom "musical instrument" crafted to play exactly the song you're about to play on it. A lot like Bret Victor's "puppet controller" app in his presentation Stop Drawing Dead Fish (https://vimeo.com/64895205); or the actually-an-instrument "MIDI fighter" created by Shawn Wasabi for his music (https://www.youtube.com/watch?v=qAeybdD5UoQ, https://www.youtube.com/watch?v=Cj2eUpt3MVE).


> this video demo is one of the very best I've ever seen

I have said the same thing. I was a film major, a video producer, and a tech writer (now a programmer), and I am in awe.



I loved the casual and original expressions of the host, it made the whole demo look much more authentic and convincing. Pure presentations like this is truly an work of art. Hats off to it.


Thank you, this really is a great watch. Particularly the zero point stuff re: stairstepping - I had no idea this was merely a function of display.


Monty Montgomery, the author, hangs around here as id=xiphmont :)


That is fantastic. The presenter manages to pull of "charismatic engineer" really well.


Well, he is a very charismatic engineer...


09:30 to 10:30 explains the effects of quantisation extremely well! Overall, that is one of the clearest, no frills, to the point videos I've ever seen.


I thought you were being a bit hyperbolic until I watched it. Nope, it was stellar. Thanks for posting it for those of us who hadn't seen it, it was fun, and even better, I learned some things about something I thought I already understood.


Apart from it being a great video, I'm sad to say that I'm surprised my phone isn't burning my fingers after watching all the way through it. It's like these people know something about multimedia on the internet.


Noise shaping dither filters are amazing and so is the video, quite educational. I have never been more interested in signal processing than during this video.


Read the article, but didn't look at the video until seeing your comment. Thanks, that was indeed an excellent demo video.


Unfortunately this video doesn't work on iOS. I'll try to remember to watch this next time I'm at a computer.


It's available on YouTube[1] as well, which should work just about everywhere.

[1]: https://www.youtube.com/watch?v=cIQ9IXSUzuM


Thanks!


Thanks for sharing! Well worth the watch, and gave me a bonus dose of ASMR. Commenting needlessly to show my appreciation and to hopefully inspire a future version of myself to watch this again.


This reminds me, class D amplifiers are effectively using only 1-bit sampling (!!), the trick being it uses a high sample rate. And these are some of the best amplifiers around.


That's pretty amazing.. why wouldnt he set up his own sound to eliminate more of that signal generator fan noise though. So distracting.


I did, but I didn't want to wear a lapel mic for the video and deal with the extra hassle. It was just me (no helpers) and a couple cameras. Wired mics suspended overhead kept sound setup from adding to the logistics for each take.

The sound was done similarly to the previous video, which hadn't drawn comments. This time though, a sizable fraction of people said it was distracting.


And yet you insist on being the authority for what's permissible? I think you should stop publicly clamoring for limitations to audio resolution on the grounds that you're not representative of enough listeners. I don't mind one bit that you don't hear these things, or that you represent even a majority of people who don't hear these things… but they don't represent the market for commercial audio. You're doing harm, please stop.


Wow, that was awesome!


Wow, I hadn't seen this until now,but it is definitely the best video I've ever watched to explain the basics of digital sampling. Great find!


Thanks, I appreciate it. I have probably forwarded the article to 20 or so people since it first came out. Didn't know about this video.


That is amazingly well presented.

Agreed. Wow!


Well I disagree. The background music on transitional slides can be removed. The rethorical questions ("let's pretend we do not know about digital audio") are just time fillers. Waving the hands toward an inscrustation looks as silly as the meteo girls "playing with flies in front of a green screen" on all TVs worldwide.

It is the same with most "well-written" articles in newspapers nowadays. We do not even feel it but its just canned food, a lot of jelly and chemical taste enhancer but the amount of real meat inside is near zero.


Good points in the article, but it has some flaws.

The problem whenever somebody writes about digital audio, is that it is very tempting to hold on to sampling theory (Nyquist limit, etc) and totally discard the problems of implementing an actual Analog-Digital and Digital-Analog chain that works perfectly at 44100Hz sample rate.

I agree with the assesment that 16 bit depth is good enough; even 14 bit is good enough and was used with good results in the past (!). However, the problem is with the sampling rate.

> All signals with content entirely below the Nyquist frequency (half the sampling rate) are captured perfectly and completely by sampling;

Here lies the problem. This is what theory says, however, when using 44KHz sample rate, this means that to capture the audio you need to low-pass at 22KHz. And this is not your gentle (6, 12 or 24db) low-pass filter; no, this needs to be HARD filtering; nothing should pass beyond 22KHz. And this must be on the analog domain, because your signal is analog. To implement such a filter, you need a brickwall analog filter and this is not only expensive, but it also makes mess with the audio, either 'ringing' effects and/or ripple on the frequency response and/or strong phase shifts.

So on Analog-to-digital in 2017, converters should be operating at a higher rate (say, 192KHz), because this makes analog filtering of the signal much easier and without side effects.

Now, for Digital-to-Analog, if your sample rate is 44KHz, you have two alternatives:

a) Analog brickwall filtering, with the problems noted above

or

b) filtering on the digital domain + using oversampling

the article mentions:

>So the math is ideal, but what of real world complications? The most notorious is the band-limiting requirement. Signals with content over the Nyquist frequency must be lowpassed before sampling to avoid aliasing distortion; this analog lowpass is the infamous antialiasing filter. Antialiasing can't be ideal in practice, but modern techniques bring it very close. ...and with that we come to oversampling."

So they are mentioning alternative (b). The problem is that oversampling does not solve all problems. Oversampling implies that the filtering is done on the digital domain and there are several choices of filtering you could use, for example FIR (Finite Impulse Response), IIR (infinite impulse response), etc.

And each one of these choices have side effects...

In short, the problem is that with 44KHz sampling rate, your filter cutoff (22KHz) is too short to your desired bandwidth (20Hz-20KHz). Using a sample rate of 192KHz gives the DAC designer much more leeway for a better conversion. And CONVERSION is the key to good digital sound.

>What actually works to improve the quality of the digital audio to which we're listening?

It is interesting that the author mentions things such as "buying better headphones" (agree), but he never mentions "Getting a better Digital to Analog converter", which is highly important !!

On the other hand, he backs up his claim that "44KHz is enough" with an interesting AES test i was already aware of in the past:

>Empirical evidence from listening tests backs up the assertion that 44.1kHz/16 bit provides highest-possible fidelity playback. There are numerous controlled tests confirming this, but I'll plug a recent paper, Audibility of a CD-Standard A/D/A Loop Inserted into High-Resolution Audio Playback, done by local folks here at the Boston Audio Society.

This is a very interesting paper, and I did have the copy, however the test equipment should be checked. There are systems and better systems. The AES paper cited above had the particularity that the ADC and DAC used were provided by exactly the same machine (a Sony PCM converter), with the same strategy: no oversampling, brickwall analog filters. I can bet (99% sure) that the brickwall filters were identical on the ADC and the DAC on that machine; Murata-brand filters in a package.

The devil, as they say, is in the details.


I don't think there are flaws in the article as you claim. It already explicitly explains that oversampling at the ADC or DAC is an acceptable engineering solution:

> Oversampling is simple and clever. You may recall from my A Digital Media Primer for Geeks that high sampling rates provide a great deal more space between the highest frequency audio we care about (20kHz) and the Nyquist frequency (half the sampling rate). This allows for simpler, smoother, more reliable analog anti-aliasing filters, and thus higher fidelity. This extra space between 20kHz and the Nyquist frequency is essentially just spectral padding for the analog filter.

> That's only half the story. Because digital filters have few of the practical limitations of an analog filter, we can complete the anti-aliasing process with greater efficiency and precision digitally. The very high rate raw digital signal passes through a digital anti-aliasing filter, which has no trouble fitting a transition band into a tight space. After this further digital anti-aliasing, the extra padding samples are simply thrown away. Oversampled playback approximately works in reverse.

> This means we can use low rate 44.1kHz or 48kHz audio with all the fidelity benefits of 192kHz or higher sampling (smooth frequency response, low aliasing) and none of the drawbacks (ultrasonics that cause intermodulation distortion, wasted space). Nearly all of today's analog-to-digital converters (ADCs) and digital-to-analog converters (DACs) oversample at very high rates. Few people realize this is happening because it's completely automatic and hidden.

The main point of the article is to argue that storing or transmitting music above 16-bit, 48 kHz is wasteful and potentially harmful. It still fully condones using higher specs for audio capture, editing, and rendering.


> I don't think there are flaws in the article as you claim. It already explicitly explains that oversampling at the ADC or DAC is an acceptable engineering solution

Of course it is acceptable. Even 14 bit audio at 36KHz with a great DAC would be fairly nice, acceptable.

What the article claims is that 192KHz is useless, of no benefit. And i contend that it is of benefit when you want more than just good or acceptable performance. Not if you have a run of the mill DAC and OK headphones/speakers, but it is if you are a music lover and critical listener.


You've missed the point though - no human has demonstrated the ability to be able to distinguish the fidelity of audio above 44.1kHz sampling in a properly controlled comparison test (double blind). This empirical result is to be expected given the biology of the ear and the science of sampling theory, as the article explains.

It doesn't matter if you're a music lover or critical listener!


> What the article claims is that 192KHz is useless, of no benefit.

The article claims that 192KHz downloads are of no benefit. It's right there in the article's title. It's difficult to not accuse you of willfully misinterpreting his argument.


Modern DACs and ADCs (sigma-delta) get around this entirely by oversampling/upsampling then low pass filtering in the digital domain with a minimal if any analog LPF. This avoids all the issues you describe and with noise shaping the added noise is inaudible (unless you want to claim you can hear -100dBFS which would be pretty amazing)

>So they are mentioning alternative (b). The problem is that oversampling does not solve all problems. Oversampling implies that the filtering is done on the digital domain and there are several choices of filtering you could use, for example FIR (Finite Impulse Response), IIR (infinite impulse response), etc. And each one of these choices have side effects...

Citation needed here, oversampling solves virtually all the problems + with modern DSP the FIR filters can be made extremely good. the induced noise of modern adc/dac's is seriously tiny, and swamped by the recording noise of your audio.


There is no reason the DAC or ADC can't internally do the resampling itself. In fact, most take it to the extreme - they resample to several MHz, but at 1 bit per sample, with noise shaped dithering as explained in the video linked elsewhere. They then have a very simple analog low pass filter - the noise is shaped far into the stop band of the filter.

That is no reason to store or even process your music at higher rates, though.


> There is no reason the DAC or ADC can't internally do the resampling itself.

You are describing alternative (b) i mentioned above: digital filtering plus oversampling. This also isn't without side effects.

Oversampling a 44KHz signal is not the same as having a 192Khz material to start with. Very different.


Yes, and the only difference is the lack of unwanted ultrasonic signals.


No, it is not, please do take a deep look at oversampling in D/A conversion and the artifacts of digital filtering, articles are out there.


In practice, all audio converters are modulation converters of some sort, because it is simply too expensive to achieve the high AC linearity wanted at the required sampling frequencies with other kinds of converters. This already relaxes LPF requirements.

In practice, signal quality issues are usually layout and supply issues, not problems with the converter itself.

In practice, the speakers and ears are the worst parts with the largest non-linearities and a frequency response that looks like a nuke test area compared to the FR of the converter. Of course, in the case of speakers, we have concluded that we want them to have their own sound, because neutral speakers are not possible.


The speakers are often the most influential parts of a hi-fi, but they're not the worst part. The worst part is almost always the room itself. Nothing comes close for screwing up the signal path.

(I tend to avoid discussing the other critical parts — original recording quality and the listener's ears — because these are often immutable constants.)


What about for non-humans?

Consider a dog that lives with a musician who plays, for example, trumpet. The musician plays the trumpet at home to practice, and also records his practices to review.

A trumpet produces significant acoustic energy out to about 100 KHz [1]. When the musician plays live the dog hears a rich musical instrument. When the musician plays back his recordings, half the frequency range that the dog could hear in the live trumpet will be gone. I'd imagine that this makes the recorded trumpet a lot less aesthetically pleasing to the poor dog.

[1] https://www.cco.caltech.edu/~boyk/spectra/spectra.htm


In my experience as a long time trumpet playing dog owner, the dog is generally thrilled when there is less richness in the noise, and generally less trumpet playing period. I say this with experience owning many dogs and having played the trumpet in a range from really badly and inexpertly, to pretty good... they don't care. Trumpets in person are just way too loud and a bit too much in general for most dogs.


And perhaps too loud for humans?

I only say this because most people are happy listening to music on CDs but when in the presence of a live band (eg an orchestra) it is suddenly obvious how incredibly loud it is. My brother is a drummer and I find it incredibly loud; I am a bass player and I don't play loud although he sometimes complains that there's "too much bass". Perhaps we just go deaf in our relative audio spectrum.


The human has trained hard for years to master an instrument that has been perfected over centuries.

Both those huge efforts has gone into controlling the humanly audible part of the sound. Whatever sound is accidentally produced in other frequencies is probably at best aesthetically neutral, but more likely distracting.

Though my guess trumpeting is just noise to dogs either way.


What about transhumans? I'm looking forward to my HiFi Cochleatron 9000(tm) when my hearing starts to go.

Then I'll be mighty glad we made all these high-res recordings.


That makes for some genuinely interesting thought experiments.

What if, this actually becomes possible, but we discover that because we previously couldn't hear these frequencies, our instruments and equipment are HORRIBLY mis-tuned and sound terrible? We may end up having to re-record tons of stuff.

Something something premature optimization. And part of me is glad that the art of hand-making instruments is not yet lost; we might need the originals in the future.

Disclaimer: I say this as a completely naive person when it comes to instruments. The answer to this may be "if it wasn't built to resonate at frequency X, it won't by itself," which would be a good thing.


Most instruments generate harmonics that are integer multiples of the fundamental frequency, and some go a lot higher than human hearing. As you go up the harmonic series, the harmonics get (logarithmically) closer together. Our brains interpret close-together notes as dissonant, so higher harmonics could be kind of obnoxious to hear together. They might be "in-tune", but just too close to be enjoyable. (Imagine an extremely bright harpsichord.)

There's another effect that comes into play, though. There's a minimum pitch separation between simultaneous notes that we expect, and when notes are closer than that, they clash. That separation is usually around a minor third (~300 cents) in most of the human hearing range, but in the bass it's a lot wider, and in the high treble it's smaller. That's why you can play two notes a major second apart (~200 cents) on a piano in the high treble and it sounds okay, but down in the low bass it sounds muddy if they're closer than about a major third or perfect fourth (~400-500 cents). So, if we extrapolate into higher frequency ranges, then it's not unreasonable to expect that we would be able to interpret musical intervals that are a lot closer than 200 cents as consonant.

It's also possible that the minimum note separation thing is just an artifact of how our ears physically work, and that an artificial ear would have no such limitation. Which could open the possibility of enjoying kinds of music that we can't currently imagine as pleasant with our normal ears.


Because of the power decrease as you go up the overtone series, I'd suspect that being able to hear higher frequencies wouldn't cause very much trouble. However, the ability to hear higher fundamental frequencies would surely change harmonic theory! This is assuming that our pitch perception of high fundamental frequencies increased accordingly.


Higher frequencies aren't even in most recordings, so we wouldn't have to re-record them for that reason.

And if they were (such as in 96 kHz hi-res audio), you could just run it through a low-pass filter to strip off the higher frequencies.


Even if the engineer pressed the "192" button on their field recorder or DAW, chances are the instrument isn't going to be doing much interesting over 20kHz and/or the microphone isn't going to be particularly sensitive over 20kHz.


Ah, good point.

And... heh, using a filter to strip out the audio we used all that extra filesize to deliberately store. Haha. :)


My webapp manages photos for photography competitions. People upload 30MB JPEGs that would be visually lossless at a tenth of that file size. And I keep the originals, but actually resize down to 300KB for every function within the software. Haven't had a single complaint about image quality... :)


There's the all too real prospect that Hatsune Miku has been singing secret messages calling for the robot uprising, and none of us humans can hear those frequencies. Winter is coming desu~!

The good news is that you strip out all of this robot propaganda and still hear the exact same music, simply by encoding at a reasonable rate.


what makes you think your consciousness is capable of processing sounds in that range even if the ear is physically capable of it?


How do we know that the trumpet still sounds harmonically pleasing above 20 kHz? It may sound nice to the human ear, and a horrible mess to anyone with ultrasonic hearing capabilities.

Have we asked the dogs?


If we boost the frequencies even in just the range 10-20 kHz, things sound harsh and fizzy to the human ear.

The frequencies there are no longer musical in the sense of being recognized as musical notes.

That is to say, although in a pure arithmetic sense, frequency 148080 is an A, since it is a power of two multiple of 440; five octaves above it, we don't hear it as an A tone.

The frequencies in that ballpark just have to be present to add a sense of "definition" or "crispness" or "air" to the sound.

In fact, this can be faked!

In the OPUS codec, there is something called SBC: spectral band compression. What this refers to is basically a hack whereby the upper harmonics of the signal are completely stripped away, and then re-synthesized on the other end based on a duplicate of the lower harmonics or something like that. The listener just hears the increased definition.


> What about for non-humans?

The thing is, the higher sample rate data doesn't actually have a lot of the higher components after 20 kHz.

What the faster sample rate allows is to use a less aggressive filter.

Instead of a "brick wall" filter that rapidly cuts off after around 20 kHz, one with fewer poles can be used which rolls off more gently.

The higher sample rate ensures that there isn't any aliasing.

192 kHz audio does not reproduce flat up to 90 kHz.

(I'm going to gloss over the microphone used on the trumpet, or the loudspeakers which ultimately reproduce it.)


If the only thing you do with your music is listen, then yes, 24/192 delivers questionable value compared to the more popular 16/44.1 or 16/48 formats.

However, all musicians I know use these high-rez formats internally. The reason for that, when you apply audio effects, especially complex VST ones, these discretization artifacts noticeably decrease the result quality.

Maybe, the musicians who distribute their music in 24/192 format expect their music to be mixed and otherwise processed.


I do not believe it. 24 bits is definitely needed for processing (better yet, use floating-point).

Not 192 kHz; no friggin' way.

Repeated processing through multiple blocks at a given sample rate does not produce cumulative discretization problems in the time domain; it's just not how the math works.


> better yet, use floating-point

Both your inputs (ADC) and outputs (DAC) are fixed-point. Why would you want to use a floating point in between? Technically, 64-bit floating point format would be enough for precision. But that would inflate both bandwidth and CPU requirements for no value. 32-bit floating point ain’t enough. Many people in the industry already use 32-bit integers for these samples.

> Not 192 kHz; no friggin' way.

I think you’re underestimating the complexity of modern musician-targeted VST effects. Take a look: https://www.youtube.com/watch?v=-AGGl5R1vtY I’m not an expert, i.e. I’ve never programmed that kind of software. But I’m positive such effects are overwhelmingly more complex than just multiply+add these sample values. Therefore, extra temporal resolution helps.

BTW, professionals use 24bit/192kHz audio interfaces for decades already. E.g. ESI Juli@ was released in 2004, and that was very affordable device back then.


I habitually edit audio using 32-bit float, not 16-bit integer.

> Why would you want to use a floating point in between?

Because 32-bit float has enough mantissa bits to represent all 24-bit integer fixed-point values exactly, so it is at least as good.

Because 32-bit float is friendly to vectorization/SIMD, whereas 24-bit integer is not.

Because with 32-bit integers, you still have to worry about overflow if you start stacking like 65536 voices on top of each other, whereas 32-bit float will behave more gracefully.

Because 32-bit floating-point audio editing is only double the storage/memory requirements compared to 16-bit integer, but it buys you the ultimate peace of mind against silly numerical precision problems.


float has scale-independent error because it is logarithmic/exponential.

If you quiet the amplitude by some decibels, that is just decrementing the exponent field in the float; the mantissa stays 24 bits wide.

If you quiet the amplitude of integer samples, they lose resolution (bits per sample).

If you divide a float by two, and then multiply by two, you recover the original value without loss, because just the exponent decremented and then incremented again.

(Of course, I mean: in the absence of underflow. But underflow is far away. If the sample value of 1 is represented as 1.0, you have tons of room in either direction.)


Yeah I edit at 32 bit float myself, then I export down to 16 bit 44100 flac for distribution.


> Both your inputs (ADC) and outputs (DAC) are fixed-point. Why would you want to use a floating point in between?

Fixed point arithmetic is non-trivial and not well supported by CPU instruction sets. (Hint: you can't just use integer add/multiply.)

> I think you’re underestimating the complexity of modern musician-targeted VST effects. I’ve never programmed that kind of software. But I’m positive such effects are overwhelmingly more complex than just multiply+add these sample values. Therefore, extra temporal resolution helps.

Indeed, many audio effects require upsampling to work well with common inputs, e.g highly non-linear effects like distortion/saturation or analog filter models. However usually they perform upsampling and downsampling internally (commonly between 2x-4x-8x). While upsampling/downsampling is expensive (especially if you are using multiple of these types of plugins) its not clear if running at a higher sample rate across the board is worth it just to save those steps.


> Therefore, extra temporal resolution helps.

But it's not resolution, right? It's extra frequencies outside the audible range. Is there any natural process that would make those affect the audible components, if I were listening to the music live instead of a recording?


Yes, you can get a beat effect (https://en.wikipedia.org/wiki/Beat_(acoustics)) as the ultrasonic frequencies interfere with the audible ones. It's generally not considered desirable.


Only if the two are mixed in a nonlinear way (e.g. "heterodyned").

If a sonic and ultrasonic frequency are combined together, but a low pass filter doesn't pass the ultrasonic one, the ultrasonic one doesn't exist on the other end.

Hence, there can be no beat.


> Both your inputs (ADC) and outputs (DAC) are fixed-point. Why would you want to use a floating point in between?

The main reason is that it solves clipping in the pipeline.


> Why would you want to use a floating point in between?

Because if you don't you accumulate small errors at each processing step due to rounding. Remember that it is very common for an input to pass through multiple digital filters, EQs, some compressors, a few plugins, then to be grouped and have more plugins applied to the group. You can end up running the sample through hundreds of equations before final output. Small errors at the beginning can be magnified.

Pretty much all pro-level mix engines use 32-bit floating point for all samples internally. This gives you enough precision that there isn't a useful limit to the number of processing steps before accumulated error becomes a problem. By all samples I mean the input comes from a 24-bit ADC and gets converted to 32-bit FP. From that point on all plugins and processes use 32-bit FP. The final output busses convert back to 24-bit and dither to feed the DAC (for higher-end gear the DAC may handle this in hardware).

As for 192 kHz I've never seen or heard a difference. Even 96 kHz seems like overkill. A lot of albums have been recorded at 48 kHz without any problems. As the video explains there is no "missed" audible information if you're sampling at 48 kHz. I know that seems counter-intuitive but the math (and experiments) bear this out.

An inaccurate but intuitive way to think about it is your ear can't register a sound at a given frequency unless it gets enough of the wave which has a certain length in the time domain (by definition). If an impulse is shorter than that then it has a different frequency, again by definition. 1/16th of a 1 kHz wave doesn't actually happen. Even if it did a speaker is a physical moving object and can't respond fast enough to make that happen (speakers can't reproduce square waves either for the same reasons - they'll end up smoothing it out somewhat). Even if it could the air can't transmit 1/16th of a wave - the effect will be a lower-amplutide wave of a different frequency. And again your ear drum can't transmit such an impulse (nor can it transmit a true square wave).

I've done a lot of live audio mixing and a little bit of studio work, including helping a band cut a vinyl album. Fun fact: almost all vinyl is made from CD masters and has been for years. The vinyl acetate (and master) are cut by squashing the crap out of the CD master and applying a lot of EQ to shape the signal (both to prevent the needle from cutting the groove walls too thin), then having the physical medium itself roll off the highs.

The only case where getting a 24-bit/192kHz recording might be worthwhile is if it is pre-mastering. Then it won't be over-compressed and over-EQ'd, but that applies just as well to any master. (For the vinyl we cut I compressed the MP3 version myself from the 24-bit 48 kHz masters so they had the best dynamic range of anything: better than the CD and far better than the Vinyl).


Unless you are altering the time or pitch, which, these days, you are more than you are not.

But no, musicians aren't releasing things at ultra-resolutions because they expect others to reuse their work. The ones that are, are providing multitracks.


> Repeated processing through multiple blocks at a given sample rate does not produce cumulative discretization problems in the time domain; it's just not how the math works.

That isn't entirely true. e.g. It's common for an audio DSP to use fixed point 24bit coefficients for an FIR filter. If you're trying to implement a filter at low frequency then there can be significant error due to coefficient precision, that error is reduced by increasing the sampling rate.


> Not 192 kHz; no friggin' way.

It can be useful to run your signal processing chain at a higher rate because many digital effects are not properly bandlimited internally (and it would be pretty CPU hungry to do so).

But that doesn't mean you need to store data even that you'll process later at 192KHz though it might be easier to do so.


What if I simply want to slow down the signal 2 times (without pitch correction)? Then 44.1 kHz is obviously not enough. Maybe 192 kHz is way overkill though, but I would argue that that's the point. You don't want it to become a bottleneck ever for any effects.


It is only insufficient if there are components beyond 20 kHz that you want to bring down into the audio range as you're slowing down.


Lots of my plugin users are resorting to massive oversampling, even when it's not appropriate, simply because in their experience so many of the plugins they use are substantially better when oversampled. 192K is an extremely low rate for oversampling.


I wonder how you measure the quality difference for higher rates? 384K, 512K and beyond? I hear from audiophiles that there is a very distinct difference, but there is absolutely no basis for it in science.


Not so: this oversampling is mostly about generating distortion products without aliasing, so in the context I mean, the difference is obvious. But, it's a tradeoff- purer behavior under distortion, versus a dead quality that comes from extreme over-processing. I've never seen an audiophile talk about 384K+ sampling rate unless they mean DSD, and with that, it's for getting the known unstable HF behavior up and out of the audible range.

Oversampling in studio recording is mostly about eliminating aliasing in software that's producing distortion, and it's only relevant in that context: I don't think it's nearly so relevant on, say, an EQ.


That could be simply that some of these plugins are either written to specifically work correctly only at 192 kHz; i.e. it's a matter of matching the format they expect.


> Not 192 kHz; no friggin' way.

That's an anthropocentric statement.

I want to leave it at that for now just to see if I get dinged for low effort.


Only one ding? I'm disappointed.

Anyway, check out the frequency range for other animals:

https://en.wikipedia.org/wiki/Hearing_range

Notice that there are a number of species whose range extends well past 20kHz. Even with 192kHz you're still chopping off the upper end of what dolphins and porpoises can hear and produce.

So please convince Apple and friends that you need 200+kHz to truly capture the "warmth" of Neil Young's live albums. Then we'll be able to crowdsource recording all the animals and eventually synthesize the sounds to communicate with them all.

Maybe then we can synthesize a compelling, "We're really sorry, we promise we're going to fix all the things now," for the dolphins and convince them not to leave. :)


for DAW use - this has been mentioned before on this thread, but 192khz has two beneficial effects - lower latency for realtime playing, and it "forces" some synths and effects to oversample, thus reducing aliasing.

All this comes at a high computational and storage cost though.

I personally use 44khz 24bit settings for DAW use.


TFA specifically addresses this and agrees 100% with you on it. Develop in 24/192, but ship in 16/48. Good point about supporting better downstream remixing!


I feel like there is some parallel to open source and GPL type licenses: you can never know which of your customers may wish to remix or further work on your materials, so you should ship the "source" material.


To truly support downstream remixing, musicians would need to distribute the original, separate tracks. Occasionally an independent musician will do this, but it's not at all common AFAIK.


For software, you have a source code, you compile it with a standard-complying compiler, and you’ll more or less reproduce the result.

Music is different. If you have the original multi-track composition, you can’t reproduce the result unless you also have the original DAW software (of the specific version), and all the original VST plugins (again, of the specific version each).

All that software is non-free, and typically it’s quite expensive (esp. VSTi). Some software/plugins are only available on Windows or Mac. There’s little to no compatibility even across different versions of the same software.

Musicians don’t support nor maintain their music. Therefore, DAW software vendors don’t care about open formats, interoperability, or standardization.


All true. One independent musician that I like, Kevin Reeves, made the tracks from his first album available for free download for a while. But he published them as WAV files. So, of course, one could only create one's own mixes, not easily reproduce his. But if he had just posted the files saved by his DAW (or his mix engineer's DAW), that would have been useless to just about everyone. Aside: Though the album was completed in mid-2006, it was recorded with DAW software that was obsolete even then, specifically an old version of Pro Tools for Mac OS 9, because both Kevin and his mix engineer (for that album) are blind, and Pro Tools for OS X was inaccessible to blind users at the time.

BTW, I'm merely a dilettante when it comes to recording and especially mixing.


Traktor stems and remix decks though.


I've done signal and sound-processing courses at the university. I know the Nyquist-Shannon theorem. I know all about samples, and digital sound not been square staircases.

I know and understand how incorrect down-sampling from high frequencies can cause distortion in the form of sub-harmonics in the audioable range.

I know about audible dynamic range and how many decibels of extra range 8-bits are going to give you.

I know all this, but I still have to admit: if there's a hi-res recording (24-bit, >48kHz) available for download/purchase, I'll always go for that instead of the "regular" 16-bit 44.1/48kHz download. I guess escaping your former audiophile self is very, very hard.

Anyone else want to admit their guilty, stupid pleasures? :)


I collect vintage headphones, and especially relish the truly absurd ones I can't believe anyone thought were a saleable item.

I'm up to about 300 different models.


Would love to see a shot of the collection on /r/headphones.


Yeah seriously, that's really cool.... and on that note what headphones does he actually use?!


More is better right?

Deriving pleasure from listening to music has a large subjective component. So if I've paid more for a track and / or I got the most amount of bits I could I'll probably enjoy it more. Also makes for great conversation topics.


raises hand


While the extra quality is lost on listeners, from experience I've found that super-HQ source material can change your results (for the better) when fed through distortion/compression/warming effects processors


Sure, when you're editing music it can be helpful to have super high sample rates and lots of bits to work with. But for listening, it's just a waste of space.


He was talking about listening.


tomc1985 said "fed through distortion/compression/warming effects processors" which is editing, not listening.


Yeah, editing


Lots of pre-amps can do this in real time, and some installation audio systems do it as part of pre-processing (e.g. compression, limiter, etc).


This is specifically addressed in the link under the section titled "When does 24 bit matter?"


Is that because of the HQ audio, or because the processing software introduces less artifacts at those sample rates/bit depths? I.e. could you get the same results by up-sampling CD quality audio?


For distortion and compression, the only possible difference I can think of is aliasing, and yeah you could just oversample your input, then downsample again on the way out. This is really really common in most professional plugins, often with an adjustable amount to balance quality and CPU usage.

I also have some gear that aliases at a volume you can't hear, but when you plug it into an analog distortion pedal, the aliasing in the high frequencies becomes apparent. This would be avoided if it had a higher sample rate so the aliasing was far out of the audible range.

For other sorts of effects like spectral processing, pitch shifting, the extra detail above 22khz really does make a difference, especially if pitching down.


More bits means:

• Less quantisation noise, so your noise floor is a bit lower and therefore, when you're mixing stuff together, you accumulate less noise

• More numerical room to play with, you can scale or shift the volume up and down more without clipping and with less loss of accuracy

With 16-bit CD audio, you can't just convert to 24-bit and suddenly have less noise. You might get more room, though.

As for higher sampling rates (more kHz, if you will), I think Monty mentioned some benefit regarding less sharp aliasing filters (can have a larger transition band from 20–48kHz, say, rather than from 20—22.1kHz), but it's not something I understand the benefit of well.


filters with a sharper cutoff will 'ring' more, smearing sharp changes in the signal out over time. It's a pretty subtle effect when it's all happening above 20 kHz though


I imagine it's because lossy codecs are tuned for human perceptual limitations, and when you process audio, it can "pull" areas of the sound that are otherwise hidden from your perception into perceptual ranges, analogous to how fiddling with the brightness/contrast of a highly compressed JPEG image can accentuate the artifacts.


We are not really talking about lossy codecs here, though, are we?


Ah, my mistake. But perhaps you could consider 24/192 -> 16/48 a lossy codec in this context; the same argument does apply, for the same reasons.


That way it does make sense, editing might require higher fidelity than we can perceive in the final project to avoid artifacts.


Actually, the article is not talking about it, but the topic very much is lossy codecs.

Specifically, if 24/192 AAC is worth it compared to 16/44.1 AAC (and the answer to that is yes, although the answer to 24/192 WAV is no)


But what about lossless 16/44.1KHz audio? It's not a "compressed" version of 24/192Khz. Chances are that your 192KHz music will be low pass filtered anyway, 192KHz is a trick that sound chipset vendors added so that integrators wouldn't have to make high quality analog filters for their ADCs. There's likely nothing above 22.05KHz except shaped noise, production artifacts, and very, very quiet ultrasound noises which have nothing to do with the music.


There's no such thing as lossless 16/44.1 because it's practically impossible to make an artefact-free antialiasing/reconstruction filter pair clocked at 44.1k.

The point of higher sample rates isn't to capture bat squeaks, but to relax the engineering constraints on pre-post filtering.

Nyquist is fine in theory, and if you've never actually tried to implement a clean filter you'll likely watch the xiph video and think "Well that makes sense."

If actually know something about practical DSP and the constraints and challenges of real filter design you're not going to be quite so easily impressed.

Likewise with higher bit depths. "Common sense" suggests that no one should be able to hear a noise signal at -90dB.

Common sense is dead wrong, because the effects of a single bit of dither are absolutely audible.

And if you can hear the effects of noise added at -90dB, you can certainly hear the effects of quantisation noise artefacts on reverb tails and long decaying notes at -30 to -40dB, added by recording at 16 bits instead of 24 bits.

Whether or not that level of detail is present in a typical pop or classical recording is a different issue. Realistically most music is heavily compressed and limited, so the answer is usually "no."

And not all sources have 24-bits of detail [1]. (Recordings made on the typical digital multitrack machines used in the 80s and 90s certainly don't.)

That doesn't mean that a clean unprocessed recording of music with a wide dynamic range made on no-compromise equipment won't show the difference clearly.

Speaking from experience, it certainly does.

[1] Technically no sources have 24-bits of detail. The best you'll get from a real world converter is around 22-bits.


> you'll likely watch the xiph video and think "Well that makes sense."

What video? This thread is about an article.

> The point of higher sample rates isn't to capture bat squeaks, but to relax the engineering constraints on pre-post filtering.

I just said that, I think.


Because modern production can involve hundreds or even thousands of digital effects. So 24-bit, or even 32-bit float just because you can, is standard.

As Monty demonstrates, it's a fraudulent waste to try to sell the result as a product to the end listener.


I think it's because processes that affect things at the individual sample level (which distortion/compression often do) have a lot more detail to work with. The author mentions this a bit in the article


> Its playback fidelity is slightly inferior to 16/44.1 or 16/48, and it takes up 6 times the space.

The article is highly technical. Does anyone have a way to describe this phenomenon intuitively?


24/192 can capture audio well beyond what we can hear. These sounds can, though, lead to distortion that is within our hearing range, depending on the equipment used.

Therefore, rather than just being useless extra data, it can be actively harmful to the listening experience.


It's not that much different to think of than 480p vs 4K/2160p -- dramatically higher resolution, increasing the width and length of the 'grid' that sound is recorded into -- except that in the case of audio fidelity, quality after a certain point causes problems with playback systems, resulting in audible artifacts and distortion.


That analogy works if you modify it:

- 480p vs 2160p is measuring the resolution of your cell phone propped up on a pillow at the other end of your living room

- experimental evidence shows that your eyesight is not good enough to pick up on the increased resolution, you've maxed out your sensory perception

- Your phone stutters trying to stream at 4k so the playback might actually be worse.


The analogy with images is demonstrated in this screenshot [1], which is referenced in a StackOverflow answer on the topic of image scaling in web browsers [2].

When the high resolution image is downscaled poorly, some high (spatial) frequencies are aliased down into lower (spatial) frequencies, manifesting as blocky/jagged lines. The images are 2D data in the spatial domain, while the audio is 1D data in the temporal domain, but the aliasing of high frequency signal to lower frequency artifact is similar.

Viewing a high resolution image on a panel that has enough pixels to display it without scaling is analogous to listening to 192 kHz audio on a fantastic speaker system that can reproduce those high frequencies accurately, instead of causing distortion by aliasing them to lower frequencies. On the other side, viewing a high resolution image which has been downscaled poorly is analogous to listening to that 192 kHz audio on a realistic speaker system that cannot reproduce high frequencies, which results in those signals aliasing down into the audible range.

And as you say, there is a point where, for the viewer/listener's sake, it doesn't make sense to push for higher frequencies because even if you can build a panel/speaker that will faithfully reproduce those frequencies without aliasing, the eye/ear will not be able to perceive the additional detail.

[1] http://www.maxrev.de/files/2012/08/screenshot_interpolation_...

[2] https://stackoverflow.com/a/11987735


Technically this is not aliasing; rather, the large and varied non-linearities of a speaker can act like a frequency mixer, which is why you'll get a 3 kHz sound (the difference) when playing, say, 20 and 23 kHz.


All good points! Indeed all that extra data makes the decode stage that much more difficult. Same happens with audio too, but high-res audio isn't nearly as bulky as video or nearly as resource-intensive to decode


very good analogy.


Increasing the bits doesn't increase the accuracy of the audio in terms of frequencies. It's 100% accurate regardless of bits. That's why trying to compare it to video resolution doesn't work. Bit depth is more akin to the screen brightness. If you turn down the brightness to 1 step from black, it's impossible to distinguish features in the image.


But weren't those "outside our hearing range sounds that can distort our hearing range" present in real life when the recording was made? So I would have heard the distortions had I been there in real life?


The distortions are not present in the source. They occur because the speaker is a physical device subject to physical limitations and can't perfectly reproduce the waveform. In the process of trying to reproduce the ultrasonic signal, the speaker can produce audible distortions. If the ultrasonic signal is not there to begin with, then it can't possibly cause distortion.


The problem is that your speakers were not designed to reproduce sounds at those frequencies, so instead of faithfully reproducing those sounds (that you can't hear anyway) they may emit unintended artifacts/distortion (that you can hear).


The distortions he's talking about would be introduced during playback by DAC, signal processing, speakers, or amplification.


I doubt that is the case. Most systems which I have used actually sound better at the higher sampling rates. This is not because the sampling rates cause anything to sound better, but because the hardware works better at those rates. In a blind test, you wouldn't be able to decide which one you liked better, only that there is a difference. At least on the machines I have, there are some clearly audible differences.


> you wouldn't be able to decide which one you liked better, only that there is a difference.

Yet people have been reliably _unable_ to do this.

The gold standard in listening tests for this is an ABX where you are simply trying to show that you can discern a difference.

When properly setup and calibrated people are unable to show that they can distinguish 48kHz and 192kHz.

Moreover, by the numbers, audio hardware tends to work less well at higher rates if they're different, because running them at higher rates makes their own internal noise shaping less powerful. (Though for anything done well the distinction should still be inaudible).

Finally, if you do have cruddy converters that sound significantly different at different rates nothing stops you from using a transparent software resampler (SSRC is well respected for good reason) to put the audio at whatever rate you want.. until you get better hardware. :)


"In a blind test, you wouldn't be able to decide which one you liked better, only that there is a difference"

That would have to be the noise. The math doesn't lie...


I'm not talking about noise or other artifacts from the conversion process. I am talking about differences in bass and treble balance that come from different engineering in the converters at different frequencies. In some converters, there are actually completely different circuits that engage for different frequencies. The converters that I have experienced do this, and they all have a noticeable effect on what we hear.


The core is here: https://xiph.org/~xiphmont/demo/neil-young.html#toc_1ch

When you play something stored at 44 khz, there are no ultrasonic sounds recorded. At 192 khz instead, there are, and the speakers may push some of the ultrasonic sounds down to the audible spectrum, causing distortion that the human ear can hear.


It’s mostly because speakers aren’t the best at reproducing ultrasonic frequencies that are captured by the high resolution audio files. You might have components of the speaker that have resonances that manifest the ultrasonic frequencies at a lower frequency.

Imagine the normal operation of a speaker as a swing, except you are pushing and pulling the swing all throughout the cycle as it goes up and down. Now, you can technically move the swing at a variety of frequencies if you’re holding onto it the whole time. However imagine as you push it back and forth (low frequencies), you also vigorously shake the swing at the same time (high frequencies). This would probably result in the chains rattling, similar to the unwanted distortions in the speakers caused by ultrasonic frequencies.


Basically, the sampling theorem says you can reconstruct the exact waveform with a certain number of samples. Adding a bunch more samples bulks up the file, but you didn't need them to restore the exact waveform. However, the unnecessary samples are in the file between you and the next sample you do need. At high enough levels of waste this creates an I/O bottleneck that hampers performance.

Another way to look at is that digital audio is not like digital imaging. There aren't pixels. Increasing the data rate does not continue to make the waveform more and more detailed to human auditory perception in the way that raising the pixel density does for human visual perception.

To describe it intuitively, forget your intuition that audio is like visual and start from "there is no metaphor between these two things."


Digital imaging definitely has sampling concerns too:

https://en.wikipedia.org/wiki/Aliasing

Even analog imaging has concerns that are better described in the frequency domain than the spatial domain:

https://en.wikipedia.org/wiki/Airy_disk

https://en.wikipedia.org/wiki/Optical_transfer_function


Interesting!


Images and audio are pretty similar, and sampling theory works for both - a low-pass filter on audio and a blur filter on images is the same.

The difference is audio is "actually" bandlimited and frequency-based, but images are "actually" spatial. When you try using frequency math on images you get artifacts called ringing, which are easy to see with JPEGs or sharp scalers like Lanczos.

Of course audio isn't really frequency-based either, or else it would just be one sound repeated forever. So there's still ringing artifacts (called "pre-echo") and cymbals are the first to go in an MP3.


> The difference is audio is "actually" bandlimited and frequency-based, but images are "actually" spatial.

I.e. audio is sampled along one dimension, while images are sampled along two dimensions. Note that frequency-domain considerations play a crucial role in all optical design, including imaging sensors.


It's a little more than that. A sine wave originates in the frequency domain but actually sounds like something, but images originating in the frequency domain are just noise and not pictures of anything.


> Increasing the data rate does not continue to make the waveform more and more detailed to human auditory perception in the way that raising the pixel density does for human visual perception.

Early low data rate codecs - such as the one used for GSM mobiles - are obviously inferior, but still functional. I think a better analogy is that an iPhone 7 has a 1 megapixel screen, so there's no difference between a 1 megapixel image and a 5 megapixel image, except one is much larger. Of course visually you can zoom in (or move closer in real life), but audibly you can't.


I get what you're saying, that if you remove the ability to zoom you can equalize things, but without that twist I think this preserves the essential problem with the analogy that Monty works so hard to correct. If your data rate is too low, you don't have enough samples to recreate the waveform. So you create an approximate. But there is a finite number of samples that you need to recreate all of the information in the waveform, so once you have that, there isn't any additional information there for you to obtain if you continue to increase the sample rate.


To be fair, phone audio signals have until recently been bandlimited to just a few KHz in centered around 3KHz.


> Another way to look at is that digital audio is not like digital imaging. There aren't pixels.

For the mathematically inclined, this would probably be a good time to repeat: pixels are not little squares[1].

[1] http://alvyray.com/Memos/CG/Microsoft/6_pixel.pdf


"Diminishing returns"?


In the amplitude domain, you definitely need more bits for that type of processing, and many other types.

Digital distortion basically simulates high gain with soft clipping, which basically takes a narrow slice of the amplitude domain and magnifies it. The extra resolution has to be there for that not to turn into ugly sounding shit with artifacts.


Well, yeah. For properly mixed audio, you don't need anything beyond 44-khz/16-bit. If you need to do extensive processing, or you're working with raw audio and want the extra margin to be able to recover a sound when it may be too quiet (or too loud) or otherwise compromised then more samples and greater bit-depth can be highly useful.


It's also great if you you're doing a bunch of time/pitch mangling.


I've heard (and it passes the smell test) that even if your ears can't resonate at frequencies higher than human hearing, the space you are in can, and your ears are sensitive to the differences in arrival time that are smaller than the pitches we can't hear. So at least in high sample rate music the imaging could potentially be cleaner. Assuming super great everything else.


That's one of those things that sounds true without putting thought into it but is ultimately false. A 44.1kHz sampled sounds can encode phases with offsets smaller than 22µs (1/44.1k). How much smaller? I'm not exactly sure, but I believe it's dependent on your sample size. My guess is 1/2^16 (assuming sample resolution of 16 bits), but I might be off by a factor of 2 (or indeed 100% wrong) there. Caveat emptor. If I'm right then that's 346 pico-seconds, which I'm betting is smaller than your brain can differentiate.


The formula, as far as I could find, is:

1 / (2 * pi * bandwidth * 2 ^ bitdepth)

So for full scale 44/16 signal it is about 0.1 nanosecond.

More here [1]. There are also some sample files for ABX tests there.

[1] - https://hydrogenaud.io/index.php/topic,108987.msg896449.html...


What you're sort of describing - having more of the higher frequencies resonate together - can actually cause intermodulation distortion (which itself can be in the audible range), which can make it sound worse...

I haven't had time to check the linked article but if it's the one I'm thinking of I'm pretty sure it goes into this.

In the end though, double blind testing (again, I think mentioned in the article) shows that there's a threshold (a bit below 48kHz, probably why 44.1kHz was chosen for CD audio) after which people cant distinguish between higher sample rates at any more accuracy than randomly guessing.


I think it's the intermodalation distortion or as I call it "the room sound" that your non-conscious mind perceives. The best example of this I've heard is when listening to identical masters on SACD/DSD vs PCM audio. Notably different sound when reproduced via loudspeakers.


Arrival time differences are essentially phase shifts and assuming infinite quantization you can encode phase shifts with infinite accuracy on a band limited sampled system.

So no.


If you are downloading to simply consume, then sure. But as others have noted this article assumes that mastering is always the last step in the editing process. Anyone who ever remixed, sampled or DJ'ed will disagree.

If we keep that in mind, then the setup of the problem changes significantly and most arguments made here do not apply to the download itself.


Right. I used to DJ throughout college and tried out a variety of formats. At high amplification, artifacts become apparent. VBR0 and 256kbit MP3s did not hold up, at all. Even smaller speakers and headphones didn't hide some of the compression artifacts for me.

320kbit CBR MP3s were... I'd say, generally OK if you did not plan on skewing their tempo much outside of a very small range (give or take 5% speed). Really bad artifacting becomes audible quickly beyond that range. Maybe it was placebo, but I also found differences between FLAC and 320kbit MP3 discernible when working on big, more powerful speakers.

But, with FLAC, it didn't matter if the sound was played at 10% tempo or extreme amplitude, audio was always crisp, clear, and free of compression artifacts (obviously).

On my laptop speakers or earbuds, no way would I be able to tell the difference between any of these today.


He specifically addresses audio engineering (which can be interpolated to DJing and remixing). As he says, what you need in that case is 24bit audio to have more dynamic range to play with. >48kHz is still useless to human hearing.

This particular article doesn't touch on compression as well, which is probably the biggest thing for remixing or DJing. You really want uncompressed or at least high-bitrate AAC.


> Anyone who ever remixed, sampled or DJ'ed will disagree.

Could you expand on this?


Super-high sample rate could make a difference if you're sampling it and pitching it down heavily to make a bass-line. But in practice, I think most such audio has very little information above 20kHz. Ordinary microphones are designed for audible frequencies, and ultra-sonics are easily lost in production. What music actually contains interesting ultra-sonic information for slowed-down listening?


If anything, wouldn't the high sample rate be a liability when trying to shift pitch? If it captures too much outside the range of human hearing, then that ultrasound might suddenly become audible when you change pitch, which could sound weird.


Exactly, and the weirdness might be interesting.


Ah, right. And if ultrasound is actually much like regular sound, it may sound perfectly normal.


My go to comparison track for sound quality for a new device, headphones, DAC, amp, codec etc, is "Speak To Me" from Dark Side Of The Moon.

I don't hear a difference between various rips that are 16/128 or 24/192, but I have noticed a difference listening to the Blu-ray version of it (which is many-many gigabytes in size). It is a definitely interesting experience, but the way I can describe it is as an absolute absence of noise.

Every single version but this one exhibits noise at the start (the heartbeat sound) as the sound goes from very quiet to loud.

But, to be fair, it could just be different masters.


Which would be in line with the article: a SACD release pressed onto a CD-R sounded better than the CD release, because the SACD release had a better master.


You should try downsampling the Blu-ray version yourself and comparing then.


Hah. I use the Dark Side of the Moon album for comparisons as well. I find that you need to use something that you are very well acquainted with. Oddly enough I also use Electric Heart, by Sia, as it has such a bizarrely flooded mid-range :)


almost certainly different masters :)


"Under ideal laboratory conditions, humans can hear sound as low as 12 Hz[8] and as high as 28 kHz..." From https://en.m.wikipedia.org/wiki/Hearing_range#Humans

That explains why some people can tell the difference between 44.1 khz and the higher-resolution sampling rates. It also means that the ideal audiophile sampling rate is somewhere between 58khz and 60khz, not 192khz.


You just need these speakers (http://magico.net/product/ultimate.php) to reproduce those frequencies! The problem, of course, is whether even "prosumer" recordings keep frequencies higher than 20 or 22kHz, even if they were recorded on equipment that sampled at 192? Seems like most would LPF the rest away in the process of mastering for the masses.


Perhaps this is a silly question but the explanations mostly make perfect sense to me but in the case of sample rate, what happens when two waves with very different frequencies overlap?

Say I've got an 18kHz wave and a 9kHz wave and the 9kHz wave is ever so slightly out of phase. Then imagine there are 10 different waves under 20kHz all interfering with each other in different wages.

Is it still possible to reproduce everything accurately?

And on bit-depth and dynamic range: Given that much audio doesn't use the full range available to it, wouldn't higher bit depth increase the fidelity in the range the audio does fill? The article talks about the bit depth and range only in terms of the maximum volume but what about fidelity? What's the minimum difference in volume the human ear can hear?


With regards to the first part of your question - as long as the signal is bandlimited the Nyquist-Shannon theorem applies. If all of the waves are under 20kHz, regardless of how they're interfering with each other, the signal can be reconstructed perfectly with a 40kHz sample rate.


Incorrect. Nyquist-Shannon theorem does not guarantee phase! It only guarantees frequency reproduction. This paper illustrates the issue succinctly: http://www.wescottdesign.com/articles/Sampling/sampling.pdf


Assuming that you have a perfect DAC (which you don’t), with perfect analog brickwall filters (which you don’t), with no compressed audio (which isn’t the case).

With a perfect DAC, and perfect filters, and expensive headphones, there is no difference between 16bit/44.1kHz WAV and 24bit/192kHz WAV (until you add an EQ, then there is).

With a usual home DAC, and its filters, and at best $100 headphones, there is a major difference between 16/44.1 AAC and 24/192 AAC.

(And guess what the article attacked? The decision of Apple to sell lossy files in 24/192 for home usage. The discussion was never about lossless files, or professional usage)


So long as all frequencies you're trying to represent are below the Nyquist frequency, there's still only one band-limited reconstruction of the samples. So yes, it's still possible to reproduce everything accurately so long as all of the individual frequencies (no matter their relationships) fall below 1/2 the sample rate.


> Is it still possible to reproduce everything accurately?

Sound is linear, so yes. Any signal can be represented correctly as a sum of frequencies.

> wouldn't higher bit depth increase the fidelity in the range the audio does fill?

When you do the maths quantisation turns into the original sound + quantization noise. The bit range determines the noise (dither helps here). So it's a question of whether the noise is loud enough to hear. There are samples on the internet you can try to determine yourself at which stage you can notice the noise. It's shocking how few bits are actually required.

For a sloppy recording or repeated processing the quantisation noise goes up, so more bits are required.


I believe that recorded music is a ton of waves of different frequencies going at the same time. You still just need to sample at twice the highest frequency. Overlapping frequencies may have some audible effects that may be undesirable (like beating [0]), but those would be audible in the real (analog) signal as well as the sampled signal.

[0] https://en.wikipedia.org/wiki/Beat_(acoustics)


IIRC, I think the article's point about maximum volume was about how you'd need to turn the volume up to deafening levels before the small differences would be noticeable.


It doesn't seem to say anything like that. Looking at it again, I see this:

> It is also worth mentioning that increasing the bit depth of the audio representation from 16 to 24 bits does not increase the perceptible resolution or 'fineness' of the audio. It only increases the dynamic range, the range between the softest possible and the loudest possible sound, by lowering the noise floor. However, a 16-bit noise floor is already below what we can hear.

But I don't understand, why is this? Is the step in volume between two values fixed regardless of the bit depth? If so, why?


The video linked in the top-most comment will beautifully answer your questions. It's also on YouTube.

You can think of it as in the sampling rate provides a mathimatical solution to the waveform - it's simply 100% accurate all the time. The bit depth adds extra "buckets" to divide the volumes into . If you have 2 bits, you then have volume levels 0, 1, 2, 3.


I challenge y'all to take two copies of the same professionally mastered track, 16 vs 24 bit... and phase invert one, and put them on top of each other. What you can then hear is basically like an auditory diff.

Do you not hear anything? Yeah... all that sound is what amounts to lost fidelity when you down-sample. You can't just argue on bit-rates alone... It's about having head room in the mix and room for more fidelity. Sure, if you mix shit badly, you can't hear the difference, but that's missing the point entirely.

It's not nearly as bad as the time someone tried to tell me that Opus was an adequate audio file format for music, but still... frustrating.

I know I don't have super-human hearing, and I can easily hear the difference. Just because everyone can't hear the difference, or can't tell that they can hear the difference, doesn't negate the fidelity loss of lower bitrates.

Further, given a large enough speaker stack, it's not about what you can hear any more, it's about what you can feel.

And never-mind the benefits of a 24bit DAW pipeline... Hello, low latency?


Kind of a side topic, but does anyone know what's up with Sirius XM satellite radio? Whenever I listen to that in my wife's car, I find the sound quality to be obnoxiously bad, to the point that I choose to listen to plain old FM if I'm driving her car.

The higher frequencies (hi hats for example) are mushy and sound like they are warbling, and the sound just generally has a lack of depth. What is the explanation for this, or am I just imagining it?


I notice it too, and it's because the music is extremely low bitrate - usually 48kbps, but sometimes as low as 32. It's HE-AACv2 - not as good as Opus, but close. Its characteristic artifacts include destroying transients, like your hi-hats.


Probably compressing the living daylights out of it. By squashing it down, they flatten the amplitude in the music and things with a lot of transients (by definition, drums) suffer. It results in a lot of fatigue, because the higher the compression the more it turns into white noise, essentially.

Sirius XM might be one of the last to standardize on a reasonable LUFS setting of 12 to 16.


Highly compressed AAC. Answers on Quora say about 48kb/s of bandwidth, so equivalent to about a 96kb/s MP3, which sounds about right.

Bandwidth allocation is per channel, talk only channels are even worse and some words are barely comprehensible.


No, I hear it too and would like to know how it works. I think some stations get more fidelity than others, but it's almost always compressed as you described.


Off topic, but I've noticed in the gym, sometimes the music playing from the instructor's iPhone through one of those cheap Ion Block Rocker bluetooth speakers (big boxy one, looks like a guitar amp) has very very noticeable pitch sag in various parts of various songs.

It's via bluetooth, and the source is Pandora. I even googled Pandora and pitch change, and found some forums with folks discussing bluetooth causing this to happen, which has never been my experience.

It occurred to me that slowing the playback (eg, causing the pitch to sag) could be a fantastic way of dealing with a connection that's too slow. It would be far better than stuttering, even though many of us would find the pitch change really annoying. The instructor and my wife simply can't hear it, or say "I thought that was just part of the song".

Anyway, has anyone ever heard of this happening? Do certain products do it? Were the Pandora forums wrong and this is actually a Pandora problem? Of all of the bluetooth problems I've had in my life, I've never experienced this before, so I tend to lean towards it being a data rate issue on the cell connection and Pandora slowing the music down.


> Off topic, but I've noticed in the gym, sometimes the music playing from the instructor's iPhone through one of those cheap Ion Block Rocker bluetooth speakers (big boxy one, looks like a guitar amp) has very very noticeable pitch sag in various parts of various songs.

Many Bluetooth speakers are not loseless. Cheaper ones are likely going to support MP3 and that's it (or even worse, just the minimum required SBC).

Pandora is already compressed, if your instructor doesn't know to flip the "HQ sound" toggle (which is turned off by default on Mobile, even with a paid subscription!), then 128kbit MP3 recompressed to MP3 yet again, is going to sound pretty atrocious.

Not sure if one of the other codecs used for A2DP can cause pitch changes though.


At my greatest aspiration, I'm no more than a casual music listener - however I am very conscious of pitch. On more than one occasion, I've noticed that a song I'm familiar with seems off pitch, as in lower or higher key than I think of it in my head - and I seem to notice that particularly around shitty speakers or bluetooth or some combination of the two. Glad to know I'm not crazy :)

To your point, I would lean towards bluetooth as I've never associated the phenomena with Pandora. My friends and I mostly use Spotify.


I use spotify as well. And what I describe is a sag for 5 or 10 or 15 seconds, not just a song that's playing too slowly. it's really weird.


Would it perhaps be because the bluetooth speakers typically cannot reproduce bass below a certain range and so play the harmonics of the bass note or use the resonance of the item they are placed on to "produce" the bass note? In truth you are not hearing the bass note - you are hearing the harmonics and your brain is making up the missing low note.


Or people who pirate their music by ripping YouTube videos that are just the songs pitch shifted to avoid automated copyright strikes


FTA: "192kHz digital music files offer no benefits. They're not quite neutral either; practical fidelity is slightly worse. The ultrasonics are a liability during playback."

I would expect the samples to be interpolated before the final DA conversion takes place, so no extra ultrasonics should be involved. And there would be filters for them anyway, we still have to use them for 44.1, 48, 96 etc.


But if you're filtering out ultrasonics, then you've guaranteed that the extra samples will be wasted. There won't be any difference in the reconstructed signal at all.


The extra bits mean you can change how you filter the sound later, depending on what range your headphones/speakers are capable of reproducing without distortion, or what your ears are capable of hearing.

Who knows, maybe technology or genetic changes will allow future humans to hear in a wider range than we do now. Those people might appreciate being able to hear what our music really sounded like.


As an engineer, those extra samples are never wasted, and as a listener, I could care less about the "wasted space." Drive space is meaningless at this point. When I buy a piece of audio, give me the highest quality available and I will make my own decisions regarding the frequency I want to listen to it at. The reasons behind not giving the user the highest quality possible all sound like a misguided excuse to prevent piracy.


Ok... but the point stands that the signal will be the same if you low-pass filter it no matter how many samples you take. You might as well distribute the file at 44.1 kHz and then interpolate to something higher yourself, if you want to play with it later.


> as a listener, I could care less about the "wasted space."

In general, bandwidth is more of a concern than disk space. Most media people consume is never stored on the local device.


So 192/1024? 1024/4M? 1G/1T? Like is there a practical limit for you when you say "highest quality possible"?


I see no issue with storing 24/192 multichannel audio like what comes on blu-ray audio discs, or DSD streams. For my current hard drive situation, I could store 5 GB per song and easily have enough space to have a very large catalog. But drive space is only getting cheaper.


Wasn't the point that if you have a system capable of outputting a 192kHz analog signal on the DAC, essentially hitting your speakers with 96kHz audio, any non-linearity starting at the DAC and beyond could cause distortion in the actually audible range?

And if you do any downsampling, or any low pass filtering in general, then having 192kHz sampling is even more useless, in addition to only having the "benefit" of adding a frequency band to the spectrum that's completely inaudible anyway.


We need the extra frequencies to communicate over the air updates to our mind control implants, obviously.


I've always been able to tell when something on Spotify I've heard before is lower quality now. I can instinctively tell in my car when that's the case. It's mostly due to how loud certain parts of the song are or how it sounds at higher volumes. I guess, given what he says, that wouldn't be the case if made sure they were the same dB.


It is well known that music media for "popular" consumption are often intentionally mastered with much less dynamic. This is called "compression", but has, on the surface, nothing to do with compression of bits* . It's the dynamic that's compressed, making quieter and louder sounds be closer together.

For markets with people that are more likely to care about sound quality, though, a much larger dynamic range is preserved. This is why the same album often sounds better on vinyl than on digital media[1]. It has nothing to do with the media, it's the superior mastering that was consciously chosen.

The Wikipedia article on the Loudness War[2] offers a good explanation.

* Well, technically, music with compressed dynamic has less entropy, so can be encoded at a lower bitrate without loss.

[1] See this database for example: http://dr.loudness-war.info [2] https://en.wikipedia.org/wiki/Loudness_war


It's true that much popular music is heavily compressed, but you normally wouldn't encounter two differently-compressed masters of the same song. Two exceptions would be radio (some stations used to apply their own compression, and perhaps they still do) and some high-quality releases aimed at audiophiles that are produced from a less-compressed master. I wouldn't think that streaming services like Spotify would re-compress songs.


I was a bit imprecise, but the "high-quality releases aimed at audiophiles" is essentially what I was referring to. The record stores are full of remastered vinyl editions.


So how about for Spotify? It's all digital media, but I pay more to be able to have the highest quality version streamed/pre-downloaded for songs and I can pretty much tell the difference. I've had anecdotal experiences were I noticed a distinct difference and checked to find out somehow my settings were changed.

Is it that Spotify changes the dynamic for lower and high quality encodings of the songs?


I don't know what Spotify does. It's possible that they use compression on the "non high quality" media, it's also possible that you are hearing other differences that are the result of different encoding parameters.


You could be hearing that Spotify applies dynamic compression if you have the 'set the same volume level for all songs' setting enabled (it's on my default).


Without even reading the article I already know that it's extremely hard that hearing the difference between 16/44.1 and 24/192 indeed is really hard if not impossible.

I only use higher bitrates when creating music so when I mix or record I don't need to make the levels go almost into red to get a hot enough signal, I can do all kinds of changes like slowing down or up, distortion, EQ, compression without losing any perceived detail in the end. If I record on 16 / 44.1 then start manipulating the sound I start losing detail immediately.

But in the end more than 20 / 88.1 won't do much for you. 16 / 44.1 might be a bit less than ideal for music with a lot of dynamics but it's absolutely fine for most purposes.


No, not "really hard if not impossible", it is impossible. The article itself cites that no one has ever been observed to be able to do so.


Yeah, when you leave the possibility open, everyone wants to believe they are that special one with amazing ears that can tell the difference where no one else can, but it's always placebo.


More than 44 is useful for tempo matching while mixing


Yes, the article also mentions how 24/192 can be useful for production and editing.



Makes me long for the day when consumer audio products were proudly labeled as having a 1-bit DAC. Nowadays you couldn't sell audio hardware that made such truthful statements about its internals. Gotta have the bits.


Hey, they still do. And it's 'audiophile.' DSD is 1-bit, but with a very high sampling rate (2.8224 MHz). Though, as far as I know there isn't actually any quality boost vs PCM.


That's my point. The golden ears wouldn't deign to buy a "cheesy" 1-bit DAC. It has to be recast as something else to fool them.


Funny that the author says we haven't found any such people in the past 100 years of testing with truly exceptional hearing, such as a greatly extended hearing range, so they probably don't exist, yet I was tested at around 13 as having 33KHz upper range, which allowed me to hear a beetle walking over leaves and grass in a basement window sill at about 20 feet away...

Unfortunately I lost a lot of hearing through concerts, headphones, and traffic, but I can still tell the difference between an MP3 and CD-quality music much of the time.


That's not even an additional octave over 20K. It amazes me when people behave as if there's a magical hard limit that's really, really precise and applies to all humans.

I think you probably understand that people get locked into arguing for victory, and if you tell them you were tested with thus and so range (as a young person, which is plausible) they will simply call you a liar.

My own experience is this: when I was a kid, I got an ear wax problem, and had it removed (hasn't recurred). It was a horrible painful nightmare with nasty tweezers and water squirters, and I was just a little kid… but afterwards, sound (especially very high frequency sound) was a revelation.

Later, when I was a little older, the advent of digital audio (at first, in record albums) was a nightmare to me, because I couldn't understand how or why that stuff sounded SO BAD. And of course my early experience of CDs was pretty nightmarish: pod people music, with all emotion and humanity weirdly excised. That's what got me into audio: wanting to understand this, and then later, fix it.

I did actually succeed: I can produce and mix and process digital audio that young me would not be horrified by. But especially if I had to meet that higher bar, I won't be able to do it at less than say 22 bit/80K, well engineered. If I get to use all my current tricks I could do it at 20 bit/80K: I can cheat word length easier than I can compensate for a nasty brickwall filter.

24/96K is widely prevalent and enough, given good converters. I'm not convinced 192K is at all necessary, but the more people crusade against it, the more contrarian I get ;) I've got a Beauty Pill album mastered in 24/192K and it sounds freaking phenomenal. Mind you, I have professional equipment designed to handle that.


Your describing of “MP3” as a level of quality doesn't really give credence to your claim.


Contrary to what he says, I can most certainly see some IR remotes. Not super bright but enough to notice. Same goes for the IR lights in my Kinect.


"The original version of this article stated that IR LEDs operate from 300-325THz (about 920-980nm), wavelengths that are invisible. Quite a few readers wrote to say that they could in fact just barely see the LEDs in some (or all) of their remotes. Several were kind enough to let me know which remotes these were, and I was able to test several on a spectrometer. Lo and behold, these remotes were using higher-frequency LEDs operating from 350-380THz (800-850nm), just overlapping the extreme edge of the visible range. "


Interesting. I had not read the footnotes. They should update the actual post rather than simply amending it in the footnotes.


Have you had cataract surgery? I know that can have an impact on visible spectrum depending on the replacement lens they put in.


Nope, I'm afraid not. I never noticed until maybe 3 years ago, so I think it's a newer development. I'm 31.

I also got glasses this year for the first time in my life, which the need for came on rather suddenly... So maybe it's linked with some kind of degeneration? I really can't say. I honestly didn't think to mention it to the doctor, but now I'm thinking maybe I should have.


I'm not an ophthalmologist, so if you are concerned about your eye health talk to one. That being said I was referencing that the human eye (retina) is capable of seeing a wider spectrum than we typically see. The natural lens filters out part of the spectrum (UV) that the synthetic replacement lenses do not filter out.


I expect this is because some actually produce light in the visible spectrum too.


I want as much resolution as possible, because when you do like I do and introduce pitch-shifting after the fact (re-tuning a Floyd Rose for every other song is not really conducive to a jam session) 24/96 just doesn't cut it at all, and you get bad artifacting further than a half step either way.


Minor niggle: we can see ultraviolet just a little because it makes various bits of our eyes glow. Perceptually this shows up to us as a little bit of fuzziness near a UV source. But no, we can't see it with anything like the resolution we see visible light.


Excellent writing. Information conveyed concisely, but targeted at non-audiophiles.


So, having come from an image-processing background to put in 15 years of experience at audio processing companies, I've seen the xiph arguments, or something like them, several times. Heck, I worked for Sony (where that precious 22KHz cliff was made) for close to ten years... in professional media... many of those years with Sony professional audio people.

And the "44.1KHz is enough" or "48KHz" is enough" people are, sadly, kind of dumb.

How do I know? Because I was dumb, too.

Being a coding/math/audio/video badass, after a few years of industry experience, I rattled off some mouthy kid quip, saying "well, I don't know why we do 24/96, since nobody can hear above 20K anyway..."

And a very talented, very knowledgable, and generally reserved engineer suddenly perked up one eyebrow and said, incredulously "because temporal and frequency response are inherently linked..."

That look was in 2001, and I still remember that feeling of dread sinking in, realizing that I had no idea what he was talking about, no concept of why that would matter. I knew about Nyquist and could write a quick FFT, but he'd spent four years getting a degree in pure audio engineering at the most selective program in the country. That look, which I completely remember today, was like a deeply disappointed parent after a kid has just been bailed out of jail, from one of the nicest engineers I know.

It was late, I was brash, and I pressed on, asking him what he meant. "What's a transient look like, spectrally?" he asked. He waited for my blinks to make audible sounds (due to the apparent hollowness of my head), then he asked "and how many channels of audio do we listen to?"

He watched me stand there, like a doofus, for what seemed to me like several minutes (probably 5-10 seconds), then he went back to coding.

It didn't hit me until weeks later, and I didn't really internalize until years later, that he was hinting at inter-ear phasing the other facilities of our auditory systems besides frequency response. Years later, I read up on Georg von Békésy's incredible work (including positional acuity), and I worked as a tech lead at Digidesign on the first-generation Venue live sound system which operated at 48KHz but with incredibly low latency (processing steps were 1, 2, or 16 samples) due to the requirements of, for instance, vocalists using in-ear monitors.

Along the way, I ran across Microsoft engineers who thought that ~10-20ms inter-channel timing consistency would be okay in Windows Vista (it wasn't), conducted blind tests between 96KHz and 44.1KHz audio (for people who were shocked to immediately notice differences), came across plenty of hot-shot kids who said exactly the same kind of stuff I'd said, and saw postings from xiph making a mix of valid and grossly sophistic arguments ranging from "here's how waveform reconstruction from regularized samples work" (good) to "audio equipment can't even capture signals beyond this range" (dumb). At times, I thought about setting up refutation articles, then I realized, like many, that I had actual work to do.

Von Békésy's work points to positional inter-ear phase fidelity of roughly 10µs. What's the sampling interval at 44.1KHz? >22µs? Good luck rebuilding that stereo field at 44.1...

The trick is that there is a really serious diminishing return on audio sampling rate. 4KHz to 8KHz is enormous... 8KHz to 16KHz is transformative... 16KHz to 32KHz is eye-opening... 32KHz to 48KHz is appreciable... 48KHz to 96KHz is.... pretty minor, especially in the age of crappy $30 earbuds, streaming services delivering heavily compressed audio that will be crammed into a BT headset that may or may not be connected with additional compression, and all of the convenience that those changes bring. You may detect it in some audio if you're really listening, if you know what to listen for, and it may present advantages in system design (converters, processing, etc). From a data-rate perspective, the low-hanging fruit has already been picked.

But people who smugly say that there is "no difference", that audiophiles are buying "snake oil", are letting their ignorance show, and that's including that kid that I was, 16 years ago.

I've since moved out of pure pro media to consumer devices, where precision takes a back seat to the big picture a lot of the time. When discussing an audio fidelity multi-channel problem with a possible vendor last year, I expressed my concern about the inter-channel timing assurance slipping from 1µs to 50µs in product generations. "Depending on the sampling rate, that's several samples of audio", I said.

A very senior engineer on our side (Director equivalent) quickly admonished me, saying "it's microseconds, not milliseconds", to which I said "I know... Which is why it's several samples, not several thousand..."

From the look on his face, I'm 100% sure that he didn't understand me at the time, but I hope he put it together eventually.

In the end, the industry has moved in the opposite direction of 24/192 for a long time. If we can get back to normalization of CD quality audio, I'll be happy.


All that still doesn't explain why in properly controlled double blind trials, people still haven't been able to demonstrate the ability to distinguish between different sample rates above 44.1kHz though...


Citations? I'm happy to read any research you have. (You can also come over, crack a beer, and record balloon pops and acoustic instruments... and an electric guitar... and electric kazoo.)

The xiph-cited studies I've seen show an identification of difference and a preference for... MP3. Hey, we want what we want.

Otherwise, go read Von Békésy's work for the foundation, established in the pre-digital era, but transferable if you understand digital audio.

For recognition of difference in high res audio, see:

Reiss - https://qmro.qmul.ac.uk/xmlui/bitstream/handle/123456789/134...

...And the papers referenced.

It's an interesting meta analysis and a good survey of the last 20 years of publication on the subject.

If you have some properly controlled double blind trials that show no discrimination ability, I'd be happy to read them. I'll admit that I haven't conducted statistically sufficient tests. I have, however, double-blinded (via software pseudo-random sample randomization).

Like I said, though, I've got work to do. Do some listening. Read some papers.


This seems the best place to plug one of my favorite musicians, 24192(https://m.soundcloud.com/x24192)


Am I the only one who have no idea what 16/44.1 or 16/48 mean? I initially thought 192 here was referring to 192 vs 320 etc but this is apparently about something completely different?


16/44.1 - 16 bits at 44.1 kHz frequency. 16/48 is 16 bits at 48 kHz frequency. Both refers to the sampled but uncompressed, "raw" audio.

192 and 320 usually refers to the bitrate for mp3 compressed audio, which indeed is something different. The mp3 compression removes more and more details from the original audio to fit "inside" this bitrate speed window, hence higher is better because less original content is removed. Only great ears can hear what was removed from a 320 stream.


bit depth / sample rate

common sample rates are 44.1 and 48 kHz and multiples: 96 kHz ... 192 kHz


I don't know. It could be very useful once cellphone makers find out how to blast audio directly into nerve impulses. New and innovative audio filters. Humans with custom DNA. Etc.


There's something satisfying about having really fat audio files on the hard drive.


Exactly in 2017 it looks like silly but i t has been popular in sometime.


when I listen mp3s via 24/192 M-audio spdif to denon 1705 there is a huge differenc to just optical from the motherboard. why is that then?


There can be dozens of completely different reasons why that might be the case, without requiring inaudible frequencies to be involved.


I work in this industry, and I have produced what is almost certainly the most high-performance dither to date, which works through noise-shaped Benford Realness calculations: http://www.airwindows.com/not-just-another-dither/ I mention this to say that I can absolutely make 16 bit 'acceptable' or 'listenable', even for audiophiles. I do that for a living. And yet…

Monty is wrong. To cover the range of human listeners, the required specs even through use of very insensitive double blind testing (which is geared to substantially indicate the PRESENCE of a difference between examples if that's present, and does NOT similarly indicate/prove the absence of a difference with a comparable degree of confidence: that is a horrible logical fallacy with realworld consequences) are more like 19-21 bit resolution at 55-75K sampling.

Beyond this, there's pretty much no problem (unless you are doing further processing: I've established that quantization exists even in floating point, which a surprising number of audio DSP people seem not to understand. There's a tradeoff between the resolution used in typical audio sample values, and the ability of the exponent to cover values way outside what's required)

That said, it is absurd and annoying to strive so tirelessly to limit the format of audio data to EXACTLY the limits of human hearing and not a inch beyond. What the hell? I would happily double it just for comfort and assurance that nobody would ever possibly have an issue, no matter who they were. Suddenly audio data is so expensive that we can't allow formats to use bytes freely? That's the absurdity I speak of.

Our computers process things in 32-bit chunks (or indeed 64!). If you take great pains to snip away spare bits to where your audio data words are exactly 19 bits or something, the data will only be padded so it can be processed using general purpose computing. It is ludicrous to struggle heroically to limit audio workers and listeners to some word length below 32 bit for their own good, or to save space in a world where video is becoming capable of 1080p uncompressed raw capture. Moore's law left audio behind years ago, never to be troubled by audio's bandwidth requirements again.

Sample rate's another issue as only very nearby or artificial sounds (or some percussion instruments, notably cymbals) contain large amounts of supersonic energy in the first place. However, sharp cutoffs are for synthesizers, not audio. Brickwall filters are garbage, technically awful, and expanding sample rate allows for completely different filter designs. Neil Young's ill-fated Pono took this route. I've got one and it sounds fantastic (and is also a fine tool for getting digital audio into the analog domain in the studio: drive anything with a Pono and it's pretty much like using a live feed). I've driven powerful amplifiers running horn-loaded speakers, capable of astonishing dynamic range. Total lack of grain or any digital sonic signature, at any playback level.

My choice for sample rate at the extreme would be 96K, not 192K. Why? Because it's substantially beyond my own needs and it's established. I'm not dissing 192K, but I wouldn't go to war for it: as an output format, I would rather leave the super high sample rate stuff to DSD (which is qualitatively different from PCM audio in that the error in DSD is frequency-sensitive: more noise in the highs, progressively less as frequency drops).

Even with DSD, which is known to produce excessive supersonic noise even while sounding great, the scaremongering about IM distortion is foolish and wrong. If you have a playback system which is suffering from supersonic noise modulating the audio and harming it, I have three words you should be studying before trying to legislate against other people's use of high sample rates.

"Capacitor", and "Ferrite Choke".

Or, you could simply use an interconnect cable which has enough internal capacitance to tame your signal. If you have a playback system that's capable of being ruined just by 192K digital audio, your playback system is broken and it's wrong to blame that on the format. That would be very silly indeed.

I hope this has been expressed civilly: I am very angry with this attitude as expressed by Monty.


I will add that the concerns of transient timing are actually a fallacy: given correct reconstruction, sampling is more than capable of producing a high-frequency transient that crosses a given point at a given moment in time that's NOT simply defined by discrete samples. Reconstruction is key here, and no special technique is required: sampling and reconstruction alone will produce this 'analog' positioning of the transient anywhere along a stretch of time.

The accuracy is limited by the combination of sample rate AND word length: any alteration of the sample's value will also shift the position of the transient in time.

But since the 'timing' issue is a factor of reconstruction, you can improve the 'timing' of transients at 44.1K by moving from 16 to 24 bit. The positioning of samples will be a tiny bit more accurate, and that means the location of the reconstructed wave will be that much more time-accurate, since it's calculated using the known sample positions as signposts.

Positioning of high frequency transients does not occur only at sample boundaries, so that alone isn't an argument for high sample rates. You can already place a transient anywhere between the sample boundaries, in any PCM digital audio system. The argument for higher sample rates is use of less annoying filters, and to some extent the better handling of borderline-supersonic frequencies. For me, the gentler filters is by far more important, and I can take or leave the 'bug killing' super-highs. I don't find 'em that musical as a rule.


This is fabulous


(2012) but still an excellent article.

Edit: And the linked "Show & Tell" video is a great way to get some "intuition" about the sampling theorem. https://video.xiph.org/vid2.shtml


Yeah, this post title should be dated, I definitely read this exact post years ago.

I mean it references Steve Jobs in the present ffs.


Thanks! We've updated the headline.


Still relevant, as well.


It really needs a TL;DR because it's all buried in so much fluff I didn't understand why. (I understand the rest of you all really enjoy that and wouldn't call it fluff but I'm not an audio engineer).


To be replaced soon by "4k video is very silly indeed"

If you look at the angular resolution of the eye, unless you are sitting very close to the screen, you can't resolve 4k video.


Yes, I've measured my eyes' resolution to be about 60 pixels per degree. I made a black-and-white grid and set it as my desktop picture, with a one-to-one magnification. Then I backed away until I could not discern the pixels. There's even a moire effect, like with cameras, as I reached my threshold. Then I used trigonometry to calculate my angle of view.

Most living-room TVs are probably placed at a 15- to 25-degree angle of view. 1920x1080 is enough for up to a 32-degree horizontal angle of view, which is 7 feet away from a TV with a 55" diagonal, for example. I will say, however, that movie theaters look a little better with 4K, since 30 degrees is supposed to be the back row, and the front row might be 60 or so.


Can't tell if this is sarcasm or not, but I think that's sort of the point. You want a high enough pixel density so you can't perceive pixels at normal viewing distance, and I can definitely see pixels on my 42in 1080p screen from across the room.


I'm going to go for "sarcasm" there.

I use a 65" 4k TV with my HTPC in the living room, but often do desktop-ish things on it. (Logitech's wireless keyboards are great).

The difference in resolution between 1080 and 2160 is huge. 1080 is just fuzzy.


65 freaking inches! Of course you are gonna need 4k


That's not true at all. I can visually discern 4K from 1080p on my desktop monitors (4K 27 inch) and even on my 15 inch Retina MacBook if I try. On a large TV it's generally fairly easy too.

Sure, if you're looking at a little 40" TV five meters away, not so much, but for many screens in common use 4K is useful (but getting close to the point of diminishing returns, sure). Whereas as this article says, higher than 48kHz audio should be literally impossible to discern for any human in any setup.


the devil is in the DAC and ADC. You just can't turn 16bit/24bit data directly to/from analog without much loss. The 1/65536 accuracy voltage divider simply don't exist.

So, you have to up-sample the songs to high rates with less bits, like 1bit to 6bits, then do the conversion, and get the best SNR you can.

In this sense, there's simply a lot of advantage of using 24/196 since the above conversion can result in less loss and higher SNR


why was this down-voted too? it's the truth!


and yet, we are nowhere near being able to electronically reproduce a live acoustic music performance. have you ever walked by a bunch of musical sound coming out of a room and thought to yourself, "wow, those live musicians sound great" only to discover it was just a stereo playing? nope.

as engineers we will never solve this problem as long as the "44.1kHz is good enough" dogma is perpetuated.

here's a question. why are frequency and bit depth the only two variables under discussion here? how does the human ear locate a sound in space? suppose I place a series of 20kHz tone generators along a wall (and that I can still hear 20kHz :) and trigger them at different times, and record the session in stereo at 44.1kHz with a standard X-Y mic setup. will I be able to reconstruct the performance?


>as engineers we will never solve this problem [reproduce a live acoustic music performance] as long as the "44.1kHz is good enough" dogma is perpetuated

It's the opposite. We are never going to solve this problem if we are going to focus on things that have nothing to do with the problem. Compare and contrast:

>as engineers we will never solve this problem as long as the "copper wires are good enough" dogma is perpetuated

Also, please read the article. The author specifically lists advances in audio tech they think are worthwhile to consider, such as surround sound. This actually addresses the problem you mentioned (reproducing the live performance) and the question you asked, i.e.

>here's a question. why are frequency and bit depth the only two variables under discussion here?

They are not, at least not in the article. Here it's because that's what's in the title, and not everyone gets to the end of the article.

Some comments do talk about the importance of having a good DAC for a good sound.


interesting viewpoint, however, did you think about the experiment I presented? without an answer, sample rate and cabling cannot be considered equivalent distractions on the road to high fidelity.


You’re talking about something orthogonal to the question at hand. It’s like complaining that the 4K TV sucks at VR.

Of course it does. It’s not meant to provide VR.

Same thing with sampling and bit-depth. Those address digital encoding of analog signals. They have nothing to say about speaker design, number of audio channels, room acoustics, or the myriad other factors that go into replicating a live stage performance.


It's not obvious that 2 channels of recorded audio aren't sufficient to recreate a convincing stereo image; suggesting that I'm seeking the equivalent of VR is specious.

And you haven't answered my question about the array of 20kHz tone generators. In fact, NOBODY has, and yet the question has been down-voted! How is that even possible? Posing a novel experiment which might invalidate the populist view considered harmful?

TFA's author is not active in the field of advancing man's ability to recreate live music more convincingly, AFAIK; he writes codecs. He believes people shouldn't purchase 192kHz downloads. He's certainly right that most consumers won't be able to tell the difference with their current equipment. But he makes no mention of the interaural time difference in human auditory perception, so he's already not telling the whole story. There is more to learn here, folks, and down-voting a question is an embarrassing failure of these forums. Why aren't posts in support of music piracy down-voted (read above)?


Regarding your question about the wall of tone generators:

I imagine a pair of microphones inserted into the ear canals of a dummy head should be able to capture what a real person sitting there would. Once the signals are captured, and assuming perfect frequency response of the microphones and noiseless transfer of the signals to an ADC, 44.1kHz would absolutely be enough to perfectly encode the 20Hz frequencies.

I put emphasis on the frequency response of the microphones. They’d have to match the frequency response of the human ear. Meaning they would not capture ultrasonics, just like our ears don’t.

I am less sure of the math behind bit-depth and how it relates to our own dynamic range. I also agree that if you intend to transform the recording, mix it with others, etc, then go ahead and encode at higher bitrates and even with a higher frequency (both mic and ADC sampling). But the final product, being sold for direct listening, need not be sampled at a rate that’s beyond our hearing. No more than a video recording should be able to encode colors in the ultraviolet spectrum (A better analogy than my previous one)


>TFA's author is not active in the field of advancing man's ability to recreate live music more convincingly, AFAIK; he writes codecs

As your other questions have been addressed by others, I simply would like to point out that this seems to be quite an arrogant stance to have.

The development of codecs has a lot to do with understanding of how the humans perceive sound, and how to effectively encode and reproduce sounds - which is useful even if you personally never listen to anything but analog recordings on analog systems.

However, we do live in a digital world, and one where codecs are a necessity. Codecs made recording, sharing, and distributing digital media at all possible - and now, they are making it possible to create better recordings by any metric you choose.

Consider this: bandwidth and space-saving that codecs give you allows you to record more data with the same equipment at the highest settings. That's why I don't have to think if I'll run out of memory I want to record 4-channel surround sound on my Zoom H2N (something that definitely goes towards a more faithful reproduction of being there than, say, bumping the frequency to 192kHz, which, incidentally, is the point of the article).

Unless you are there to record every live show, we'll have to rely on other people doing that - and guess what, they'll use codecs! How do I know that - that's because I do, they do, and the absolute majority of live show recordings that I've seen were not available in lossless formats. For that matter, good codecs contribute directly to the quality of the sound you'll hear.

Therefore, advancing the codecs does advance man's ability to recreate live music more convincingly.

So please, pause before dismissing other people's work.

>But he makes no mention of the interaural time difference in human auditory perception

He also doesn't mention how long it would take from Earth to Mars on a rocket, or the airspeed velocity of an unladen swallow. If you want to make a claim that this is somehow relevant to the question, you need to argue why, with sources - or simply ask the author, who might just answer.

>There is more to learn here, folks, and down-voting a question is an embarrassing failure of these forums. Why aren't posts in support of music piracy down-voted (read above)?

Not all questions are created equal. Your last question is an example of one that rightly deserves to be downvoted, as it contributes nothing to the discussion (of whether 192Khz really does anything for us), appeals to emotion, and derails the conversation off the topic. Please don't do that.


> Therefore, advancing the codecs does advance man's ability to recreate live music more convincingly.

Only where bandwidth and storage are constrained. If we're trying to push the state of the art, it's not going to be with a Zoom H2N.

The best music reproduction systems use lossless compression. Psychoacoustic compression does NOT get us closer to the original performance. I'm stating this as someone who gets 5 out of 5 correct, every time, on the NPR test:

http://www.npr.org/sections/therecord/2015/06/02/411473508/h...

(I'm ignoring the Suzanne Vega vocal-only track due to both its absence of musical complexity and use as test content during the development of the MP3 algorithm.)

While I appreciate xiphmont's codec work, I am dismissive of his open attempt to steer research and commerce in this area.

Why is his article posted as "neil-young.html"? Is that really fair?

> If you want to make a claim that this is somehow relevant to the question, you need to argue why, with sources - or simply ask the author, who might just answer.

Please see chaboud's excellent post above, referencing the work of Georg von Bekesy.

> Your last question is an example of one that rightly deserves to be downvoted

You're referring to my array-of-20kHz-tone-generators experiment? Sorry I don't know the answer, but I haven't done the experiment myself; I was hoping someone here had! Where's the appeal to emotion, though? If the experiment shows a higher sample rate is necessary (that's the whole point of the experiment) it's germane.


>Only where bandwidth and storage are constrained

I.e. everywhere in this universe. There is not such thing as unlimited bandwidth/storage. Gains that codecs give allow us to record information that otherwise would be lost.

>If we're trying to push the state of the art, it's not going to be with a Zoom H2N.

I wish I could see the future so clearly!

I only have guesses, and my guess tells me that audio captured from 10 Zoom H2N's at 48kHz will store more information than audio from a single microphone at 480kHz. Current "state of the art" seems to use fewer channels. An advance in the state of the art in the direction of utilizing more sources seems more than feasible to me.

>Psychoacoustic compression does NOT get us closer to the original performance

I think you have missed my point. An uncompressed source is obviously not going to be better than the lossy-compressed data.

However, we do not live in a world of infinite resources. Given the constraints, compression offers new possibilities.

At the same space/bandwidth, you can have, e.g.:

- uncompressed audio from a single source

- compressed audio from 5x many sources

- compressed audio from 2x sources, plus some other data which affects the perception of the sound (???)

This plays right into your question "Why are we only considering bitrate/frequency?" - we don't. Compression offers more flexibility in making other directions viable.

This is why I believe that codec research is important for advances of the state of the art.

>I am dismissive of his open attempt to steer research and commerce in this area.

In what area exactly? What research? He is not "steering research", he is educating the less knowledgeable general public. So far, your dismissive attitude can also be applied verbatim to anyone who explains why super-thick-golden-cables from MonstrousCable(tm) are a waste of money.

>> Your last question is an example of one that rightly deserves to be downvoted >You're referring to my array-of-20kHz-tone-generators experiment?

No, I was referring to this:

>Why aren't posts in support of music piracy down-voted (read above)?


xiphmont's primary goal appears to be to stop Neil Young from selling 24/192 audio to the general public; that's why he called the page neil-young.html. Sure, few buyers have the ears or equipment to pursue anything beyond the compact disc.

The problem is that many readers of neil-young.html will come away thinking they understand human hearing and digital sampling, when in fact the article is far too sparse on details to understand either; there is no discussion of how sounds are located in 3D space, or of how phase information is recovered. It is amazing that you can completely cover one ear, rub your fingers together behind your head and precisely pinpoint where your fingers are. It is also amazing that "Sampling doesn't affect frequency response or phase" but xiphmont doesn't explain this at all.

And then there's this lovely quote:

"It's true enough that a properly encoded Ogg file (or MP3, or AAC file) will be indistinguishable from the original at a moderate bitrate."

which is provably wrong. I can very reliably pick the uncompressed WAV each try when compared against 320kbps MP3.

My attitude is in support of furthering research in the area of live sound reproduction. As I've said, we are VERY far away right now. It is foolish to believe we understand human musical perception completely today. We cannot even replicate a simple cymbal strike with today's recording and playback technology.

I would encourage the curious to stand in the center of an outdoor arc of 100 horn players, like this (feel free to skip first 48 seconds):

https://www.youtube.com/watch?v=2EDIDCdy5Es

Once you experience that live, try to figure out how to replicate the input to your two ears. You can't, without 100 brass players.

Interestingly, these two examples of trumpet and cymbal have significant ultrasonic frequency content:

https://www.cco.caltech.edu/~boyk/spectra/spectra.htm

I don't believe it's a coincidence.


Did you think about your experiment? What's your own conclusion, and what other conclusions would you expect others to make? "Just think about it" is not a very convincing argument.

The microphones would probably be the bottleneck in reproducing the sound. If your microphone setup doesn't perfectly model the ears of the listener (with respect to how the headphones are worn and their frequency response), you're not going to be able to plausibly reproduce the whole sound field using a stereo recording. That has little to do with sample rate, though.


In my own experience, the vast majority of consumer audio gear doesn't even take full advantage of 16/44.1. I've made live recordings of my classical chorus, and if I want it to be listenable on typical consumer gear, I have to apply a significant amount of dynamic range compression - otherwise, it will be either too quiet, or it won't handle the peaks when played at concert volume.

That being said, I'm using quite a bit less compression than the loudness-war-type mastering that is all too typical with pop music.


That's the funny thing about those tools -- applied properly, it can be a great asset. Applied carelessly, you end up with clipping and brickwalls of audio.


> have you ever walked by a bunch of musical sound coming out of a room and thought to yourself, "wow, those live musicians sound great" only to discover it was just a stereo playing?

Yes, I have. With the right combination of speaker setups and hi fi recordings, it is possible to fool yourself into believing there are musicians there.


Note that sitting in front of a speaker setup playing a carefully selected recording is not the same as the scenario above. We do not yet have the technology to fool someone walking by a room, especially for orchestral works or even an acoustic drum kit.


But I have experienced exactly this. Not an orchestra, but a recording of a band. I guess you can split hairs about the definition of "fool", but I have been fooled as you describe.


Well that's exciting then! What was the setup??


This is probably the best reason why people should, at least once, go see a symphony!


> and yet, we are nowhere near being able to electronically reproduce a live acoustic music performance.

That has a lot more to do with the spatial component of the audio than anything else.

Unfortunately, surround sound sufficient to really reproduce acoustic fields (and not just sound effects ping ponging around) require more cost and concessions in the listening room than many are willing to tolerate.

So long as people continue to get the engineering wrong and think the sampling rate and bit-depth have anything to do with it we'll probably continue to see the market invest in the wrong solutions.


This article makes the same mistake that is done frequently with video.

So, let’s look at a similar issue with video. Your display is likely only 720p, or 1080p, but a 4K video on youtube will still look a lot better, although technically it should have no visible difference.

But the reality is, we don’t get uncompressed video, or uncompressed audio.

We have a choice between audio or video compressed with lossy codecs, at 16bit/44kHz or 16bit/96kHz, or 4:2:0 video at 1080p or 4:2:0 video at 4K.

And just like you need 4K 4:2:0 mp4 video to get even close to the quality of uncompressed 1080p 4:4:4 video, you also need far higher sampling rate and depth of highly compressed audio to get a quality similar to 16bit/44.1kHz PCM.

That’s the real reason why 24bit/192kHz AAC downloads were even discussed.


Suppose you have a 5 Mbps data budget, 1080p display, and 4k source material. You will get better quality by first downsampling the 4k to 1080p and then compressing and distributing the result. If you compress and distribute the 4k, followed by downsampling to display at 1080p, you cannot recover the color and/or motion information that was lost in order to fit all of those pixels into 5 Mbps.

However, if you have a 20 Mbps budget for the 4k to account for having 4 times as much original data, then there shouldn't be much of a difference in the downsampled 1080p video (ignoring peculiarities of the codec).

All this is not very relevant to the audio issue being discussed. It would be relevant if it were physically impossible to perceive the difference between 1080p and 4k video, and if watching 4k video potentially caused optical illusions. In that case, the only reason to prefer the 20 Mbps 4k stream would be if you planned to edit, mix, or zoom around in the video instead of simply watching it.

When it comes to audio, since size isn't as much of a concern as video, in most cases I would say "maybe I'll want to edit it someday" is strong enough reason to get the 24/192 material at a correspondingly high bitrate if it's available.


Of course your theory is quite sound, but I will point out that in practice most 4k streaming content uses an HEVC codec, while most 1080p streaming content uses an AVC codec, so you'll likely have much better results on your data budget with the 4k signal even if significantly downsampled to your display.


But that’s the exact issue that this entire article is missing!

It’s all about peculiarities of the codec!

The issue at hand is apple selling 24bit/192kHz versions of lossy AAC compressed files, compared to 16bit/44.1kHz versions of AAC files.

And the issue I was comparing with video was the same – with video, codecs enforce Chroma Subsampling, where the resolution for color is half that of the actual imagery.

In the same way, AAC and mp3 heavily reduce the bandwidth for the upper half of the frequency spectrum, spending like 90% of their available bandwidth on the lower half (with 44.1kHz, they prioritize the range between 4 and 8kHz, specifically, where speech is).

The entire topic is if using a codec that specifically cuts away the lower and upper parts of the frequency spectrum, increasing the frequency spectrum can improve quality. And yes, it does. Apple is selling AAC, not WAV. Which makes the entire article useless.

Yes, we should all focus on replacing 16bit/44.1kHz AAC with 16bit/44.1kHz FLAC instead of 24bit/192kHz AAC, and we all should focus on replacing 4:2:0 1080p mp4 with 4:4:4 1080p mp4 instead of 4:2:0 4K mp4 (the chroma subsampling issue I mentioned). But that’s not the reality we live in, and given the choice between 16bit/44.1kHz AAC and 24bit/192kHz AAC, I’ll choose the second.


The article was arguing about 24bit/192kHz digital audio, not about what codecs do or don't do with it. If you need 24bit/192kHz with both inaudible frequencies and inaudible dynamic steps to make 16bit/48kHz music sound better at the same bitrate, then the parameters you've given to your codec suck.

Same for video. YouTube very likely allocates more bandwidth for 4k video than would be required for equivalent quality at lower resolutions, assuming that you are interested in higher overall quality (e.g. less artifacts) if you go through the trouble of 4k video. That's a conscious choice, not a technical necessity.


With video it’s not about bitrate, but Chroma Subsampling.

Basically, mp4 and webm video only encodes the brightness channel Y at full resolution, and the color channels Pb Pr at half resolution. You can’t have mp4 or webm video without Chroma Subsampling, it’s defined in the codecs standard.

Audio codecs do something very similar, cutting off a large percentage of the higher and lower frequencies. mp3 (and AAC) for example allocate almost the entire space for the frequencies around 4 to 8kHz, and drops then drops a certain percentage of the upper frequencies entirely.

The article talks about uncompressed audio, but the topic he responds to is Apple choosing to sell 24bit/192kHz AAC lossy compressed audio. The author of the article is responding to a business decision, which has nothing to do with the actual topic of the article.


Chroma subsampling is optional in AVC and HEVC and IMO we should stop using it for 480p or smaller video. Not because the information is important - color is usually bandlimited - but because all the decoders use the worst resampling algorithm to display it.


Funny anecdote, I first became aware of this while watching Aliens. When the aliens cut the power and the set is bathed in the red of the emergency backup lights, the resolution of the video appears much worse.


> Your display is likely only 720p, or 1080p, but a 4K video on youtube will still look a lot better, although technically it should have no visible difference.

Not at all, bitrate differences are audible.

> And just like you need 4K 4:2:0 mp4 video to get even close to the quality of uncompressed 1080p 4:4:4 video, you also need far higher sampling rate and depth of highly compressed audio to get a quality similar to 16bit/44.1kHz PCM.

…how so?


> Not at all, bitrate differences are audible.

Even with the same bitrate, you’ll need 4K video to get quality comparable to uncompressed 1080p.

For example, mp4 defines that the brightness channel Y should be stored with full resolution, and the color channels Pb Pr should be stored at half resolution. So in a 4K mp4 video at lossless bitrate, the actual colors will only be stored at 1080p. This functionality is called Chroma Subsampling.

Audio Codecs do something very similar, causing these exact issues.


It doesn't matter what resolution the color is stored at, as long as the high frequencies are the same as the Y channel. You won't have lost anything.

Of course, cross-channel intra prediction would work better[1] but 4:2:0 is pretty good quality considering you can throw out 3/4 the pixels.

[1] https://people.xiph.org/~unlord/spie_cfl.pdf


That’s again the same issue. Nice theory, and theoretically is true, but all real algorithms also subsample high frequencies, and scale them back up with nearest-neighbor.

The same issue happens with audio. Nice theory, completely broken realistic implementations.


It's true that nearest neighbor is bad, but the important part is the YUV->RGB conversion. It distributes the missing high frequencies out of the Y channel into all three RGB images.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact

Search: