I have a 96khz/24bit interface that I use and ATH-M30X headphones, and I can tell a difference between at least some 24bit FLAC files and 16bit highest-quality-possible MP3s. I was mixing my own music and the difference was quite obvious to me. The notable thing was that drum cymbals seemed to have a bit less sizzle and such.
Now that being said, if I hadn't heard the song a million times in it's lossless form from trying to mix it, I probably wouldn't have noticed, and even then it didn't actually affect my "experience".
I'm one of those guys that downloads vinyl rips as well, but I do that mostly just to experience the alternative mastering, not that I think it's higher quality or anything. (though I have heard a terrible loudness-war CD master that sounded great on vinyl with a different master)
They're pointless for playback.
That is really the central issue. It's much like imaging since the time of Ansel Adams: the sensor can capture more dynamic range than the human eye can experience. The producer may have use for that range when editing, but the audience will never know what was -- may have been -- missed. And we're not talking about limits of reproduction. We're talking about the human sensors both instantaneous and absolute upper and lower bounds.
That's not really true. Dynamic range refers to the difference between the biggest value that isn't clipped and the smallest value that isn't rounded to zero. The human eye is a logarithmic detector, cameras are linear. The only reason HDR is a thing is because cameras DON'T have enough dynamic range.
However, that's neither here nor there because the human eye is not the real bottleneck here. The media we use to display photos are. Printed photographs have approximately six stops of DR; typical monitors have eight. Modern cameras capture much more information than can be displayed, and the raw sensor data must be tone-mapped either by the camera software or in post-processing to produce a viewable image. There is a lot of latitude in deciding how to map 2^14 discrete values of input to a mere 2^8 values of output.
Nice computer displays (and mobile device displays) without glare and with the brightness cranked all the way up can get up to something like 9.5–10.5 stops.
Of course, that range still pales in comparison to the contrast between shadows and highlights on a sunny day, which can be more like 16+ stops.
First, as msandford pointed out, the human eye has significantly better dynamic range than image sensors. Technically, our eyes have a lower range at any specific instant, but due to the way our eyes work we effectively see upwards of 20 stops of dynamic range. The best sensors available (in medium format cameras, pro-DSLRs, etc) can only capture 14-15 stops.
Second, some black and white films have a better dynamic range than digital sensors, so it's also not the case that digital is strictly better. 18-20 stops isn't unheard of for some types of film.
If I airplay a song from my iPhone and have the volume at 50% set in software, then a few extra bits can help. Not sure if it makes a noticable difference, but it's a digital mixing scenario occurring at playback. If you play at extremely low volume it should be noticable.
I see it this way: let's say 16 bits is needed to represent the entire discernable dynamic range between the threshold of hearing and the threshold of pain. if you turn the volume to 50%, then you throw out 8 bits, but you also only need to represent 8 bits worth of hearing range.
With a 24bit stream you can easily give up a few bits without losing dynamic range.
The basic problem: the quieter a sound or detail gets, the fewer bits of resolution are used to represent it.
In 16-bit recording, there simply aren't enough bits to represent very low level details without distorting them with a subtle but audible crunchy digital halo of quantisation noise.
In a 24-bit recording, there are.
Talking about dynamic range completely misses the point. It's the not the absolute difference between the loudest and quietest sounds that matters - it's the accuracy with which the quieter sounds are reproduced.
This is because in a studio, 0dB full-scale meter redline is calibrated to a standard voltage reference, and both consumer and professional audio has equivalent standard levels for the loudest level possible.
These levels don't change for different bit depths, and they're used on both analog and digital equipment. (In fact they've been standard for decades now.)
This is why using more bits does not mean you can "reproduce music with a bigger dynamic range" - not without turning the volume up, anyway.
What actually happens is that the maximum possible volume of a playback system stays the same, but quieter sounds are reproduced with more or less accuracy.
In a 16-bit recording quiet sounds below around 50Db have 1-8 bits of effective resolution, which is nowhere near enough for truly accurate reproduction. (Try listening to an 8-bit recording to hear what this means.)
You might think it doesn't matter because they're quiet. Not so. 50dB is a long way from being inaudible, ears can be incredibly good at spectral estimation, and your brain parses spectral content and volume as separate things.
There's a wide range between "loud enough to hear" and "too loud" and 24-bit covers that whole range accurately. 16-bit is fine for louder sounds, but the quieter details just above "loud enough to get hear" get audibly bit-crushed.
The effect isn't glaringly disturbing, and adding dither helps make it even less obvious. But it's still there.
24-bit doesn't need tricks like dither - because it does the job properly in the first place.
Now - whether or not commercial recordings have enough musical detail to take full advantage of 24-bits is a different question. For various reasons - compression, mastering, cheapness - many don't.
But if you have any kind of aural sensitivity, you really should be able to A/B the difference between a 24-bit uncompressed orchestral recording and a 16-bit recording using an otherwise identical studio-grade mixer/mike/recorder/speaker system without too much difficulty.
You are slightly confused. (It may help to remember that a decibel always refers to a ratio, so the setting of your volume knob is not important.) Greater bit depth does allow for greater dynamic range, this stems directly from the definition of dynamic range. 16-bit audio has a theoretical dynamic range of:
10 * log10 (2^16)^2 ~ 96dB
10 * log10 (2^24)^2 ~ 144dB
The idea that having bits in excess of this amount will somehow result in the perception of a smoother or more accurate sound is fallacious. Even at maximum playback volume, this information will exist well below the noise floor and will simply not be perceived. In fact, this information will likely exist well below the noise floor of the recording studio and thus, in some sense, will not even be recorded.
"Talking about dynamic range completely misses the point."
There is nothing magic about 24 bits here. Record something with 48 bits but set up your equipment all screwy so your only actually using the first 8 bits... and you've got an effectively 8 bit recording.
In real world applications the codec is giving you trouble with the low amplitude stuff, not the quantizer. Not that in realistic situations your equipment is likely to be able to generate this cleanly anyway.
"24-bit doesn't need tricks like dither - because it does the job properly in the first place."
On playback, the issue goes the other way around. If you've mastered things correctly you'll be using the available dynamic range of the output in such a way that the information content of your signal is well represented. This is sufficient at CD rates for all practical listening scenarios.
As someone who both records/mixes albums and a live-instrument musician, a live instrument in the room sounds utterly different than any recording. Not necessarily worse, just different. The pursuit of "accuracy" in audio playback is childish and naive. The sound of a recording is a function of technical limitations, compromises, and aesthetic decisions as much as it is a product of the raw source sounds. Don't make it sound accurate, make it sound GOOD! And that usually means a lot of compression, and often deliberate distortion.
The article is still correct, just like it always was.
Ironically, most of your analysis is also correct. Somewhere in your understanding though, you're leaping sideways to an incorrect conclusion.
>The basic problem: the quieter a sound or detail gets, the fewer bits of resolution are used to represent it.
So far so good, but you're about to go wrong again once you start thinking in terms of stairsteps and boxes and looking instead of hearing.
Back to the bits.
What lower amplitude (and fewer used bits) means is that the sound, as represented, is not as far above the noise floor as a full-amplitude sound. The digital noise floor is completely analogous to, eg, the noise floor of analog tape. If you use a dithered digital representation, you get something that behaves exactly as analog does. You hear and perceive both the same way.
>In 16-bit recording, there simply aren't enough bits to represent very low level details without distorting them with a subtle but audible crunchy digital halo of quantisation noise.
On an audio tape, the magnetic grains are just too large to represent very low level details without distorting them with a subtle but audible crunchy halo of analogue distortion and hiss.
In a 24 bit recording, the noise you mention is still there! It's just shifted down [theoretically] 8 bits or -48dB. That's the only difference. The noise floor is lower.
[In reality, 24 bit isn't. Most recordings don't even hit a full 16 bits, and no recordings, unless they're mathematically rendered, can get deeper than about 21 bits. There is no such thing as a 24-bit audio ADC/DAC that delivers 24 bits. The very best available today
are about 21 bits of signal + 3 bits of noise.]
So the difference in playback between 16 bits and 24 bits is about 5 actual bits. If you're complaining about soft sounds in a 16-bit recording 'not having enough detail' because they're down at, say, 3 bits of resolution, are you saying it's all fixed by using 8? Aren't 8 bits woefully too few for any kind of quality sound?
(I hope at this point, you realize you're barking up an incorrect tree)
If you're following me so far, we can continue, but I expect even this much is going to require more conversation.
This is only minor nitpicking, but the standard 0dB levels for professional audio (0dB reference at +4dBu == 1.23Vrms) and consumer audio (0dB reference at -10dBV == 0.32Vrms) are not meant to indicate the maximum ("loudest level possible") but just serve at a reference point, for e.g. the 1kHz sine you inject when setting up your gain throughout the signal chain. On most studio gear, you'll easily have +15dB headroom left.
AD/DA converters haven't really standardized on a full-scale level and there are quite a few different definitions in use: https://en.wikipedia.org/wiki/DBFS. Most "line level" ADC/DACs will have switches or jumpers to select between two or so settings. You'll choose them so that you are not likely to clip your ADC, and will only playback on your DAC with an appropriate level trimmed to not clip your analog gear.
In a 16-bit master, a noise shaping function is applied during down-conversion, by which quantization noise will be re-distributed so that most of the noise energy goes to the high frequencies (>15k) where it is completely inaudible.
For a good example of such a recording, see Ahn-Plugged (Ahn Trio, 2000, Sony BMG Masterworks). Fire up a good spectrum analyzer. You'll find the noise floor is well below <110 dB throughout most of the spectrum, even though it's 'just' a 16 bit CD.
Besides, mp3 [audio] compressions have difficulty in handling specific samples, or type of samples (eg. sharp attacks), and they may manifest artifacts independently of the bitrate; MP3, AFAIK, also has a ceiling of 320 kbps within the standard specification, which certainly doesn't help.
Secondly, I'm not sure if you process further the MP3s (when you refer to mixing), but if you do, you're definitely going to make noticeable, artifacts which weren't so in the unprocessed MP3 form.
It's possible you are just hearing the difference between codecs. You'd have a fairer comparison with 24-bit vs 16-bit FLAC.
Even 128Kbps MP3s render cymbals better.
Yes, non-linear effects can be sample rate sensitive. However-- this really means that their internal model is aliasing and not faithfully simulating an infinite sample rate system.
In an ideal world, effect that needed more sample rate would internally upsample/downsample (or be constructed in a way that they didn't need to). Then they would behave consistently across rates; though doing this would waste cpu cycles.
In any case, the article is all about distribution. Having excess rate in mastering is cheap and harmless, and-- because of these reasons, can be practically pretty useful.
The difference you hear is the difference between flac's lossless format and mp3's lossy format it has nothing to do with 16 bit versus 24 bit.