Hacker News new | comments | ask | show | jobs | submit login

I can't hear a difference between 96khz/44khz in it's raw form. However, I can tell the difference from effects in audio mixing. The extra detail can really make a difference in how well an audio effect VST works.

I have a 96khz/24bit interface that I use and ATH-M30X headphones, and I can tell a difference between at least some 24bit FLAC files and 16bit highest-quality-possible MP3s. I was mixing my own music and the difference was quite obvious to me. The notable thing was that drum cymbals seemed to have a bit less sizzle and such.

Now that being said, if I hadn't heard the song a million times in it's lossless form from trying to mix it, I probably wouldn't have noticed, and even then it didn't actually affect my "experience".

I'm one of those guys that downloads vinyl rips as well, but I do that mostly just to experience the alternative mastering, not that I think it's higher quality or anything. (though I have heard a terrible loudness-war CD master that sounded great on vinyl with a different master)




The article is pretty clear about this too - higher bitdepths and sampling rates can be quite useful in mixing and recording situations.

They're pointless for playback.


> higher bitdepths and sampling rates can be quite useful in mixing

That is really the central issue. It's much like imaging since the time of Ansel Adams: the sensor can capture more dynamic range than the human eye can experience. The producer may have use for that range when editing, but the audience will never know what was -- may have been -- missed. And we're not talking about limits of reproduction. We're talking about the human sensors both instantaneous and absolute upper and lower bounds.


> That is really the central issue. It's much like imaging since the time of Ansel Adams: the sensor can capture more dynamic range than the human eye can experience.

That's not really true. Dynamic range refers to the difference between the biggest value that isn't clipped and the smallest value that isn't rounded to zero. The human eye is a logarithmic detector, cameras are linear. The only reason HDR is a thing is because cameras DON'T have enough dynamic range.

http://photo.stackexchange.com/questions/21579/how-does-the-...


The instantaneous dynamic range of the human eye is estimated to be less than eight stops. Both film and modern camera sensors can capture much more than that. The reason we're able to adequately perceive scenes with 12+ stops of dynamic range is that our visual system does continuous integration and reconstructs an HDR "mosaic" in real time. HDR photography is required in situations where even the 12-14 stops that a modern sensor is capable of is not enough.

However, that's neither here nor there because the human eye is not the real bottleneck here. The media we use to display photos are. Printed photographs have approximately six stops of DR; typical monitors have eight. Modern cameras capture much more information than can be displayed, and the raw sensor data must be tone-mapped either by the camera software or in post-processing to produce a viewable image. There is a lot of latitude in deciding how to map 2^14 discrete values of input to a mere 2^8 values of output.


Nice photo prints can typically show something like 100:1 to 200:1 contrast (~6.5–7.5 stops) between white and black, or at the top end, some dye transfer prints under carefully controlled lighting can get up to about 500:1 (9 stops).

Nice computer displays (and mobile device displays) without glare and with the brightness cranked all the way up can get up to something like 9.5–10.5 stops.

Of course, that range still pales in comparison to the contrast between shadows and highlights on a sunny day, which can be more like 16+ stops.


You're thinking about absolute range of a human system. Yes, I can look into shadows and my iris will imperceptibly dilate. Similarly, the stapedius muscle can dampen input to the oval window. So you can appreciate a wide dynamic range. But instantaneously, your retina is only good for about 7-8 EV steps. Modern imaging sensors deliver well over 10 EV, some deliver more than 14. Depending on which format you save to, you could be throwing away half the information! So, much like a artist should record 24/192, a photographer should be saving to RAW. But no monitor or film or the human eye will ever capture all that range in one instant. So, especially for temporal media, like audio and video, that space can be scoped down in the interest of space saving without any perceptible loss.


Define HDR. Almost all displays people have these days are not capable of reproducing the dynamic range that most cameras can capture. Hence why we have colour grading, which is the image equivalent of audio mastering.


I don't know about music and audio, but your photography comparison is incorrect for two reasons.

First, as msandford pointed out, the human eye has significantly better dynamic range than image sensors. Technically, our eyes have a lower range at any specific instant, but due to the way our eyes work we effectively see upwards of 20 stops of dynamic range. The best sensors available (in medium format cameras, pro-DSLRs, etc) can only capture 14-15 stops.

Second, some black and white films have a better dynamic range than digital sensors, so it's also not the case that digital is strictly better. 18-20 stops isn't unheard of for some types of film.


Doesn't that depend on what type of playback you are doing? More and more playback these days is done via digital transfer with the volume set in software at the sending end, to amplifiers at fixed volume, such as many multi-room systems.

If I airplay a song from my iPhone and have the volume at 50% set in software, then a few extra bits can help. Not sure if it makes a noticable difference, but it's a digital mixing scenario occurring at playback. If you play at extremely low volume it should be noticable.


An aside: I've never understood why the logic around digital signal-path volume adjustment isn't "keep the volume number around all the way to the end; throw the signal through the DAC at 100% gain, and then attenuate the signal on the analogue path using the digital volume setting." Uses more power, I suppose.


That's how it should work. Not sure why it doesn't. Needs some updated standard for digital transfer I suppose. Updates to AirPlay, Toslink etc.


no, because if you play at lower volume, that simply means that the quietest parts are closer to (or underneath) the threshold of hearing.

I see it this way: let's say 16 bits is needed to represent the entire discernable dynamic range between the threshold of hearing and the threshold of pain. if you turn the volume to 50%, then you throw out 8 bits, but you also only need to represent 8 bits worth of hearing range.


turning the volume %50 down would yield 15 bits, not 8 bits, which would be discernible.


er, my log math is bad, but the point is still valid.


Why woulda few extra bit in the source data help there? A DAC with more bits would, but that can easily be scaled from the existing data.


If you attenuate digitally you bit shift things out. If you have data in a 16bit (-32k..32k) stream and set a low volume in software before you send it, then it will scale the samples to (say) -8k..8k which is basically now a 14bit stream.

With a 24bit stream you can easily give up a few bits without losing dynamic range.


Sure, but that only to the stream you actually send to the DAC, not the source material? (as in, you can take a 16 bit stream, scale it to 24 bits and then lower the volume) Am I missing something?


I think the point was that sometimes you do want to apply some effect to the sound at playback time, e.g. an equalizer, and in that case a higher bit depth could maybe conceivably become useful.


No they're not. And no matter how many times this gets linked to on the Internet, it's still wrong.

The basic problem: the quieter a sound or detail gets, the fewer bits of resolution are used to represent it.

In 16-bit recording, there simply aren't enough bits to represent very low level details without distorting them with a subtle but audible crunchy digital halo of quantisation noise.

In a 24-bit recording, there are.

Talking about dynamic range completely misses the point. It's the not the absolute difference between the loudest and quietest sounds that matters - it's the accuracy with which the quieter sounds are reproduced.

This is because in a studio, 0dB full-scale meter redline is calibrated to a standard voltage reference, and both consumer and professional audio has equivalent standard levels for the loudest level possible.

These levels don't change for different bit depths, and they're used on both analog and digital equipment. (In fact they've been standard for decades now.)

This is why using more bits does not mean you can "reproduce music with a bigger dynamic range" - not without turning the volume up, anyway.

What actually happens is that the maximum possible volume of a playback system stays the same, but quieter sounds are reproduced with more or less accuracy.

In a 16-bit recording quiet sounds below around 50Db have 1-8 bits of effective resolution, which is nowhere near enough for truly accurate reproduction. (Try listening to an 8-bit recording to hear what this means.)

You might think it doesn't matter because they're quiet. Not so. 50dB is a long way from being inaudible, ears can be incredibly good at spectral estimation, and your brain parses spectral content and volume as separate things.

There's a wide range between "loud enough to hear" and "too loud" and 24-bit covers that whole range accurately. 16-bit is fine for louder sounds, but the quieter details just above "loud enough to get hear" get audibly bit-crushed.

The effect isn't glaringly disturbing, and adding dither helps make it even less obvious. But it's still there.

24-bit doesn't need tricks like dither - because it does the job properly in the first place.

Now - whether or not commercial recordings have enough musical detail to take full advantage of 24-bits is a different question. For various reasons - compression, mastering, cheapness - many don't.

But if you have any kind of aural sensitivity, you really should be able to A/B the difference between a 24-bit uncompressed orchestral recording and a 16-bit recording using an otherwise identical studio-grade mixer/mike/recorder/speaker system without too much difficulty.


"This is why using more bits does not mean you can "reproduce music with a bigger dynamic range" - not without turning the volume up, anyway. What actually happens is that the maximum possible volume of a playback system stays the same, but quieter sounds are reproduced with more or less accuracy."

You are slightly confused. (It may help to remember that a decibel always refers to a ratio, so the setting of your volume knob is not important.) Greater bit depth does allow for greater dynamic range, this stems directly from the definition of dynamic range. 16-bit audio has a theoretical dynamic range of:

  10 * log10 (2^16)^2 ~ 96dB
24-bit audio has a theoretical dynamic range of:

  10 * log10 (2^24)^2 ~ 144dB
For reference, a quiet recording room has a noise floor of ~40dB SPL and the loudest amplified music rarely exceeds 115dB SPL. This gives a dynamic range of 75db, which indicates that a well-recorded 16-bit master should be more than adequate.

The idea that having bits in excess of this amount will somehow result in the perception of a smoother or more accurate sound is fallacious. Even at maximum playback volume, this information will exist well below the noise floor and will simply not be perceived. In fact, this information will likely exist well below the noise floor of the recording studio and thus, in some sense, will not even be recorded.


If your mastering is done competently, you really aren't going to be able to hear it in a realistic scenario. Which is why:

  "Talking about dynamic range completely misses the point."
isn't really sensible. It's the use of dynamic range that decides how much useful resolution you have when quantizing a signal. This is really why higher bit depths on record and mixing are useful - they let you be sloppier with the inputs without losing much information before you've had a chance to work with it. It still doesn't gain you anything fundamental but it does mean if you got the levels a bit wrong you can salvage it. Higher bit rates here are excellent.

There is nothing magic about 24 bits here. Record something with 48 bits but set up your equipment all screwy so your only actually using the first 8 bits... and you've got an effectively 8 bit recording.

In real world applications the codec is giving you trouble with the low amplitude stuff, not the quantizer. Not that in realistic situations your equipment is likely to be able to generate this cleanly anyway.

   "24-bit doesn't need tricks like dither - because it does the job properly in the first place."
No. Dither isn't a trick, it is a fundamental approach to quantization error at any depth.

On playback, the issue goes the other way around. If you've mastered things correctly you'll be using the available dynamic range of the output in such a way that the information content of your signal is well represented. This is sufficient at CD rates for all practical listening scenarios.


Mastering, especially modern mastering, compresses the bejesus out of the end product. Trust me, you do not want to listen to uncompressed recordings under real-world conditions. Details will be so quiet you can't hear them. It'll sound thin and dull. Most modern pop has maybe 5-6db of dynamic range. Really loose, open mastering will be 20db or so.

As someone who both records/mixes albums and a live-instrument musician, a live instrument in the room sounds utterly different than any recording. Not necessarily worse, just different. The pursuit of "accuracy" in audio playback is childish and naive. The sound of a recording is a function of technical limitations, compromises, and aesthetic decisions as much as it is a product of the raw source sounds. Don't make it sound accurate, make it sound GOOD! And that usually means a lot of compression, and often deliberate distortion.


> No they're not. And no matter how many times this gets linked to on the Internet, it's still wrong.

The article is still correct, just like it always was.

Ironically, most of your analysis is also correct. Somewhere in your understanding though, you're leaping sideways to an incorrect conclusion.

>The basic problem: the quieter a sound or detail gets, the fewer bits of resolution are used to represent it.

So far so good, but you're about to go wrong again once you start thinking in terms of stairsteps and boxes and looking instead of hearing.

Back to the bits.

What lower amplitude (and fewer used bits) means is that the sound, as represented, is not as far above the noise floor as a full-amplitude sound. The digital noise floor is completely analogous to, eg, the noise floor of analog tape. If you use a dithered digital representation, you get something that behaves exactly as analog does. You hear and perceive both the same way.

>In 16-bit recording, there simply aren't enough bits to represent very low level details without distorting them with a subtle but audible crunchy digital halo of quantisation noise.

On an audio tape, the magnetic grains are just too large to represent very low level details without distorting them with a subtle but audible crunchy halo of analogue distortion and hiss.

In a 24 bit recording, the noise you mention is still there! It's just shifted down [theoretically] 8 bits or -48dB. That's the only difference. The noise floor is lower.

[In reality, 24 bit isn't. Most recordings don't even hit a full 16 bits, and no recordings, unless they're mathematically rendered, can get deeper than about 21 bits. There is no such thing as a 24-bit audio ADC/DAC that delivers 24 bits. The very best available today are about 21 bits of signal + 3 bits of noise.]

So the difference in playback between 16 bits and 24 bits is about 5 actual bits. If you're complaining about soft sounds in a 16-bit recording 'not having enough detail' because they're down at, say, 3 bits of resolution, are you saying it's all fixed by using 8? Aren't 8 bits woefully too few for any kind of quality sound?

(I hope at this point, you realize you're barking up an incorrect tree)

If you're following me so far, we can continue, but I expect even this much is going to require more conversation.


For a properly dithered recording, bit depth is exactly equivalent to noise floor. If low-volume details are lost in 16 bit recordings it is because their volume falls near or below the noise floor imposed by the 16 bit recording. 16 bits is good enough because the noise floor is low enough to be imperceptible to a human, if the gain of the recording is such that the full dynamic range available is used


""" in a studio, 0dB full-scale meter redline is calibrated to a standard voltage reference, and both consumer and professional audio has equivalent standard levels for the loudest level possible"""

This is only minor nitpicking, but the standard 0dB levels for professional audio (0dB reference at +4dBu == 1.23Vrms) and consumer audio (0dB reference at -10dBV == 0.32Vrms) are not meant to indicate the maximum ("loudest level possible") but just serve at a reference point, for e.g. the 1kHz sine you inject when setting up your gain throughout the signal chain. On most studio gear, you'll easily have +15dB headroom left.

AD/DA converters haven't really standardized on a full-scale level and there are quite a few different definitions in use: https://en.wikipedia.org/wiki/DBFS. Most "line level" ADC/DACs will have switches or jumpers to select between two or so settings. You'll choose them so that you are not likely to clip your ADC, and will only playback on your DAC with an appropriate level trimmed to not clip your analog gear.


Are you an expert?

In a 16-bit master, a noise shaping function is applied during down-conversion, by which quantization noise will be re-distributed so that most of the noise energy goes to the high frequencies (>15k) where it is completely inaudible.

For a good example of such a recording, see Ahn-Plugged (Ahn Trio, 2000, Sony BMG Masterworks). Fire up a good spectrum analyzer. You'll find the noise floor is well below <110 dB throughout most of the spectrum, even though it's 'just' a 16 bit CD.


Compressed formats are not really relevant to considerations about the bit depth itself.

Besides, mp3 [audio] compressions have difficulty in handling specific samples, or type of samples (eg. sharp attacks), and they may manifest artifacts independently of the bitrate; MP3, AFAIK, also has a ceiling of 320 kbps within the standard specification, which certainly doesn't help.

Secondly, I'm not sure if you process further the MP3s (when you refer to mixing), but if you do, you're definitely going to make noticeable, artifacts which weren't so in the unprocessed MP3 form.


I mean, I'm just some hobbyist, but my understanding is everyone renders to lossless and then converts that to the various MP3/AAC formats, never changing anything solely because of the final compressed format.


Old thread, but I thought I made it clear that I wasn't talking about data compression, but rather dynamic range compression.


MP3's perceptual model can still throw away information at the highest quality settings. FLAC doesn't throw away anything.

It's possible you are just hearing the difference between codecs. You'd have a fairer comparison with 24-bit vs 16-bit FLAC.


Cymbals are the biggest tell for compressed music. Even with crappy speakers they sound very strange.


SiruisXM satellite radio is probably the worst offender. Makes music unlistenable to me.

Even 128Kbps MP3s render cymbals better.


" can tell the difference from effects in audio mixing."

Yes, non-linear effects can be sample rate sensitive. However-- this really means that their internal model is aliasing and not faithfully simulating an infinite sample rate system.

In an ideal world, effect that needed more sample rate would internally upsample/downsample (or be constructed in a way that they didn't need to). Then they would behave consistently across rates; though doing this would waste cpu cycles.

In any case, the article is all about distribution. Having excess rate in mastering is cheap and harmless, and-- because of these reasons, can be practically pretty useful.


For those interesting in different versions of an album where it was mastered by a different audio engineer, you should check out the Steve Hoffman forums: http://forums.stevehoffman.tv/


Or even the same engineer years later.


That's your processing overhead, you can mess with the sound a lot more at 96k before you hear audio issues.

The difference you hear is the difference between flac's lossless format and mp3's lossy format it has nothing to do with 16 bit versus 24 bit.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: