Hacker News new | past | comments | ask | show | jobs | submit login
There is no point to distributing music in 24-bit/192kHz format. (xiph.org)
732 points by nullc on Mar 5, 2012 | hide | past | web | favorite | 316 comments

I must say I get rather irritated when people spend time worrying about dubious 'tweaker' methods to improve their audio, when the most under-performing component of most people's sound equipment also has the lowest-hanging fruit: The room itself.

When people ask me where they should spend money to improve the quality of their hi-fi or home theater system, in nearly every case my response will be something like "get a thicker rug" or "put something on this wall to absorb sound reflections, even if it's just a bookshelf."

Beyond that, I'd tend to say something like "stop being so paranoid about what you think you can't hear, and enjoy the damn music."

stop being so paranoid about what you think you can't hear, and enjoy the damn music.

I'm a composer who works in film/games. I can assure you this is exactly what I'd like people to do when they listen to my music. I spend 99% of my time trying to create good musical ideas, and I spend 1% of my time getting the mix down. I get criticized (rightly) for this quite a bit, but it is hard to care about someone sitting in a >$10,000 labyrinth of sound equipment when I'd rather write a catchy tune.

Then again, when I write sheet music I have to endure some of the most soul-crushingly awful midi sequencing in order to check my work, so perhaps I'm too tolerant to terrible sound quality. Still, I'd rather people listened to the music, not the sound of it.

What's wrong with caring about both sides of it?

A good sounding catchy tune is something work spending that little bit more time on.

I don't necessarily mind effort spent on making sure that music is presented properly-- what I mind is when it supersedes all other concerns about the music.

> A good sounding catchy tune is something work spending that little bit more time on.

But in reality most of your customers want a ton-frakk of compression, loudness and filters on that catchy tune so it ultimately sounds HUGE on the tiniest phone, radio and car speakers... so all that audiophile mixing and dynamics are completely lost anyway.

Agree one-hundred percent about the room (although the prescription isn't always as simple as "get a thicker rug" etc).

The other issue regarding high-frequency sound reproduction is that in most cases, the loudspeaker won't be outputting much beyond 22-25 kHz (assuming very good quality loudspeakers, cheap consumer grade units might struggle to hit a -6 dB point at 18 kHz) and even for the speakers that have usable output at that range, the directivity at those frequencies will be so narrow that your head will have to be locked in the perfect "sweet spot" to hear anything.

  > Agree one-hundred percent about the room
With a username like yours, I'm not surprised. :-)

  > although the prescription isn't always as
  > simple as "get a thicker rug" etc
A prescription is only as good as the likelihood that it will be heeded by the patient. A rug is an easy win; acoustic ceiling tiles and bass traps are a bit harder...

I might sound/be stupid for asking, but what's the actual physical response from something at 22 kHz+? I have a hard time picking up a pure sine > 17 kHz. I doubt I'd get any aural response from anything at 22 kHz, so what's the deal?

The deal is just that you're getting older. Your ears just don't work as well as a 12year old's. Neither do anybody else's your age (within the bounds of typical human variation - probably well over 95% of use _never_ heard 22kHz, not matter _how_ "young" our ears were).

I was once in a small, treated room working with some rather large PA speakers. I was curious how far my hearing range actually extended, and did something very unwise: I played a 20kHz tone and very briefly ramped the volume up and down. I definitely heard it, but I also induced quite a lot of pain. I learned two lessons: 1. my threshold of hearing at 20kHz is near or above the threshold of pain, and 2. don't do that ever again.

heh, your story reminded me of the bar scene in Good Will Hunting:

WILL: The sad thing is, in about 50 years you might start doin' some thinkin' on your own and by then you'll realize there are only two certainties in life.

CLARK: Yeah? What're those?

WILL: One, don't do that. Two -- you dropped a hundred and fifty grand on an education you coulda' picked up for a dollar fifty in late charges at the Public Library.

Yeah I get that I'm getting older. It's just, what's the point of having a stereo that gives perfect playback at 22 kHz if you can't hear it? I'm guessing there must be something since people buy gear like that, or is just a case of deranged audiophiles?

You might not hear a pure 22kHz sine but any sound from, say, a harpsichord will have much off these highs, and some think it is a part off the sound, that one feel without actually hearing it. I'm not endorsing this view, sound islike wine tasting, a lot of hand waving and few solid ground.

There's blind wine tasting, if you want the real deal. Without the hand waving.

...of course I mean audible. Or spectral? Not aural, anyway. English is not my first language. Sorry.

That's really good actually, I'm not even sure I know the difference between aural and audible. (<- also, that sentence is a run-on and not good English haha)

I don't really know much about this, but wouldn't the 22kHz sounds potentially create beats in the lower frequencies?

Acoustic "beat tones" aren't "real" tones— you hear them because of non-linearies in the ear-brain system, but you have to hear the initial tones first. (Well, unless you're talking >>130dB SPL levels where the air starts becoming non-linear, but then lower frequency recording would capture it fine)

If you could hear subharmonic beats from ultrasonics then it would be _very_ easy to demonstrate, alas.

Curious, what does non-linear mean in this context?

IIRC, linearity is when you put a sound wave frequency into the medium (air) a some point, you can predict the frequency of the sound wave at some other place using a linear function - meaning that there is no distortion. Non-linear is when the physics of the medium starts screwing with that function.

I believe that it means that the superposition principle (http://en.wikipedia.org/wiki/Superposition_principle) doesn't hold: the net response at a point is not just the weighted sum of the individual responses.

so without very high power sounds and the nonlinearity business the whole "Sound from Ultrasound" wouldn't work? Huh, I guess all this time I misunderstood it.

for those curious http://en.wikipedia.org/wiki/Sound_from_ultrasound

Well, what I can think of is that of course you need to sample at > 2*max frequency if you do uniform sampling to avoid aliasing (by Nyquist), but that's not the same as playback.

Yes, there will be inter-modulations from higher frequencies. There are also from the audible spectrum but if the amp is linear enough they will be low.

I agree. It should be about the performance, not the sonics. There are plenty of old Motown and even Beatles recordings with distorted vocals, bad edits, etc. Your brain passes right over them because of the emotional content of the music.

That's because they focussed on the most likely end-user experience:

>While Motown shortened song to fit into radio time, the company also produced records specifically with car radio audio quality in mind. Motown recording engineers set up car speakers in the studio so that they could simulate and perfect how a song would sound emanating from a car radio

- what's the point of engineering things to a set of conditions virtually none of your target audience possesses?



I worked out a long time ago, that I enjoy listening to _music_, not HiFi gear.

My advice to people who ask how to make their system sound better? Buy some music you enjoy more…

I can enjoy a wonderful performance of a great tune played through my laptop speakers - much more that I enjoy test tones or gear-demo-tracks through sound gear worth something north of a new car…

(not that I haven't been "that guy" in my past…)

> My advice to people who ask how to make their system sound better? Buy some music you enjoy more…

Good advice, but you do need some baseline quality equipment to start with. Got my car with one speaker blown out, speakers wired semi randomly (left-right and front-rear faders don't work as they should), also powering line-in source from cigarette lighter results in funky background noise. Sounds great--when a good tune is playing and I'm able to recognize it ;-)

I'd disagree, at least among the people I know: they all have cheap HTIB systems and the single biggest, most cost-effective improvement you can make from there is to buy better speakers.

Then you can worry about your room.

No, even in that case, the room can still overpower the speaker. A $200 HTIB system in a properly treated room will sound a lot better than $28,000 Wilson Watt Puppies in a bathroom for example.

Hum, my experience (I was sound engineer in a previous life) is that the first thing to fix bad sound is to flatten the equalizer and to remove bass enhancer. Then I'd put the speakers on a solid table in a relative symmetry regarding the listener, while checking they have the correct phase. All the rest is rarely necessary.

Sure, but that's a rather contrived example-- most people have a fairly normal room, and the average joe would be best served by getting a good pair of speakers, and a reasonable amplifier and DAC, before worrying seriously about room acoustics.

A fairly normal room is pretty awful these days, particularly with the trend towards timber flooring and sparse furnishing.

I'd even skip the dac (I mean I might not.... but a decent amplifier, and I just mean decent, like something that still works from the 70's or 80's) and a decent pair of tower speakers (needn't be expensive), and, well, just don't use the shittiest cables you can find (I mean as long as they are thicker than a few human hairs you're okay)- it'll sound far, far better.

Or get some half decent headphones.

You might be surprised how much of a difference EQ can make. As an experiment I once used 12 bands of parametric EQ to adjust the speakers in a cheap, old LCD monitor. Sure, you're not going to get any better bass response than before, but stereo imaging (nonexistent before EQ, perfect phantom localization after), spectral balance, clarity, distortion (due to not exciting resonances in the monitor), etc. were significantly improved. Most people could have added an EQ'd subwoofer to those tinny LCD speakers and been completely satisfied.

One of the many things EQ can't fix, of course, is room reflections, which can be helped by room treatments and speakers with a directivity better suited to the room.

Of course EQ can help with room reflection! In a square room you'll have a resonance at a given frequency, and you can mitigate a bit this problem with an EQ. But usually EQs are used too add bass and do more harm than good.

By room reflections I mean higher frequency reflections that result in comb filtering and spatial and temporal smearing of the sound, rather than lower frequency resonances that result in the standing waves you mention. EQ can reduce the effect of room resonances, but it still can't fix the extended decay time at those frequencies[0].

[0] DRC can improve this, but only within a small sweet spot.

I should point out that after room treatment, my next recommendation tends to be an amplifier with Audyssey MultEQ in-room calibration. I've never heard a listening environment that didn't sound unambiguously better with it enabled.

Did you read the article? Are you sure it wasn't just unambiguously louder? ;)

Yes. Because one of the settings shown after the full room color sweeps is dB, and the sweeps will often set various speakers a bit quieter. (The goal of Audessey calibration is to eliminate frequency resonance hot spots in common listening positions, as well as get flat response from your speaker setup.)

Audio tweaker with hacker leanings but without carpentry skills? Start your engines: http://drc-fir.sourceforge.net/

We have a lot in common.

It is said that in most cases 192 kbit .mp3 is indistinguishable from >192, and blind tests support that. Granted, there are instruments like castanets which make it easier to hear the difference. In general though, I can't distinguish 128 from 192 and I listen to music a lot. Also it's unlikely that my hearing is already damaged because I try to keep volume low.

But I've noticed that where I put the speakers makes a huge difference. I can easily tell the difference from speakers on the floor versus speakers on my desk. Where I'm at the moment also matters a lot. If I lie on the floor, floor speakers don't sound as bad anymore.

In the end, I use headphones. Midrange Audio Technica ones, and I'm probably already overpaying a bit. But I bought them for build quality and comfort, and I wasn't disappointed. I can have wear them for hours (Not healthy I guess, but I'm used to wearing them even with no music being played). Headphones have the advantage that it suddenly stops to matter where your speakers are and where are you relative to them.

Is the 192 the bitrate or the frequency response? I thought 192 in MP3 was the bit rate, not the maximum frequency response...

This effect isn't sooo surprising seeing as it even occurs with dumb mono guitar cab speakers and is very, very, VERY clearly audible there, even just moving your head a few cm in or out of the cones' axis.

Amen. Someone got the point of the article.

> Stop being so paranoid about what you think you can't hear, and enjoy the damn music.

Yes. Do a couple of blind tests with your acoustic system first.

> It's true enough that a properly encoded Ogg file (or MP3, or AAC file) will be indistinguishable from the original at a moderate bitrate.

Disagree. This claim seems to be ungrounded compared with others.

I can believe limitations with bit depth and sampling rate (although I'll take a chance to test myself if I get near good enough acoustic system). However, I definitely could discern in a blind test whether music I listened to was stored using lossy format with reasonable bitrate. It's usually quite audible with rock music that involves cymbals.

There's a specific "bug" in the mp3 encoding scheme which means that you get a pre-echo effect on fast attack waveforms. It's inherent in the encoding, so it can't be eliminated (although the higher the bitrate, the less obvious it is IIRC). If you how know to listen out for it then you'll spot it immediately.

AAC / Ogg don't have that limitation & at high enough bitrate should be indistinguishable from the source in a blind listening test, as demonstrated in a number of Hydrogen Audio listening tests down the years, unless of course you're using crappy encoders at which point all bets are off...

(Really, LAME is very good indeed these days. I eventually decided that I was going to get with the program and just encode all my CDs (backed up to flac files) as mp3 for portable listening. It's good enough, and I've decided not to listen for the pre-echo artifacts so that I won't notice them :) )

Well, actually it was that long cymbal sounds “fade” quicker and just sound different with lossy music.

IIRC I distinguished an mp3 encoded by iTunes with bit rate 192 or 256 kbps from its original in Apple Lossless (both played on same cheap acoustic system). I probably should test with AAC or Ogg, too. Although I have a feeling that it's pretty much impossible to keep intact those rich in high frequencies cymbals while keeping compact file size.

> I've decided not to listen for the pre-echo artifacts so that I won't notice them :)

You're much better at controlling your mind. =) After I once verified that the difference is audible even on cheap speakers, I can't switch back to lossy formats. It means constant wondering if that how it's supposed to sound or not…

That's, by the way, why Apple's idea of having ‘Mastered for iTunes’ label[0] IMO is worthwhile—at least you can be sure that mastering engineer listened to it this way. =)

[0] http://arstechnica.com/apple/news/2012/02/mastered-for-itune...

Might be interesting to try AAC or Ogg Vorbis & see if they're any better. In these days of ever increasing cheap portable storage carrying a bunch of flacs around isn't quite as nuts as it used to be of course.

(Cymbals seem to be a particular bugbear for mp3 encoding; cymbal-heavy tracks tend to suffer the most from obvious encoding artifacts once you know what to listen for.)

It's certainly true. Ogg is far less audible than mp3 - it's actually very decent at bit rates as low as 64kpbs.

"I distinguished"

Was this in a blind test?

Please refer to my comment above[0]. Yes, it was a blind test. The person helping me might've looked up bit rates, so it was not a double-blind experiment, but I could not (nor did I want to) see what's being played, and relied only on hearing.

[0] http://news.ycombinator.com/item?id=3669893

A thousand times this. I never cease to be amazed at the number of people who will vocally argue the benefits of solid-silver wundercable, but who've never heard of mirror points or bass traps. $20,000 hifi systems in rooms with bare wooden floors and bare concrete walls. Subwoofers in untreated cubic rooms. People praising the transient response of their PMC MB2s in a room with chronic flutter echo. It's utterly dispiriting.

I agree the damn room matters so much more. Also the distortion of most speakers is already the bottle neck in most people's systems.

Exactly! I could not have put it better myself!

" … when the most under-performing component of most people's sound equipment also has the lowest-hanging fruit:"

And I was _so_ sure the next sentence was going to be something like:

"No, do not have any suggestions that will make your sound equipment make Justin Bieber sound better…"

There's a lot of scientific-sounded content in this, but unfortunately most of it couldn't be further from the truth. I'm an ex-audio engineer and studied digital and analog audio engineering; this has been debated to death over the last 15 years.

Digitally recording a triangle is the best example of why 48kHz is very limiting. The distinct sound of the triangle constitutes of a high fundamental frequency, ballpark 5kHz and of many very high-pitch harmonics. Most of these harmonics are above 20kHz. The harmonics are what makes it sound like a triangle, not the frequencies below 20kHz. This is why the triangle is one of the hardest instruments to digitally record. It always sounds like crap.

In theory, it's true that the human hear can't hear above ~18kHz, but it can hear the influence of the very high pitch harmonics on a lower frequency.

EDIT: here's more data backing what I said http://www.cco.caltech.edu/~boyk/spectra/spectra.htm

EDIT 2: typos, frequency mistake

> Digitally recording a triangle is the best example of why 48kHz is very limiting

The article's about distribution, not recording. I don't think anybody disputes the usefulness of higher sampling rates when recording.

> In theory, it's true that the human hear can't hear above ~18kHz, but it can hear the influence of the very high pitch harmonics on a lower frequency.

...and 48kHz audio contains those lower frequencies.

Stripping frequencies above 20kHz negates the effect on the lower frequencies since those lower frequencies are not "modified" by the higher ones. The human hear can actually hear the very high harmonics when they're combined with a lower fundamental frequency.

For example, the human hear will hear a 30kHz frequency if it's fundamental is 10kHz. If it's played at 44.1kHz, the 30kHz frequency is gone and all you'll hear is 10kHz, not a "different sounding" 10kHz.

For example, the human hear will hear a 30kHz frequency if it's fundamental is 10kHz

You are going to have to provide me with a citation to back that up because that goes against everything I've learned and experience in 17 years of working in acoustics.

Here is a Wikipedia article on the subject: http://en.wikipedia.org/wiki/Sound_from_ultrasound

Basically, if you produce two ultrasonic frequencies, they will create an interference pattern at a much lower frequency than either of the individual frequencies. Modulate a signal on the difference between two signals, and you can create a directional speaker, since ultrasonic sounds tend to be highly directional (so long as the diameter of the transducer is greater than 1/2 wavelength, which is almost guaranteed with ultrasonic signals). This is how the "sound cannons" that are being deployed for crowd control work.

That article describes hetrodyning which happens because ultrasonic frequencies at high amplitudes interacts nonlinearly with air. You are not going to see that effect with sound waves generated near the audible spectrum, and normal loudspeakers are going to generate ultrasonic sound waves.

Yes, but the effects of interference patterns between multiple ultrasonic frequencies is the same, and definitely does affect the audible spectrum. This is why we must filter the square wave that comes out of a DAC. And the limitations of filters (phase shifts and roll-off) are why modern CD players oversample the signal--so that the filtering can be performed well beyond the audible spectrum.


Yes, but the effects of interference patterns between multiple ultrasonic frequencies is the same, and definitely does affect the audible spectrum

has nothing to do with this:

This is why we must filter the square wave that comes out of a DAC

The only reason that square waves "must" be filtered is to reduce the potential of damaging tweeters. If you want to record a square wave with the purpose of later reproducing the square wave, than you don't want to filter it - once you filter it, it's no longer a square wave.

OK, if you say so. I think you're misunderstanding a fundamental concept of digital to analog converters. But if you think it's just to prevent blowing your speakers, that's OK.

The reason that square wave sucks is because it introduces tons of high frequency content (your amp probably won't reproduce the high frequency content anyway, so I don't think most Japanese consumer amps will damage your speakers--that is, the amp will act like a filter anyway). That high frequency content then creates alias effects (think of moire patterns when looking at super high-res photos that are scaled down without anti-aliasing). Those alias effects sound like shit to the human ear.

The point of filtering is to anti-alias the resulting analog signal after conversion from digital to analog. The point of upsampling is to move that filter well beyond the audible range, so you can use a 1st-order filter (gentle slope, but it introduces no phase effects). The fact that a square wave hurts your speakers is inconsequential--the amp will effectively filter the signal anyway. Unfortunately, it will filter the signal without anti-aliasing, which introduces those nasty interference patterns within the audible spectrum (that is, if you feed a straight 44.1KHz sampled square wave to your speakers without upsampling/filtering).

Recording music is supposed to be a snapshot (with room for interpretation) of the composition at play.

Trying to record an edge case like this is the same as recording in a room with bad acoustics. So you end up with some weird (but not faithful) representation of the sound which is a snapshot of the microphone's characteristics and directionality of the ultrasonic tones. It's not reasonable to assume any microphone will behave exactly like a human ear. Even if you could, you're going to have to mimic the tiny random movements a normal person would make listening to a sound, movements which would definitely impact the perception of the sound, because microphones are much more stationary than any human would be.

The "different sounding" argument two posts above is silly, because sound is almost never that monochromatic, and if it is, it's usually boring. Also I don't understand how missing out on an odd order harmonic would be a bad thing :) The reality is none of these arguments are based in a reality of what people would hear, and because of that, the arguments aren't practical.

In reality, 20 bits at 48kHz (or 64kHz) would be more than acceptable for even the most discerning of ears and probably the most practical in terms of space and fidelity, but it'd be a weird format to distribute in.

That's very cool, but it requires pretty high-intensity ultrasound to be noticeable. I doubt that will be the case with ordinary music.

> Basically, if you produce two ultrasonic frequencies, they will create an interference pattern at a much lower frequency than either of the individual frequencies.

So the interference pattern will be made up of one low frequency sound and higher frequency harmonics. Once again the higher frequency harmonics are redundant, because you only need to record the lower frequency sound.

The only possible way ultrasound can be picked up by the ear is if the ear has a non-linear response to the input sound. Going by the information in the article linked, it is highly unlikely that any significant non-linearity exists in the ear.

It's definitely possible for two sounds to be indistinguishable when played separately, but when played together it is revealed that they are in fact different (see link below). Whether this applies for sounds with frequencies above 20kHz I don't know. I'd like to see a citation as well. Doesn't seem like it would be the hardest experiment to set up either.


Me and my brother would sing at each other in certain tones such that we created harmonics in both our ears. It wasn't pleasant, but it was interesting. Regardless, I'd smash my equipment if it made harmonics like that.

Humans will hear the impact > 20kHz frequency has on the lower frequencies, not the 30kHz frequency itself. That's been proven a million times.

If that is true, surely in your up thread example of recording a triangle, the "impact on lower then 20kHz frequencies" would already have happened during the recording process in between the triangle and the microphone, and would have been captured perfectly on recording equipment that's proven capable of capturing everything below 20kHz? So we'd "hear" the effect as part of the recording instead of requiring it to happen in our listening room…

>That's been proven a million times.

Then you should be able to provide at least one citation.

If you're not going to hear the frequency, then there's no reason to record it, so I don't see what you're objection is.

Yes, but if you sample the frequency to create a step wave, then neglect to filter the results, you will end up reproducing tons of high frequencies. That is why we need to filter the output for signals >20KHz...to remove these harmonics that result from reproducing the square wave.

Of course, filters aren't perfect, and result in phase shift and roll-off. So we over-sample the signal to create a signal with a much higher frequency than 20KHz, so that the filtering occurs well outside the audible band, allowing us to filter out all of these harmonics without affecting the desired signal.

Basically, the end result is that by sampling the signal, you are introducing high frequency content that must be removed prior to playback. This high frequency content is one of the reasons old CD players from the 80s and 90s cause "listener fatigue", although I have no sources to back up that last statement.

Yup... people need to get very clear in their heads the difference between the recording/sampling/mixing/mastering stages, where high bitrate/width/gear/knowledge is helpful, and playback, which is a completely different thing.

(not for eatmyshorts -you get this I gather) - everyone gets that "upsampling" can't add detail to a recording right? You can't get more than you've got.... no matter what you do. There is no magic. You upsample so you drive harmonics generated in the digital-to-analog process during playback further up in the spectrum so when you get to the analog stage you can use a nice gentle analog filter to filter them out. Without the upsampling, you need a nasty steep analog filter to filter them out, and that can have audible side-effects (or at least measurable) in the audible spectrum. eatmyshorts - correct me if I mis-stated any of that please....

You got it 100% correct. You upsample simply to move the frequency of the analog filter higher, with a gentle rolloff (and ideally a 1st order filter, so you introduce no phase effects) to get your final signal.

In other words, your theory is that the superposition principle doesn't hold for sound waves.

Well, the superposition principle only holds in linear media. Sound waves can propagate in linear media, but they can also propagate in nonlinear media, and any medium that can carry sound will go nonlinear at sufficiently high amplitudes.

Note the lack of citation


I don't know about the physics of the speaker itself generating the overtone (in cabinet), but it could certainly resonate a wine glass in the room, for example.

Yes, overtones exist, and yes, overtones affect the sound, and yes, if you filtered the sound to remove overtones in the audible range then it would sound different. However, if you remove overtones outside the audible range then it will not make an audible difference (this is what xiphmont was saying in TFA).

So no, your wikipedia link is not a citation for the claim that cmer made.

"A/B or GTFO," I believe is the parlance of our times.

Are they talking about "beat frequency" type effects?

Yes. A 40KHz tone and a 41KHz tone will interfere with each other and can create 1KHz tones that are audible. Edited to correct error, thanks anechoic.

No, Holosonics is not creating sound from beating, they are using heterodyning, which takes advantage of how high-amplitude ultrasonic sound waves interact with the atmosphere, that's different from beating.

They don't— the air is linear (except at insane sound pressures) so there is no interference. While the ear is not linear, it doesn't respond at those frequencies.

If it really worked that way it would be trivial to demonstrate. Alas, it doesn't.

> For example, the human hear will hear a 30kHz frequency if it's fundamental is 10kHz.

No. It won't.

I think you're referring to this: http://en.wikipedia.org/wiki/Missing_fundamental

Which, since it's a psychoacoustic phenomenon, wouldn't hold true when the partials involved are above the audible spectrum.

I know where you're going with this I think, and I'm not disagreeing outright, but wouldn't this be captured during the high-bitrate (or good analog?) recording and mixing phase if the recording/mixing/mastering engineer were doing things right? At least, as well as possible?

> The article's about distribution, not recording. I don't think anybody disputes the usefulness of higher sampling rates when recording.

Didn't read the article, so commenting out of context, however it needs to be said that in sample-based music genres the distributed music gets used as if it were a recording. Maybe then it could be argued that higher sampling/bit rates should be available, if only for those who are sampling.

> In theory, it's true that the human hear can't hear above ~18kHz, but it can hear the influence of the very high pitch harmonics on a lower frequency.

That may well be true. But those mixed-down harmonics that are heard "live" would then be captured by the 16/44 (or whatever) sampling. IOW, the recording captures what you heard. Those upper harmonics have no emergent properties. Their effect is captured.


I'm no sound engineer, but as far as I can tell, the main point of that paper is that some instruments produce harmonics at frequencies greater than 20kHz, not that these frequencies matter to humans. However, section X references other papers that apparently make this claim.

Just because it is difficult to record a triangle does not necessarily mean it is impossible to accurately recreate the sound (to human ears) using 48kHz.

> I'm no sound engineer, but as far as I can tell, the main point of that paper is that some instruments produce harmonics at frequencies greater than 20kHz, not that these frequencies matter to humans. However, section X references other papers that apparently make this claim.

Yes, you're right.

In fact, some of the section X references don't even mention hearing, they talk about "alpha-EEG rhytms" (in this case "listeners explicitly denied that the reproduced sound was affected by the ultra-tweeter") and "bone-conducted ultrasonic hearing" trough the "saccule" ("organ that responds to acceleration and gravity and may be responsible for transduction of sound after destruction of the cochlea").


In fact, most of the claims of the article are around the fact that there is energy over 20khz and how it can affect recording process.

This is a well known fact, and this is exactly why engineers filter out sub-sonic and super-sonic frequencies, especially today: stuff that you can't hear (or feel) will just suck your headroom and make you lose the loudness war.

The only "good sounding" triangles you'll hear are those buried in a mix. Alone, it always sounds weird and "muted".

EDIT: Listen to the triangle at the beginning of Rush's YYZ. It's an old recording, but it sounds significantly worse than the analog version. It's been digitally mastered some time ago so if it was mastered today, it would probably sound better, but still not great. I heard a rumor that Rush is remastering all their albums "for iTunes" at the moment, so hopefully we'll be able to compare soon!

Not a very good example, because that's a Crotale (A Flat, ~4" cymbal, basically), not a triangle.

Wow! I didn't know that! All these years I was convinced it was a triangle just like pretty much everybody I guess. Thanks :)

It's not like it has been a secret: http://en.wikipedia.org/wiki/YYZ_%28instrumental%29

Yes, but our ears only hear 20Hz-20KHz. So, according to Nyquist theory, you can recreate the entire signal that the human ear hears by recording those sonic artifacts that result from interference between supersonic harmonics.

So while it's true that the human ear can't hear well above ~18KHz, and the interference between high order harmonics are audible, it's also true that a properly recorded signal, sampled at 44.1KHz, oversampled, and filtered, can reproduce the exact signal the human ear is capable of hearing. At least according to theory.

The human ear is capable of detecting sound pressure as well as sound intensity, and while playback of the interference between harmonics can be reproduced faithfully in the sound intensity realm, the sound pressure levels will differ, and it is theorized that people may be able to tell the difference between the two. However, as far as I am aware, nobody has been able to demonstrate this reliably in practice.

What about sound outside 20-20k that affects us via mechanisms other than being directly sensed in the air by our ears? For instance, consider frequencies below 20 Hz that we can feel with our feet as vibrations in the floor, instead of hear with our ears? Or what about the possibility of sound above 20k causing a vibration in something other than our ears, which could have a subharmonic in 20-20k that gets conducted to our ears via bone?

I'd prefer recording technology to err on the side of capturing what we need to reproduce all of that, even if we aren't sure that we need it.

>I'd prefer recording technology to err on the side of capturing what we need to reproduce all of that, even if we aren't sure that we need it.

Again, this article is about distribution, not recording.

Nyquist is true for static signals. Music is not static. Brick wall at 20 kHz and get audible phasing artifacts! (Even if your filter is phase linear)

I'm an audio engineer, too, and I agree that this has been debated to death. And I agree that frequencies above the threshold of hearing are more important than standard dogma (based on Nyquist theory combined with Pure tone audiometry) allows. It helps explain how audio gear with a 100kHz bandwidth sounds clearer than gear with a 20kHz bandwidth even when they measure the same in the audible band.

Have you read the Audio Technology magazine interview with Rupert Neve?

Greg Simmons: Geoff Emerick, the famous British Producer ?

Rupert Neve: Yes, he started me off on this trail. A 48 input console had been delivered to George Martin's Air Studios, and Geoff Emerick was very unhappy about it. It was a new console, made not long after I had sold the Neve company in 1977. George Martin called me and said, "please come and make Geoff happy, while he's unhappy we can't do any work".

They'd had engineers from the company there, and so on. The danger is that if you are not sensitive to people like Geoff Emerick, and you don't respect them for what they have done, then you are not going to listen to them. Unfortunately, there was a breed of young engineers in the company ( I hasten to say this was after I sold it !) who couldn't understand what he was bitching about. So they went back to the company and just made a report saying the customer was mad and there wasn't really a problem. Leave it alone, forget it, the problem will go away. They were acting like used car salesmen. I was very angry with it. So I went and spent time there, at George Martin's request, and Geoff finally managed to show me what it was that he could hear, and then I began to hear it, too.

Now Geoff was The Golden Ears - and he still is - and he was perceiving something that I wasn't looking for. And it wasn't until I had spent some time with him, as it were, being lead by him through the sounds, that I began to pick up what he was listening to. And once I'd heard it, oh yes, then I knew what he was talking about. We measured it and found that in three out of the full 48 channels, the output transformers had not been correctly terminated and were producing a 3dB rise at 54kHz. And so people said, "oh no, he can't possible hear that". But when we corrected that problem, and it was only one capacitor that had to be added to each of those three channels, I mean, Geoff's face just lit up ! Here you have the happiness/ unhappiness mood thing the Japanese were talking about.

copy here: http://poonshead.com/Reading/Articles.aspx

The article doesn't suggest only using 48kHz for recording and mixing. I don't think the author would disagree that recording triangles is difficult. He would argue that once you've decided what final audible frequencies you want to present to the listener, the best way to distribute them is at 16-bit 44.1/48kHz. It's a compelling case.

What if you want to sample the song later?

That's one thing I find concerning with the move to digital. With analog media, you can go back, re-record and get an improved result (provided the source is good) but District 9 (which was shot on Red One) will never have improved quality other than resampling because the source is set to a particular digital format with associated data quality.

There seems to be some strange idea that analogue means 'infinite detail'. In this particular case, there's no significant difference between being limited by the original digital recording resolution and the grain size of a film recording.

"[...] provided the source is good" is begging the question; it's no different from saying "District 9 could be better if they hadn't recorded in 4k (or whatever the Red One was using) and downsampled it for my DVD" The nature of the source is irrelevant, barring the fact that film might provide a higher resolution, if film scanning technology increases, and you can afford to both capture on film, process and store your film properly (archiving film is rather difficult, I believe), and get the best quality digitisation possible.

Obviously, I am not claiming infinite detail. There is going to be a limit based on the grain and the size of the film (35mm, Super, IMAX). 65mm film shot is going to be of higher quality than what digital is capable of today.

While I have no doubt that digital will eventually catch up and surpass film, there inevitably is going to be a transition period where quality films were recorded (let's just say at 2k) where the input is constrained and extrapolation be the only available option.

4k is the current state of the art. It will not be so forever and because it's recorded at 4k, we can't go back and extract more dynamic range due to the limitation of the sensor. Whereas you can go back, redigitize an IMAX film (say Chronos shot in 1985) that is in good condition and get way more info than something shot on 4k yesterday.

TL;DR IMO input still absolutely matters. 35mm is not the upper limit. We went through this with photography and am now doing the same with video/film.

EDIT: After thinking more about it, here's a more extreme example. I purchased a Kodak DC20 back in the 90s (early adopter yay!), even if the camera had decent glass, there's no way I can go back to an image captured by that camera and magically get the equivalent of 22mp 5D camera by resampling. If I had used a film camera, I can get a much improved scan.

EDIT2: Here's a good example. Slumdog Millionaire was mostly shot on a SI-2K which recorded at 2k. You can't go back and get 4k output on the digital portions. So generations later, we will be stuck enjoying an Academy Award winning film at that level of quality.


And we'll never be able to go back and "re-film" "The Texas Chainsaw Massacre" on 32mm, it'll forever be marred by grain and poor low-light performance of 16mm. I guess I'm not sure what your point is. The best digital can present is currently worse than the best film can present, yes. That doesn't mean we shouldn't use it.

My original response was to the effect that the output should be high quality so that data is preserved if sampled.

Digital is the future. Hence it behooves us to have the maximal input & output possible at this time. Unfortunately, this is not common now and the price paid is that content created during this period will be stuck at the same quality level.

I'm entirely in favour of increasing the resolution/bit-depth for video, but I think the general problem is more complicated by external factors.

The cost of renting a red one and recording straight digital vs hiring a film camera, process lab, and all the other parts needed quite possibly means that some films might never have been produced due to filming costs.

What measure of quality can compare X against X, if it was never made?

I imagine (I have very little actual experience here, so it's perfectly possible I'm wrong) that digital recording might make it easier/cheaper to retake shots/scenes repeatedly to get them right as well, offering another 2nd order quality effect.

I completely disagree with the article having heard the difference many times myself. You can't record at 192kHz and hope to keep the same quality by distributing the final mix in 44.1kHz. It just doesn't work that way.

Well there is also the aliasing in that resampling. Recording at 192 for shipping at 48 makes more sense than shipping at 44.1 surely? Some audio seems to do 88.1 but rarely 176.2.

Would you like to post double-blind test results?

We actually took those challenges in school :) Lots of fun if you're an audio nerd!

OK well then, what were the results?

Hey cmer, thanks for posting

I don't think I understand quite what you're saying and wondered if you could explain more. You and the article both say that humans can't hear above about 20kHz. If there are higher frequencies that create a harmonic at a lower frequency (e.g. a 33kHz harmonic that produces a sound at 16.5kHz) then surely that lower harmonic (16.5kHz in this case) will be recorded by the original recording equipment assuming it is recording at a frequency at least twice that of the highest audible frequency (let's say that this would be 48kHz, although there might be other DAC-related reasons to go higher).

I'm possibly being very daft here!

Let's make things super simple. Let's say you record 4 sine waves at a 192kHz sampling rate: 15kHz, 30kHz, 45kHz and 60kHz. All 4 frequencies will be captured and the 15kHz frequency will sound different to your hear because its harmonics.

If you take this recording and master it for a CD (44.1kHz), you'll effectively get up to ~20kHz (since they're a low pass filter starting at around 16-18kHz). This means that only our first frequency will be captured: 15kHz. It will be exactly the same as if you only recorded 15kHz alone. The harmonics don't modify the fundamental frequency, they just trick the human hear. But when they're gone, they have no effect whatsoever.

Hope this helps!

EDIT: the frequency numbers I used are actually somewhat of a bad example. Harmonics are never exactly double, triple the fundamental. Those would be mostly inaudible. But you get the idea.

I don't think I understand how it could sound different to my ear. My understanding is that my ear doesn't have the sensory equipment to detect signals above ~20kHz - this is what I was told at university, and a decent trawl of the web suggests this is still true. If there is any sound that is in the range 20Hz-20kHz then why doesn't the microphone pick it up?

Or am I wrong, and the ear is able to detect frequencies above 20kHz?

> Harmonics are never exactly double, triple the fundamental. Those would be mostly inaudible. But you get the idea.

Actually, they are: https://en.wikipedia.org/wiki/Harmonics

The second half of the statement is wrong, but the first half is right. Harmonics in real-world instruments are not usually exact multiples of the fundamental. A simple diffeq model of a rigid oscillator will show you this mathematically.

An extreme example is present on modern pianos, where the high rigidity of the loud, heavy piano strings can cause tuners to stretch the lowest and highest notes as much as a half-semitone so that their harmonics are in tune with the note the next octave down or up. In other words, the first harmonic on the lowest note of a piano can be as much as 1/2 of a note sharp.

And when your oscillator is no longer one-dimensional, most harmonics aren't even close to integer multiples. The harmonics of bells, cymbals and drums are all over the place. That's what gives them their percussive sound. (Edit: some of these modes of vibration aren't harmonics in the linear sense.)

Harmonics in real-world instruments are not usually exact multiples of the fundamental. A simple diffeq model of a rigid oscillator will show you this mathematically.

That is absolutely incorrect, mathematically speaking, harmonics are by definition "integral multiples of the fundamental." (Fundamentals of Acoustics, Kinsler & Frey).

People from a musical, non-signals background tend to use 'harmonics' as a synonym for 'overtones' or 'partial tones', which is where the confusion arises, I suspect.

There's a measure -- inharmonicity[1] -- of how far the actual overtones of a particular instrument differ from their theoretical fundamental multiples.

[I suspect you already know this. This reply is probably for others' benefit]

[1] https://en.wikipedia.org/wiki/Inharmonicity

Then one would be forced to conclude that many instruments have no harmonics at all, which is obviously not what 'harmonic' is referring to in this thread of discussion. Why be pedantic when it's obvious what everyone is talking about?

Anyway, it's not as though mathematical literature requires you to use a term exactly one way. I had a diff eqs textbook that used the word 'harmonic' in exactly the way I used it above when I made reference to diff eqs...

> I had a diff eqs textbook that used the word 'harmonic' in exactly the [incorrect] way I used it above

So you had a textbook with a mistake in it. What book was it?

> And when your oscillator is no longer one-dimensional, most harmonics aren't even close to integer multiples. The harmonics of bells, cymbals and drums are all over the place. That's what gives them their percussive sound.

But those aren't harmonics, they're inharmonic partials.

> The harmonics don't modify the fundamental frequency, they just trick the human hear. But when they're gone, they have no effect whatsoever.

This is the part I really do not understand... either my ear CAN pick up those frequencies, maybe the harmonics are "tickling" the little hairs inside my cochlea and ultimately the frequencies I can actually hear were altered in my perception that way - or I can not hear or sense the harmonics and they physically alter the "original" wave that I end up actually hearing.

Either way, pretty much the exact same thing should happen in a studio microphone. Those all do have frequency limitations and AKG, Royer, Rode, Shure, Sennheiser, Audio Tech, what-have-you pretty much all go up to 15kHz or 20kHz according to specs, if I understand them correctly, but not further than that. If it isn't even recorded, those frequencies I also cannot hear can NOT alter my perception so they HAVE to somehow change the frequencies I can hear and are being recorded... on top of that you are making "room" for frequencies up to, say, 60kHz but I very strongly doubt your mics can go even remotely that high.

The linked article was accurate. You are confused.

"I'm an ex-audio engineer"

Hard to believe.

"The distinct sound of the triangle constitutes of a high fundamental frequency, ballpark 10kHz"

That's a pretty high note - higher than the top key on the piano. But an "audio engineer" would know that.

"many very high-pitch harmonics"

Since the next harmonic after the fundamental would be at 20khz, which only young people can hear, and none of the others are audible to any human, I don't understand what you are talking about.

"Most of these harmonics are 20kHz."

OK, you don't either.

"it can hear the influence of the very high pitch harmonics on a lower frequency."


You clearly have little to no musical background, and think that your basic math skills are a substitute. The overtones present in a cymbal or triangle are not straight multiples of the fundamental, they are chaotic, and are very important in determining the timbre. Anyone (and I mean that) can easily tell the difference between a cymbal with and without a low-pass filter with the threshold around 22kHz, because these "inaudible" frequencies are lost.

If anyone can hear it, then surely it must have been verified through a double-blind test. Can you provide a citation?

I don't know of any to point you to. They probably exist, but I haven't read them. Let me know if you stir some up.

This is a much more polite response than what I had in mind. Better this way I guess :)

The undertones created by the high overtones are realized in the anti-aliasing filter during recording. That's not actually the reason 44 kHz sampling isn't enough.

1: He said "harmonics", not overtones. 2: You can not hear inaudible frequencies. Because they are inaudible.

1: You're still wrong. One person's typo is not a slight against physics. 2: That's what "sarcastic quotes" are for.

Playing Devil's Advocate...

The statement that frequencies above 20kHz don't matter rests upon the assumption that the ear is linear. If the ear is not linear (I don't know whether it is not not) then frequencies above 20kHz will matter, as the ear will be able to mix higher frequencies down to less than 20kHz. For example, if we have frequencies of 56kHz and 59kHz, the ear MIGHT be able to discern a difference frequency of 3kHz. No doubt this effect could be reproduced by signal with a sampling rate of 44.1KHz, but only if the analogue systems, before the sampling stage, reproduce any non-linearity in the human ear.

Incidentally, you can get speakers that create a localised beam of sound, that the person sitting next to you cannot hear. They work by transmitting frequencies above the audible range. These high frequencies can be beamformed by a relaitively small speaker array, so the sound is localised. They then rely on the non-linearity of the ear (or maybe the air around the ear?) to mix the ultrasonic frequencies down to audible frequencies. I guess there must be non-linearity in the human auditory system!

On the subject of 24-bits my understanding is that 16-bits is adequate, provided the levels (scaling) are set correctly in the recording. What 24-bits delivers is the ability to do a crappy job of the mixing, and still end up with the full dynamic range of the human ear. 24-bits is probably a temporary solution though, as manufacturers will engage in the usual Loudness War [1], and push the signal to the top of the dynamic range. Before long 24-bit audio will be equivalent to 16-bits (since the 8 least significant bits will be unused) and the next big thing will be 32-bit audio.

Having said all that, I'd guess that the speakers will be the limiting factor in most sound systems, not the recording format.

[1] http://en.wikipedia.org/wiki/Loudness_war

Nonlinearity of the ear is thought to be the explanation for sum and difference tones, which most certainly exist: https://en.wikipedia.org/wiki/Combination_tone.

> Having said all that, I'd guess that the speakers will be the limiting factor in most sound systems, not the recording format.

Yes. And DACs, which normally have filters too.

Yes, though I tend to think of the reconstruction filters as being part of the recording format.

Here's an interesting article:

In 1975, the Canadian Broadcasting Corporation was using a head shaped microphone, which was presumably an attempt to reproduce the non-linearity of the ear. It would be interesting to do such experiments with digital sampling.

Thinking about it, if every person has a different non-linear response, in theory the only way to reproduce sound beyond a certain threshold of fidelity would be to reproduce the ultrasonic components, so each person would hear their own non-linearity. (That would be beyond what I can hear or care about, but it would be fun to play with. Beyond a certain level we also get to the point where we need to ask what it means to hear a sound.)

  > the speakers will be the limiting factor
  > in most sound systems
I disagree -- in most sound systems, the room is generally the most limiting factor.

Pardon the reductio ad absurdum, but would you prefer to listen to $1,000 speakers in a dry, padded listening room, or to $100,000 speakers in a tile bathroom? Obviously the room matters; I think most people underestimate by how much.

Probably. I should have left it at "it's not the recording format" and not nominated a limiting factor.

I'd take the bathroom, given that my singing voice sounds less worse there! :-)

Another "ex audio engineer" here, you can believe or not at your leisure. Many hours spent in high-end recording and mastering environments.

I'm not sure what your background in audio is, but everything he says is correct. High end frequencies well past 15k and up (22.1k actually) are widely acknowledged to influence the lower frequencies and play a huge role in the perception of the quality of a recording. This is an old debate with pros and cons on both sides, but in general you'll find the "Golden Ears" mastering engineers (Stephen Marcussen, Bob Ludwig, etc.) come down on the side of higher sampling rates.

Now, if your original recording was mastered to 16/44.1, then a transfer by way of 24/192 will probably actually hurt the recording. But if you're mastering from an original analog or high-quality digital, in my experience there's no question, higher sampling rates deliver better experiences.

I have also spent many hours spent in high-end recording and mastering environments, and it's my observation that most engineers suffer from confirmation bias just like everyone else on the planet.

I've caught engineers using L1-Ultramaximizer (or similar) to bounce a recording down to 16-bit/44.1khz as part of the mastering process, and they're always surprised when they're completely unable to hear the difference even in the most simple cover-the-screen-and-toggle-bypass test.

Audio, perhaps like the wine industry, is a vast bastion of confirmation bias and subjectivity, no argument there.

But I know what my ears hear, and IMO there is absolutely a vast different between 44.1 and 192. I'm not sure how you can even question it. Someone else on the thread was saying it's impossible to hear the difference between 16bit and 24bit. I don't even know what to say to that. It's like telling me the glass of Gallo "Table Red" you're drinking is as good as my '75 Lafite. All I can say is "cheers" and just enjoy.

Regarding your wine anecdote: http://www.theatlantic.com/health/archive/2011/10/you-are-no...

If I gave you a bottle of "Table Red" with a '75 Lafite label, I'm sure you'd tell me how rich and wonderful it was. The problem here is that, as you said, "I know what my ears hear". You know you're listening to 192, wow it sure sounds great!

You just need to consider more plausible explanations for the difference you are hearing, such as low-quality sample rate conversion on the playback devices you are using, clock jitter that is less audible at 192k than 44.1k, no dithering on the 16-bit output resulting in quantization noise on quiet sounds, etc.

If you're listening to quiet music in a quiet room at high volumes on very low noise equipment, you can hear a difference in the noise floor level between dithered 16-bit and 24-bit, but at that volume level if that music (or movie) also has full-amplitude signals you'll be reaching peaks over 110dB SPL.

  > I'm not sure how you can even question it.
With evidence.

As much respect as I have for Bob Ludwig's hard won mastering skills, he also strongly believes in $n,000/foot speaker cable, which is what he has installed at Gateway. So by all means give him well deserved props, but don't assume he's an expert on all aspects of audio theory or practice.

I've always thought the most expensive speaker cable sounds a lot better... to the wallet of the salesperson.

That data doesn't back your point at all. That data concerns what frequencies are present, not what frequencies can be heard.

He raises a lot of valid points. However...

192 kHz is clearly overkill for listening. Not so for further editing of the data.

Same goes for 16/24 bit, however, the difference between 16 and 24 bit is actually audible.

44100 is not a bad sampling rate, but it necessitates very sharp aliasing filters, which are audibly bad. A bit more headroom is well needed there.

That bit about intermodulation distortion is complete bogus. He talks about problems when resampling high-fs audio data. However, you would never do that. You would digitally process 192kHz all the way. Only your loudspeakers or ears would introduce a high-pass filter, and a rather bening (flat) one at that. There is certainly no aliasing going on there unless you resample (wrongly). Intermodulation distortion is not the fault of the sample rate.

I mayored in hearing technology. Calling 192/24 worse than 44.1/16 is total BS. How useful it is is a different debate.

>Same goes for 16/24 bit, however, the difference between 16 and 24 bit is actually audible.

This [1] (widely accepted in the scientific audio community) study's conclusions disagree with your assertion.

>44100 is not a bad sampling rate, but it necessitates very sharp aliasing filters, which are audibly bad.

This is not the 1980s, hardware has progressed beyond that point. Modern (i.e. anything from 1995 onwards) DACs do not suffer from aliasing problems. Also see [1]

>That bit about intermodulation distortion is complete bogus. He talks about problems when resampling high-fs audio data.

I did not notice that in the article. It talks about IMD in the context of the analog chain and the transducers following the DAC, and it's possible that high frequencies can increase it.

[1] http://www.aes.org/e-lib/browse.cfm?elib=14195

> Modern (i.e. anything from 1995 onwards) DACs do not suffer from aliasing problems.

True, but they do so using (long, high-quality) high-cut filters. And these filters are pretty sharp, as they have to close within, say, 18-22.1 kHz. You can design them as linear-phase FIR filters with oversampling and all the good stuff, but physics dictates that sharp filters introduce distortion. A sharp filter like that is audible.

I'm not aware of any (blind) listening tests actually showing that a modern, high-quality DAC for 44 kHz audio introduces audible distortion compared to a similarly high-quality DAC for, say, 96 kHz audio, though. It's not theoretically impossible that the lowpass would introduce some sort of noticeable distortion, but I haven't run into substantiated evidence that it actually does.

44100 is not a bad sampling rate, but it necessitates very sharp aliasing filters,

When you're talking about recording, sure, but in terms of storage and playback, we solved that problem 20 years ago with oversampling.

You will still need a aliasing filter that cuts off between, say, 18 and 22.5 kHz to avoid aliasing noise. That is one sharp filter no matter how you look at it. You can use a high quality, long, linear-phase FIR filter, but you can't cheat physics: sharp filters necesserily introduce distortion, and such a sharp filter so close to the hearing threshold does not go unnoticed.

I don't see how a sharp filter could be needed if the DAC is oversampling.

Obviously. Analogue audio does not have a sampling rate. The ADC however can oversample all it wants, but if the output is 44.1 kHz, it needs an aliasing filter that cuts off at 22.05 kHz.

We appear to be in complete agreement with each other.

Same goes for 16/24 bit, however, the difference between 16 and 24 bit is actually audible

No, the difference is not audible at all. At 16 bits of depth on a normal low-level audio signal (~0.3 volts), we're talking about less than 0.000005 volts per amplitude step. This difference gets lost in the THD already at the DAC in your audio output stage. Then it gets lost again in the amplifier. And again in the cable to your speakers or headphones. And then it gets lost again in the speaker elements. What survives in a normal low-level audio signal is about 14 bits of resolution.

44100 is not a bad sampling rate, but it necessitates very sharp aliasing filters, which are audibly bad. A bit more headroom is well needed there.

44.1khz IS a bad sampling rate for accurately reproducing anything except a triangle wave or square wave above 5khz.

why do you think "This difference gets lost in the THD already at the DAC "? Do you have numbers to back it up? What's the noise floor of DAC? What's the noise floor of an output stage? Do you have the number?

The number is somewhere between the amplitude of an ant pissing on cotton, and an ant not even thinking about pissing on cotton.

high dynamic range is not about the lowest volume you can hear, it's about the voltage resolution between this sampling point and the next. Base on your assumption, we can all see black whether we use 16bit RGB color or 24bit RGB color, what's the point of using 24bit RGB?

Many years of building audio equipment (in particular analog synthesizers), and equally many years of being meticulously anal with getting the best components for my circuits, reading specifications of down to every single op-amp I've ever employed, is why I think so.

I am not saying that there aren't any DACs on the planet that can't handle five millionths of a volt, but I am saying that five millionths of a volt isn't surviving through the particular DACs and the rest of the electronics used in your PC/living room hi-fi audio equipment.

Heh, it's funny to see this late-nineties debate get re-hashed here. Also kind of fun.

If it were true that there's no audible difference between 16 and 24 bit, companies like Alesis, Otari, ProTools, etc. wouldn't have spent the last 15 years ditching 16 bit like an old pair of smelly sneakers. (better metaphors welcome).

Seriously, anyone who has sat down in a real listening environment for 5 minutes A/Bing 16 vs 20 bit, 16 vs 24, etc. hears the difference immediately. There's no question. This is why you can buy ADAT 16 bit 'blackfaces' for $100, down from their original $4,000.

Sure, moving up from 16bit recording was an improvement, but having done engineering for a company listed above for over a decade, I can tell you that we went 24bit/192kHz because of market demand, not for any real technical reasons. We thought it was fairly unnecessary ourselves. It was also kind of an arms race with other companies, much like the megapixel arms race for digital cameras.

Bigger numbers are better. Right?

It's all marketing, baby!

...And the new Pro Tools 10 just added the ability to record in 32-bit floating point. http://www.avid.com/US/products/Pro-Tools-Software

Yes, and anyone who has ever sat down infront of an LCD flatscreen watching their favorite movie on DVD/BD using gold-plated $200 HDMI cable instead of $4.99 Walmart HDMI cable see the extra sharpness immediately. This is why non-gold plated non-OFC HDMI cables are down to $4.99 a piece from their original $49.99 during introduction.

I can't tell if you're being sarcastic or not.

I’m going to go ahead and say yes, that seems to be blatant sarcasm, or at least, reference to placebo effect / being a sucker.

The difficulty I had is that the same person claimed they could hear the difference between 44 kHz and 96 kHz, when the article (and all other comments which cited outside sources) claims that is well outside of human capability.

That's cute. Obviously you've never recorded a rock band while riding the pre to compensate for 16bit's terrible noise floor and horribly limited headroom. You've never had the joy of ruining a perfectly good take because of that wonderful sound it makes when the volume spikes into digital distortion despite compressing the wazoo out of the input source. Glorious sound, digital distortion. Run a dentist drill through an old Speak & Spell and you'd just about have it.

You've never rented an expensive tube EQ during a mix to cover up 16bit's grating harshness from 10k to 15k. Or tried like mad to make the bass drum sound like a freaking bass drum and not a pie pan slamming against the back of a plastic trash can. And yes, we had good mics and pres, all standard studio stuff. Decent, not brilliant, converters, but it was the 16bit that was the problem. Getting those 20bit XTs for the first time was like walking into the Promised Land.

Sure, there's lots of marketing ploys out there, lots of snake oil. Moving up from 16 bit was not one of them.

It looks like you are jumping in without actually having read the article in question. That's ok, but you are wasting space building a straw man proceeding to vigorously demolish him.

The original article explicitly mentions how 24bit is useful for recording.

Speaking of jumping in without reading...I wasn't responding to the article. I was responding to the commenter that said you couldn't tell the difference between 16 and 24 bit.

And you cannot tell the difference. The reason to record using 24 bits is so you don't have to be as precise centering the recording level. If that level is centered then you can capture fine with 16 bits (by the way that is also explained in the article).

Did you read the original article at all?

> Professionals use 24 bit samples in recording and production [11] for headroom, noise floor, and convenience reasons.


> Modern work flows may involve literally thousands of effects and operations. The quantization noise and noise floor of a 16 bit sample may be undetectable during playback, but multiplying that noise by a few thousand times eventually becomes noticeable. 24 bits keeps the accumulated noise at a very low level. Once the music is ready to distribute, there's no reason to keep more than 16 bits.

The original article does say that yes, during recording and production, 24 bit audio gives you a lot more room to play with. That doesn't mean that you can hear the difference between 16 and 24 bits for the final recording; just that 24 bits give you more room to keep out of trouble during production.

Did you read the comment thread at all? I wasn't responding to the article, I was responding to a comment:


>Same goes for 16/24 bit, however, the difference between 16 and 24 bit is actually audible

No, the difference is not audible at all.


For those of you who are interested in just how much of a golden ear you truly are: download Harmon's "How to Listen" software for Windows or Mac OS X http://harmanhowtolisten.blogspot.com/ (scroll down).

Harmon requires its trained listeners to pass tests based on this software before participating in juries to evaluate Harmon products. It doesn't directly address the sample rate/bit depth issues discussed in the linked article, but it does address a lot of the issues brought up in the HN discussion, so you can have a chance to see how much those characteristics really matter.

You may be surprised.

Even without debating the science and signal processing arguments raised...

In any test where a listener can tell two choices apart via any means apart from listening, the results will usually be what the listener expected in advance; this is called confirmation bias and it's similar to the placebo effect. It means people 'hear' differences because of subconscious cues and preferences that have nothing to do with the audio, like preferring a more expensive (or more attractive) amplifier over a cheaper option.

The human brain is designed to notice patterns and differences, even where none exist. This tendency can't just be turned off when a person is asked to make objective decisions; it's completely subconscious. Nor can a bias be defeated by mere skepticism. Controlled experimentation shows that awareness of confirmation bias actually increases rather than decreases the effect!

Doesn't that completely negate his conclusion, that there is no point to distributing 24/192 music? If people want to pay for 24/192, and even he just admitted that they will legitimately enjoy it more, how can you conclude there is no point?

Life is short. I want to enjoy things. Whether or not my enjoyment can be quantified or scientifically defended, I really don't give a shit. But that's okay, if you don't want to sell me 24/192 music, Amazon will. Between this and DRM-free content, it's no wonder I buy all my music from Amazon these days.

There is a perversion going on both ends here. And by perversion I mean a distortion of truth in a bid to make a profit. This is not the worst that can happen, but is just worth mentioning. You probably put more mildly, but I am bit more harsh. Some people are irrational and spend money of stuff that they don't need and another group of people are perpetuating the lies and the marketing in an effort to extract the maximum amount of money from the other group (In other words your basic market setup).

Audiophiles are quite a fascinating group. These are people that can be rather rational in some respects (they could be doing research in some lab somewhere) but when it comes to audio equipment they will shell $2000 for HDMI cables. The salesmen and manufacturers that make these things ("high end" HDMI cables, 192kHz recordings) know this very well and they aggregate around this target set of clients.

I think that is exactly what is happening here. At some point storage capacity is just good enough and one can distribute 48kHz, 16bit audio to everyone. But what do you do next? Everyone is getting that and it is not new and cool anymore. What to do? Well increase the frequency and sell everyone a newer, better, higher fidelity thing, even though objectively human years cannot really hear the difference. Subjectively though, there is a huge difference. If you ask someone who just spent $50 for a 192kHz record if they like it better than say a $20 48kHz one, I bet you 100% of people will confirm that 192kHz sounds better and will be ready to go and buy more.

> Doesn't that completely negate his conclusion, that there is no point to distributing 24/192 music? If people want to pay for 24/192, and even he just admitted that they will legitimately enjoy it more, how can you conclude there is no point?

Ultimately, sure. The world is full of products and services which only add value in this weak sense.

If the same wine tastes better if it's priced higher, then it still tastes better. But I think it's only honest that the consumer be aware that the increased utility from being priced higher is due solely to the fact of it being priced higher. Beyond that, I don't care.

One thing we can all agree on is that music is much more enjoyable if you think you're listening to it through good equipment or from a good source. Ultimately it's only the `thinking' part that matters. So I would make two points:

1. One point he's making is that playing audio sampled at 192khz through regular equipement actively distorts the music in negative ways. So now if you know this now you should enjoy that music _less_.

2. If you're adept metacognition (maybe that's not the right word), you'll realize a) you can get most of the enjoyment by buying equipment that's `pretty decent', and then not worry about it too much. b) you're probably fooling yourself by spending so much time/money worrying about having the best equipment, so you're probably not getting the maximum utility from the experience anyway. Or maybe it's the experience of trying to get the best equipment it self that's enjoyable, not necessarily the increased audio fidelity.

> If people want to pay for 24/192, and even he just admitted that they will legitimately enjoy it more, how can you conclude there is no point?

Sorry, no time to reply. I gotta run and write up my biz plan to distribute 32/384 audio.

SUCKER! I'm already working on 48/768 audio. It's amazing how clear the recordings are.

Well, if we accept that argument, then just about any means of signalling "this sounds better!" will work. How about we choose something that doesn't waste bandwidth?

That's true. It's kind of the Monster Cable model. BTW, I'm not saying that marketing and whatnot should deceive less technical consumers and trick them into spending more money than they should (which is basically what Moster Cable does). But when you explain to technical people why something like 24/192 isn't better (other people in this thread have pointed out, this isn't totally accurate in the first place), and they understand what you're saying but still prefer it, by all means, let them buy it.

This is the same reasoning that somebody used when I was debating with her if insurances should reimburse homeopathic and other alternative treatments. Her reasoning was 'well if it works, it should be reimbursed, doesn't matter if it's from a placebo effect or not'; my position is that they shouldn't be reimbursed, but quite honestly, I don't really have a rational reason for it (at first I thought I had but it turned out I couldn't formulate it, which is the same as not having it).

So, while I have no option (for now) but to acknowledge your position, I still feel dirty for doing so.

If there's no point arguing against something that people will eat up regardless of evidence or fact,

why are you arguing against the conclusion of an article that has this many upvotes on HN?

This article is one of the most lucid and accurate that I have read on this topic.

However, one thing that's missing here (and in nearly all other similar pieces) is a full discussion of the prerequisites of the sampling theorem. For example, the signal must be bandwidth-limited (and no finite-time signal can be).

But this is a minor concern, as there are many elements in the analog domain of the recording and playback chains that serve as low-pass filters - starting with the mics. So bandwidth-limiting is effectively achieved.

For a similar reason, the discussion of the "harmful" effect of high-frequencies to playback electronics and loudspeakers to be a bit overdone IMO. Peruse the excellent lab results of modern audio gear on Stereophile's web site. You'll find that bandwidths exceeding 30kHz are rare.

One last thing. When doing subjective "testing," keep in mind that what some folks are hearing may be limitations of their gear. For example, most DACs derive their clocks for higher sampling rates (88/96/176/192) by clock-multiplier circuits. IOW, 44kHz and 48kHz are the only ones clocked directly by a crystal. These multiplier circuits are often noisy, contributing to jitter. The audible effect of this jitter is hard to predict.


PS As an avid audiophile, I find the clash of subjectivists and objectivists on this normally-buttoned-down forum to be a bit of a trip.

You always record stuff at 24-bit/192 kHz for many reasons usually involving minimizing analog artifacts and to give you a lot of information to work with. You use 32-bit float wavs to transport stuff around so you don't have to worry about normalizing levels and clipping. Lossless formats drastically improve the quality of transients by an enormous degree. But every single objection to this is either ignoring the points of the article, or talking about the benefits of recording at high fidelity, when this entire article is pointing out that once you have _finished a mix_, there is no reason to distribute things in 24-bit/192kHz. Most speakers can't even play about 20kHz anyway, which makes the entire point moot. I don't care if you have a bajillion kHz, the speakers can't play about 20 kHz, so your screwed.

You're getting two entirely different things mixed up.

192 kHz is the sample rate. 192,000 slices per second. It does not refer to the audible sound spectrum.

20 kHz in speakers refers to the cycles per second of the audible waveform. Normal human hearing rage is 20 hz - 20 kHz. For most people, it's less than that.

A speaker can certainly play back music sampled 192,000 times per second. Most of them can't play tones that are higher pitched than 20 kHz, which is fine because mostly only dogs can hear up there anyway.

I am not getting these things mixed up, because the sample rate is related to the maximum frequency that can be stored, and lo and behold, look at all these people claiming that those higher frequencies matter. 44.1 kHz sample rate can only encode tones up to about 22 kHz, whereas 192 can encode frequencies of up to 81 kHz, and those people up there are arguing that these higher frequencies are exactly why 192 kHz is superior. Now, if you want to say that sampling a tone at 44100 times per second somehow won't sound as good than 192000 times per second, I'm not saying that isn't possible, but I don't really take that claim seriously at all.

The fact is, simply distributing music in lossless format carries the vast majority of audible improvements. Arguing over whether or not its 24-bit or 16-bit or making a chunk of sound last 5.2 microseconds instead of 22.67 seems incredibly stupid to me, because you're better off simply improving the mix itself then fiddling over such microscopic differences. These things only become relevant if your mix and performance and recording equipment (or synths) are absurdly close to perfection. This becomes even LESS relevant in an age of indie-musicians.

The sampling theorem is for static signals and perfect filters. Turns out, music isn't static. Once you have transients in the program, you need higher bandwidth or you will end up with phasing effects (time domain aliasing.) This is plain from the math!

Filters are also not perfect (but good oversampling filters are not the weakest link)

Further, even perfectly dithered 16 bit data can't go 20 dB below the quantization floor, unless you give up on frequency response on the high end. Again, this is plain math.

With a calibrated 105 dB low-distortion sound system, in a quiet room, I can hear imperfections from 16 bit, 44 kHz material, especially in soft flutes and triangle type percussion. Of course, D class amplifiers, and MP3 encoding, do worse things to the signal, so let's start there. But 20 bit, 96 kHz (or at least 64 kHz) are scientifically defensible, when analyzing the math and the physics involved. No snake oil needed!

For an article containing a lot of "well, if you knew signal processing..." there are two fairly major oversights:

1) Any well-designed system is going to have headroom. Period. Just because 48kHz can capture the frequencies the human hear theoretically, it's always good to have a little wiggle room. This comes into play even more with interactive situations: humans are particularly sensitive to jitter. Having an "overkill" sample rate lets you seamlessly sync things easier without anyone noticing.

2) 192kHz comes with an additional benefit besides higher frequencies: it also means more granular timing for the start and stop of transients. More accurate reverb would be the obvious example. I don't know if the human ear can discern the difference between 0.03ms and 0.005ms but it's something I don't see mentioned often.

1) 48kHz sampling does include headroom.

2) increased sampling rate does not improve timing. This also has been researched in detail (because it sounds like it could possibly be true given that the ears can phase match to much greater granularity than the sample clock). It was found false in practice, and in retrospect, the sampling theorem explains why. The Griesinger link discusses this with illustrations, and provides a bibliography.

To avoid the trouble of digging up the link: http://www.davidgriesinger.com/intermod.ppt

Slides 29-35 address this point.

> it's always good to have a little wiggle room

48kHz already has enough 'wiggle room'. How many people do you personally know that can hear a 24kHz sine tone?

> more granular timing for the start and stop of transients. ... it's something I don't see mentioned often.

Probably because it doesn't make sense. Human ears cannot hear frequencies about 24kHz and Nyquist tells us that 48kHz is enough to completely capture all the detail of a signal at that frequency and below.

  > Having an "overkill" sample rate lets you seamlessly
  > sync things easier without anyone noticing.
You can get the same theoretical benefit by oversampling on playback. And a lot of audio equipment does just that.

  > 192kHz ... also means more granular timing for the
  > start and stop of transients.
Not really, for two reasons -- unless you're talking about glitch music, transients are unlikely to ever be so sudden that the difference between 0.03ms and 0.005ms could possibly matter.

I'm pretty sure that #2 isn't true; signal processing folks will be able to phrase this better than I can, but I think that if you have enough information to capture the waveform at a given frequency, you also have enough information to precisely place it in time - phasing errors are more likely due to quantization error, which is about bit depth, not sample rate. No?

[edited: I was wrong]

This is completely incorrect, by shannon (http://en.wikipedia.org/wiki/Nyquist%E2%80%93Shannon_samplin...). The sampling frequency determines the maximum frequency that can be captured, not the temporal resolution. That said, a transient containing higher frequencies will be sharper than a transient that doesn't, but its onset time resolution will not be determined at all by the sample rate.

Said another way, two band limited pulse signals with different onset times, no matter how arbitrarily close, will result in different sampled signals.

> two band limited pulse signals with different onset times, no matter how arbitrarily close, will result in different sampled signals.

This is true, but different than what I am arguing. You're saying that a listener over time will be able to tell that the two signals differ. I am saying that a listener will be able to determine this at fractional wavelengths.

It's similar to dithering a high dynamic range signal onto a lower bit depth: more than two samples are required for "evidence" of two different signals, while sampling at a high enough rate will tell you this almost instantly.

Again, I don't know if human ears are able to detect this, just that I haven't seen it addressed in these discussions.

I'm not sure what you're getting at.

As a thought experiment, let's consider a pulse that has been band-limited to 20kHz. Are you arguing that the analog output of a (filtered, idealized) DAC would look different depending on whether the dac was running at 44.1kHz vs 192kHz? If so, I don't think many people would agree with you.

Any difference in the "timing" of the output wave would have to come from energy that falls above nyquist of the slower sample rate. So, while I agree with you that the timing would be sharper, this is exactly caused by "higher frequencies", not by some other sort of timing improvement.

> Are you arguing that the analog output of a (filtered, idealized) DAC would look different depending on whether the dac was running at 44.1kHz vs 192kHz?

No. I'm arguing this: take a 44.1kHz signal and upsample it to 192. It's the same signal, same bandwidth and everything. Duplicate the stream and add a 1 sample delay to one of the channels. When you hit play, that delay would be there. If you downsampled the 44.1kHz signals after applying the delay to one of the channels, you would almost hear the same thing. The difference is that you could not detect the difference between the signals until after a few samples. With the 192kHz stream it would be unambiguous after 2.

Remember, Nyquist-Shannon holds if you have an infinite number of samples. If your ears could look into the future then what you say is perfectly correct, but they need time to collect enough samples to identify any timing discrepancies.

You are right.

i think what jaylevitt is referencing to is that there is interpolation going on in the dac. that could mean (i'm no dac expert, so not sure) that the dac could guess more granular than the sampling rate would allow the start points (of transient e.g.)

but the question for me is how exact that guessing is. correct me if i'm wrong but, that interpolation happens twice: when recording by the adc and on playback by the dac.

so a lot of that whole discussion (yeah, finally something about acousticts :) depends on how accurate interpolation works in adcs and dacs.

This is the core secret of the sampling theorem. It says if you have signals of a particular type (bandlimited) you can do a certain kind of interpolation and recover the original exactly. This is no more surprising than the fact that you can recover the coefficients for an N degree polynomial using any N points on it, though the computation is easier.

It turns out that if you reproduce a digital signal using stair steps you get an infinite number of harmonics— but _all_ of them are above the nyquist frequency. The frequencies below the nyquist are undisturbed. Then you apply a lowpass filter to the signal to remove these harmonics— after all, we said at the start that the signal was bandlimited— you get the original back unmolested.

Because analog filters are kinda sucky (and because converters with high bit depth aren't very linear), modern ADCs and DACs are oversampling— they internally resample the signal to a few MHz and apply those reconstruction filters digitally with stupidly high precision. Then they only need a very simple analog filter to cope with their much higher frequency sampling.

But at a given sample rate, if I'm sampling at bit depth 2, doesn't that quantization error end up temporally shifting the sine wave I'm reconstructing?

> I don't know if the human ear can discern the difference between 0.03ms and 0.005ms but it's something I don't see mentioned often

That's the time it takes sound to travel 8mm. Do you think you could tell if an instrument was positioned differently by 8mm?

The ears distinguish directional audio in part from timing differences in what hits each ear.

http://en.wikipedia.org/wiki/Sound_localization cites http://web.archive.org/web/20100410235208/http://www.cs.ucc.... that suggests the brain is sensitive to timing differences between ears as low as 10 microseconds, or 0.01ms.

It's not the timing differences, it's the phase differences. The ear is exceptionally sensitive to phase differences between the ears below 1kHz. This information is captured exactly (to well beyond the naive precision of the sampling clock) for any frequency below Nyquist.

0.03ms is 33kHz - you can't, no matter how much you want to, make a granular timing that is faster than at least one cycle of the frequency you are using. 0.005ms is 200kHz BTW.

This isn't true. Sample a bandlimited impulse. The exact timing is encoded into the gibbs oscillations of the signal. So long as you have a high enough SNR you can have timing as precise as you want. (and because the ear doesn't work with ultrasonics— it is itself bandlimited— it uses the same phenomena for timing)

humans are particularly sensitive to jitter.

Humans are sensitive to jitter, but jitter isn't a major problem with modern digital electronics and reclocking strategies. This ArsT thread hashed out these issues a couple of months ago: http://arstechnica.com/civis/viewtopic.php?f=6&t=1164451...

What I would love to have is: independent instrument/vocals tracks along with a default recommended "mix". The default mix would be used for normal playback and independent tracks would be great for custom mix / karaoke etc.

Is this too unrealistic to expect? Has something like this been tried before?

Trent Reznor / Nine Inch Nails has done it several times: http://www.ninremixes.com/multitracks.php

Plenty of other artists have as well, but this is the most high profile example I can think of. I agree it would be great if it happened more often.

I'm not a huge NIN fan, but Trent is truly awesome when it comes to digital music. You can add excellent mastering and dedicated surround mixes too..(rec: Social network soundtrack). Also a former oink'er.

The beatles multi-tracks are also available (although they were only recorded 4-track so not every instrument always has it's own track), and there has been a handful of artists who have released their samples of one song for remix competitions (Daft Punk, Royksopp, Booka Shade).

There are two reasons I don't think this will happen:

1. People would use the tracks to create custom remixes which they would then distribute. What happens when a remix becomes more popular than the original track? Artists generally have to pay other artists to remix their songs (usually via royalties).

2. Creativity. When an artist creates something they want you to hear it the way it was intended. Allowing you to remix it however you like takes away a lot of the creative control from the artist.

Regarding remixing. Artists usually don't "pay" each other, but return the favor - if it's the right term to say. E.g. artist A remixes a song of artist B and artist B in turn does the same for artist A. Or if they are all on the same record label artist A does a remix for artist B and later B makes a collaboration with A. I've noticed this in electronica/edm music artists at least.

And another important remark: some artists are flattered when someone asks them to make a remix for their song. (Imagine you're an artist and your idol asks you to make a remix of his song.)

True. I still think point 2 stands though. If your idol or another artist you respect, asks to remix your song, you may be fine with that. But if every person that buys (or pirates) your song can remix it you might be less happy about it.

I write music and have considered releasing separate tracks so people can freely remix it but I prefer just having mixes that are controlled by me. Allowing another artist you know to remix your track still allows you some sort of control (you know their style so have some idea of how the remix will go). Giving up that control is a big step and, I think, an unnecessary one.

> Giving up that control is a big step and, I think, an unnecessary one.

Why do you need that control? Someone creating something new with your work, doesn't seem to damage your work in any way.

I don't think there is really a logical answer, it's more of an emotional thing. Perhaps it is related to my personality (I like to be in control), in that case other musicians might feel differently.

That was predicted (and suggested) by Glenn Gould some forty years ago. At the time, anything with higher fidelity than, say, a bad telephone connection, was analogue, but we were stepping into the world of quadraphonic sound (which died soon after in the analogue kingdom), but he was a big proponent of the listener as participant (hey, it was Toronto and McLuhan was still around) and was convinced that technology was the only limiting factor at the time. (To put thing in perspective, he was also very much anti-concert--he hated what he called the "non-take-two-ness" of live performance.) Let's just say that the idea was no more popular among artists then than it is now.

I'd love that too... damn, should have put it in the article...

Closest I've found is to take the .mogg files out of the Guitar Hero games and use those to make new mixes. :-)

Many of the groups I listen to do do this -- this is certainly not that rare. Sometimes they go for a bit more money by releasing a separate CD with karaoke tracks, for example, but at least if you want it, it's available.

Of course, you can very easily get just the vocal track by subtracting the two. Sometimes the "non-vocal" track will still include backing vocals or the like in appropriate places, and just pull out the main vocal track.

Some musicians have even released every single track of their work separately; see "Desperate Religion" on http://en.wikipedia.org/wiki/Trilogy_(ATB_album) , intentionally inviting remixes.

For music where the vocal tracks aren't released separately, you can often pull them out nevertheless. The best is if you can get the audio in 5.1 -- vocals are almost always center-panned, which makes extracting them quite easy.

Some songs when released as a single have Acapella and Instrumental versions of them as well. There are also compilations with only acapellas and compilations with only instrumental versions of the songs.

And when you have them, just use something in the like of Ableton Live and that will be it. I think that that's what you mean right?

It will be a great idea to have tracks released as several `layers` so that the user can choose which of them to play and which not, for example the bass/beats layer, layer with melodies, layer with the percussions, layer with the vocals of course, but that sounds like semi-studio production.

I have to say that was probably the most comprehensive dealings with the issue of sample-rates I've ever come across. I'm not going to make the mistake others have of claiming falsehoods (all of which i've read so far have been debunked to my satisfaction by the HN users-- i'm impressed, guys).

As pointed out, mastering has vastly greater effect on the audio quality (and is often pretty poor[1]), and is the reason vinyl records often can sound better than their digital counterpart, despite being an inferior technology[2]. The DAC used also has a massive effect on the sound once you get into decent quality equipment.

Like the author, i'd also love to see some expansion of mixed-for-surround music.

[1] a lot because of loudness wars, as pointed out in the post, but also just due to a lack of time/care/love(/demand?).

[2] http://www.hydrogenaudio.org/forums/index.php?showtopic=6175... This thread explores the bit-depth of vinyl records, beginning with a claim of a maximum 11-bit resolution-- limited by the width of a PVC molecule the record is made from.

My hearing has declined over the years, to the point where audiophile gear is a complete waste of money. For example, I can no longer hear the difference between a cassette tape and an LP. I still listen to and enjoy music all day, but no longer worry at all about the sonic quality of it.

My advice to you younger guys is to keep the windows rolled up while driving. I have no other explanation why my left ear is much worse than my right.

This is a really convincing article that makes me want to set up a double blind test for myself with my own equipment.

In my own tests I believed that I couldn't tell the difference between 16/44 and 24/96 on high quality loudspeakers, but I could with high quality headphones. The studies cited all seem to use loud speakers in testing.

Also worth noting, the article states that obtaining 24/96 source material sometimes means you get better mastered material, which still sounds better after down-sampling back to 16/44.

You weren't just believing things. The difference between 44khz and 96khz sample rate is very noticable even with mediocre audio equipment. It's an overstatement to refer to the situation as a "hi-fi case". 16/24 bits however makes no difference at all except on the size of the material.

I know a bit of sound engineering, waves and so.. I totally agree with the title and the first 60 lines of article, and I add my POV: 1. Most of the people doesn't care, 2. What apple did is just about marketing, 3. Most of the people who says that care is pretending, 4. Zeppelin still rock the shit in a poor quality mono mp3 recorded by a drunk guy in the audience of a concert in 73.

I do care, but I'm not the average user. Apple has always catered well for those in audio and video, up to professional levels. These are markets that retain Apple users, even when Steve Jobs was between Apples. It seems like Apple is only requesting masters to come in a higher resolution, not that consumers will generally end up with these. I think this is entirely fair since before you want to modify something (e.g. to remaster it for iTunes) you want to start off at a good quality high resolution.

That said, if Apple also allows high quality recordings to be sold, it will be useful. For example of their acapellas, instrumental tracks or samples, it would be convenient for others who want to want to remix it, and iTunes would be a platform for this trade.

Also for tracks DJs play. Most compression throw away a lot of the bass which people can't hear, but this is bass you can feel rumbling through your guts on a big sound system and is part of the experience.

For the rest, they were happy with low rate AAC files on the early iPods, they are happy with the sound coming from their crappy little iPod dock, for them it won't make a difference as long as it's a chart music track from a memorable and impressionable time of their life.

In normal listening conditions and for most people the difference between 16/44 and 24/192 is inaudible.

Given a 5 minute song, if I have the choice to download a 11MB file (320kpbs MP3) or a 330MB file (24/192) I would of course choose the 11MB file. The sound quality is perfectly acceptable and the file size much more convenient to manage (storage, backups, etc.).

In terms of the convenience of managing the file size and sound quality I think 320kbps MP3 is the best compromise.

Here's a file size comparision of a 5 minute stereo song:

MP3 128kbps > 5 MB

MP3 320kbps > 11 MB

Uncompressed 16/44 > 50 MB

Uncompressed 24/192 > 330 MB

When talking about sound quality there is a much more relevant issue: the amplitude compression (distortion) abuse used by mastering engineers and producers that totally destroys the dynamic and life of the sound. That is a real issue. When buying a song there should be two versions to choose from:

A) "Loud", dynamically destroyed / distorted version.

B) Normal, dynamic, non-distorted version.

Today only version A is available to buy.

But then for every 10 people like you there is 1 person who is willing to pay 20x as much so they can get a "higher fidelity" product.

For a producer and manufacturer the rational approach would be to cater to that craziness and extract as much money from it as possible. In other words if you are selling HDMI cables, spend $2/cable to make it, then sell most for $5 and then re-brand some and sell for $500. If only takes 1 out of 100 people to buying that to make the same profit. You know these people are obsessed and irrational so you cater to that. And that's basically how we end up with ridiculously overpriced Monster cables and recordings distributed to customers @ 192kHz.

Agreed, that market exists. My point is, why discuss the subtle difference between 16/44 vs 24/192 when there are far more audible and damaging practices going on in the music industry. For example, aggressive compression and brick limiting which adds distortion to achieve maximum loudness ('loudness wars').

I mostly agree with the article in the context of distribution of a final mix. However, the article ignores one glaringly obvious reason to distribute in 24/192 format: to allow the listener to be a participant in the creative process, enabling better results for amateur musician listeners who want to sample or remix the audio or for DJs to get better results when altering the tempo for beat matching one track with another, etc. Of course, if you're going to do that, you might as well distribute in a multi-track format instead to maximize flexibility for the end user (Want to sing karaoke? Just turn off the lead vocal track for playback).

Yea, and and if the bandwidth/storage is at all an issue 6x size bloat from 24/192 pays for 6 separated tracks. (Actually more, because multitrack is more losslessly compressible while 24/192 is less). If you're already providing multitrack then 24 bit audio would make sense... otherwise, meh.

There is no harm in releasing higher quality uncompressed or loss-less tracks. At the worst they will bring in some new customers, such as myself, that currently will not buy music online. Why would I pay $10 for an album as a highly compressed download when I can pay the same price for the CD and rip it to FLAC myself? I realize I am in the minority here, but as CDs phase out even more, there has to be some other way for consumers to obtain high quality versions of tracks.

Footnote, you don't have to have a >$10,000 setup to benefit from higher quality tracks (compared to the downloads that sometimes have 'questionable' quality). I have two systems, a full range stereo (front left and right) setup for nearfield listening at my desk thats +/- 1DB from 50hz-20khz. The other is a stereo setup in my media room; 2 way quarter wave transmission line, +/-3DB 40hz-20khz. The point is, there are a lot of people with less than $1200 in audio gear that still want lossless tracks made available. Who cares if the human ear can't discern much of the extra information, we still want it.

A few years ago I became really interested in recording music. I had been writing a little with a friend, using whatever crap equipment we could afford, the results weren't amazing but we were having fun and staying focussed on the music itself.

Then we starting recording other people. I became obsessed with gear, software and all the associated toys that go with any technical pursuit. I'm a programmer, so it's easy to understand how that happens but I totally lost sight of the music, spent way too much money and equipment that was nowhere near being required and generally lost the plot. I was tracking everything 24-bit/96kHz and bemoaning the loss of quality when I mixed down for CD.

Anyway, the TL;DR version of what followed was that we recorded quite a bit, lost interest in making our own music and then the whole adventure came to an end. Now my gear is leaving via eBay and I'm finding my way back to just playing guitar and trying to write good music.

24-bit/192kHz - pointless. Give me a small venue and a guy with an acoustic guitar any day.

This is a good article, however the guy who has been pushing this for years and years now, is a man called Dan Lavry. In fact he wrote a very good, rigorous explanation a few years back,in very readable and well written form.


Minor nitpick

> The FLAC file is also smaller than the WAV, and so a random corruption would be less likely because there's less data that could be affected.

At the same time, if you flip a bit on a WAV file, you may hear a "pop" sound. On a FLAC file, the whole encoding block may be inaudible (or worse).

The hearing of ears is a time-domain thing, not a frequency domain thing. It's the frequency response of all the frequency components added together. people might not be able to respond well to a single high frequency tone, but might respond well to a combination of tones.

No. It's both.

The basilar membrane is a loosely tuned resonator. The hair cells placed on it fire beginning on the positive zero crossing. So, to a first approximation, the ear is in fact a filterbank.

There is a time domain component in that the cochlear nucleus contains nerve cells that watch multiple hair cells at a time and correlate the firing in several different ways. Some attempt to discriminate pitch, some convolve and correlate in-phase firing energy, some look for tones to end, etc. This information is then forwarded on to the brain.

However, getting back to your point, no hair cells will fire if the basilar membrane doesn't move, and it's tuned to a frequency range.

I find mp3 and aac compression artifacts to be monstrously irritating. I have no idea how the majority of the world seemingly can ignore them.

Further, I can hear a difference between 44.1kHz and 96kHz. Whether you can hear that difference is up to you. (The word-length is a red herring - there's no new information contained in a 24-bit recording vs 16.)

IMO anything less than flac and you're missing something. Higher sampling frequencies do add to the sound, but in a way that is almost invisible to the untrained ear. Perhaps these should be distributed at a premium the way SACDs and similar "audiophile" formats were in the past?

So, presuming we take this example:


The key to reproducing the original signal from the digital signal is a low-pass filter that rejects everything above the sampling rate, correct?

That is to say, what I am getting at is while the original signal can be reproduced, it requires properly tuned, and probably reasonably high performance, hardware to remove the higher frequency components of that square wave. Can you count on consumer grade hardware to do this well?

Yes, thats basically it. They do this _exceptionally_ well in fact.

Typically the technique used inside DAC is to digitally upsample the signal (by duplicating samples, often to a few MHz— also allowing them to use a low bit-depth DAC) then it applies a very sharp "perfect" digital filter to cut it right to the proper passband (half the sampling rate). The analog output then contains only a tiny amount of ultrasonic aliasing which is so far out that it's easily rolled off by simple induction in the output.

This isn't just theory. Here is a wav file I made at a 1kHz sampling rate, where every other sample is -.25/.25: http://people.xiph.org/~greg/1khz-sampled.wav (so a 500Hz tone, the highest you can represent with 1kHz sampling).

Feeding that file to a boring resampler (I used SSRC, but anything should give roughly the same result— a least when not quite so ridiculously close to nyquist, most will attenuate near-nyquist data extensively) and get this: http://people.xiph.org/~greg/1khz-sampled-to-48khz.wav

Here are the two signals plotted against each other: http://people.xiph.org/~greg/1khz-to-48khz.png

As you can see— the 500Hz sinewave is reconstructed perfectly. (Of course, a 500Hz square wave would not be (you'd get a sinewave out) but this is because a 500Hz square wave contains energy far beyond the nyquist of 1kHz sampling).

Here is a spectrograph of the same signal http://people.xiph.org/~greg/1khz-to-48khz-spec.png showing that the tone is indeed pure (the faint background noise is the dither the resampler applies when requantizing its high precision intermediate format back to 16 bits).

Your question is somewhat amusing. A standard CD player uses 1-bit DAC (it's either on or off) at a yet-higher frequency to achieve better linearity. Filtering is quite easy in the analog world.

I was under the impression that two inaudible high frequency tones could interfere with each other to create an audible interference pattern. (I think known as a "beat frequency").

If this is the case, then all of the arguments in the world about the maximum audible single frequency are irrelevant. Imagine music composed entirely of these beat frequencies and performed with a pair of oscillators between 25kHz and 35kHz. Without higher resolution encoding, it would be audible IRL but the recording would be silence.

If the beat frequency is audible, it will be on the recording. Obviously.

That would suppose that the recording device precisely matched the orientation of the listener, and the recording was not created digitally in (multi-track fashion for example). There would have to be air space in order for the interference pattern to set up in.

So you'd be right if your mics were head spaced and in the venue. But you'd still have secondary data, with the original lost.

> But you'd still have secondary data, with the original lost.

By that standard, the original is always lost unless you have a completely holographic recording. 192kHz doesn't help with that problem at all.

Alas, they don't— you can easily demonstrate this for yourself. Startup an audio editor and generate tones at 25k and 28k (make sure you can't hear them— otherwise you have severe distortion screwing up your test) then play both at once. You will not hear a 3kHz tone.

The tone you get from an acoustic beat is not a real tone— it's a perceptual quark that requires you to be able to hear the tones in the first place.

I tried this in Audacity, with the project set to 96kHz and two tones at 25kHz and 28kHz. I couldn't hear either of the tones individually, but I could hear a tone when played together. This is on Windows 7 with the sound card configured for 24-bit/48kHz. Am I running into resampling artifacts somewhere in the chain?

EDIT: it turns out Audacity won't generate a tone above 20kHz (the UI accepts the value, but when you reopen it the value has been rounded down), so both of my generated tones were actually 20kHz.

I tried this with 19kHz and 20kHz. Couldn't hear any of them on their own, but very clearly together.

EDIT: You can generate higher than 20kHz by increasing the pitch of a tone lower than 20kHz. Upon doing this, I could hear 24kHz and 26kHz together.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact