I must say I get rather irritated when people spend time worrying about dubious 'tweaker' methods to improve their audio, when the most under-performing component of most people's sound equipment also has the lowest-hanging fruit: The room itself.
When people ask me where they should spend money to improve the quality of their hi-fi or home theater system, in nearly every case my response will be something like "get a thicker rug" or "put something on this wall to absorb sound reflections, even if it's just a bookshelf."
Beyond that, I'd tend to say something like "stop being so paranoid about what you think you can't hear, and enjoy the damn music."
stop being so paranoid about what you think you can't hear, and enjoy the damn music.
I'm a composer who works in film/games. I can assure you this is exactly what I'd like people to do when they listen to my music. I spend 99% of my time trying to create good musical ideas, and I spend 1% of my time getting the mix down. I get criticized (rightly) for this quite a bit, but it is hard to care about someone sitting in a >$10,000 labyrinth of sound equipment when I'd rather write a catchy tune.
Then again, when I write sheet music I have to endure some of the most soul-crushingly awful midi sequencing in order to check my work, so perhaps I'm too tolerant to terrible sound quality. Still, I'd rather people listened to the music, not the sound of it.
> A good sounding catchy tune is something work spending that little bit more time on.
But in reality most of your customers want a ton-frakk of compression, loudness and filters on that catchy tune so it ultimately sounds HUGE on the tiniest phone, radio and car speakers... so all that audiophile mixing and dynamics are completely lost anyway.
Agree one-hundred percent about the room (although the prescription isn't always as simple as "get a thicker rug" etc).
The other issue regarding high-frequency sound reproduction is that in most cases, the loudspeaker won't be outputting much beyond 22-25 kHz (assuming very good quality loudspeakers, cheap consumer grade units might struggle to hit a -6 dB point at 18 kHz) and even for the speakers that have usable output at that range, the directivity at those frequencies will be so narrow that your head will have to be locked in the perfect "sweet spot" to hear anything.
I might sound/be stupid for asking, but what's the actual physical response from something at 22 kHz+? I have a hard time picking up a pure sine > 17 kHz. I doubt I'd get any aural response from anything at 22 kHz, so what's the deal?
The deal is just that you're getting older. Your ears just don't work as well as a 12year old's. Neither do anybody else's your age (within the bounds of typical human variation - probably well over 95% of use _never_ heard 22kHz, not matter _how_ "young" our ears were).
I was once in a small, treated room working with some rather large PA speakers. I was curious how far my hearing range actually extended, and did something very unwise: I played a 20kHz tone and very briefly ramped the volume up and down. I definitely heard it, but I also induced quite a lot of pain. I learned two lessons: 1. my threshold of hearing at 20kHz is near or above the threshold of pain, and 2. don't do that ever again.
Yeah I get that I'm getting older. It's just, what's the point of having a stereo that gives perfect playback at 22 kHz if you can't hear it? I'm guessing there must be something since people buy gear like that, or is just a case of deranged audiophiles?
You might not hear a pure 22kHz sine but any sound from, say, a harpsichord will have much off these highs, and some think it is a part off the sound, that one feel without actually hearing it. I'm not endorsing this view, sound islike wine tasting, a lot of hand waving and few solid ground.
Acoustic "beat tones" aren't "real" tones— you hear them because of non-linearies in the ear-brain system, but you have to hear the initial tones first. (Well, unless you're talking >>130dB SPL levels where the air starts becoming non-linear, but then lower frequency recording would capture it fine)
If you could hear subharmonic beats from ultrasonics then it would be _very_ easy to demonstrate, alas.
IIRC, linearity is when you put a sound wave frequency into the medium (air) a some point, you can predict the frequency of the sound wave at some other place using a linear function - meaning that there is no distortion. Non-linear is when the physics of the medium starts screwing with that function.
I agree. It should be about the performance, not the sonics. There are plenty of old Motown and even Beatles recordings with distorted vocals, bad edits, etc. Your brain passes right over them because of the emotional content of the music.
That's because they focussed on the most likely end-user experience:
>While Motown shortened song to fit into radio time, the company also produced records specifically with car radio audio quality in mind. Motown recording engineers set up car speakers in the studio so that they could simulate and perfect how a song would sound emanating from a car radio
- what's the point of engineering things to a set of conditions virtually none of your target audience possesses?
> My advice to people who ask how to make their system sound better? Buy some music you enjoy more…
Good advice, but you do need some baseline quality equipment to start with.
Got my car with one speaker blown out, speakers wired semi randomly (left-right and front-rear faders don't work as they should), also powering line-in source from cigarette lighter results in funky background noise. Sounds great--when a good tune is playing and I'm able to recognize it ;-)
Hum, my experience (I was sound engineer in a previous life) is that the first thing to fix bad sound is to flatten the equalizer and to remove bass enhancer. Then I'd put the speakers on a solid table in a relative symmetry regarding the listener, while checking they have the correct phase. All the rest is rarely necessary.
Sure, but that's a rather contrived example-- most people have a fairly normal room, and the average joe would be best served by getting a good pair of speakers, and a reasonable amplifier and DAC, before worrying seriously about room acoustics.
I'd even skip the dac (I mean I might not.... but a decent amplifier, and I just mean decent, like something that still works from the 70's or 80's) and a decent pair of tower speakers (needn't be expensive), and, well, just don't use the shittiest cables you can find (I mean as long as they are thicker than a few human hairs you're okay)- it'll sound far, far better.
You might be surprised how much of a difference EQ can make. As an experiment I once used 12 bands of parametric EQ to adjust the speakers in a cheap, old LCD monitor. Sure, you're not going to get any better bass response than before, but stereo imaging (nonexistent before EQ, perfect phantom localization after), spectral balance, clarity, distortion (due to not exciting resonances in the monitor), etc. were significantly improved. Most people could have added an EQ'd subwoofer to those tinny LCD speakers and been completely satisfied.
One of the many things EQ can't fix, of course, is room reflections, which can be helped by room treatments and speakers with a directivity better suited to the room.
Of course EQ can help with room reflection! In a square room you'll have a resonance at a given frequency, and you can mitigate a bit this problem with an EQ. But usually EQs are used too add bass and do more harm than good.
By room reflections I mean higher frequency reflections that result in comb filtering and spatial and temporal smearing of the sound, rather than lower frequency resonances that result in the standing waves you mention. EQ can reduce the effect of room resonances, but it still can't fix the extended decay time at those frequencies.
 DRC can improve this, but only within a small sweet spot.
I should point out that after room treatment, my next recommendation tends to be an amplifier with Audyssey MultEQ in-room calibration. I've never heard a listening environment that didn't sound unambiguously better with it enabled.
Yes. Because one of the settings shown after the full room color sweeps is dB, and the sweeps will often set various speakers a bit quieter. (The goal of Audessey calibration is to eliminate frequency resonance hot spots in common listening positions, as well as get flat response from your speaker setup.)
It is said that in most cases 192 kbit .mp3 is indistinguishable from >192, and blind tests support that. Granted, there are instruments like castanets which make it easier to hear the difference. In general though, I can't distinguish 128 from 192 and I listen to music a lot. Also it's unlikely that my hearing is already damaged because I try to keep volume low.
But I've noticed that where I put the speakers makes a huge difference. I can easily tell the difference from speakers on the floor versus speakers on my desk. Where I'm at the moment also matters a lot. If I lie on the floor, floor speakers don't sound as bad anymore.
In the end, I use headphones. Midrange Audio Technica ones, and I'm probably already overpaying a bit. But I bought them for build quality and comfort, and I wasn't disappointed. I can have wear them for hours (Not healthy I guess, but I'm used to wearing them even with no music being played). Headphones have the advantage that it suddenly stops to matter where your speakers are and where are you relative to them.
This effect isn't sooo surprising seeing as it even occurs with dumb mono guitar cab speakers and is very, very, VERY clearly audible there, even just moving your head a few cm in or out of the cones' axis.
> Stop being so paranoid about what you think you can't hear, and enjoy the damn music.
Yes. Do a couple of blind tests with your acoustic system first.
> It's true enough that a properly encoded Ogg file (or MP3, or AAC file) will be indistinguishable from the original at a moderate bitrate.
Disagree. This claim seems to be ungrounded compared with others.
I can believe limitations with bit depth and sampling rate (although I'll take a chance to test myself if I get near good enough acoustic system). However, I definitely could discern in a blind test whether music I listened to was stored using lossy format with reasonable bitrate. It's usually quite audible with rock music that involves cymbals.
There's a specific "bug" in the mp3 encoding scheme which means that you get a pre-echo effect on fast attack waveforms. It's inherent in the encoding, so it can't be eliminated (although the higher the bitrate, the less obvious it is IIRC). If you how know to listen out for it then you'll spot it immediately.
AAC / Ogg don't have that limitation & at high enough bitrate should be indistinguishable from the source in a blind listening test, as demonstrated in a number of Hydrogen Audio listening tests down the years, unless of course you're using crappy encoders at which point all bets are off...
(Really, LAME is very good indeed these days. I eventually decided that I was going to get with the program and just encode all my CDs (backed up to flac files) as mp3 for portable listening. It's good enough, and I've decided not to listen for the pre-echo artifacts so that I won't notice them :) )
Well, actually it was that long cymbal sounds “fade” quicker and just sound different with lossy music.
IIRC I distinguished an mp3 encoded by iTunes with bit rate 192 or 256 kbps from its original in Apple Lossless (both played on same cheap acoustic system). I probably should test with AAC or Ogg, too. Although I have a feeling that it's pretty much impossible to keep intact those rich in high frequencies cymbals while keeping compact file size.
> I've decided not to listen for the pre-echo artifacts so that I won't notice them :)
You're much better at controlling your mind. =) After I once verified that the difference is audible even on cheap speakers, I can't switch back to lossy formats. It means constant wondering if that how it's supposed to sound or not…
That's, by the way, why Apple's idea of having ‘Mastered for iTunes’ label IMO is worthwhile—at least you can be sure that mastering engineer listened to it this way. =)
Might be interesting to try AAC or Ogg Vorbis & see if they're any better. In these days of ever increasing cheap portable storage carrying a bunch of flacs around isn't quite as nuts as it used to be of course.
(Cymbals seem to be a particular bugbear for mp3 encoding; cymbal-heavy tracks tend to suffer the most from obvious encoding artifacts once you know what to listen for.)
Please refer to my comment above. Yes, it was a blind test. The person helping me might've looked up bit rates, so it was not a double-blind experiment, but I could not (nor did I want to) see what's being played, and relied only on hearing.
A thousand times this. I never cease to be amazed at the number of people who will vocally argue the benefits of solid-silver wundercable, but who've never heard of mirror points or bass traps. $20,000 hifi systems in rooms with bare wooden floors and bare concrete walls. Subwoofers in untreated cubic rooms. People praising the transient response of their PMC MB2s in a room with chronic flutter echo. It's utterly dispiriting.
There's a lot of scientific-sounded content in this, but unfortunately most of it couldn't be further from the truth. I'm an ex-audio engineer and studied digital and analog audio engineering; this has been debated to death over the last 15 years.
Digitally recording a triangle is the best example of why 48kHz is very limiting. The distinct sound of the triangle constitutes of a high fundamental frequency, ballpark 5kHz and of many very high-pitch harmonics. Most of these harmonics are above 20kHz. The harmonics are what makes it sound like a triangle, not the frequencies below 20kHz. This is why the triangle is one of the hardest instruments to digitally record. It always sounds like crap.
In theory, it's true that the human hear can't hear above ~18kHz, but it can hear the influence of the very high pitch harmonics on a lower frequency.
I'm an audio engineer, too, and I agree that this has been debated to death. And I agree that frequencies above the threshold of hearing are more important than standard dogma (based on Nyquist theory combined with Pure tone audiometry) allows. It helps explain how audio gear with a 100kHz bandwidth sounds clearer than gear with a 20kHz bandwidth even when they measure the same in the audible band.
Have you read the Audio Technology magazine interview with Rupert Neve?
Greg Simmons: Geoff Emerick, the famous British Producer ?
Rupert Neve: Yes, he started me off on this trail. A 48 input console had been delivered to George Martin's Air Studios, and Geoff Emerick was very unhappy about it. It was a new console, made not long after I had sold the Neve company in 1977. George Martin called me and said, "please come and make Geoff happy, while he's unhappy we can't do any work".
They'd had engineers from the company there, and so on. The danger is that if you are not sensitive to people like Geoff Emerick, and you don't respect them for what they have done, then you are not going to listen to them. Unfortunately, there was a breed of young engineers in the company ( I hasten to say this was after I sold it !) who couldn't understand what he was bitching about. So they went back to the company and just made a report saying the customer was mad and there wasn't really a problem. Leave it alone, forget it, the problem will go away. They were acting like used car salesmen. I was very angry with it. So I went and spent time there, at George Martin's request, and Geoff finally managed to show me what it was that he could hear, and then I began to hear it, too.
Now Geoff was The Golden Ears - and he still is - and he was perceiving something that I wasn't looking for. And it wasn't until I had spent some time with him, as it were, being lead by him through the sounds, that I began to pick up what he was listening to. And once I'd heard it, oh yes, then I knew what he was talking about. We measured it and found that in three out of the full 48 channels, the output transformers had not been correctly terminated and were producing a 3dB rise at 54kHz. And so people said, "oh no, he can't possible hear that". But when we corrected that problem, and it was only one capacitor that had to be added to each of those three channels, I mean, Geoff's face just lit up ! Here you have the happiness/ unhappiness mood thing the Japanese were talking about.
Stripping frequencies above 20kHz negates the effect on the lower frequencies since those lower frequencies are not "modified" by the higher ones. The human hear can actually hear the very high harmonics when they're combined with a lower fundamental frequency.
For example, the human hear will hear a 30kHz frequency if it's fundamental is 10kHz. If it's played at 44.1kHz, the 30kHz frequency is gone and all you'll hear is 10kHz, not a "different sounding" 10kHz.
Basically, if you produce two ultrasonic frequencies, they will create an interference pattern at a much lower frequency than either of the individual frequencies. Modulate a signal on the difference between two signals, and you can create a directional speaker, since ultrasonic sounds tend to be highly directional (so long as the diameter of the transducer is greater than 1/2 wavelength, which is almost guaranteed with ultrasonic signals). This is how the "sound cannons" that are being deployed for crowd control work.
That article describes hetrodyning which happens because ultrasonic frequencies at high amplitudes interacts nonlinearly with air. You are not going to see that effect with sound waves generated near the audible spectrum, and normal loudspeakers are going to generate ultrasonic sound waves.
Yes, but the effects of interference patterns between multiple ultrasonic frequencies is the same, and definitely does affect the audible spectrum. This is why we must filter the square wave that comes out of a DAC. And the limitations of filters (phase shifts and roll-off) are why modern CD players oversample the signal--so that the filtering can be performed well beyond the audible spectrum.
Yes, but the effects of interference patterns between multiple ultrasonic frequencies is the same, and definitely does affect the audible spectrum
has nothing to do with this:
This is why we must filter the square wave that comes out of a DAC
The only reason that square waves "must" be filtered is to reduce the potential of damaging tweeters. If you want to record a square wave with the purpose of later reproducing the square wave, than you don't want to filter it - once you filter it, it's no longer a square wave.
OK, if you say so. I think you're misunderstanding a fundamental concept of digital to analog converters. But if you think it's just to prevent blowing your speakers, that's OK.
The reason that square wave sucks is because it introduces tons of high frequency content (your amp probably won't reproduce the high frequency content anyway, so I don't think most Japanese consumer amps will damage your speakers--that is, the amp will act like a filter anyway). That high frequency content then creates alias effects (think of moire patterns when looking at super high-res photos that are scaled down without anti-aliasing). Those alias effects sound like shit to the human ear.
The point of filtering is to anti-alias the resulting analog signal after conversion from digital to analog. The point of upsampling is to move that filter well beyond the audible range, so you can use a 1st-order filter (gentle slope, but it introduces no phase effects). The fact that a square wave hurts your speakers is inconsequential--the amp will effectively filter the signal anyway. Unfortunately, it will filter the signal without anti-aliasing, which introduces those nasty interference patterns within the audible spectrum (that is, if you feed a straight 44.1KHz sampled square wave to your speakers without upsampling/filtering).
Recording music is supposed to be a snapshot (with room for interpretation) of the composition at play.
Trying to record an edge case like this is the same as recording in a room with bad acoustics. So you end up with some weird (but not faithful) representation of the sound which is a snapshot of the microphone's characteristics and directionality of the ultrasonic tones. It's not reasonable to assume any microphone will behave exactly like a human ear. Even if you could, you're going to have to mimic the tiny random movements a normal person would make listening to a sound, movements which would definitely impact the perception of the sound, because microphones are much more stationary than any human would be.
The "different sounding" argument two posts above is silly, because sound is almost never that monochromatic, and if it is, it's usually boring. Also I don't understand how missing out on an odd order harmonic would be a bad thing :) The reality is none of these arguments are based in a reality of what people would hear, and because of that, the arguments aren't practical.
In reality, 20 bits at 48kHz (or 64kHz) would be more than acceptable for even the most discerning of ears and probably the most practical in terms of space and fidelity, but it'd be a weird format to distribute in.
> Basically, if you produce two ultrasonic frequencies, they will create an interference pattern at a much lower frequency than either of the individual frequencies.
So the interference pattern will be made up of one low frequency sound and higher frequency harmonics. Once again the higher frequency harmonics are redundant, because you only need to record the lower frequency sound.
The only possible way ultrasound can be picked up by the ear is if the ear has a non-linear response to the input sound. Going by the information in the article linked, it is highly unlikely that any significant non-linearity exists in the ear.
It's definitely possible for two sounds to be indistinguishable when played separately, but when played together it is revealed that they are in fact different (see link below). Whether this applies for sounds with frequencies above 20kHz I don't know.
I'd like to see a citation as well. Doesn't seem like it would be the hardest experiment to set up either.
Me and my brother would sing at each other in certain tones such that we created harmonics in both our ears. It wasn't pleasant, but it was interesting. Regardless, I'd smash my equipment if it made harmonics like that.
If that is true, surely in your up thread example of recording a triangle, the "impact on lower then 20kHz frequencies" would already have happened during the recording process in between the triangle and the microphone, and would have been captured perfectly on recording equipment that's proven capable of capturing everything below 20kHz? So we'd "hear" the effect as part of the recording instead of requiring it to happen in our listening room…
Yes, but if you sample the frequency to create a step wave, then neglect to filter the results, you will end up reproducing tons of high frequencies. That is why we need to filter the output for signals >20KHz...to remove these harmonics that result from reproducing the square wave.
Of course, filters aren't perfect, and result in phase shift and roll-off. So we over-sample the signal to create a signal with a much higher frequency than 20KHz, so that the filtering occurs well outside the audible band, allowing us to filter out all of these harmonics without affecting the desired signal.
Basically, the end result is that by sampling the signal, you are introducing high frequency content that must be removed prior to playback. This high frequency content is one of the reasons old CD players from the 80s and 90s cause "listener fatigue", although I have no sources to back up that last statement.
Yup... people need to get very clear in their heads the difference between the recording/sampling/mixing/mastering stages, where high bitrate/width/gear/knowledge is helpful, and playback, which is a completely different thing.
(not for eatmyshorts -you get this I gather) - everyone gets that "upsampling" can't add detail to a recording right? You can't get more than you've got.... no matter what you do. There is no magic. You upsample so you drive harmonics generated in the digital-to-analog process during playback further up in the spectrum so when you get to the analog stage you can use a nice gentle analog filter to filter them out. Without the upsampling, you need a nasty steep analog filter to filter them out, and that can have audible side-effects (or at least measurable) in the audible spectrum.
eatmyshorts - correct me if I mis-stated any of that please....
You got it 100% correct. You upsample simply to move the frequency of the analog filter higher, with a gentle rolloff (and ideally a 1st order filter, so you introduce no phase effects) to get your final signal.
Well, the superposition principle only holds in linear media. Sound waves can propagate in linear media, but they can also propagate in nonlinear media, and any medium that can carry sound will go nonlinear at sufficiently high amplitudes.
Yes, overtones exist, and yes, overtones affect the sound, and yes, if you filtered the sound to remove overtones in the audible range then it would sound different. However, if you remove overtones outside the audible range then it will not make an audible difference (this is what xiphmont was saying in TFA).
So no, your wikipedia link is not a citation for the claim that cmer made.
No, Holosonics is not creating sound from beating, they are using heterodyning, which takes advantage of how high-amplitude ultrasonic sound waves interact with the atmosphere, that's different from beating.
I know where you're going with this I think, and I'm not disagreeing outright, but wouldn't this be captured during the high-bitrate (or good analog?) recording and mixing phase if the recording/mixing/mastering engineer were doing things right? At least, as well as possible?
> The article's about distribution, not recording. I don't think anybody disputes the usefulness of higher sampling rates when recording.
Didn't read the article, so commenting out of context, however it needs to be said that in sample-based music genres the distributed music gets used as if it were a recording. Maybe then it could be argued that higher sampling/bit rates should be available, if only for those who are sampling.
> In theory, it's true that the human hear can't hear above ~18kHz, but it can hear the influence of the very high pitch harmonics on a lower frequency.
That may well be true. But those mixed-down harmonics that are heard "live" would then be captured by the 16/44 (or whatever) sampling. IOW, the recording captures what you heard. Those upper harmonics have no emergent properties. Their effect is captured.
I'm no sound engineer, but as far as I can tell, the main point of that paper is that some instruments produce harmonics at frequencies greater than 20kHz, not that these frequencies matter to humans. However, section X references other papers that apparently make this claim.
Just because it is difficult to record a triangle does not necessarily mean it is impossible to accurately recreate the sound (to human ears) using 48kHz.
> I'm no sound engineer, but as far as I can tell, the main point of that paper is that some instruments produce harmonics at frequencies greater than 20kHz, not that these frequencies matter to humans. However, section X references other papers that apparently make this claim.
Yes, you're right.
In fact, some of the section X references don't even mention hearing, they talk about "alpha-EEG rhytms" (in this case "listeners explicitly denied that the reproduced sound was affected by the ultra-tweeter") and "bone-conducted ultrasonic hearing" trough the "saccule" ("organ that responds to acceleration and gravity and may be responsible for transduction of sound after destruction of the cochlea").
In fact, most of the claims of the article are around the fact that there is energy over 20khz and how it can affect recording process.
This is a well known fact, and this is exactly why engineers filter out sub-sonic and super-sonic frequencies, especially today: stuff that you can't hear (or feel) will just suck your headroom and make you lose the loudness war.
The only "good sounding" triangles you'll hear are those buried in a mix. Alone, it always sounds weird and "muted".
EDIT: Listen to the triangle at the beginning of Rush's YYZ. It's an old recording, but it sounds significantly worse than the analog version. It's been digitally mastered some time ago so if it was mastered today, it would probably sound better, but still not great. I heard a rumor that Rush is remastering all their albums "for iTunes" at the moment, so hopefully we'll be able to compare soon!
Yes, but our ears only hear 20Hz-20KHz. So, according to Nyquist theory, you can recreate the entire signal that the human ear hears by recording those sonic artifacts that result from interference between supersonic harmonics.
So while it's true that the human ear can't hear well above ~18KHz, and the interference between high order harmonics are audible, it's also true that a properly recorded signal, sampled at 44.1KHz, oversampled, and filtered, can reproduce the exact signal the human ear is capable of hearing. At least according to theory.
The human ear is capable of detecting sound pressure as well as sound intensity, and while playback of the interference between harmonics can be reproduced faithfully in the sound intensity realm, the sound pressure levels will differ, and it is theorized that people may be able to tell the difference between the two. However, as far as I am aware, nobody has been able to demonstrate this reliably in practice.
What about sound outside 20-20k that affects us via mechanisms other than being directly sensed in the air by our ears? For instance, consider frequencies below 20 Hz that we can feel with our feet as vibrations in the floor, instead of hear with our ears? Or what about the possibility of sound above 20k causing a vibration in something other than our ears, which could have a subharmonic in 20-20k that gets conducted to our ears via bone?
I'd prefer recording technology to err on the side of capturing what we need to reproduce all of that, even if we aren't sure that we need it.
The article doesn't suggest only using 48kHz for recording and mixing. I don't think the author would disagree that recording triangles is difficult. He would argue that once you've decided what final audible frequencies you want to present to the listener, the best way to distribute them is at 16-bit 44.1/48kHz. It's a compelling case.
That's one thing I find concerning with the move to digital. With analog media, you can go back, re-record and get an improved result (provided the source is good) but District 9 (which was shot on Red One) will never have improved quality other than resampling because the source is set to a particular digital format with associated data quality.
There seems to be some strange idea that analogue means 'infinite detail'. In this particular case, there's no significant difference between being limited by the original digital recording resolution and the grain size of a film recording.
"[...] provided the source is good" is begging the question; it's no different from saying "District 9 could be better if they hadn't recorded in 4k (or whatever the Red One was using) and downsampled it for my DVD" The nature of the source is irrelevant, barring the fact that film might provide a higher resolution, if film scanning technology increases, and you can afford to both capture on film, process and store your film properly (archiving film is rather difficult, I believe), and get the best quality digitisation possible.
Obviously, I am not claiming infinite detail. There is going to be a limit based on the grain and the size of the film (35mm, Super, IMAX). 65mm film shot is going to be of higher quality than what digital is capable of today.
While I have no doubt that digital will eventually catch up and surpass film, there inevitably is going to be a transition period where quality films were recorded (let's just say at 2k) where the input is constrained and extrapolation be the only available option.
4k is the current state of the art. It will not be so forever and because it's recorded at 4k, we can't go back and extract more dynamic range due to the limitation of the sensor. Whereas you can go back, redigitize an IMAX film (say Chronos shot in 1985) that is in good condition and get way more info than something shot on 4k yesterday.
TL;DR IMO input still absolutely matters. 35mm is not the upper limit. We went through this with photography and am now doing the same with video/film.
EDIT: After thinking more about it, here's a more extreme example. I purchased a Kodak DC20 back in the 90s (early adopter yay!), even if the camera had decent glass, there's no way I can go back to an image captured by that camera and magically get the equivalent of 22mp 5D camera by resampling. If I had used a film camera, I can get a much improved scan.
EDIT2: Here's a good example. Slumdog Millionaire was mostly shot on a SI-2K which recorded at 2k. You can't go back and get 4k output on the digital portions. So generations later, we will be stuck enjoying an Academy Award winning film at that level of quality.
And we'll never be able to go back and "re-film" "The Texas Chainsaw Massacre" on 32mm, it'll forever be marred by grain and poor low-light performance of 16mm. I guess I'm not sure what your point is. The best digital can present is currently worse than the best film can present, yes. That doesn't mean we shouldn't use it.
My original response was to the effect that the output should be high quality so that data is preserved if sampled.
Digital is the future. Hence it behooves us to have the maximal input & output possible at this time. Unfortunately, this is not common now and the price paid is that content created during this period will be stuck at the same quality level.
I'm entirely in favour of increasing the resolution/bit-depth for video, but I think the general problem is more complicated by external factors.
The cost of renting a red one and recording straight digital vs hiring a film camera, process lab, and all the other parts needed quite possibly means that some films might never have been produced due to filming costs.
What measure of quality can compare X against X, if it was never made?
I imagine (I have very little actual experience here, so it's perfectly possible I'm wrong) that digital recording might make it easier/cheaper to retake shots/scenes repeatedly to get them right as well, offering another 2nd order quality effect.
I completely disagree with the article having heard the difference many times myself. You can't record at 192kHz and hope to keep the same quality by distributing the final mix in 44.1kHz. It just doesn't work that way.
I don't think I understand quite what you're saying and wondered if you could explain more. You and the article both say that humans can't hear above about 20kHz. If there are higher frequencies that create a harmonic at a lower frequency (e.g. a 33kHz harmonic that produces a sound at 16.5kHz) then surely that lower harmonic (16.5kHz in this case) will be recorded by the original recording equipment assuming it is recording at a frequency at least twice that of the highest audible frequency (let's say that this would be 48kHz, although there might be other DAC-related reasons to go higher).
Let's make things super simple. Let's say you record 4 sine waves at a 192kHz sampling rate: 15kHz, 30kHz, 45kHz and 60kHz. All 4 frequencies will be captured and the 15kHz frequency will sound different to your hear because its harmonics.
If you take this recording and master it for a CD (44.1kHz), you'll effectively get up to ~20kHz (since they're a low pass filter starting at around 16-18kHz). This means that only our first frequency will be captured: 15kHz. It will be exactly the same as if you only recorded 15kHz alone. The harmonics don't modify the fundamental frequency, they just trick the human hear. But when they're gone, they have no effect whatsoever.
Hope this helps!
EDIT: the frequency numbers I used are actually somewhat of a bad example. Harmonics are never exactly double, triple the fundamental. Those would be mostly inaudible. But you get the idea.
I don't think I understand how it could sound different to my ear. My understanding is that my ear doesn't have the sensory equipment to detect signals above ~20kHz - this is what I was told at university, and a decent trawl of the web suggests this is still true. If there is any sound that is in the range 20Hz-20kHz then why doesn't the microphone pick it up?
Or am I wrong, and the ear is able to detect frequencies above 20kHz?
The second half of the statement is wrong, but the first half is right. Harmonics in real-world instruments are not usually exact multiples of the fundamental. A simple diffeq model of a rigid oscillator will show you this mathematically.
An extreme example is present on modern pianos, where the high rigidity of the loud, heavy piano strings can cause tuners to stretch the lowest and highest notes as much as a half-semitone so that their harmonics are in tune with the note the next octave down or up. In other words, the first harmonic on the lowest note of a piano can be as much as 1/2 of a note sharp.
And when your oscillator is no longer one-dimensional, most harmonics aren't even close to integer multiples. The harmonics of bells, cymbals and drums are all over the place. That's what gives them their percussive sound. (Edit: some of these modes of vibration aren't harmonics in the linear sense.)
Then one would be forced to conclude that many instruments have no harmonics at all, which is obviously not what 'harmonic' is referring to in this thread of discussion. Why be pedantic when it's obvious what everyone is talking about?
Anyway, it's not as though mathematical literature requires you to use a term exactly one way. I had a diff eqs textbook that used the word 'harmonic' in exactly the way I used it above when I made reference to diff eqs...
> And when your oscillator is no longer one-dimensional, most harmonics aren't even close to integer multiples. The harmonics of bells, cymbals and drums are all over the place. That's what gives them their percussive sound.
But those aren't harmonics, they're inharmonic partials.
> The harmonics don't modify the fundamental frequency, they just trick the human hear. But when they're gone, they have no effect whatsoever.
This is the part I really do not understand... either my ear CAN pick up those frequencies, maybe the harmonics are "tickling" the little hairs inside my cochlea and ultimately the frequencies I can actually hear were altered in my perception that way - or I can not hear or sense the harmonics and they physically alter the "original" wave that I end up actually hearing.
Either way, pretty much the exact same thing should happen in a studio microphone. Those all do have frequency limitations and AKG, Royer, Rode, Shure, Sennheiser, Audio Tech, what-have-you pretty much all go up to 15kHz or 20kHz according to specs, if I understand them correctly, but not further than that. If it isn't even recorded, those frequencies I also cannot hear can NOT alter my perception so they HAVE to somehow change the frequencies I can hear and are being recorded... on top of that you are making "room" for frequencies up to, say, 60kHz but I very strongly doubt your mics can go even remotely that high.
You clearly have little to no musical background, and think that your basic math skills are a substitute. The overtones present in a cymbal or triangle are not straight multiples of the fundamental, they are chaotic, and are very important in determining the timbre. Anyone (and I mean that) can easily tell the difference between a cymbal with and without a low-pass filter with the threshold around 22kHz, because these "inaudible" frequencies are lost.
The statement that frequencies above 20kHz don't matter rests upon the assumption that the ear is linear. If the ear is not linear (I don't know whether it is not not) then frequencies above 20kHz will matter, as the ear will be able to mix higher frequencies down to less than 20kHz. For example, if we have frequencies of 56kHz and 59kHz, the ear MIGHT be able to discern a difference frequency of 3kHz. No doubt this effect could be reproduced by signal with a sampling rate of 44.1KHz, but only if the analogue systems, before the sampling stage, reproduce any non-linearity in the human ear.
Incidentally, you can get speakers that create a localised beam of sound, that the person sitting next to you cannot hear. They work by transmitting frequencies above the audible range. These high frequencies can be beamformed by a relaitively small speaker array, so the sound is localised. They then rely on the non-linearity of the ear (or maybe the air around the ear?) to mix the ultrasonic frequencies down to audible frequencies. I guess there must be non-linearity in the human auditory system!
On the subject of 24-bits my understanding is that 16-bits is adequate, provided the levels (scaling) are set correctly in the recording. What 24-bits delivers is the ability to do a crappy job of the mixing, and still end up with the full dynamic range of the human ear. 24-bits is probably a temporary solution though, as manufacturers will engage in the usual Loudness War , and push the signal to the top of the dynamic range. Before long 24-bit audio will be equivalent to 16-bits (since the 8 least significant bits will be unused) and the next big thing will be 32-bit audio.
Having said all that, I'd guess that the speakers will be the limiting factor in most sound systems, not the recording format.
In 1975, the Canadian Broadcasting Corporation was using a head shaped microphone, which was presumably an attempt to reproduce the non-linearity of the ear. It would be interesting to do such experiments with digital sampling.
Thinking about it, if every person has a different non-linear response, in theory the only way to reproduce sound beyond a certain threshold of fidelity would be to reproduce the ultrasonic components, so each person would hear their own non-linearity. (That would be beyond what I can hear or care about, but it would be fun to play with. Beyond a certain level we also get to the point where we need to ask what it means to hear a sound.)
> the speakers will be the limiting factor
> in most sound systems
I disagree -- in most sound systems, the room is generally the most limiting factor.
Pardon the reductio ad absurdum, but would you prefer to listen to $1,000 speakers in a dry, padded listening room, or to $100,000 speakers in a tile bathroom? Obviously the room matters; I think most people underestimate by how much.
Another "ex audio engineer" here, you can believe or not at your leisure. Many hours spent in high-end recording and mastering environments.
I'm not sure what your background in audio is, but everything he says is correct. High end frequencies well past 15k and up (22.1k actually) are widely acknowledged to influence the lower frequencies and play a huge role in the perception of the quality of a recording. This is an old debate with pros and cons on both sides, but in general you'll find the "Golden Ears" mastering engineers (Stephen Marcussen, Bob Ludwig, etc.) come down on the side of higher sampling rates.
Now, if your original recording was mastered to 16/44.1, then a transfer by way of 24/192 will probably actually hurt the recording. But if you're mastering from an original analog or high-quality digital, in my experience there's no question, higher sampling rates deliver better experiences.
I have also spent many hours spent in high-end recording and mastering environments, and it's my observation that most engineers suffer from confirmation bias just like everyone else on the planet.
I've caught engineers using L1-Ultramaximizer (or similar) to bounce a recording down to 16-bit/44.1khz as part of the mastering process, and they're always surprised when they're completely unable to hear the difference even in the most simple cover-the-screen-and-toggle-bypass test.
Audio, perhaps like the wine industry, is a vast bastion of confirmation bias and subjectivity, no argument there.
But I know what my ears hear, and IMO there is absolutely a vast different between 44.1 and 192. I'm not sure how you can even question it. Someone else on the thread was saying it's impossible to hear the difference between 16bit and 24bit. I don't even know what to say to that. It's like telling me the glass of Gallo "Table Red" you're drinking is as good as my '75 Lafite. All I can say is "cheers" and just enjoy.
If I gave you a bottle of "Table Red" with a '75 Lafite label, I'm sure you'd tell me how rich and wonderful it was. The problem here is that, as you said, "I know what my ears hear". You know you're listening to 192, wow it sure sounds great!
You just need to consider more plausible explanations for the difference you are hearing, such as low-quality sample rate conversion on the playback devices you are using, clock jitter that is less audible at 192k than 44.1k, no dithering on the 16-bit output resulting in quantization noise on quiet sounds, etc.
If you're listening to quiet music in a quiet room at high volumes on very low noise equipment, you can hear a difference in the noise floor level between dithered 16-bit and 24-bit, but at that volume level if that music (or movie) also has full-amplitude signals you'll be reaching peaks over 110dB SPL.
As much respect as I have for Bob Ludwig's hard won mastering skills, he also strongly believes in $n,000/foot speaker cable, which is what he has installed at Gateway. So by all means give him well deserved props, but don't assume he's an expert on all aspects of audio theory or practice.
192 kHz is clearly overkill for listening. Not so for further editing of the data.
Same goes for 16/24 bit, however, the difference between 16 and 24 bit is actually audible.
44100 is not a bad sampling rate, but it necessitates very sharp aliasing filters, which are audibly bad. A bit more headroom is well needed there.
That bit about intermodulation distortion is complete bogus. He talks about problems when resampling high-fs audio data. However, you would never do that. You would digitally process 192kHz all the way. Only your loudspeakers or ears would introduce a high-pass filter, and a rather bening (flat) one at that. There is certainly no aliasing going on there unless you resample (wrongly). Intermodulation distortion is not the fault of the sample rate.
I mayored in hearing technology. Calling 192/24 worse than 44.1/16 is total BS. How useful it is is a different debate.
> Modern (i.e. anything from 1995 onwards) DACs do not suffer from aliasing problems.
True, but they do so using (long, high-quality) high-cut filters. And these filters are pretty sharp, as they have to close within, say, 18-22.1 kHz. You can design them as linear-phase FIR filters with oversampling and all the good stuff, but physics dictates that sharp filters introduce distortion. A sharp filter like that is audible.
I'm not aware of any (blind) listening tests actually showing that a modern, high-quality DAC for 44 kHz audio introduces audible distortion compared to a similarly high-quality DAC for, say, 96 kHz audio, though. It's not theoretically impossible that the lowpass would introduce some sort of noticeable distortion, but I haven't run into substantiated evidence that it actually does.
You will still need a aliasing filter that cuts off between, say, 18 and 22.5 kHz to avoid aliasing noise. That is one sharp filter no matter how you look at it. You can use a high quality, long, linear-phase FIR filter, but you can't cheat physics: sharp filters necesserily introduce distortion, and such a sharp filter so close to the hearing threshold does not go unnoticed.
Same goes for 16/24 bit, however, the difference between 16 and 24 bit is actually audible
No, the difference is not audible at all. At 16 bits of depth on a normal low-level audio signal (~0.3 volts), we're talking about less than 0.000005 volts per amplitude step. This difference gets lost in the THD already at the DAC in your audio output stage. Then it gets lost again in the amplifier. And again in the cable to your speakers or headphones. And then it gets lost again in the speaker elements. What survives in a normal low-level audio signal is about 14 bits of resolution.
44100 is not a bad sampling rate, but it necessitates very sharp aliasing filters, which are audibly bad. A bit more headroom is well needed there.
44.1khz IS a bad sampling rate for accurately reproducing anything except a triangle wave or square wave above 5khz.
why do you think "This difference gets lost in the THD already at the DAC "? Do you have numbers to back it up? What's the noise floor of DAC? What's the noise floor of an output stage? Do you have the number?
high dynamic range is not about the lowest volume you can hear, it's about the voltage resolution between this sampling point and the next. Base on your assumption, we can all see black whether we use 16bit RGB color or 24bit RGB color, what's the point of using 24bit RGB?
Many years of building audio equipment (in particular analog synthesizers), and equally many years of being meticulously anal with getting the best components for my circuits, reading specifications of down to every single op-amp I've ever employed, is why I think so.
I am not saying that there aren't any DACs on the planet that can't handle five millionths of a volt, but I am saying that five millionths of a volt isn't surviving through the particular DACs and the rest of the electronics used in your PC/living room hi-fi audio equipment.
Heh, it's funny to see this late-nineties debate get re-hashed here. Also kind of fun.
If it were true that there's no audible difference between 16 and 24 bit, companies like Alesis, Otari, ProTools, etc. wouldn't have spent the last 15 years ditching 16 bit like an old pair of smelly sneakers. (better metaphors welcome).
Seriously, anyone who has sat down in a real listening environment for 5 minutes A/Bing 16 vs 20 bit, 16 vs 24, etc. hears the difference immediately. There's no question. This is why you can buy ADAT 16 bit 'blackfaces' for $100, down from their original $4,000.
Sure, moving up from 16bit recording was an improvement, but having done engineering for a company listed above for over a decade, I can tell you that we went 24bit/192kHz because of market demand, not for any real technical reasons. We thought it was fairly unnecessary ourselves. It was also kind of an arms race with other companies, much like the megapixel arms race for digital cameras.
Yes, and anyone who has ever sat down infront of an LCD flatscreen watching their favorite movie on DVD/BD using gold-plated $200 HDMI cable instead of $4.99 Walmart HDMI cable see the extra sharpness immediately. This is why non-gold plated non-OFC HDMI cables are down to $4.99 a piece from their original $49.99 during introduction.
The difficulty I had is that the same person claimed they could hear the difference between 44 kHz and 96 kHz, when the article (and all other comments which cited outside sources) claims that is well outside of human capability.
That's cute. Obviously you've never recorded a rock band while riding the pre to compensate for 16bit's terrible noise floor and horribly limited headroom. You've never had the joy of ruining a perfectly good take because of that wonderful sound it makes when the volume spikes into digital distortion despite compressing the wazoo out of the input source. Glorious sound, digital distortion. Run a dentist drill through an old Speak & Spell and you'd just about have it.
You've never rented an expensive tube EQ during a mix to cover up 16bit's grating harshness from 10k to 15k. Or tried like mad to make the bass drum sound like a freaking bass drum and not a pie pan slamming against the back of a plastic trash can. And yes, we had good mics and pres, all standard studio stuff. Decent, not brilliant, converters, but it was the 16bit that was the problem. Getting those 20bit XTs for the first time was like walking into the Promised Land.
Sure, there's lots of marketing ploys out there, lots of snake oil. Moving up from 16 bit was not one of them.
> Professionals use 24 bit samples in recording and production  for headroom, noise floor, and convenience reasons.
> Modern work flows may involve literally thousands of effects and operations. The quantization noise and noise floor of a 16 bit sample may be undetectable during playback, but multiplying that noise by a few thousand times eventually becomes noticeable. 24 bits keeps the accumulated noise at a very low level. Once the music is ready to distribute, there's no reason to keep more than 16 bits.
The original article does say that yes, during recording and production, 24 bit audio gives you a lot more room to play with. That doesn't mean that you can hear the difference between 16 and 24 bits for the final recording; just that 24 bits give you more room to keep out of trouble during production.
And you cannot tell the difference. The reason to record using 24 bits is so you don't have to be as precise centering the recording level. If that level is centered then you can capture fine with 16 bits (by the way that is also explained in the article).
For those of you who are interested in just how much of a golden ear you truly are: download Harmon's "How to Listen" software for Windows or Mac OS X http://harmanhowtolisten.blogspot.com/ (scroll down).
Harmon requires its trained listeners to pass tests based on this software before participating in juries to evaluate Harmon products. It doesn't directly address the sample rate/bit depth issues discussed in the linked article, but it does address a lot of the issues brought up in the HN discussion, so you can have a chance to see how much those characteristics really matter.
Even without debating the science and signal processing arguments raised...
In any test where a listener can tell two choices apart via any means apart from listening, the results will usually be what the listener expected in advance; this is called confirmation bias and it's similar to the placebo effect. It means people 'hear' differences because of subconscious cues and preferences that have nothing to do with the audio, like preferring a more expensive (or more attractive) amplifier over a cheaper option.
The human brain is designed to notice patterns and differences, even where none exist. This tendency can't just be turned off when a person is asked to make objective decisions; it's completely subconscious. Nor can a bias be defeated by mere skepticism. Controlled experimentation shows that awareness of confirmation bias actually increases rather than decreases the effect!
Doesn't that completely negate his conclusion, that there is no point to distributing 24/192 music? If people want to pay for 24/192, and even he just admitted that they will legitimately enjoy it more, how can you conclude there is no point?
Life is short. I want to enjoy things. Whether or not my enjoyment can be quantified or scientifically defended, I really don't give a shit. But that's okay, if you don't want to sell me 24/192 music, Amazon will. Between this and DRM-free content, it's no wonder I buy all my music from Amazon these days.
There is a perversion going on both ends here. And by perversion I mean a distortion of truth in a bid to make a profit. This is not the worst that can happen, but is just worth mentioning. You probably put more mildly, but I am bit more harsh. Some people are irrational and spend money of stuff that they don't need and another group of people are perpetuating the lies and the marketing in an effort to extract the maximum amount of money from the other group (In other words your basic market setup).
Audiophiles are quite a fascinating group. These are people that can be rather rational in some respects (they could be doing research in some lab somewhere) but when it comes to audio equipment they will shell $2000 for HDMI cables. The salesmen and manufacturers that make these things ("high end" HDMI cables, 192kHz recordings) know this very well and they aggregate around this target set of clients.
I think that is exactly what is happening here. At some point storage capacity is just good enough and one can distribute 48kHz, 16bit audio to everyone. But what do you do next? Everyone is getting that and it is not new and cool anymore. What to do? Well increase the frequency and sell everyone a newer, better, higher fidelity thing, even though objectively human years cannot really hear the difference. Subjectively though, there is a huge difference. If you ask someone who just spent $50 for a 192kHz record if they like it better than say a $20 48kHz one, I bet you 100% of people will confirm that 192kHz sounds better and will be ready to go and buy more.
> Doesn't that completely negate his conclusion, that there is no point to distributing 24/192 music? If people want to pay for 24/192, and even he just admitted that they will legitimately enjoy it more, how can you conclude there is no point?
Ultimately, sure. The world is full of products and services which only add value in this weak sense.
If the same wine tastes better if it's priced higher, then it still tastes better. But I think it's only honest that the consumer be aware that the increased utility from being priced higher is due solely to the fact of it being priced higher. Beyond that, I don't care.
One thing we can all agree on is that music is much more enjoyable if you think you're listening to it through good equipment or from a good source. Ultimately it's only the `thinking' part that matters. So I would make two points:
1. One point he's making is that playing audio sampled at 192khz through regular equipement actively distorts the music in negative ways. So now if you know this now you should enjoy that music _less_.
2. If you're adept metacognition (maybe that's not the right word), you'll realize a) you can get most of the enjoyment by buying equipment that's `pretty decent', and then not worry about it too much. b) you're probably fooling yourself by spending so much time/money worrying about having the best equipment, so you're probably not getting the maximum utility from the experience anyway. Or maybe it's the experience of trying to get the best equipment it self that's enjoyable, not necessarily the increased audio fidelity.
That's true. It's kind of the Monster Cable model. BTW, I'm not saying that marketing and whatnot should deceive less technical consumers and trick them into spending more money than they should (which is basically what Moster Cable does). But when you explain to technical people why something like 24/192 isn't better (other people in this thread have pointed out, this isn't totally accurate in the first place), and they understand what you're saying but still prefer it, by all means, let them buy it.
This is the same reasoning that somebody used when I was debating with her if insurances should reimburse homeopathic and other alternative treatments. Her reasoning was 'well if it works, it should be reimbursed, doesn't matter if it's from a placebo effect or not'; my position is that they shouldn't be reimbursed, but quite honestly, I don't really have a rational reason for it (at first I thought I had but it turned out I couldn't formulate it, which is the same as not having it).
So, while I have no option (for now) but to acknowledge your position, I still feel dirty for doing so.
This article is one of the most lucid and accurate that I have read on this topic.
However, one thing that's missing here (and in nearly all other similar pieces) is a full discussion of the prerequisites of the sampling theorem. For example, the signal must be bandwidth-limited (and no finite-time signal can be).
But this is a minor concern, as there are many elements in the analog domain of the recording and playback chains that serve as low-pass filters - starting with the mics. So bandwidth-limiting is effectively achieved.
For a similar reason, the discussion of the "harmful" effect of high-frequencies to playback electronics and loudspeakers to be a bit overdone IMO. Peruse the excellent lab results of modern audio gear on Stereophile's web site. You'll find that bandwidths exceeding 30kHz are rare.
One last thing. When doing subjective "testing," keep in mind that what some folks are hearing may be limitations of their gear. For example, most DACs derive their clocks for higher sampling rates (88/96/176/192) by clock-multiplier circuits. IOW, 44kHz and 48kHz are the only ones clocked directly by a crystal. These multiplier circuits are often noisy, contributing to jitter. The audible effect of this jitter is hard to predict.
PS As an avid audiophile, I find the clash of subjectivists and objectivists on this normally-buttoned-down forum to be a bit of a trip.
You always record stuff at 24-bit/192 kHz for many reasons usually involving minimizing analog artifacts and to give you a lot of information to work with. You use 32-bit float wavs to transport stuff around so you don't have to worry about normalizing levels and clipping. Lossless formats drastically improve the quality of transients by an enormous degree. But every single objection to this is either ignoring the points of the article, or talking about the benefits of recording at high fidelity, when this entire article is pointing out that once you have _finished a mix_, there is no reason to distribute things in 24-bit/192kHz. Most speakers can't even play about 20kHz anyway, which makes the entire point moot. I don't care if you have a bajillion kHz, the speakers can't play about 20 kHz, so your screwed.
You're getting two entirely different things mixed up.
192 kHz is the sample rate. 192,000 slices per second. It does not refer to the audible sound spectrum.
20 kHz in speakers refers to the cycles per second of the audible waveform. Normal human hearing rage is 20 hz - 20 kHz. For most people, it's less than that.
A speaker can certainly play back music sampled 192,000 times per second. Most of them can't play tones that are higher pitched than 20 kHz, which is fine because mostly only dogs can hear up there anyway.
I am not getting these things mixed up, because the sample rate is related to the maximum frequency that can be stored, and lo and behold, look at all these people claiming that those higher frequencies matter. 44.1 kHz sample rate can only encode tones up to about 22 kHz, whereas 192 can encode frequencies of up to 81 kHz, and those people up there are arguing that these higher frequencies are exactly why 192 kHz is superior. Now, if you want to say that sampling a tone at 44100 times per second somehow won't sound as good than 192000 times per second, I'm not saying that isn't possible, but I don't really take that claim seriously at all.
The fact is, simply distributing music in lossless format carries the vast majority of audible improvements. Arguing over whether or not its 24-bit or 16-bit or making a chunk of sound last 5.2 microseconds instead of 22.67 seems incredibly stupid to me, because you're better off simply improving the mix itself then fiddling over such microscopic differences. These things only become relevant if your mix and performance and recording equipment (or synths) are absurdly close to perfection. This becomes even LESS relevant in an age of indie-musicians.
The sampling theorem is for static signals and perfect filters. Turns out, music isn't static. Once you have transients in the program, you need higher bandwidth or you will end up with phasing effects (time domain aliasing.) This is plain from the math!
Filters are also not perfect (but good oversampling filters are not the weakest link)
Further, even perfectly dithered 16 bit data can't go 20 dB below the quantization floor, unless you give up on frequency response on the high end. Again, this is plain math.
With a calibrated 105 dB low-distortion sound system, in a quiet room, I can hear imperfections from 16 bit, 44 kHz material, especially in soft flutes and triangle type percussion. Of course, D class amplifiers, and MP3 encoding, do worse things to the signal, so let's start there. But 20 bit, 96 kHz (or at least 64 kHz) are scientifically defensible, when analyzing the math and the physics involved. No snake oil needed!
For an article containing a lot of "well, if you knew signal processing..." there are two fairly major oversights:
1) Any well-designed system is going to have headroom. Period. Just because 48kHz can capture the frequencies the human hear theoretically, it's always good to have a little wiggle room. This comes into play even more with interactive situations: humans are particularly sensitive to jitter. Having an "overkill" sample rate lets you seamlessly sync things easier without anyone noticing.
2) 192kHz comes with an additional benefit besides higher frequencies: it also means more granular timing for the start and stop of transients. More accurate reverb would be the obvious example. I don't know if the human ear can discern the difference between 0.03ms and 0.005ms but it's something I don't see mentioned often.
2) increased sampling rate does not improve timing. This also has been researched in detail (because it sounds like it could possibly be true given that the ears can phase match to much greater granularity than the sample clock). It was found false in practice, and in retrospect, the sampling theorem explains why. The Griesinger link discusses this with illustrations, and provides a bibliography.
48kHz already has enough 'wiggle room'. How many people do you personally know that can hear a 24kHz sine tone?
> more granular timing for the start and stop of transients.
... it's something I don't see mentioned often.
Probably because it doesn't make sense. Human ears cannot hear frequencies about 24kHz and Nyquist tells us that 48kHz is enough to completely capture all the detail of a signal at that frequency and below.
I'm pretty sure that #2 isn't true; signal processing folks will be able to phrase this better than I can, but I think that if you have enough information to capture the waveform at a given frequency, you also have enough information to precisely place it in time - phasing errors are more likely due to quantization error, which is about bit depth, not sample rate. No?
This is completely incorrect, by shannon (http://en.wikipedia.org/wiki/Nyquist%E2%80%93Shannon_samplin...). The sampling frequency determines the maximum frequency that can be captured, not the temporal resolution. That said, a transient containing higher frequencies will be sharper than a transient that doesn't, but its onset time resolution will not be determined at all by the sample rate.
Said another way, two band limited pulse signals with different onset times, no matter how arbitrarily close, will result in different sampled signals.
> two band limited pulse signals with different onset times, no matter how arbitrarily close, will result in different sampled signals.
This is true, but different than what I am arguing. You're saying that a listener over time will be able to tell that the two signals differ. I am saying that a listener will be able to determine this at fractional wavelengths.
It's similar to dithering a high dynamic range signal onto a lower bit depth: more than two samples are required for "evidence" of two different signals, while sampling at a high enough rate will tell you this almost instantly.
Again, I don't know if human ears are able to detect this, just that I haven't seen it addressed in these discussions.
As a thought experiment, let's consider a pulse that has been band-limited to 20kHz. Are you arguing that the analog output of a (filtered, idealized) DAC would look different depending on whether the dac was running at 44.1kHz vs 192kHz? If so, I don't think many people would agree with you.
Any difference in the "timing" of the output wave would have to come from energy that falls above nyquist of the slower sample rate. So, while I agree with you that the timing would be sharper, this is exactly caused by "higher frequencies", not by some other sort of timing improvement.
> Are you arguing that the analog output of a (filtered, idealized) DAC would look different depending on whether the dac was running at 44.1kHz vs 192kHz?
No. I'm arguing this: take a 44.1kHz signal and upsample it to 192. It's the same signal, same bandwidth and everything. Duplicate the stream and add a 1 sample delay to one of the channels. When you hit play, that delay would be there. If you downsampled the 44.1kHz signals after applying the delay to one of the channels, you would almost hear the same thing. The difference is that you could not detect the difference between the signals until after a few samples. With the 192kHz stream it would be unambiguous after 2.
Remember, Nyquist-Shannon holds if you have an infinite number of samples. If your ears could look into the future then what you say is perfectly correct, but they need time to collect enough samples to identify any timing discrepancies.
i think what jaylevitt is referencing to is that there is interpolation going on in the dac. that could mean (i'm no dac expert, so not sure) that the dac could guess more granular than the sampling rate would allow the start points (of transient e.g.)
but the question for me is how exact that guessing is.
correct me if i'm wrong but, that interpolation happens twice: when recording by the adc and on playback by the dac.
so a lot of that whole discussion (yeah, finally something about acousticts :) depends on how accurate interpolation works in adcs and dacs.
This is the core secret of the sampling theorem. It says if you have signals of a particular type (bandlimited) you can do a certain kind of interpolation and recover the original exactly. This is no more surprising than the fact that you can recover the coefficients for an N degree polynomial using any N points on it, though the computation is easier.
It turns out that if you reproduce a digital signal using stair steps you get an infinite number of harmonics— but _all_ of them are above the nyquist frequency. The frequencies below the nyquist are undisturbed. Then you apply a lowpass filter to the signal to remove these harmonics— after all, we said at the start that the signal was bandlimited— you get the original back unmolested.
Because analog filters are kinda sucky (and because converters with high bit depth aren't very linear), modern ADCs and DACs are oversampling— they internally resample the signal to a few MHz and apply those reconstruction filters digitally with stupidly high precision. Then they only need a very simple analog filter to cope with their much higher frequency sampling.
It's not the timing differences, it's the phase differences. The ear is exceptionally sensitive to phase differences between the ears below 1kHz. This information is captured exactly (to well beyond the naive precision of the sampling clock) for any frequency below Nyquist.
This isn't true. Sample a bandlimited impulse. The exact timing is encoded into the gibbs oscillations of the signal. So long as you have a high enough SNR you can have timing as precise as you want. (and because the ear doesn't work with ultrasonics— it is itself bandlimited— it uses the same phenomena for timing)
What I would love to have is: independent instrument/vocals tracks along with a default recommended "mix". The default mix would be used for normal playback and independent tracks would be great for custom mix / karaoke etc.
Is this too unrealistic to expect? Has something like this been tried before?
I'm not a huge NIN fan, but Trent is truly awesome when it comes to digital music. You can add excellent mastering and dedicated surround mixes too..(rec: Social network soundtrack). Also a former oink'er.
The beatles multi-tracks are also available (although they were only recorded 4-track so not every instrument always has it's own track), and there has been a handful of artists who have released their samples of one song for remix competitions (Daft Punk, Royksopp, Booka Shade).
There are two reasons I don't think this will happen:
1. People would use the tracks to create custom remixes which they would then distribute. What happens when a remix becomes more popular than the original track? Artists generally have to pay other artists to remix their songs (usually via royalties).
2. Creativity. When an artist creates something they want you to hear it the way it was intended. Allowing you to remix it however you like takes away a lot of the creative control from the artist.
Regarding remixing. Artists usually don't "pay" each other, but return the favor - if it's the right term to say. E.g. artist A remixes a song of artist B and artist B in turn does the same for artist A. Or if they are all on the same record label artist A does a remix for artist B and later B makes a collaboration with A. I've noticed this in electronica/edm music artists at least.
And another important remark: some artists are flattered when someone asks them to make a remix for their song. (Imagine you're an artist and your idol asks you to make a remix of his song.)
True. I still think point 2 stands though. If your idol or another artist you respect, asks to remix your song, you may be fine with that. But if every person that buys (or pirates) your song can remix it you might be less happy about it.
I write music and have considered releasing separate tracks so people can freely remix it but I prefer just having mixes that are controlled by me. Allowing another artist you know to remix your track still allows you some sort of control (you know their style so have some idea of how the remix will go). Giving up that control is a big step and, I think, an unnecessary one.
That was predicted (and suggested) by Glenn Gould some forty years ago. At the time, anything with higher fidelity than, say, a bad telephone connection, was analogue, but we were stepping into the world of quadraphonic sound (which died soon after in the analogue kingdom), but he was a big proponent of the listener as participant (hey, it was Toronto and McLuhan was still around) and was convinced that technology was the only limiting factor at the time. (To put thing in perspective, he was also very much anti-concert--he hated what he called the "non-take-two-ness" of live performance.) Let's just say that the idea was no more popular among artists then than it is now.
Many of the groups I listen to do do this -- this is certainly not that rare. Sometimes they go for a bit more money by releasing a separate CD with karaoke tracks, for example, but at least if you want it, it's available.
Of course, you can very easily get just the vocal track by subtracting the two. Sometimes the "non-vocal" track will still include backing vocals or the like in appropriate places, and just pull out the main vocal track.
For music where the vocal tracks aren't released separately, you can often pull them out nevertheless. The best is if you can get the audio in 5.1 -- vocals are almost always center-panned, which makes extracting them quite easy.
Some songs when released as a single have Acapella and Instrumental versions of them as well. There are also compilations with only acapellas and compilations with only instrumental versions of the songs.
And when you have them, just use something in the like of Ableton Live and that will be it. I think that that's what you mean right?
It will be a great idea to have tracks released as several `layers` so that the user can choose which of them to play and which not, for example the bass/beats layer, layer with melodies, layer with the percussions, layer with the vocals of course, but that sounds like semi-studio production.
I have to say that was probably the most comprehensive dealings with the issue of sample-rates I've ever come across. I'm not going to make the mistake others have of claiming falsehoods (all of which i've read so far have been debunked to my satisfaction by the HN users-- i'm impressed, guys).
As pointed out, mastering has vastly greater effect on the audio quality (and is often pretty poor), and is the reason vinyl records often can sound better than their digital counterpart, despite being an inferior technology. The DAC used also has a massive effect on the sound once you get into decent quality equipment.
Like the author, i'd also love to see some expansion of mixed-for-surround music.
 a lot because of loudness wars, as pointed out in the post, but also just due to a lack of time/care/love(/demand?).
My hearing has declined over the years, to the point where audiophile gear is a complete waste of money. For example, I can no longer hear the difference between a cassette tape and an LP. I still listen to and enjoy music all day, but no longer worry at all about the sonic quality of it.
My advice to you younger guys is to keep the windows rolled up while driving. I have no other explanation why my left ear is much worse than my right.
This is a really convincing article that makes me want to set up a double blind test for myself with my own equipment.
In my own tests I believed that I couldn't tell the difference between 16/44 and 24/96 on high quality loudspeakers, but I could with high quality headphones. The studies cited all seem to use loud speakers in testing.
Also worth noting, the article states that obtaining 24/96 source material sometimes means you get better mastered material, which still sounds better after down-sampling back to 16/44.
You weren't just believing things. The difference between 44khz and 96khz sample rate is very noticable even with mediocre audio equipment. It's an overstatement to refer to the situation as a "hi-fi case". 16/24 bits however makes no difference at all except on the size of the material.
In normal listening conditions and for most people the difference between 16/44 and 24/192 is inaudible.
Given a 5 minute song, if I have the choice to download a 11MB file (320kpbs MP3) or a 330MB file (24/192) I would of course choose the 11MB file. The sound quality is perfectly acceptable and the file size much more convenient to manage (storage, backups, etc.).
In terms of the convenience of managing the file size and sound quality I think 320kbps MP3 is the best compromise.
Here's a file size comparision of a 5 minute stereo song:
MP3 128kbps > 5 MB
MP3 320kbps > 11 MB
Uncompressed 16/44 > 50 MB
Uncompressed 24/192 > 330 MB
When talking about sound quality there is a much more relevant issue: the amplitude compression (distortion) abuse used by mastering engineers and producers that totally destroys the dynamic and life of the sound. That is a real issue. When buying a song there should be two versions to choose from:
A) "Loud", dynamically destroyed / distorted version.
But then for every 10 people like you there is 1 person who is willing to pay 20x as much so they can get a "higher fidelity" product.
For a producer and manufacturer the rational approach would be to cater to that craziness and extract as much money from it as possible. In other words if you are selling HDMI cables, spend $2/cable to make it, then sell most for $5 and then re-brand some and sell for $500. If only takes 1 out of 100 people to buying that to make the same profit. You know these people are obsessed and irrational so you cater to that. And that's basically how we end up with ridiculously overpriced Monster cables and recordings distributed to customers @ 192kHz.
Agreed, that market exists. My point is, why discuss the subtle difference between 16/44 vs 24/192 when there are far more audible and damaging practices going on in the music industry. For example, aggressive compression and brick limiting which adds distortion to achieve maximum loudness ('loudness wars').
I know a bit of sound engineering, waves and so..
I totally agree with the title and the first 60 lines of article, and I add my POV:
1. Most of the people doesn't care,
2. What apple did is just about marketing,
3. Most of the people who says that care is pretending,
4. Zeppelin still rock the shit in a poor quality mono mp3 recorded by a drunk guy in the audience of a concert in 73.
I do care, but I'm not the average user. Apple has always catered well for those in audio and video, up to professional levels. These are markets that retain Apple users, even when Steve Jobs was between Apples. It seems like Apple is only requesting masters to come in a higher resolution, not that consumers will generally end up with these. I think this is entirely fair since before you want to modify something (e.g. to remaster it for iTunes) you want to start off at a good quality high resolution.
That said, if Apple also allows high quality recordings to be sold, it will be useful. For example of their acapellas, instrumental tracks or samples, it would be convenient for others who want to want to remix it, and iTunes would be a platform for this trade.
Also for tracks DJs play. Most compression throw away a lot of the bass which people can't hear, but this is bass you can feel rumbling through your guts on a big sound system and is part of the experience.
For the rest, they were happy with low rate AAC files on the early iPods, they are happy with the sound coming from their crappy little iPod dock, for them it won't make a difference as long as it's a chart music track from a memorable and impressionable time of their life.
I mostly agree with the article in the context of distribution of a final mix. However, the article ignores one glaringly obvious reason to distribute in 24/192 format: to allow the listener to be a participant in the creative process, enabling better results for amateur musician listeners who want to sample or remix the audio or for DJs to get better results when altering the tempo for beat matching one track with another, etc. Of course, if you're going to do that, you might as well distribute in a multi-track format instead to maximize flexibility for the end user (Want to sing karaoke? Just turn off the lead vocal track for playback).
Yea, and and if the bandwidth/storage is at all an issue 6x size bloat from 24/192 pays for 6 separated tracks. (Actually more, because multitrack is more losslessly compressible while 24/192 is less). If you're already providing multitrack then 24 bit audio would make sense... otherwise, meh.
There is no harm in releasing higher quality uncompressed or loss-less tracks. At the worst they will bring in some new customers, such as myself, that currently will not buy music online. Why would I pay $10 for an album as a highly compressed download when I can pay the same price for the CD and rip it to FLAC myself? I realize I am in the minority here, but as CDs phase out even more, there has to be some other way for consumers to obtain high quality versions of tracks.
Footnote, you don't have to have a >$10,000 setup to benefit from higher quality tracks (compared to the downloads that sometimes have 'questionable' quality). I have two systems, a full range stereo (front left and right) setup for nearfield listening at my desk thats +/- 1DB from 50hz-20khz. The other is a stereo setup in my media room; 2 way quarter wave transmission line, +/-3DB 40hz-20khz. The point is, there are a lot of people with less than $1200 in audio gear that still want lossless tracks made available. Who cares if the human ear can't discern much of the extra information, we still want it.
A few years ago I became really interested in recording music. I had been writing a little with a friend, using whatever crap equipment we could afford, the results weren't amazing but we were having fun and staying focussed on the music itself.
Then we starting recording other people. I became obsessed with gear, software and all the associated toys that go with any technical pursuit. I'm a programmer, so it's easy to understand how that happens but I totally lost sight of the music, spent way too much money and equipment that was nowhere near being required and generally lost the plot. I was tracking everything 24-bit/96kHz and bemoaning the loss of quality when I mixed down for CD.
Anyway, the TL;DR version of what followed was that we recorded quite a bit, lost interest in making our own music and then the whole adventure came to an end. Now my gear is leaving via eBay and I'm finding my way back to just playing guitar and trying to write good music.
24-bit/192kHz - pointless. Give me a small venue and a guy with an acoustic guitar any day.
This is a good article, however the guy who has been pushing this for years and years now, is a man called Dan Lavry. In fact he wrote a very good, rigorous explanation a few years back,in very readable and well written form.
The hearing of ears is a time-domain thing, not a frequency domain thing. It's the frequency response of all the frequency components added together. people might not be able to respond well to a single high frequency tone, but might respond well to a combination of tones.
The basilar membrane is a loosely tuned resonator. The hair cells placed on it fire beginning on the positive zero crossing. So, to a first approximation, the ear is in fact a filterbank.
There is a time domain component in that the cochlear nucleus contains nerve cells that watch multiple hair cells at a time and correlate the firing in several different ways. Some attempt to discriminate pitch, some convolve and correlate in-phase firing energy, some look for tones to end, etc. This information is then forwarded on to the brain.
However, getting back to your point, no hair cells will fire if the basilar membrane doesn't move, and it's tuned to a frequency range.
I find mp3 and aac compression artifacts to be monstrously irritating. I have no idea how the majority of the world seemingly can ignore them.
Further, I can hear a difference between 44.1kHz and 96kHz. Whether you can hear that difference is up to you. (The word-length is a red herring - there's no new information contained in a 24-bit recording vs 16.)
IMO anything less than flac and you're missing something. Higher sampling frequencies do add to the sound, but in a way that is almost invisible to the untrained ear. Perhaps these should be distributed at a premium the way SACDs and similar "audiophile" formats were in the past?
I was under the impression that two inaudible high frequency tones could interfere with each other to create an audible interference pattern. (I think known as a "beat frequency").
If this is the case, then all of the arguments in the world about the maximum audible single frequency are irrelevant. Imagine music composed entirely of these beat frequencies and performed with a pair of oscillators between 25kHz and 35kHz. Without higher resolution encoding, it would be audible IRL but the recording would be silence.
That would suppose that the recording device precisely matched the orientation of the listener, and the recording was not created digitally in (multi-track fashion for example). There would have to be air space in order for the interference pattern to set up in.
So you'd be right if your mics were head spaced and in the venue. But you'd still have secondary data, with the original lost.
Alas, they don't— you can easily demonstrate this for yourself. Startup an audio editor and generate tones at 25k and 28k (make sure you can't hear them— otherwise you have severe distortion screwing up your test) then play both at once. You will not hear a 3kHz tone.
The tone you get from an acoustic beat is not a real tone— it's a perceptual quark that requires you to be able to hear the tones in the first place.
I tried this in Audacity, with the project set to 96kHz and two tones at 25kHz and 28kHz. I couldn't hear either of the tones individually, but I could hear a tone when played together. This is on Windows 7 with the sound card configured for 24-bit/48kHz. Am I running into resampling artifacts somewhere in the chain?
EDIT: it turns out Audacity won't generate a tone above 20kHz (the UI accepts the value, but when you reopen it the value has been rounded down), so both of my generated tones were actually 20kHz.
TL;DR - long and detailed information about why if you got music in 24/192 format you couldn't tell the difference between it and 16/48 music.
I chuckled because this is so true, and yet tell that to the people who buy oxygen free copper 'monster' cables for their speakers, being careful to align the arrows with the direction of the music from the amplifier to the speaker. People, even otherwise reasonable people, will swear up and down they can hear the difference.
A person can not hear a 22kHz tone doesn't mean he can not hear a sound that contains 22kHz components. For example, a square wave contains lots of high frequency harmonics, the more higher frequency harmonics it have, the "squarer" the square wave gets. An ideal square wave forms ideal "0" "1" states. A person's ear might not be able to hear a 22Khz sine wave tone, but he might be able to sense the steepness of "0" "1" state.
First, if you can "hear the steepness" you really can hear higher frequencies, but I assume you meant "maybe you can hear higher frequencies but not higher frequency _tones_"...
People have suggested this. It's been tested in rigorous double blind tests— involving both real music signals as well as special test tones (Linked from the article). The tests were unable to show that people could hear the ultrasonics. Moreover, there isn't any physiological basis to expect people to be able to. You can't expect a stronger result than that.
Common 48KHz audio already goes a bit beyond what adults are known to be able to hear, so you've already got some headroom for "but what if a few people hear better than anyone the researchers have been able to find!".
I know this is slightly tangential but are hi-end DACs really worth it? I have always been amazed how much audiophile DACs cost ($300-1000). The reality is I listen to 320kbps music that was most likely recorded at 44100. DAC technology is not exactly new. So why the price?
Another tangent: To me it seems audio engineering should fix the "woofer". That is it seems subwoofers have terrible distortion.
A low-end dedicated DAC is likely to be a substantial upgrade over a built-in soundcard (I'm assuming we're talking PC sound here). A PC case is a pretty noisy place, electrically - I know one one work PC I had once you could actually here the mouse move, if you had heaphones on and cranked the volume with nothing playing - horizontal and vertical movement had different frequencies.
The move from a low end ($150-300) DAC to one much more expensive will be considerably less drastic, and likely won't matter until you've dropped at least $5k in to the rest of your system.
That said, you may already own a DAC without realising it...as long as you're taking the singal out _digitally_ (e.g. SP-DIF or digital coax) to an external receiver, you're already in a pretty decent place.
Oh yes I agree an off board DAC is better. I own the Fiio E7 which I highly recommend for laptops and only costs $80.00. In fact I run a 50 foot USB cable from my laptop to my DAC and the improvement is much better than running a 50 foot 3.5mm TRS.
But the high end ones that are 24 bit 192khz that cost $1k (Cambridge Soundworks DAC magic comes to mind) I have to seriously doubt I'm going to hear it. I really only hear the DAC difference (compared to my laptop and FIIO) when I use headphones.
Has anyone had a look at their hi-fi amp recently? If probably probably doesn't handle much more than 80 kHz and your speakers probably dont respond to anything over 20 kHz. So yes, 192 kHz is pointless UNLESS you intend using it for studio quality editing/mixing - and I'm sure Steve Jobs would not have encouraged this!
From Footnote 1:
[...] If we were to use the full dynamic range of 24bit and a listener had the equipment to reproduce it all, there is a fair chance, depending on age and general health, that the listener would die instantly. The most fit would probably just go into coma for a few weeks and wake up totally deaf.
The article AFAIK states little about distortions introduced in remixes & samples. I would expect certain high frequency samples, when mixed together to overlap in time, would introduce moire artifacts (beats).
I think this only applies to headphones. People also 'hear' sound with there body (skin). Maybe you could call it experiencing sound.
And then there are resonating sounds that cannot be heard but help to create other sounds. But maybe this won't apply to a recording because your will record the result and not the tones that make the result.
This is a great article but I'm still not convinced people cannot have a sensation of sound out of there hearing range.
I would guess that's because some earlier remotes used a higher frequency IR emitter that was in fact touching into the red.
These days, the various IR communication protocols have been standardized and virtually all use 920nm, 940nm or 980nm emitters, all of which will be invisible. I mentioned the Apple IR remote specifically because it's a remote most people reading TFA will have, and it's known to be a 980nm emitter.
The key to reproducing the original signal from the digital signal is a low-pass filter that rejects everything above the sampling rate, correct?
That is to say, what I am getting at is while the original signal can be reproduced, it requires properly tuned, and probably reasonably high performance, hardware to remove the higher frequency components of that square wave. Can you count on consumer grade hardware to do this well?
Yes, thats basically it. They do this _exceptionally_ well in fact.
Typically the technique used inside DAC is to digitally upsample the signal (by duplicating samples, often to a few MHz— also allowing them to use a low bit-depth DAC) then it applies a very sharp "perfect" digital filter to cut it right to the proper passband (half the sampling rate). The analog output then contains only a tiny amount of ultrasonic aliasing which is so far out that it's easily rolled off by simple induction in the output.
This isn't just theory. Here is a wav file I made at a 1kHz sampling rate, where every other sample is -.25/.25: http://people.xiph.org/~greg/1khz-sampled.wav (so a 500Hz tone, the highest you can represent with 1kHz sampling).
Feeding that file to a boring resampler (I used SSRC, but anything should give roughly the same result— a least when not quite so ridiculously close to nyquist, most will attenuate near-nyquist data extensively) and get this: http://people.xiph.org/~greg/1khz-sampled-to-48khz.wav
As you can see— the 500Hz sinewave is reconstructed perfectly. (Of course, a 500Hz square wave would not be (you'd get a sinewave out) but this is because a 500Hz square wave contains energy far beyond the nyquist of 1kHz sampling).
Here is a spectrograph of the same signal http://people.xiph.org/~greg/1khz-to-48khz-spec.png showing that the tone is indeed pure (the faint background noise is the dither the resampler applies when requantizing its high precision intermediate format back to 16 bits).
There is no point with going over 16 bits, but there is definitely a point with going over 44.1khz, as it allows you to actually reproduce waveforms more accurately than 44.1khz. Try reproducing f.e. a sinewave accurately over 4-5khz with a sample rate of just 44.1khz - it cannot be done, and at this point we haven't even taken into account the issue of varying slew-rate characteristics of the thousands or so different DAC output stages in use in personal audio equipment.
44.1khz gives too much aliasing distortion, but 192khz is quite the overkill. Ideally, digital audio could sit on 16 bits of depth sampled at 96khz.
No. This really is not the case. The article _specifically_ addresses this misconception.
The signal reproduced from your 44.1kHz sampled digital input is not a stair-step like some broken waveform editor might display: On output it goes through a matched reconstruction filter (which may, in fact, be digital and involve an oversampled DAC or it could be analog though those are harder to build without compromise). After the reconstruction filter the output is _EXACT_, assuming the input only contained energy below the nyquist (well, and was sufficiently far away from the reconstruction lowpass).
So even a 5khz sine wave is reproduced perfectly with 44.1kHz sampling.
@nullc: of course you're right, and the commenter you're replying to does not understand the Nyquist-Shannon sampling theorem. Which is a shame, because the article specifically addressed this point.
These discussions of audio standards always get sidetracked by people who don't understand or believe this result. (Have to admit, the result is surprising).
I think there may be problems with the argument in TFA, which is based exclusively on standard linear systems theory.
Of course, the ear and some of its perceptual components may be significantly nonlinear, and thus not covered by the frequency response graphs of TFA.
These graphs assume linear systems, in which you put two frequencies in, and the same frequencies pop out in scaled form. Nonlinear systems can produce new frequencies in response, and this possibility is not discussed in TFA. Probably these effects are quite minor, but may be audible to some listeners on some equipment for some choices of source material.
Indeed, but if there were non-linearies in the ear (there are many, of course) which allowed detection of ultrasonics (less likely, because the first stage of the ear is impressively linear) you'd expect them to show up in the actual listening tests.
The TFA does at least make this the-proof-is-in-the-pudding point somewhere in its depths. :)
Couldn't agree more with you! 192kHz is overkill as a "final" format.
16 bits is very limiting for music with lots of dynamics (ie: classical). Very quiet sounds sound quite bad at 16 bits, but since most pop music has about 6-12db of dynamic range, it doesn't make much of a difference.
I always thought the sweet spot would be 96-24. But the truth is, the market wants smaller and portable digital files, not higher quality music. Anything MP3 encoded will sound significantly worse than a CD anyways.
16 bits is not "very limiting" for anything, unless you think your ears themselves are very limiting.
Many things are mastered poorly— recording engineers crushing the dynamics in order to get the loudest possible signals— mostly a problem for pop music, but nothing is immune.
It's been observed that the various 'higher-definition' recordings have less brutal mastering— no doubt owing to the different audience they are marketed to. But this isn't a property of 24-bit vs 16-bit distribution.
Sometimes less is more. The debate goes on. Why not just let the music play? And by that I mean high resolution music. All you need is one person who can hear high frequencies, and all the technical mumble-jumble becomes hogwash.
People actually _believe_ the 20KHz argument that anything above is inaudible. That's hogwash. I know because I can hear (or sense) higher frequencies, and I do not have the absolute best ears I've ever "met."
For example, last week I attended a A/V equipment event with very high-end equipment. It was packed --- over 600 people for one evening. 6 rooms of equipment. I'm sure all six served the same fare according to the 20-20KHz argument of this piece, yet they all sounded quite (or even extremely) different.
The 20 KHz argument is a myth. For people who can't hear the difference, no problem. But please do refrain from ruining or hobbling music for the rest of us... who can hear a wider frequency range.
Yes, some people are color blind. Does that mean the rest of us shouldn't use color? I hope not.
Music is an important wholesome and potentially emotional part of human life. Please do not cap it with "false optimizations".
24-bit/192 KHz is not inferior to CD quality sound. If you don't believe me, try a Linn system sourced on a Klimax DS with some high bitrate Linn classical music (or the Beatles Masters USB release!). If you can't hear the difference compared to low bit-rate (including CD quality) material, I assure you someone can. The low bit-rate will sound flat, hollow, less lively, or/and more coarse. Any number of problems exhibit at inadequate bit levels.
Vinyl is analogue quality (no discrete digital distortion). CD quality is a large step down from vinyl. A/V is just trying to get vinyl like quality from digital. We don't need nay-sayers impeding progress. If you can't hear the difference, please let someone who can hear make the informed decisions.
It's not a myth, but a fact established in laboratory studies. Your anecdotal claims to hear frequencies that scientific evidence suggests you cannot hear doesn't overturn science. I'd be convinced if you correctly identified which speakers were reproducing 21 kHz frequencies in a double-blind test, though.
Isn't science verified through (wait for it...) experimentation? So how does my hearing not invalidate your science?
That's the problem with the theoretical science. When it's false, it's false. Come up with a new hypothesis; this one's false as it pertains to human hearing. There's information theory, and then there's auditory reality. Reality confounds the theory as applied to hearing. I don't know where the fault lies, and I don't really care.
But it's really annoying and frustrating having people nix progress out of idealistic theory, "laboratory" studies, and ignorance. The experiments (my experiences and numerous others) don't lie.
Double-blind is great, but I can already tell the differences between all six rooms of equipment from last week. One of the rooms was so extreme, I wanted to run out of the room due to discomfort (but I was polite and stayed all 30 minutes). In other words, double-blind was unnecessary. Someone whose ears I respect a great deal, loved that room. Even golden ears don't all hear the same. But I don't need double-blind to confirm trivial experience. The proof is already in the listening.
> So how does my hearing not invalidate your science?
Because it's not a blind study. In audio, claiming something sounds better than something else is low-strength evidence, because it doesn't: 1) distinguish psychological bias (which is very strong in this area) from actual audible results; or 2) distinguish which characteristics of speakers, if any, you may be hearing.
If you can consistently ABX two speakers that have similar characteristics except that one reproduces frequencies over 20 kHz while the other doesn't (with identical performance below 20 kHz), I'd be convinced. One possibility is to use the same speaker but insert a high-quality 20 kHz lowpass in the chain during part of the test; or use the same speaker but with 44 kHz versus 96 kHz source material. I've never seen a controlled, blind case where a human can tell the difference there.
The psychological component is a red herring. Even though I already have a system (bias), I don't care about the other systems. I went to the show for enjoyment, education, and appreciation. Some of the systems were unknown to me (no bias), and some were known and surprised me in some ways (again, some bias overridden). So bias can be important, but it's not relevant in this case. So bias doesn't invalidate my experience.
As for the double-blind and high frequencies, I believe I've already done the test. I have had my hearing tested several times. One of them, I recall the tester actually asked me to repeat some tests... it was funny. The testing was at very high frequencies. I believe she thought I was guessing the higher/lower frequencies... and getting lucky. So (I strongly suspect) she wanted to "prove" to herself what you want to prove --- that noone can hear above 20KHz. I disappointed her. I think she even threw in some placebo tests (no frequencies at all). It was funny. She never explained herself. I suspect she just thought I got lucky again.
How to really test this stuff? Get one of the audio designers to test... but they will laugh in the testers' faces. They do this stuff for a living... to build real products... for real live customers who can hear the differences. Dave Wilson was at the A/V show. Try listening to a pair of Wilson Audio speakers. I bet he can hear better than just about anyone... His speakers (when sourced and driven properly) are that good. But he wouldn't waste his time on such tests. He has customers to serve and a business to run.
I doubt lab experiments look to disprove their theories once and for all. That's a social prejudice built into the lab experiments. Fix that, and you'll end up with a better hypothesis.
The article states that greater than 16 bit has value in recording, just not in playback. If you took your 24 bit recording and translated the best 16 bits of it than output it through your 24 bit DAC then they are saying you won't here a difference. I say output through your 24 bit DAC so you aren't simply hearing a better DAC.
One thing I don't see addressed is the experience of feeling frequencies that can't directly be heard. There was a study done with a particular piece of classical music, with and without a particular inaudible component to it. The presence of the inaudible component drastically changed the listeners perception of the music. They described it as more dark or creepy (perhaps not the actual words used, but it matches the sentiment). The point is that there may be value in reproducing frequencies that we can't "hear", as inaudible notes can alter the experience of the music.
The author completely ignores infrasonics and writes under the incorrect assumption that our only perception of wave pressure comes from our eardrums.
I've never been able to enjoy listening to my favorite classical music on headphones or even smaller speakers, and it's largely because of the effect you describe.
At this point I'm resigned to preserving my treasured (and cumbersome) vinyl collections. Maybe if Apple comes up with some snazzy marketing term (e.g. "Retina") for 24/192 or even 24/92, and starts distributing it on iTunes, things might start to change.
You don't need a higher sample rate to capture or play back infrasonic pressure waves, but most recordings are mastered to remove DC offset and rumble <20Hz, as reproducing those components requires specialized equipment, such as a rotary subwoofer.
I don't understand why anyone gets down on 24-bit consumer audio.
Specifically because CD-quality 16/44 audio has midrange distortion present during complex passages that is completely eliminated and non-present in 24/96 sources.
Listen to "Us and Them" off a 16/44 CD version of the Pink Floyd album Dark Side of the Moon. When it kicks into the chorus, it becomes totally distorted and everything in the midrange bleeds into each other. It's a mess.
Then, try listening to the 24/96 Immersion box set copy or a vinyl-sourced 24/96 rip and you'll find it's gone. When the song gets complex and loud, everything remains totally clear, each instrument stands on it's own, it doesn't become an awful distorted jumble.
You could argue that it's just the quality of the master that makes the difference; but if you take a copy of the original transcoded to 16/44 and compare it again with the 24/96 copy you can hear the same effect.
Why would anyone argue against high-resolution audio anyway? Sure, most everyone will probably just continue downloading 16/44 MP3s, but at least give us the option to have 24bit FLACs of the stuff we really like. Please and thank you.
You could argue it's the quality of the master, and the mastering process, and you'd be right. That's a no-brainer.
"but if you take a copy of the original transcoded to 16/44 and compare it again with the 24/96 copy you can hear the same effect."
I could believe that, but do you mean to do the transcoding yourself? IN this case you become the engineer, and the tools you use and all that become vital as well.
Having heard stunningly awesome CD's of DSOTM on a homebuild heathkit amp and some old speakers and not believing my ears when I saw what the setup was, I'm skeptical... can't help it.
Huh, I think people truly advocating 192 as a distribution format will be few and far in between, a really good and cheaper sampling system can be put together at 96. Still, a lot of things in this article perplex me.
Human hearing is limited to 20k because frequencies higher than that are perceived as painful? Dont agree with that one.
24 bit doesn't offer any advantages to sound quality? Sheesh.
And the crux of the argument is intermodulation distortion increases when you try to represent more frequencies? Isn't that an argument for a faster power amp?
"Human hearing is limited to 20k because frequencies higher than that are perceived as painful? Dont agree with that one."
Yeah, that's a silly one. I disagree with it, too. It's a good thing it appears nowhere in the fine article. Are you actually confused about the difference between frequency and amplitude? Or did you misread the article?
"24 bit doesn't offer any advantages to sound quality? Sheesh."
As brazzy rightly points out, "Sheesh" isn't a reasoned statement. It's an ejaculation. And, it turns out, the author talked about why sound engineers record with 24 bits; It has to do with pragmatic reasons about leaving room for the highest and lowest frequencies in the audio being recorded without clipping, as well as with the author's discussion of Nyquist considerations in the distributed product.
Your post is wrong in so many ways that would have been easily fixed by reading the linked article with even 8th-grade reading skills that the reasonable reader has to wonder if you're being deliberately obtuse. Are you?
> Human hearing is limited to 20k because frequencies higher than that are perceived as painful? Dont agree with that one.
You misread the article. It's because there is so little response that being able to hear it would blow your eardrums (and even then, it might still be beyond your ability to hear it). There's no value in that.
> 24 bit doesn't offer any advantages to sound quality? Sheesh.
Not quite what TFA says. According to the article, 16 bits effectively covers the dynamic range of human hearing, so more than that is pointless for music consumed by human beings (hence all the stuff about 24bit being a good idea for mastering & production). If you're storing integers in the 0~16384 range, going from 16 bit integers to 32 bit ones is not going to give you "better ints", it's just going to waste 2 bytes per int. Same thing here.
There will always be people with slightly wider sensory perception than others, but are you honestly saying that you can easily hear sounds "well above" 20kHz? Have you tested this in a double blind test? I know that sounds like a lot of effort, but it would be interesting. You would be somewhat unusual if you could hear, say, 22kHz.
If you test this and hear something, it's almost certainly because the ultrasonic signal is being distorted by your amplifier and speakers and you're hearing distortion products that are ending up at frequencies you can hear.
As he mentions, with 16-bit it's easy to significantly reduce the dynamic range or clip; you only get the full 120dB range of 16-bit with careful handling and calibration. You don't have to worry about all this with 24-bit - you'd almost have to deliberately screw up the signal to reduce the dynamic range below that of a human's.
Ideally audio engineers would take the effort to do good 16-bit conversion for distribution, but I realize that's too much to expect of them.