When people ask me where they should spend money to improve the quality of their hi-fi or home theater system, in nearly every case my response will be something like "get a thicker rug" or "put something on this wall to absorb sound reflections, even if it's just a bookshelf."
Beyond that, I'd tend to say something like "stop being so paranoid about what you think you can't hear, and enjoy the damn music."
I'm a composer who works in film/games. I can assure you this is exactly what I'd like people to do when they listen to my music. I spend 99% of my time trying to create good musical ideas, and I spend 1% of my time getting the mix down. I get criticized (rightly) for this quite a bit, but it is hard to care about someone sitting in a >$10,000 labyrinth of sound equipment when I'd rather write a catchy tune.
Then again, when I write sheet music I have to endure some of the most soul-crushingly awful midi sequencing in order to check my work, so perhaps I'm too tolerant to terrible sound quality. Still, I'd rather people listened to the music, not the sound of it.
A good sounding catchy tune is something work spending that little bit more time on.
But in reality most of your customers want a ton-frakk of compression, loudness and filters on that catchy tune so it ultimately sounds HUGE on the tiniest phone, radio and car speakers... so all that audiophile mixing and dynamics are completely lost anyway.
The other issue regarding high-frequency sound reproduction is that in most cases, the loudspeaker won't be outputting much beyond 22-25 kHz (assuming very good quality loudspeakers, cheap consumer grade units might struggle to hit a -6 dB point at 18 kHz) and even for the speakers that have usable output at that range, the directivity at those frequencies will be so narrow that your head will have to be locked in the perfect "sweet spot" to hear anything.
> Agree one-hundred percent about the room
> although the prescription isn't always as
> simple as "get a thicker rug" etc
WILL: The sad thing is, in about 50 years you might start doin' some thinkin' on your own and by then you'll realize there are only two certainties in life.
CLARK: Yeah? What're those?
WILL: One, don't do that. Two -- you dropped a hundred and fifty grand on an education you coulda' picked up for a dollar fifty in late charges at the Public Library.
If you could hear subharmonic beats from ultrasonics then it would be _very_ easy to demonstrate, alas.
for those curious
>While Motown shortened song to fit into radio time, the company also produced records specifically with car radio audio quality in mind. Motown recording engineers set up car speakers in the studio so that they could simulate and perfect how a song would sound emanating from a car radio
- what's the point of engineering things to a set of conditions virtually none of your target audience possesses?
I worked out a long time ago, that I enjoy listening to _music_, not HiFi gear.
My advice to people who ask how to make their system sound better? Buy some music you enjoy more…
I can enjoy a wonderful performance of a great tune played through my laptop speakers - much more that I enjoy test tones or gear-demo-tracks through sound gear worth something north of a new car…
(not that I haven't been "that guy" in my past…)
Good advice, but you do need some baseline quality equipment to start with.
Got my car with one speaker blown out, speakers wired semi randomly (left-right and front-rear faders don't work as they should), also powering line-in source from cigarette lighter results in funky background noise. Sounds great--when a good tune is playing and I'm able to recognize it ;-)
Then you can worry about your room.
Or get some half decent headphones.
One of the many things EQ can't fix, of course, is room reflections, which can be helped by room treatments and speakers with a directivity better suited to the room.
 DRC can improve this, but only within a small sweet spot.
It is said that in most cases 192 kbit .mp3 is indistinguishable from >192, and blind tests support that. Granted, there are instruments like castanets which make it easier to hear the difference. In general though, I can't distinguish 128 from 192 and I listen to music a lot. Also it's unlikely that my hearing is already damaged because I try to keep volume low.
But I've noticed that where I put the speakers makes a huge difference. I can easily tell the difference from speakers on the floor versus speakers on my desk. Where I'm at the moment also matters a lot. If I lie on the floor, floor speakers don't sound as bad anymore.
In the end, I use headphones. Midrange Audio Technica ones, and I'm probably already overpaying a bit. But I bought them for build quality and comfort, and I wasn't disappointed. I can have wear them for hours (Not healthy I guess, but I'm used to wearing them even with no music being played). Headphones have the advantage that it suddenly stops to matter where your speakers are and where are you relative to them.
Yes. Do a couple of blind tests with your acoustic system first.
> It's true enough that a properly encoded Ogg file (or MP3, or AAC file) will be indistinguishable from the original at a moderate bitrate.
Disagree. This claim seems to be ungrounded compared with others.
I can believe limitations with bit depth and sampling rate (although I'll take a chance to test myself if I get near good enough acoustic system). However, I definitely could discern in a blind test whether music I listened to was stored using lossy format with reasonable bitrate. It's usually quite audible with rock music that involves cymbals.
AAC / Ogg don't have that limitation & at high enough bitrate should be indistinguishable from the source in a blind listening test, as demonstrated in a number of Hydrogen Audio listening tests down the years, unless of course you're using crappy encoders at which point all bets are off...
(Really, LAME is very good indeed these days. I eventually decided that I was going to get with the program and just encode all my CDs (backed up to flac files) as mp3 for portable listening. It's good enough, and I've decided not to listen for the pre-echo artifacts so that I won't notice them :) )
IIRC I distinguished an mp3 encoded by iTunes with bit rate 192 or 256 kbps from its original in Apple Lossless (both played on same cheap acoustic system). I probably should test with AAC or Ogg, too. Although I have a feeling that it's pretty much impossible to keep intact those rich in high frequencies cymbals while keeping compact file size.
> I've decided not to listen for the pre-echo artifacts so that I won't notice them :)
You're much better at controlling your mind. =) After I once verified that the difference is audible even on cheap speakers, I can't switch back to lossy formats. It means constant wondering if that how it's supposed to sound or not…
That's, by the way, why Apple's idea of having ‘Mastered for iTunes’ label IMO is worthwhile—at least you can be sure that mastering engineer listened to it this way. =)
(Cymbals seem to be a particular bugbear for mp3 encoding; cymbal-heavy tracks tend to suffer the most from obvious encoding artifacts once you know what to listen for.)
Was this in a blind test?
And I was _so_ sure the next sentence was going to be something like:
"No, do not have any suggestions that will make your sound equipment make Justin Bieber sound better…"
Digitally recording a triangle is the best example of why 48kHz is very limiting. The distinct sound of the triangle constitutes of a high fundamental frequency, ballpark 5kHz and of many very high-pitch harmonics. Most of these harmonics are above 20kHz. The harmonics are what makes it sound like a triangle, not the frequencies below 20kHz. This is why the triangle is one of the hardest instruments to digitally record. It always sounds like crap.
In theory, it's true that the human hear can't hear above ~18kHz, but it can hear the influence of the very high pitch harmonics on a lower frequency.
EDIT: here's more data backing what I said http://www.cco.caltech.edu/~boyk/spectra/spectra.htm
EDIT 2: typos, frequency mistake
The article's about distribution, not recording. I don't think anybody disputes the usefulness of higher sampling rates when recording.
> In theory, it's true that the human hear can't hear above ~18kHz, but it can hear the influence of the very high pitch harmonics on a lower frequency.
...and 48kHz audio contains those lower frequencies.
For example, the human hear will hear a 30kHz frequency if it's fundamental is 10kHz. If it's played at 44.1kHz, the 30kHz frequency is gone and all you'll hear is 10kHz, not a "different sounding" 10kHz.
You are going to have to provide me with a citation to back that up because that goes against everything I've learned and experience in 17 years of working in acoustics.
Basically, if you produce two ultrasonic frequencies, they will create an interference pattern at a much lower frequency than either of the individual frequencies. Modulate a signal on the difference between two signals, and you can create a directional speaker, since ultrasonic sounds tend to be highly directional (so long as the diameter of the transducer is greater than 1/2 wavelength, which is almost guaranteed with ultrasonic signals). This is how the "sound cannons" that are being deployed for crowd control work.
Yes, but the effects of interference patterns between multiple ultrasonic frequencies is the same, and definitely does affect the audible spectrum
has nothing to do with this:
This is why we must filter the square wave that comes out of a DAC
The only reason that square waves "must" be filtered is to reduce the potential of damaging tweeters. If you want to record a square wave with the purpose of later reproducing the square wave, than you don't want to filter it - once you filter it, it's no longer a square wave.
The reason that square wave sucks is because it introduces tons of high frequency content (your amp probably won't reproduce the high frequency content anyway, so I don't think most Japanese consumer amps will damage your speakers--that is, the amp will act like a filter anyway). That high frequency content then creates alias effects (think of moire patterns when looking at super high-res photos that are scaled down without anti-aliasing). Those alias effects sound like shit to the human ear.
The point of filtering is to anti-alias the resulting analog signal after conversion from digital to analog. The point of upsampling is to move that filter well beyond the audible range, so you can use a 1st-order filter (gentle slope, but it introduces no phase effects). The fact that a square wave hurts your speakers is inconsequential--the amp will effectively filter the signal anyway. Unfortunately, it will filter the signal without anti-aliasing, which introduces those nasty interference patterns within the audible spectrum (that is, if you feed a straight 44.1KHz sampled square wave to your speakers without upsampling/filtering).
Trying to record an edge case like this is the same as recording in a room with bad acoustics. So you end up with some weird (but not faithful) representation of the sound which is a snapshot of the microphone's characteristics and directionality of the ultrasonic tones. It's not reasonable to assume any microphone will behave exactly like a human ear. Even if you could, you're going to have to mimic the tiny random movements a normal person would make listening to a sound, movements which would definitely impact the perception of the sound, because microphones are much more stationary than any human would be.
The "different sounding" argument two posts above is silly, because sound is almost never that monochromatic, and if it is, it's usually boring. Also I don't understand how missing out on an odd order harmonic would be a bad thing :) The reality is none of these arguments are based in a reality of what people would hear, and because of that, the arguments aren't practical.
In reality, 20 bits at 48kHz (or 64kHz) would be more than acceptable for even the most discerning of ears and probably the most practical in terms of space and fidelity, but it'd be a weird format to distribute in.
So the interference pattern will be made up of one low frequency sound and higher frequency harmonics. Once again the higher frequency harmonics are redundant, because you only need to record the lower frequency sound.
The only possible way ultrasound can be picked up by the ear is if the ear has a non-linear response to the input sound. Going by the information in the article linked, it is highly unlikely that any significant non-linearity exists in the ear.
Then you should be able to provide at least one citation.
Of course, filters aren't perfect, and result in phase shift and roll-off. So we over-sample the signal to create a signal with a much higher frequency than 20KHz, so that the filtering occurs well outside the audible band, allowing us to filter out all of these harmonics without affecting the desired signal.
Basically, the end result is that by sampling the signal, you are introducing high frequency content that must be removed prior to playback. This high frequency content is one of the reasons old CD players from the 80s and 90s cause "listener fatigue", although I have no sources to back up that last statement.
(not for eatmyshorts -you get this I gather) - everyone gets that "upsampling" can't add detail to a recording right? You can't get more than you've got.... no matter what you do. There is no magic. You upsample so you drive harmonics generated in the digital-to-analog process during playback further up in the spectrum so when you get to the analog stage you can use a nice gentle analog filter to filter them out. Without the upsampling, you need a nasty steep analog filter to filter them out, and that can have audible side-effects (or at least measurable) in the audible spectrum.
eatmyshorts - correct me if I mis-stated any of that please....
I don't know about the physics of the speaker itself generating the overtone (in cabinet), but it could certainly resonate a wine glass in the room, for example.
So no, your wikipedia link is not a citation for the claim that cmer made.
If it really worked that way it would be trivial to demonstrate. Alas, it doesn't.
No. It won't.
Which, since it's a psychoacoustic phenomenon, wouldn't hold true when the partials involved are above the audible spectrum.
Didn't read the article, so commenting out of context, however it needs to be said that in sample-based music genres the distributed music gets used as if it were a recording. Maybe then it could be argued that higher sampling/bit rates should be available, if only for those who are sampling.
That may well be true. But those mixed-down harmonics that are heard "live" would then be captured by the 16/44 (or whatever) sampling. IOW, the recording captures what you heard. Those upper harmonics have no emergent properties. Their effect is captured.
Just because it is difficult to record a triangle does not necessarily mean it is impossible to accurately recreate the sound (to human ears) using 48kHz.
Yes, you're right.
In fact, some of the section X references don't even mention hearing, they talk about "alpha-EEG rhytms" (in this case "listeners explicitly denied that the reproduced sound was affected by the ultra-tweeter") and "bone-conducted ultrasonic hearing" trough the "saccule" ("organ that responds to acceleration and gravity and may be responsible for transduction of sound after destruction of the cochlea").
In fact, most of the claims of the article are around the fact that there is energy over 20khz and how it can affect recording process.
This is a well known fact, and this is exactly why engineers filter out sub-sonic and super-sonic frequencies, especially today: stuff that you can't hear (or feel) will just suck your headroom and make you lose the loudness war.
EDIT: Listen to the triangle at the beginning of Rush's YYZ. It's an old recording, but it sounds significantly worse than the analog version. It's been digitally mastered some time ago so if it was mastered today, it would probably sound better, but still not great. I heard a rumor that Rush is remastering all their albums "for iTunes" at the moment, so hopefully we'll be able to compare soon!
So while it's true that the human ear can't hear well above ~18KHz, and the interference between high order harmonics are audible, it's also true that a properly recorded signal, sampled at 44.1KHz, oversampled, and filtered, can reproduce the exact signal the human ear is capable of hearing. At least according to theory.
The human ear is capable of detecting sound pressure as well as sound intensity, and while playback of the interference between harmonics can be reproduced faithfully in the sound intensity realm, the sound pressure levels will differ, and it is theorized that people may be able to tell the difference between the two. However, as far as I am aware, nobody has been able to demonstrate this reliably in practice.
I'd prefer recording technology to err on the side of capturing what we need to reproduce all of that, even if we aren't sure that we need it.
Again, this article is about distribution, not recording.
Have you read the Audio Technology magazine interview with Rupert Neve?
Greg Simmons: Geoff Emerick, the famous British Producer ?
Rupert Neve: Yes, he started me off on this trail. A 48 input console had been delivered to George Martin's Air Studios, and Geoff Emerick was very unhappy about it. It was a new console, made not long after I had sold the Neve company in 1977. George Martin called me and said, "please come and make Geoff happy, while he's unhappy we can't do any work".
They'd had engineers from the company there, and so on. The danger is that if you are not sensitive to people like Geoff Emerick, and you don't respect them for what they have done, then you are not going to listen to them. Unfortunately, there was a breed of young engineers in the company ( I hasten to say this was after I sold it !) who couldn't understand what he was bitching about. So they went back to the company and just made a report saying the customer was mad and there wasn't really a problem. Leave it alone, forget it, the problem will go away. They were acting like used car salesmen. I was very angry with it. So I went and spent time there, at George Martin's request, and Geoff finally managed to show me what it was that he could hear, and then I began to hear it, too.
Now Geoff was The Golden Ears - and he still is - and he was perceiving something that I wasn't looking for. And it wasn't until I had spent some time with him, as it were, being lead by him through the sounds, that I began to pick up what he was listening to. And once I'd heard it, oh yes, then I knew what he was talking about. We measured it and found that in three out of the full 48 channels, the output transformers had not been correctly terminated and were producing a 3dB rise at 54kHz. And so people said, "oh no, he can't possible hear that". But when we corrected that problem, and it was only one capacitor that had to be added to each of those three channels, I mean, Geoff's face just lit up ! Here you have the happiness/ unhappiness mood thing the Japanese were talking about.
copy here: http://poonshead.com/Reading/Articles.aspx
That's one thing I find concerning with the move to digital. With analog media, you can go back, re-record and get an improved result (provided the source is good) but District 9 (which was shot on Red One) will never have improved quality other than resampling because the source is set to a particular digital format with associated data quality.
"[...] provided the source is good" is begging the question; it's no different from saying "District 9 could be better if they hadn't recorded in 4k (or whatever the Red One was using) and downsampled it for my DVD" The nature of the source is irrelevant, barring the fact that film might provide a higher resolution, if film scanning technology increases, and you can afford to both capture on film, process and store your film properly (archiving film is rather difficult, I believe), and get the best quality digitisation possible.
While I have no doubt that digital will eventually catch up and surpass film, there inevitably is going to be a transition period where quality films were recorded (let's just say at 2k) where the input is constrained and extrapolation be the only available option.
4k is the current state of the art. It will not be so forever and because it's recorded at 4k, we can't go back and extract more dynamic range due to the limitation of the sensor. Whereas you can go back, redigitize an IMAX film (say Chronos shot in 1985) that is in good condition and get way more info than something shot on 4k yesterday.
TL;DR IMO input still absolutely matters. 35mm is not the upper limit. We went through this with photography and am now doing the same with video/film.
EDIT: After thinking more about it, here's a more extreme example. I purchased a Kodak DC20 back in the 90s (early adopter yay!), even if the camera had decent glass, there's no way I can go back to an image captured by that camera and magically get the equivalent of 22mp 5D camera by resampling. If I had used a film camera, I can get a much improved scan.
EDIT2: Here's a good example. Slumdog Millionaire was mostly shot on a SI-2K which recorded at 2k. You can't go back and get 4k output on the digital portions. So generations later, we will be stuck enjoying an Academy Award winning film at that level of quality.
Digital is the future. Hence it behooves us to have the maximal input & output possible at this time. Unfortunately, this is not common now and the price paid is that content created during this period will be stuck at the same quality level.
The cost of renting a red one and recording straight digital vs hiring a film camera, process lab, and all the other parts needed quite possibly means that some films might never have been produced due to filming costs.
What measure of quality can compare X against X, if it was never made?
I imagine (I have very little actual experience here, so it's perfectly possible I'm wrong) that digital recording might make it easier/cheaper to retake shots/scenes repeatedly to get them right as well, offering another 2nd order quality effect.
I don't think I understand quite what you're saying and wondered if you could explain more. You and the article both say that humans can't hear above about 20kHz. If there are higher frequencies that create a harmonic at a lower frequency (e.g. a 33kHz harmonic that produces a sound at 16.5kHz) then surely that lower harmonic (16.5kHz in this case) will be recorded by the original recording equipment assuming it is recording at a frequency at least twice that of the highest audible frequency (let's say that this would be 48kHz, although there might be other DAC-related reasons to go higher).
I'm possibly being very daft here!
If you take this recording and master it for a CD (44.1kHz), you'll effectively get up to ~20kHz (since they're a low pass filter starting at around 16-18kHz). This means that only our first frequency will be captured: 15kHz. It will be exactly the same as if you only recorded 15kHz alone. The harmonics don't modify the fundamental frequency, they just trick the human hear. But when they're gone, they have no effect whatsoever.
Hope this helps!
EDIT: the frequency numbers I used are actually somewhat of a bad example. Harmonics are never exactly double, triple the fundamental. Those would be mostly inaudible. But you get the idea.
Or am I wrong, and the ear is able to detect frequencies above 20kHz?
Actually, they are: https://en.wikipedia.org/wiki/Harmonics
An extreme example is present on modern pianos, where the high rigidity of the loud, heavy piano strings can cause tuners to stretch the lowest and highest notes as much as a half-semitone so that their harmonics are in tune with the note the next octave down or up. In other words, the first harmonic on the lowest note of a piano can be as much as 1/2 of a note sharp.
And when your oscillator is no longer one-dimensional, most harmonics aren't even close to integer multiples. The harmonics of bells, cymbals and drums are all over the place. That's what gives them their percussive sound. (Edit: some of these modes of vibration aren't harmonics in the linear sense.)
That is absolutely incorrect, mathematically speaking, harmonics are by definition "integral multiples of the fundamental." (Fundamentals of Acoustics, Kinsler & Frey).
There's a measure -- inharmonicity -- of how far the actual overtones of a particular instrument differ from their theoretical fundamental multiples.
[I suspect you already know this. This reply is probably for others' benefit]
Anyway, it's not as though mathematical literature requires you to use a term exactly one way. I had a diff eqs textbook that used the word 'harmonic' in exactly the way I used it above when I made reference to diff eqs...
So you had a textbook with a mistake in it. What book was it?
But those aren't harmonics, they're inharmonic partials.
This is the part I really do not understand... either my ear CAN pick up those frequencies, maybe the harmonics are "tickling" the little hairs inside my cochlea and ultimately the frequencies I can actually hear were altered in my perception that way - or I can not hear or sense the harmonics and they physically alter the "original" wave that I end up actually hearing.
Either way, pretty much the exact same thing should happen in a studio microphone. Those all do have frequency limitations and AKG, Royer, Rode, Shure, Sennheiser, Audio Tech, what-have-you pretty much all go up to 15kHz or 20kHz according to specs, if I understand them correctly, but not further than that. If it isn't even recorded, those frequencies I also cannot hear can NOT alter my perception so they HAVE to somehow change the frequencies I can hear and are being recorded... on top of that you are making "room" for frequencies up to, say, 60kHz but I very strongly doubt your mics can go even remotely that high.
"I'm an ex-audio engineer"
Hard to believe.
"The distinct sound of the triangle constitutes of a high fundamental frequency, ballpark 10kHz"
That's a pretty high note - higher than the top key on the piano. But an "audio engineer" would know that.
"many very high-pitch harmonics"
Since the next harmonic after the fundamental would be at 20khz, which only young people can hear, and none of the others are audible to any human, I don't understand what you are talking about.
"Most of these harmonics are 20kHz."
OK, you don't either.
"it can hear the influence of the very high pitch harmonics on a lower frequency."
The statement that frequencies above 20kHz don't matter rests upon the assumption that the ear is linear. If the ear is not linear (I don't know whether it is not not) then frequencies above 20kHz will matter, as the ear will be able to mix higher frequencies down to less than 20kHz. For example, if we have frequencies of 56kHz and 59kHz, the ear MIGHT be able to discern a difference frequency of 3kHz. No doubt this effect could be reproduced by signal with a sampling rate of 44.1KHz, but only if the analogue systems, before the sampling stage, reproduce any non-linearity in the human ear.
Incidentally, you can get speakers that create a localised beam of sound, that the person sitting next to you cannot hear. They work by transmitting frequencies above the audible range. These high frequencies can be beamformed by a relaitively small speaker array, so the sound is localised. They then rely on the non-linearity of the ear (or maybe the air around the ear?) to mix the ultrasonic frequencies down to audible frequencies. I guess there must be non-linearity in the human auditory system!
On the subject of 24-bits my understanding is that 16-bits is adequate, provided the levels (scaling) are set correctly in the recording. What 24-bits delivers is the ability to do a crappy job of the mixing, and still end up with the full dynamic range of the human ear. 24-bits is probably a temporary solution though, as manufacturers will engage in the usual Loudness War , and push the signal to the top of the dynamic range. Before long 24-bit audio will be equivalent to 16-bits (since the 8 least significant bits will be unused) and the next big thing will be 32-bit audio.
Having said all that, I'd guess that the speakers will be the limiting factor in most sound systems, not the recording format.
Yes. And DACs, which normally have filters too.
Here's an interesting article:
Thinking about it, if every person has a different non-linear response, in theory the only way to reproduce sound beyond a certain threshold of fidelity would be to reproduce the ultrasonic components, so each person would hear their own non-linearity. (That would be beyond what I can hear or care about, but it would be fun to play with. Beyond a certain level we also get to the point where we need to ask what it means to hear a sound.)
> the speakers will be the limiting factor
> in most sound systems
Pardon the reductio ad absurdum, but would you prefer to listen to $1,000 speakers in a dry, padded listening room, or to $100,000 speakers in a tile bathroom? Obviously the room matters; I think most people underestimate by how much.
I'd take the bathroom, given that my singing voice sounds less worse there! :-)
I'm not sure what your background in audio is, but everything he says is correct. High end frequencies well past 15k and up (22.1k actually) are widely acknowledged to influence the lower frequencies and play a huge role in the perception of the quality of a recording. This is an old debate with pros and cons on both sides, but in general you'll find the "Golden Ears" mastering engineers (Stephen Marcussen, Bob Ludwig, etc.) come down on the side of higher sampling rates.
Now, if your original recording was mastered to 16/44.1, then a transfer by way of 24/192 will probably actually hurt the recording. But if you're mastering from an original analog or high-quality digital, in my experience there's no question, higher sampling rates deliver better experiences.
I've caught engineers using L1-Ultramaximizer (or similar) to bounce a recording down to 16-bit/44.1khz as part of the mastering process, and they're always surprised when they're completely unable to hear the difference even in the most simple cover-the-screen-and-toggle-bypass test.
But I know what my ears hear, and IMO there is absolutely a vast different between 44.1 and 192. I'm not sure how you can even question it. Someone else on the thread was saying it's impossible to hear the difference between 16bit and 24bit. I don't even know what to say to that. It's like telling me the glass of Gallo "Table Red" you're drinking is as good as my '75 Lafite. All I can say is "cheers" and just enjoy.
If I gave you a bottle of "Table Red" with a '75 Lafite label, I'm sure you'd tell me how rich and wonderful it was. The problem here is that, as you said, "I know what my ears hear". You know you're listening to 192, wow it sure sounds great!
If you're listening to quiet music in a quiet room at high volumes on very low noise equipment, you can hear a difference in the noise floor level between dithered 16-bit and 24-bit, but at that volume level if that music (or movie) also has full-amplitude signals you'll be reaching peaks over 110dB SPL.
> I'm not sure how you can even question it.
192 kHz is clearly overkill for listening. Not so for further editing of the data.
Same goes for 16/24 bit, however, the difference between 16 and 24 bit is actually audible.
44100 is not a bad sampling rate, but it necessitates very sharp aliasing filters, which are audibly bad. A bit more headroom is well needed there.
That bit about intermodulation distortion is complete bogus. He talks about problems when resampling high-fs audio data. However, you would never do that. You would digitally process 192kHz all the way. Only your loudspeakers or ears would introduce a high-pass filter, and a rather bening (flat) one at that. There is certainly no aliasing going on there unless you resample (wrongly). Intermodulation distortion is not the fault of the sample rate.
I mayored in hearing technology. Calling 192/24 worse than 44.1/16 is total BS. How useful it is is a different debate.
This  (widely accepted in the scientific audio community) study's conclusions disagree with your assertion.
>44100 is not a bad sampling rate, but it necessitates very sharp aliasing filters, which are audibly bad.
This is not the 1980s, hardware has progressed beyond that point. Modern (i.e. anything from 1995 onwards) DACs do not suffer from aliasing problems. Also see 
>That bit about intermodulation distortion is complete bogus. He talks about problems when resampling high-fs audio data.
I did not notice that in the article. It talks about IMD in the context of the analog chain and the transducers following the DAC, and it's possible that high frequencies can increase it.
True, but they do so using (long, high-quality) high-cut filters. And these filters are pretty sharp, as they have to close within, say, 18-22.1 kHz. You can design them as linear-phase FIR filters with oversampling and all the good stuff, but physics dictates that sharp filters introduce distortion. A sharp filter like that is audible.
When you're talking about recording, sure, but in terms of storage and playback, we solved that problem 20 years ago with oversampling.
No, the difference is not audible at all. At 16 bits of depth on a normal low-level audio signal (~0.3 volts), we're talking about less than 0.000005 volts per amplitude step. This difference gets lost in the THD already at the DAC in your audio output stage. Then it gets lost again in the amplifier. And again in the cable to your speakers or headphones. And then it gets lost again in the speaker elements. What survives in a normal low-level audio signal is about 14 bits of resolution.
44.1khz IS a bad sampling rate for accurately reproducing anything except a triangle wave or square wave above 5khz.
I am not saying that there aren't any DACs on the planet that can't handle five millionths of a volt, but I am saying that five millionths of a volt isn't surviving through the particular DACs and the rest of the electronics used in your PC/living room hi-fi audio equipment.
If it were true that there's no audible difference between 16 and 24 bit, companies like Alesis, Otari, ProTools, etc. wouldn't have spent the last 15 years ditching 16 bit like an old pair of smelly sneakers. (better metaphors welcome).
Seriously, anyone who has sat down in a real listening environment for 5 minutes A/Bing 16 vs 20 bit, 16 vs 24, etc. hears the difference immediately. There's no question. This is why you can buy ADAT 16 bit 'blackfaces' for $100, down from their original $4,000.
It's all marketing, baby!
You've never rented an expensive tube EQ during a mix to cover up 16bit's grating harshness from 10k to 15k. Or tried like mad to make the bass drum sound like a freaking bass drum and not a pie pan slamming against the back of a plastic trash can. And yes, we had good mics and pres, all standard studio stuff. Decent, not brilliant, converters, but it was the 16bit that was the problem. Getting those 20bit XTs for the first time was like walking into the Promised Land.
Sure, there's lots of marketing ploys out there, lots of snake oil. Moving up from 16 bit was not one of them.
The original article explicitly mentions how 24bit is useful for recording.
> Professionals use 24 bit samples in recording and production  for headroom, noise floor, and convenience reasons.
> Modern work flows may involve literally thousands of effects and operations. The quantization noise and noise floor of a 16 bit sample may be undetectable during playback, but multiplying that noise by a few thousand times eventually becomes noticeable. 24 bits keeps the accumulated noise at a very low level. Once the music is ready to distribute, there's no reason to keep more than 16 bits.
The original article does say that yes, during recording and production, 24 bit audio gives you a lot more room to play with. That doesn't mean that you can hear the difference between 16 and 24 bits for the final recording; just that 24 bits give you more room to keep out of trouble during production.
>Same goes for 16/24 bit, however, the difference between 16 and 24 bit is actually audible
No, the difference is not audible at all.
Harmon requires its trained listeners to pass tests based on this software before participating in juries to evaluate Harmon products. It doesn't directly address the sample rate/bit depth issues discussed in the linked article, but it does address a lot of the issues brought up in the HN discussion, so you can have a chance to see how much those characteristics really matter.
You may be surprised.
In any test where a listener can tell two choices apart via any means apart from listening, the results will usually be what the listener expected in advance; this is called confirmation bias and it's similar to the placebo effect. It means people 'hear' differences because of subconscious cues and preferences that have nothing to do with the audio, like preferring a more expensive (or more attractive) amplifier over a cheaper option.
The human brain is designed to notice patterns and differences, even where none exist. This tendency can't just be turned off when a person is asked to make objective decisions; it's completely subconscious. Nor can a bias be defeated by mere skepticism. Controlled experimentation shows that awareness of confirmation bias actually increases rather than decreases the effect!
Doesn't that completely negate his conclusion, that there is no point to distributing 24/192 music? If people want to pay for 24/192, and even he just admitted that they will legitimately enjoy it more, how can you conclude there is no point?
Life is short. I want to enjoy things. Whether or not my enjoyment can be quantified or scientifically defended, I really don't give a shit. But that's okay, if you don't want to sell me 24/192 music, Amazon will. Between this and DRM-free content, it's no wonder I buy all my music from Amazon these days.
Audiophiles are quite a fascinating group. These are people that can be rather rational in some respects (they could be doing research in some lab somewhere) but when it comes to audio equipment they will shell $2000 for HDMI cables. The salesmen and manufacturers that make these things ("high end" HDMI cables, 192kHz recordings) know this very well and they aggregate around this target set of clients.
I think that is exactly what is happening here. At some point storage capacity is just good enough and one can distribute 48kHz, 16bit audio to everyone. But what do you do next? Everyone is getting that and it is not new and cool anymore. What to do? Well increase the frequency and sell everyone a newer, better, higher fidelity thing, even though objectively human years cannot really hear the difference. Subjectively though, there is a huge difference. If you ask someone who just spent $50 for a 192kHz record if they like it better than say a $20 48kHz one, I bet you 100% of people will confirm that 192kHz sounds better and will be ready to go and buy more.
Ultimately, sure. The world is full of products and services which only add value in this weak sense.
If the same wine tastes better if it's priced higher, then it still tastes better. But I think it's only honest that the consumer be aware that the increased utility from being priced higher is due solely to the fact of it being priced higher. Beyond that, I don't care.
One thing we can all agree on is that music is much more enjoyable if you think you're listening to it through good equipment or from a good source. Ultimately it's only the `thinking' part that matters. So I would make two points:
1. One point he's making is that playing audio sampled at 192khz through regular equipement actively distorts the music in negative ways. So now if you know this now you should enjoy that music _less_.
2. If you're adept metacognition (maybe that's not the right word), you'll realize a) you can get most of the enjoyment by buying equipment that's `pretty decent', and then not worry about it too much. b) you're probably fooling yourself by spending so much time/money worrying about having the best equipment, so you're probably not getting the maximum utility from the experience anyway. Or maybe it's the experience of trying to get the best equipment it self that's enjoyable, not necessarily the increased audio fidelity.
Sorry, no time to reply. I gotta run and write up my biz plan to distribute 32/384 audio.
So, while I have no option (for now) but to acknowledge your position, I still feel dirty for doing so.
why are you arguing against the conclusion of an article that has this many upvotes on HN?
However, one thing that's missing here (and in nearly all other similar pieces) is a full discussion of the prerequisites of the sampling theorem. For example, the signal must be bandwidth-limited (and no finite-time signal can be).
But this is a minor concern, as there are many elements in the analog domain of the recording and playback chains that serve as low-pass filters - starting with the mics. So bandwidth-limiting is effectively achieved.
For a similar reason, the discussion of the "harmful" effect of high-frequencies to playback electronics and loudspeakers to be a bit overdone IMO. Peruse the excellent lab results of modern audio gear on Stereophile's web site. You'll find that bandwidths exceeding 30kHz are rare.
One last thing. When doing subjective "testing," keep in mind that what some folks are hearing may be limitations of their gear. For example, most DACs derive their clocks for higher sampling rates (88/96/176/192) by clock-multiplier circuits. IOW, 44kHz and 48kHz are the only ones clocked directly by a crystal. These multiplier circuits are often noisy, contributing to jitter. The audible effect of this jitter is hard to predict.
PS As an avid audiophile, I find the clash of subjectivists and objectivists on this normally-buttoned-down forum to be a bit of a trip.
192 kHz is the sample rate. 192,000 slices per second. It does not refer to the audible sound spectrum.
20 kHz in speakers refers to the cycles per second of the audible waveform. Normal human hearing rage is 20 hz - 20 kHz. For most people, it's less than that.
A speaker can certainly play back music sampled 192,000 times per second. Most of them can't play tones that are higher pitched than 20 kHz, which is fine because mostly only dogs can hear up there anyway.
The fact is, simply distributing music in lossless format carries the vast majority of audible improvements. Arguing over whether or not its 24-bit or 16-bit or making a chunk of sound last 5.2 microseconds instead of 22.67 seems incredibly stupid to me, because you're better off simply improving the mix itself then fiddling over such microscopic differences. These things only become relevant if your mix and performance and recording equipment (or synths) are absurdly close to perfection. This becomes even LESS relevant in an age of indie-musicians.
Filters are also not perfect (but good oversampling filters are not the weakest link)
Further, even perfectly dithered 16 bit data can't go 20 dB below the quantization floor, unless you give up on frequency response on the high end. Again, this is plain math.
With a calibrated 105 dB low-distortion sound system, in a quiet room, I can hear imperfections from 16 bit, 44 kHz material, especially in soft flutes and triangle type percussion. Of course, D class amplifiers, and MP3 encoding, do worse things to the signal, so let's start there. But 20 bit, 96 kHz (or at least 64 kHz) are scientifically defensible, when analyzing the math and the physics involved. No snake oil needed!
1) Any well-designed system is going to have headroom. Period. Just because 48kHz can capture the frequencies the human hear theoretically, it's always good to have a little wiggle room. This comes into play even more with interactive situations: humans are particularly sensitive to jitter. Having an "overkill" sample rate lets you seamlessly sync things easier without anyone noticing.
2) 192kHz comes with an additional benefit besides higher frequencies: it also means more granular timing for the start and stop of transients. More accurate reverb would be the obvious example. I don't know if the human ear can discern the difference between 0.03ms and 0.005ms but it's something I don't see mentioned often.
2) increased sampling rate does not improve timing. This also has been researched in detail (because it sounds like it could possibly be true given that the ears can phase match to much greater granularity than the sample clock). It was found false in practice, and in retrospect, the sampling theorem explains why. The Griesinger link discusses this with illustrations, and provides a bibliography.
Slides 29-35 address this point.
48kHz already has enough 'wiggle room'. How many people do you personally know that can hear a 24kHz sine tone?
> more granular timing for the start and stop of transients.
... it's something I don't see mentioned often.
Probably because it doesn't make sense. Human ears cannot hear frequencies about 24kHz and Nyquist tells us that 48kHz is enough to completely capture all the detail of a signal at that frequency and below.
> Having an "overkill" sample rate lets you seamlessly
> sync things easier without anyone noticing.
> 192kHz ... also means more granular timing for the
> start and stop of transients.
Said another way, two band limited pulse signals with different onset times, no matter how arbitrarily close, will result in different sampled signals.
This is true, but different than what I am arguing. You're saying that a listener over time will be able to tell that the two signals differ. I am saying that a listener will be able to determine this at fractional wavelengths.
It's similar to dithering a high dynamic range signal onto a lower bit depth: more than two samples are required for "evidence" of two different signals, while sampling at a high enough rate will tell you this almost instantly.
Again, I don't know if human ears are able to detect this, just that I haven't seen it addressed in these discussions.
As a thought experiment, let's consider a pulse that has been band-limited to 20kHz. Are you arguing that the analog output of a (filtered, idealized) DAC would look different depending on whether the dac was running at 44.1kHz vs 192kHz? If so, I don't think many people would agree with you.
Any difference in the "timing" of the output wave would have to come from energy that falls above nyquist of the slower sample rate. So, while I agree with you that the timing would be sharper, this is exactly caused by "higher frequencies", not by some other sort of timing improvement.
No. I'm arguing this: take a 44.1kHz signal and upsample it to 192. It's the same signal, same bandwidth and everything. Duplicate the stream and add a 1 sample delay to one of the channels. When you hit play, that delay would be there. If you downsampled the 44.1kHz signals after applying the delay to one of the channels, you would almost hear the same thing. The difference is that you could not detect the difference between the signals until after a few samples. With the 192kHz stream it would be unambiguous after 2.
Remember, Nyquist-Shannon holds if you have an infinite number of samples. If your ears could look into the future then what you say is perfectly correct, but they need time to collect enough samples to identify any timing discrepancies.
but the question for me is how exact that guessing is.
correct me if i'm wrong but, that interpolation happens twice: when recording by the adc and on playback by the dac.
so a lot of that whole discussion (yeah, finally something about acousticts :) depends on how accurate interpolation works in adcs and dacs.
It turns out that if you reproduce a digital signal using stair steps you get an infinite number of harmonics— but _all_ of them are above the nyquist frequency. The frequencies below the nyquist are undisturbed. Then you apply a lowpass filter to the signal to remove these harmonics— after all, we said at the start that the signal was bandlimited— you get the original back unmolested.
Because analog filters are kinda sucky (and because converters with high bit depth aren't very linear), modern ADCs and DACs are oversampling— they internally resample the signal to a few MHz and apply those reconstruction filters digitally with stupidly high precision. Then they only need a very simple analog filter to cope with their much higher frequency sampling.
That's the time it takes sound to travel 8mm. Do you think you could tell if an instrument was positioned differently by 8mm?
http://en.wikipedia.org/wiki/Sound_localization cites http://web.archive.org/web/20100410235208/http://www.cs.ucc.... that suggests the brain is sensitive to timing differences between ears as low as 10 microseconds, or 0.01ms.
Humans are sensitive to jitter, but jitter isn't a major problem with modern digital electronics and reclocking strategies. This ArsT thread hashed out these issues a couple of months ago: http://arstechnica.com/civis/viewtopic.php?f=6&t=1164451...
Is this too unrealistic to expect? Has something like this been tried before?
Plenty of other artists have as well, but this is the most high profile example I can think of. I agree it would be great if it happened more often.
The beatles multi-tracks are also available (although they were only recorded 4-track so not every instrument always has it's own track), and there has been a handful of artists who have released their samples of one song for remix competitions (Daft Punk, Royksopp, Booka Shade).
1. People would use the tracks to create custom remixes which they would then distribute. What happens when a remix becomes more popular than the original track? Artists generally have to pay other artists to remix their songs (usually via royalties).
2. Creativity. When an artist creates something they want you to hear it the way it was intended. Allowing you to remix it however you like takes away a lot of the creative control from the artist.
And another important remark: some artists are flattered when someone asks them to make a remix for their song. (Imagine you're an artist and your idol asks you to make a remix of his song.)
I write music and have considered releasing separate tracks so people can freely remix it but I prefer just having mixes that are controlled by me. Allowing another artist you know to remix your track still allows you some sort of control (you know their style so have some idea of how the remix will go). Giving up that control is a big step and, I think, an unnecessary one.
Why do you need that control? Someone creating something new with your work, doesn't seem to damage your work in any way.
Closest I've found is to take the .mogg files out of the Guitar Hero games and use those to make new mixes. :-)
Of course, you can very easily get just the vocal track by subtracting the two. Sometimes the "non-vocal" track will still include backing vocals or the like in appropriate places, and just pull out the main vocal track.
Some musicians have even released every single track of their work separately; see "Desperate Religion" on http://en.wikipedia.org/wiki/Trilogy_(ATB_album) , intentionally inviting remixes.
For music where the vocal tracks aren't released separately, you can often pull them out nevertheless. The best is if you can get the audio in 5.1 -- vocals are almost always center-panned, which makes extracting them quite easy.
And when you have them, just use something in the like of Ableton Live and that will be it. I think that that's what you mean right?
It will be a great idea to have tracks released as several `layers` so that the user can choose which of them to play and which not, for example the bass/beats layer, layer with melodies, layer with the percussions, layer with the vocals of course, but that sounds like semi-studio production.
As pointed out, mastering has vastly greater effect on the audio quality (and is often pretty poor), and is the reason vinyl records often can sound better than their digital counterpart, despite being an inferior technology. The DAC used also has a massive effect on the sound once you get into decent quality equipment.
Like the author, i'd also love to see some expansion of mixed-for-surround music.
 a lot because of loudness wars, as pointed out in the post, but also just due to a lack of time/care/love(/demand?).
 http://www.hydrogenaudio.org/forums/index.php?showtopic=6175... This thread explores the bit-depth of vinyl records, beginning with a claim of a maximum 11-bit resolution-- limited by the width of a PVC molecule the record is made from.
My advice to you younger guys is to keep the windows rolled up while driving. I have no other explanation why my left ear is much worse than my right.
In my own tests I believed that I couldn't tell the difference between 16/44 and 24/96 on high quality loudspeakers, but I could with high quality headphones. The studies cited all seem to use loud speakers in testing.
Also worth noting, the article states that obtaining 24/96 source material sometimes means you get better mastered material, which still sounds better after down-sampling back to 16/44.
That said, if Apple also allows high quality recordings to be sold, it will be useful. For example of their acapellas, instrumental tracks or samples, it would be convenient for others who want to want to remix it, and iTunes would be a platform for this trade.
Also for tracks DJs play. Most compression throw away a lot of the bass which people can't hear, but this is bass you can feel rumbling through your guts on a big sound system and is part of the experience.
For the rest, they were happy with low rate AAC files on the early iPods, they are happy with the sound coming from their crappy little iPod dock, for them it won't make a difference as long as it's a chart music track from a memorable and impressionable time of their life.
Given a 5 minute song, if I have the choice to download a 11MB file (320kpbs MP3) or a 330MB file (24/192) I would of course choose the 11MB file. The sound quality is perfectly acceptable and the file size much more convenient to manage (storage, backups, etc.).
In terms of the convenience of managing the file size and sound quality I think 320kbps MP3 is the best compromise.
Here's a file size comparision of a 5 minute stereo song:
MP3 128kbps > 5 MB
MP3 320kbps > 11 MB
Uncompressed 16/44 > 50 MB
Uncompressed 24/192 > 330 MB
When talking about sound quality there is a much more relevant issue: the amplitude compression (distortion) abuse used by mastering engineers and producers that totally destroys the dynamic and life of the sound. That is a real issue. When buying a song there should be two versions to choose from:
A) "Loud", dynamically destroyed / distorted version.
B) Normal, dynamic, non-distorted version.
Today only version A is available to buy.
For a producer and manufacturer the rational approach would be to cater to that craziness and extract as much money from it as possible. In other words if you are selling HDMI cables, spend $2/cable to make it, then sell most for $5 and then re-brand some and sell for $500. If only takes 1 out of 100 people to buying that to make the same profit. You know these people are obsessed and irrational so you cater to that. And that's basically how we end up with ridiculously overpriced Monster cables and recordings distributed to customers @ 192kHz.
Footnote, you don't have to have a >$10,000 setup to benefit from higher quality tracks (compared to the downloads that sometimes have 'questionable' quality). I have two systems, a full range stereo (front left and right) setup for nearfield listening at my desk thats +/- 1DB from 50hz-20khz. The other is a stereo setup in my media room; 2 way quarter wave transmission line, +/-3DB 40hz-20khz. The point is, there are a lot of people with less than $1200 in audio gear that still want lossless tracks made available. Who cares if the human ear can't discern much of the extra information, we still want it.
Then we starting recording other people. I became obsessed with gear, software and all the associated toys that go with any technical pursuit. I'm a programmer, so it's easy to understand how that happens but I totally lost sight of the music, spent way too much money and equipment that was nowhere near being required and generally lost the plot. I was tracking everything 24-bit/96kHz and bemoaning the loss of quality when I mixed down for CD.
Anyway, the TL;DR version of what followed was that we recorded quite a bit, lost interest in making our own music and then the whole adventure came to an end. Now my gear is leaving via eBay and I'm finding my way back to just playing guitar and trying to write good music.
24-bit/192kHz - pointless. Give me a small venue and a guy with an acoustic guitar any day.
> The FLAC file is also smaller than the WAV, and so a random corruption would be less likely because there's less data that could be affected.
At the same time, if you flip a bit on a WAV file, you may hear a "pop" sound. On a FLAC file, the whole encoding block may be inaudible (or worse).
The basilar membrane is a loosely tuned resonator. The hair cells placed on it fire beginning on the positive zero crossing. So, to a first approximation, the ear is in fact a filterbank.
There is a time domain component in that the cochlear nucleus contains nerve cells that watch multiple hair cells at a time and correlate the firing in several different ways. Some attempt to discriminate pitch, some convolve and correlate in-phase firing energy, some look for tones to end, etc. This information is then forwarded on to the brain.
However, getting back to your point, no hair cells will fire if the basilar membrane doesn't move, and it's tuned to a frequency range.
Further, I can hear a difference between 44.1kHz and 96kHz. Whether you can hear that difference is up to you. (The word-length is a red herring - there's no new information contained in a 24-bit recording vs 16.)
IMO anything less than flac and you're missing something. Higher sampling frequencies do add to the sound, but in a way that is almost invisible to the untrained ear. Perhaps these should be distributed at a premium the way SACDs and similar "audiophile" formats were in the past?
The key to reproducing the original signal from the digital signal is a low-pass filter that rejects everything above the sampling rate, correct?
That is to say, what I am getting at is while the original signal can be reproduced, it requires properly tuned, and probably reasonably high performance, hardware to remove the higher frequency components of that square wave. Can you count on consumer grade hardware to do this well?
Typically the technique used inside DAC is to digitally upsample the signal (by duplicating samples, often to a few MHz— also allowing them to use a low bit-depth DAC) then it applies a very sharp "perfect" digital filter to cut it right to the proper passband (half the sampling rate). The analog output then contains only a tiny amount of ultrasonic aliasing which is so far out that it's easily rolled off by simple induction in the output.
This isn't just theory. Here is a wav file I made at a 1kHz sampling rate, where every other sample is -.25/.25: http://people.xiph.org/~greg/1khz-sampled.wav (so a 500Hz tone, the highest you can represent with 1kHz sampling).
Feeding that file to a boring resampler (I used SSRC, but anything should give roughly the same result— a least when not quite so ridiculously close to nyquist, most will attenuate near-nyquist data extensively) and get this: http://people.xiph.org/~greg/1khz-sampled-to-48khz.wav
Here are the two signals plotted against each other:
As you can see— the 500Hz sinewave is reconstructed perfectly. (Of course, a 500Hz square wave would not be (you'd get a sinewave out) but this is because a 500Hz square wave contains energy far beyond the nyquist of 1kHz sampling).
Here is a spectrograph of the same signal http://people.xiph.org/~greg/1khz-to-48khz-spec.png showing that the tone is indeed pure (the faint background noise is the dither the resampler applies when requantizing its high precision intermediate format back to 16 bits).
If this is the case, then all of the arguments in the world about the maximum audible single frequency are irrelevant. Imagine music composed entirely of these beat frequencies and performed with a pair of oscillators between 25kHz and 35kHz. Without higher resolution encoding, it would be audible IRL but the recording would be silence.
So you'd be right if your mics were head spaced and in the venue. But you'd still have secondary data, with the original lost.
By that standard, the original is always lost unless you have a completely holographic recording. 192kHz doesn't help with that problem at all.
The tone you get from an acoustic beat is not a real tone— it's a perceptual quark that requires you to be able to hear the tones in the first place.
EDIT: it turns out Audacity won't generate a tone above 20kHz (the UI accepts the value, but when you reopen it the value has been rounded down), so both of my generated tones were actually 20kHz.
EDIT: You can generate higher than 20kHz by increasing the pitch of a tone lower than 20kHz. Upon doing this, I could hear 24kHz and 26kHz together.