Hacker News new | comments | ask | show | jobs | submit login
24/192 music downloads make no sense (xiph.org)
325 points by LERobot on Nov 6, 2015 | hide | past | web | favorite | 228 comments



The single worst thing that harms audio quality is excessive compression. It's ruining everything, I'd say. Heck, I'm sure everyone who knows their worth would agree that compression harms audio quality.

Recommended reading: https://en.wikipedia.org/wiki/Loudness_war

edit: compression as in dynamic range compression, not data compression like mp3 in audio


I've played acoustic and electric instruments for over 30 years, and recorded numerous albums.

Compression is the best tool we have for accurately reproducing the musicality and emotion of a musical performance. Without compression, most recordings would be unlistenable.

Don't confuse the foolishness of the loudness wars for "compression is bad". That's like saying the internet is bad because there's porn on it.


I respect your opinion, but it's only your opinion, not a global truth.

Compression is a style. There is far more to musicality and emotion than compression. The problem compression solves is that the environments where industrialized cultures now listen in are not dedicated listening areas, but alternately loud and quiet places, so compression makes all parts of the music almost equally loud so there are no drop outs where the quieter parts would be. There is no need to compress music in headphones, for example, to the extent that it is currently compressed.

I find compression and other techniques such as removing vocal breath sounds, makes most recordings unlistenable. They don't sound like humans anymore, but synthetic puppets animated by humans with conflicting values. Take the Foo Fighters, for example. They're popular, sure, but all of their songs sounds like one continuous din. Between the compression induced by the guitar distortion settings and the compression added to the recording, then the compression added by the radio station, it just sounds like a waterfall with a few bandpass filters changing between the verse and chorus.

Also their vocals have no dynamics. When he yells loud, the vocals don't get louder but the timbre changes. That changes it from cathartic to strained. The dynamics have all been flattened.

Why do you think the indie rock movement and bands and styles with wide dynamic range like the Pixies, Nirvana and dubstep got so popular? They eschewed the trend of hardline compression with alternating loud and quiet parts. They match the rhythm of human thought and motion which has fast and slow, detail and empty parts.

> That's like saying the internet is bad because there's porn on it.

Yes but on the internet you can go where there is no porn. Where can you find music with no compression?


> bands and styles with wide dynamic range like the Pixies, Nirvana and dubstep

One of these things is not like the others.

I can't tell if you're trolling here, or if you legitimately don't know much about audio engineering, because Dubstep (and electronic music in general) is probably the most aggressively compressed and limited genre of music out there.

If you actually do feel that Dubstep has a lot of dynamics, then you're misattributing the lack of dynamic range in a lot of modern music to audio engineering. What you're really bothered by is the songwriting and musicianship, not the engineering or compression.


heh no, sorry, I didn't meean to sound like i'm trolling -

it is extra-ironic, b/c yes dubstep uses compression a lot, but at the same time, a lot of the dubstep I've heard uses silence judiciously to create huge amounts of contrast - in between the speaker-shreddingly compressed passages.

Both house music and dnb IME tend to employ quiet as well to make the loud seem louder. It's mostly just rock and some americana these days on the radio that gets to me. Actually country is one of the most egregious genres, too. Sounds totally flat like AM radio.


> it just sounds like a waterfall with a few bandpass filters changing between the verse and chorus.

That's a good way to put it. I hate over-compressed music. It sounds terrible and is no fun to listen to once you notice the issue.

But, are you an audio engineer? I ask because I work in the field, and all engineers I know accept that dynamic range compression is a necessary part of creating recorded music. The problem is when you abuse compression to create a wall of noise with no dynamic range.

It's like autotune: you can use it to correct one bad note, or you can abuse it create T-Pain.

The engineers behind The Pixies and Nirvana [1], and all dubstep producers use or have used compression at various stages of the creative process.

[1] I guarantee you that whoever mixed Smells Like Teen Spirit spent a lot of time doing the compression on that song.


Obviously there is a difference between "excessive compression" (the term the GP used...) and "no compression at all" (what you seem to be arguing against).


> compression makes all parts of the music almost equally loud

What you are referring to is only one particular use-case for which a "compressor" is used during music production: Most often people apply dedicated plugins to the "master" mix (the final result to be put onto CD or sold as a file). These plugins will apply what's called a "multiband compressor" which can reduce dynamics individually on different parts of the frequency range, and often do quite a lot more magic than what I claim to understand. -- Done excessively the result is the mentioned "always one volume" result of the loudness wars.

The compressor as a tool isn't limited to this use-case, though. You will, for example, put it on individual instruments' track to shape the relative strength of the percussive and decaying content of an instrument, a drum or a plucked guitar for example: Typical compressor plugins have a attack-time which shape how fast the reduction of gain follows an input signal. If you have this slower than the duration of the percussive sound, the instrument will sound more "agressive", instead of being leveled down, as the "attack" sound is increased, relative to the resonating portion. -- The end result sounds just like the exact opposite of the excessive master compression people complain about.

Then you will invariably have unintended peaks in a real-world recordings, and if you don't want to level down the whole recording to account for those peaks, or manually level them down for every instance, you'll also employ a compressor plugin on problematic tracks.

Then there's the possibility to filter the part of the spectrum that will trigger the compressor (it's "sidechain") or to let a compressor be triggered by one instrument (or group of instruments) and act on the signal of another instrument (or vocals, or group of tracks): That way you increase the percieved separation of different voices in your mix, again increasing the perceived dynamics of a song.

So, if you really want to find music with no compression at all, you'll likely only find classical recordings made only with one single X/Y microphone pair... :-)


Compression is a tool, not a style. It has existed almost as long as recording itself. It solves fundamental problems.

Problem #1 - Audio speakers are not the same as instruments! If you've ever had the pleasure of getting your hair blown back by a cranked guitar amp or a drum kit... well, you just can't reproduce that experience through a set of earbuds. You can't exactly reproduce the sound of an acoustic guitar in your lap, or a violin in a beautiful hall. Speakers just aren't up to it. A recording is a miniature of a sound. Like all miniatures, it exaggerates some details and obscures others.

Second, most modern music consists of many instruments of different volumes and dynamic ranges played together. Mixing the sounds so everything sounds clear and balanced is extremely difficult. Some of the most important instruments are the worst about it - acoustic guitars and drum kits in particular walk all over everything else, with wide dynamic and frequency ranges that compete with steadier sounds. And certain harmonically-dense sounds like female vocals and violins can easily distort the audio amps used in reproduction. Compression, and its close cousin equalization, help engineers "carve" spaces for each instrument to live in.

You have probably never heard a recording without extensive compression, unless you've mixed records. And if you've been actively involved in recording, you'll already know all this.


I have to admit that you might have some kind of point regarding FF, but they are one of the more tolerable mainstream bands. funny that you use two bands that Grohl was/is a part of as a counterpoint. It would almost be like me saying Starship is everything wrong with 80s hair bands and then saying that Jefferson Airplane is what music should be like. In reality, they are the same band and are a metaphor for the demise of the baby boomer 'locust' generation.


> Where can you find music with no compression?

the pub?


"Compression is the best tool we have for accurately reproducing the musicality and emotion of a musical performance. Without compression, most recordings would be unlistenable."

That is your artistic choice, as it should be.

"The MP3 only has 5 percent of the data present in the original recording. … The convenience of the digital age has forced people to choose between quality and convenience, but they shouldn’t have to make that choice." -- Neil Young. [0]

Not every artist wants this to happen. They have no choice and listeners get a fraction of the sound recorded. This was not the case with vinyl.

@mborch, the exact compression method is of less importance than recognising that for all the compression being discussed, is a retrograde step from vinyl. Why?

[0] http://allthingsd.com/20120131/neil-young-and-the-sound-of-m...



thanks @Leo, point noted. So the DRC effects the audio amplitude? A lot of the sounds I listen too are recorded in analog tape based studios, the DRC are used in digital studios and post processing?


No, compressors have also been used in analog studios for a long time... See, for example, this device... http://www.audioblendmastering.com/fairchild-670/

Often digital plugins try to recreate the characteristics of such old tube monsters, because their sound is considered superior by some.

On the other hand, mastering plugins (including compressors and much more) nowadays are algorithmically that complicated that an analog version is not feasible.

EDIT: sorry, link is to an old limiter, not strictly a compressor, but one gets the idea


different compressions being talked about here.


thx @mborch


Dynamic range compression is an important part of the aesthetic of pop music today, like it or not. Pop hits don't sound the same without it. For this reason (unaffected by normalization on itunes/youtube) it will be slow to go away.


Did some brief Googling to see if I could participate in a blind listening test and hear the difference between 24- and 16-bit recordings. I found this instead [0], an even more interesting gap to see if you can hear the difference between 8- and 16-bit!

The source song, PSY's Gagnam Style, is the epitome of modern pop. I got a 3/10 on the listening test on a decent pair of Sennheiser headphones in a quiet room.

Some people are commenting that modern pop pairs well with 16-bit because of the heavy-handed mastering techniques and that older music thrives under 24-bits. Well, Audio Check offers the same 16 vs 8 test, using a Neil Young track from 1989... I couldn't fool myself into hearing any differences between the source WAVs at all and didn't even attempt to score the 10 soundbites.

[0] http://www.audiocheck.net/blindtests_16vs8bit.php


4/10. "Only slightly worse... than a random guess!"

I already knew that 24-bit vs 16-bit would be indistinguishable for pop music but I would not have expected that the same would go for 16-bit vs 8-bit. Or, well, I was kind of expecting it since otherwise the person who put this test together likely wouldn't have bothered.

However, I don't mind. I've never claimed to be "audiophile" -- a fact which is reflected by the inexpensive headphones I use :)


With these online tests, can anyone shed some light on how browser/plugin codecs come into play? I recently took Tidal's test to see if I could distinguish their lossless Flac option from 320 kbps AAC. Using a decent DAC and some Sennheiser HD202's I managed to score 0/10, which is somehow impressive.


I believe in various tests no-one has been able to reliably discern lossless audio from 320kbps mp3, even though mp3 performs comparatively poorly at lower bitrates, compared to more modern lossy codecs.


This is amazing! Thanks for linking it. 8 bits... sheesh.

I'm impressed.


This is true. HDTracks.com offers one of the Green Day records with no mastering compression.

Yes, it has much more dynamic range, but it sounds wrong. Compression basically emulates what our ears naturally do when hearing very loud material, so compression gives one the feeling that the music is LOUD.


If you listen to pop recordings between 1950-1980, you'll notice there's more depth and character to every instrument.

Yes, dynamic range compression is currently used/abused extensively in pop music productions, but if mixes weren't compressed they would have a much wider range for the sounds to play around in.

  Not only is Justin Bieber's My World 2.0 louder than 
  Metallica's The Black Album, it's louder than The Sex 
  Pistols' Never Mind The Bollocks. 
http://www.sonicstate.com/news/2011/02/21/why-is-justin-bieb...


that is tragically and hilarious ironic.

with older songs, you can turn them up louder, then the dynamic parts really punch you.

with newer songs, it's just a steady fatigue.


Yep.

Compression is an important part of both experiences. I like to think of it this way:

The soundscape has two axis, one is the overall perception of loudness, the other is the range of frequencies occupied by a given element, or track if you will, in a piece of music. In a stereo, or multi-track recording such as surround sound, there are more axis involving perception of placement, or "imaging" But the basics are all that are needed to consider this compression matter.

When you take one of those tracks, say the drum, or the bass guitar, in isolation a raw playback at volume on a reference system is going to largely reproduce what went in. So far so good.

Now, start adding the various bits and what happens?

That soundscape gets crowded! Little quiet bits you could appreciate are suddenly lost amidst all that is going on, so what to do?

Compress!

And that's a very good thing. First, few people actually have systems that can reproduce high dynamic range material in a quality way, and they don't have a listening environment that would make any sense if they did. So there is that. But more importantly, the details, subtle bits really are quiet! They will get lost, and so we compress the track overall to make sure those are present in the final mix, and pleasing to the ear too. Yes, there is a style and art to this, no doubt. But the compression really is necessary too.

A secondary reason for compression, and light processing by things like auto-tune, is to polish up a performance. Vocalist might be a little soft on part of an expression, or someone playing guitar might not deliver the same solid stroke every time. These things stand right out on a raw recording. If they get mixed in, they will get lost, or feel wrong, weak, etc... Compression can level this out and result in a solid, consistent sound. Again, the minds ear tends to want to hear this.

Ever listen to something a bit distorted and then play it back in your mind? Notice how your mind tosses out a lot of stuff, leaving you with that which you really craved? It's actually pretty difficult to recall something with high aural clarity for most people. Good production involves training the mind and the ear to actually pick this out so it can be managed into something people will really crave over and over.

One might also process things a bit too. This may be done to emphasize some characteristics of the sound with respect to the overall context of the mix. For a vocal, maybe it makes sense to punch the formant frequencies a bit, or dampen them to maximize the color and character in the vocalists vocalizations, for example. Our brain is an awesome audio machine, and it does a lot for us that recordings do not do. So we must bring those out and make them available to listeners of those recordings.

Now, comes more of the basics in the art!

What the minds ear hears is what we want to put on the record. Take that bass, compress it so that it occupies a smaller range in the loudness department (reduce it's overall dynamic range), and set it's LEVEL to one that's appropriate for it's overall contribution to the mix, which itself represents the overall perception of the music. Think two components here. The individual component of the mix, or track, has an overall dynamic range, but it also has a level at which it's present in the mix too. What makes sense here varies a lot and is highly subjective. Good production involves listening to the music and picking out what defines it as good, that which is resonant in terms of style, color, etc... A strong bass in one tune might make great sense, yet on another it might be pushed to the background more, etc... Depends.

A good producer will spread these things out in the soundscape so that the listener can hear them! Bringing up the little details, while bringing down the punch is needed to make room for it all to get in there and have an impact without fatigue or overloading the medium itself. A CD has it's limits, cassette other limits, radio still others, etc... An appropriate balance must be struck here, and that's not always optimal either.

This is what you are hearing when you listen to those older tunes, well produced. And it's damn good stuff too. You aren't wrong about it at all. Just blaming a good tool, when you should be blaming who ever is wielding it poorly.

(this is why remastered recordings exist Here's something fun. Go and get the DVD or Blu Ray of something you really like and compare it to what you might hear on the radio, or off a CD, or download. Often those are pushed out to the edges with the assumption the consumer has much better gear and or will pay for better production. Sometimes, shitty production on a CD can be avoided this way. Interestingly, some video game music gets remastered and I've heard more than a few wall of sound tracks sound great off my PS3...)

When this is done properly, the dynamic range is filled with a lot of things, each occupying their frequency range, sometimes overlapping, sometimes not, and each having an overall level that makes sense and that is aesthetically pleasing to the listener. This is why you can hear a great vocal right on top of that awesome guitar lick and drum set, despite the fact that they may be sharing a significant set of common frequency components.

Someone applied appropriate compression and some processing to make room for everything when it's mixed down into the final track. When they get it right, you don't even notice. It's just fucking good sound you crave. When they get it wrong, it's the tiring wall, or you find yourself straining to hear interesting bits that should just flow.

And I love this done well. To me, it's the most important thing I can say differentiates producers I love. Good production qualities in these areas are what makes a recording "pop" and you feel "there" No joke. This stuff, right here, is what makes a recording "immersive" for you. The sound goes right in, the audio engineers have made sure that's gonna happen, and it tickles your mind, taking you away from the fact that it's a recording.

Now, that brick wall... Take all that nice work, and listen to it on a great system, appropriate volume level, and the music will just stand out there, crisp, clear, every important part audible and enjoyable right?

Right.

A final processing layer can further crush this into a wall of sound, removing more and more of the dynamics to a point where it's all one intense thing, yet people can largely still differentiate the details! This can also be done, and likely and frequently is, in the mixing stage. But I'm mentioning it because commercial radio employs this processing extensively, due the limitations inherent in broadcast. FM, for example, has bandwidth and dynamic range / signal to noise limits that require this kind of processing to combat road and car noise.

newer digital radios are controversial, and a topic for another day, but they do not have those same limits, and can deliver the perception of a much better overall experience that many listeners will say is comparable to a CD. (it's not though, again another day)

It's the abuse that you don't like. Neither do I. Often the clowns even let it clip a little for "grit" or some other BS. That really sucks, because we don't even get the music, crushed as it is! But, let's set that ugly crime aside, and just stay with the brick wall for a moment longer.

It's all still there, LOUD, and that is tiring to you because it does not "breathe", "punch", etc... Graphically, it's a lot like cranking the gamma up on images. More subtle details stand out in more conditions, but the overall depth and feel are lost... and it's tiring to listen to, even at low volume. Our minds expectation of what things sound like can clash with this kind of excessive production, even though the first impression can be good. Producing for that initial, "wow" by maxing out the medium is, IMHO, always an ugly mistake, but the brutal truth is sales metrics select for LOUD over GREAT.

All that said, compression is good. We need it. Vocals, in particular, can very seriously benefit from appropriate compression and processing to really bring out the harmonics inherent in a great set of pipes owned by someone who knows who they are and how to express that well. But it all adds up, and it all needs that bit of compression love to make sure the good stuff isn't lost on the way into your head.

Maybe this helps a little to get at where some of the pain and fatigue really is.


I recall listening to a vinyl rip of Lady Gaga's first album, and found that "Pokerface" was much more enjoyable with the additional dynamic range. So I'm not sure if overcompression is necessary for newer songs, but rather that it's just what the mass market has become accustomed to because of the loudness war.


It's a common misconception that vinyl necessarily imparts all possible audio benefits to whatever is pressed into it.

Now, I'm not Gaga's recording engineer, and for all I know you're absolutely right and her & her team remastered each track without compression for the vinyl pressing. Considering how rare such a process is in industry recording, however, I doubt it. I assume that really the difference you're hearing is typical of all vinyl recordings: a type of distortion that comes from both pressing and playback through an RIAA preamplifier.

People use all sorts of terms to describe the difference between vinyl & digital sound, but mostly descriptors center around words like "warmer," or "more open" or "brighter." These are really just beneficial side-effects of the standards developed to overcome vinyl's limitations as a recording medium.

So, in reality quality and compression come in to play long before the pressing actually occurs, on the mastered track itself. I've come to realize that a poorly handled transfer to vinyl will often sound not nearly as clear or detailed as a high-res or even "Mastered for iTunes" track.


Mastering for vinyl is very common in the industry. I don't have time to dig up a reference right now, but the medium has different constraints than digital so is usually mastered separately, especially for big budget stuff. The differing constraints are mainly that while CD has a hard amplitude limit (hence brickwall limiting), the constraint on vinyl is not hard in the same way -- it mostly has to do with amplitude of low frequencies and skipping grooves.


I agree that mastering for vinyl happens, but you're not right about the dynamic range. CDs were popular initially because they have a wider dynamic range, the overly compressed / "loudness wars" tracks are nowhere near the limits of the medium. It is true that vinyl these days often has wider dynamic range - because the market for how songs sound is different for people buying vinyl, so they're mastered with wider range.


The comment was not about dynamic range, but the nature of the upper amplitude limit (hard vs soft/frequency dependent). That says nothing about the dynamic range, which is still a creative decision during mastering. I was trying to point out that vinyl mastering is common (contrary to parent post) and that one reason is the technical differences between mediums. I agree, though, that market pressures are a strong driver of the differing creative choices during mastering, but the decision to master separately for vinyl is often indepenent of that.


My understanding is that with vinyl, if the signal has too high of an RMS at low frequencies the needle can literally skip off the groove, and for this reason vinyl can't necessarily be as compressed and loud..


    • The audio is subjected to low-pass or all-pass 
  filtering, which can result in broad peaks becoming 
  slanted ramps.
    • The amount and stereo separation of deep bass content 
  is reduced for vinyl, to keep the stylus from being 
  thrown out of the groove.

http://wiki.hydrogenaud.io/index.php?title=Myths_%28Vinyl%29...

http://www.laweekly.com/music/why-cds-may-actually-sound-bet...


Not necessarily, but full-power vinyl grooves would cause all sorts of distortion and misbehavior for the very mechanical nature of cartridges and tonearms. Likewise, high frequencies would be so small they couldn't really be picked up.

Hence the RIAA curve, which is an eq curve applied to the mastering before put to vinyl that reduces the lows and exaggerates the highs. It is reversed by the phono preamp during playback.


This is true. Mastering for vinyl is basically a completely different set of skills than mastering for other media.


> I recall listening to a vinyl rip of Lady Gaga's first album, and found that "Pokerface" was much more enjoyable with the additional dynamic range.

I was skeptical of this, so I sought out a copy on Youtube[0]. "Better" is subjective, and this is definitely lower quality overall (given the Youtube compression). But it is undeniably a very different song. I've listened to the CD version of this song countless times, and I don't think I ever remember hearing the male voice in the intro so prominently.

For someone hearing the song for the first time (or even the first few), it may not be as obvious, but as someone who's used to hearing that song on a regular basis, it definitely jumps out as having a crisper sound.

[0] https://www.youtube.com/watch?v=98AgdAkjOjo


To me it just sounds like the original song put through a high pass filter. All the subwoofer bass that makes it a dance song is just gone. That's not lack of loudness war style compression, it's mostly sounds like bad equipment.

There's a major 60hz ground loop in there, too.


Reading "compression" as both a programmer and audio-minded person made that first sentence difficult to parse at first. :)

But yeah, it's obvious when I have my car stereo nearly on max to listen to classical or jazz, and then if I turn the radio and get a pop music station my ears are about to explode.


Don't worry, the loudness war is basically over. itunes and youtube normalize everything to -15dbRMS and -13dbRMS respectively. Spotify uses the ReplayGain algorithm. Broadcasters worldwide now use the EBU R128 standard, which is a more effective form of loudness normalization, hopefully to be adopted by online services too at some point. All that's required is for personal media players to widely implement loudness normalization (by default) for preexisting records in peoples collections.

http://productionadvice.co.uk/lufs-dbfs-rms/


The normalization is song-by-song to my knowledge. It doesn't normalize the audio within the song itself. (Over)compression is a key part of the aesthetic of a lot of pop hits today, and while in the face of normalization you won't be able to make your song overall louder than the next, it won't necessarily stop you from creating a "wall of sound" waveform if that's the sound you're looking for.


Absolutely, and nor should it. But what it should mean is that if you make your song mostly quiet, your peaks will be louder than a wall of sound.


Wait, what does that mean for somehow who records music as a hobby? If my recording is all the time at around -10dbRMS, it will sound as loud to the listener as if it was on -15dbRMS(itunes)?

Also, how exactly does it sound when most of my recording is at -10dbRMS and a couple peaks are at -5dbRMS?


> itunes and youtube normalize everything to -15dbRMS and -13dbRMS respectively.

Are you using "normalize" in the audio production sense, as in adding or subtracting a constant gain to an entire track? I'm pretty sure that iTunes doesn't do this, at least not by default.


Yeah, I should have read the wikipedia article in full before commenting. I'm glad the war is over!


It's not just the acoustic properties of dynamic-compressed music. The compression forces music to be more simplistic, because there's less "room" for complexity. Michael Jackson's "Thriller" is a standard pop hit, engineered for chart performance and not artistic virtuosity, but it is still a rich, deep, satisfying piece of music to listen to in part because there's a whole lot of stuff going on. If you tried to cram that many instrument parts into a modern Katy Perry tune, it would clip like crazy and become unlistenable hash.


No, compression doesn't force the music to be more simplistic. It enables simplistic music to sound fuller.

Anyone that has mixed 120+ tracks for a single "modern rock" song with a compressor at the end would tell you so.


I mostly play acoustic instruments now, but I discovered what you were talking about when I used to make electronic music. I'd naively use some heavy compression for loudness, but as I added more parts things would get quite muddled. I'm no great audio engineer so maybe something else was the cause, but I'm pretty sure this was it.


I have mixed feelings about it. But I fully agree all music should be uncompressed by default.

The good thing about compression is that it allows you to save your hearing quite a bit. Some music have dramatic parts that get super loud, which can have awesome emotional response; but it does take a toll on your hearing, unless you are in a very silent environment or have fantastic headphone insulation (I have none) -- so compression actually allows me to hear everything the music has to offer. I also use compression tool on my soundcard to play most games, specially FPSes that have incredibly loud bangs and yet you need to hear footsteps and quiet environmental noises -- with a compressor that's possible without blowing up your ears.


I'm truly curious how having good isolation would help your hearing? Is the room noise a large additive?


Yes, specially for the quieter parts (where the Signal-to-noise ratio is lower) I noticed considerable difference, at least for my urban environment. You can get away with lower sound pressure levels [volume].


Oh. I see. The problem is that they cause you to turn your volume up, lol. That's my favorite part of Pink Floyd's The Wall, most of the album is really quiet which lets it get REALLY LOUD on a couple occasions.


The irony is that extreme dynamic compression only became popular after all of the mediums we listen to (CDs, MP3, digital radio, etc) went digital and we got >90 db dynamic range and signal-to-noise ratio. Tapes and records are typically less than 60 db. So our digital music is capable of being as dynamically expressive as our ears but we clamp/ruin the dynamic range down to <20db.


There's a fundamental difference between analog and digital, though. On digital you have to stay well away from the limits or you will get extreme artifacts. On analog, you can stomp on the limits, and reconstruct the original signal with filters (analog or digital) after the fact, because tape doesn't simply stop responding, it just responds less as the inputs get more extreme.


You are talking about (dynamic) compression applied to an entire song with the sole purpose of making it as loud as possible to the human ear but still complying to the max allowed peaks (or to the energy of the signal).

This does degrade the quality, sometimes heavily. Nevertheless it is done by mastering engineers (they rarely enjoy it) as well as by radio and tv stations extensively because of the psycho-acoustic fact that a songs appears to be better if it is played louder. This gives them an advantage over the competition: On average, people searching for a radio station are more likely to listen to your radio station if it is louder than the competition.

The main issue lies in the fact that the current peak measurement of audio signals does only marginally correlate with the perceived loudness and heavy compression is used to trick this system. The broadcasting industry is aware of this. An open and quite effective loudness measurement algorithm [0] has been introduced a few years ago and it gets slowly adapted all over the world by new broadcasting laws: AGCOM 219/09/CSP (Italy), ARIB TR-B32 (Japan), ATSC A/85 PRSS CALM Act (US), EBU R128 (Europe) and OP-59 (Australia). iTunes Soundcheck is also based on [0] and since this year Youtube applies this to newly uploaded videos as well [1]. Even games use [0] to keep their audio at a consistent loudness.

So slowly, the over-usage of compression does not give music producers and broadcasters any advantage anymore and beautiful dynamic music will be competitive again.

I have collected some links [2] about this topic. Because of the lack of any affordable implementation at the time I created one myself [3] with some additional notes [4].

[0] ITU-R BS.1770, http://www.itu.int/dms_pubrec/itu-r/rec/bs/R-REC-BS.1770-4-2... [1] http://productionadvice.co.uk/youtube-loudness/ [2] https://www.klangfreund.com/lufsmeter/manual/#about_loudness [3] https://github.com/klangfreund/LUFSMeter [4] https://github.com/klangfreund/LUFSMeter/tree/master/docs/de...


This actually helps me articulate what I don't like about Spotify and why I don't use it, they only seem to have modern remasterings off all of the albums from the 70s, 80s and 90s that I'd want to listen to, but they lose a lot of the feel of the originals that I'm used to.


This is a good point. Further recommended listening, "Neil Young on Why High-Resolution Music Matters" ~ https://www.youtube.com/watch?v=5oTtylYR76o (55min, 2015JAN17)


This. The reality is that we're arguing over whether it's best to use 16 bits or 24 bits to distribute 8-12 bits worth of information. The problem that really needs to be addressed is in the studio, not in the iTunes store. Compression-happy producers are doing more than their share of the work required to make music suck. </pet_peeve>

Edit: Some useful educational material to read before moderating: https://en.wikipedia.org/wiki/Loudness_war .


Excessive data compression will also lower dynamic range, though, no?


Not really, no.

Maybe if you really mangled your audio by encoding at extremely low bit rates.

But in general, no.


Actually, yes.

mp3, for example, loves to save space by cutting away sounds just above the noise floor, and some less-noticeable frequencies.


Go find a CD that has been mastered to be just below the clipping point. (e.g. Fallujah - The Flesh Prevails)

Convert it to MP3 directly, use whatever settings you want

Watch as the mp3 rip is now clipping due to further compression of the dynamic range.


That's not because of dynamic range compression; it's because MP3 discards some phase and frequency information, e.g. so harmonics that wouldn't have clipped get phase shifted on top of a peak and clip.


I think you are confusing limiting with compression.


I can't hear a difference between 96khz/44khz in it's raw form. However, I can tell the difference from effects in audio mixing. The extra detail can really make a difference in how well an audio effect VST works.

I have a 96khz/24bit interface that I use and ATH-M30X headphones, and I can tell a difference between at least some 24bit FLAC files and 16bit highest-quality-possible MP3s. I was mixing my own music and the difference was quite obvious to me. The notable thing was that drum cymbals seemed to have a bit less sizzle and such.

Now that being said, if I hadn't heard the song a million times in it's lossless form from trying to mix it, I probably wouldn't have noticed, and even then it didn't actually affect my "experience".

I'm one of those guys that downloads vinyl rips as well, but I do that mostly just to experience the alternative mastering, not that I think it's higher quality or anything. (though I have heard a terrible loudness-war CD master that sounded great on vinyl with a different master)


The article is pretty clear about this too - higher bitdepths and sampling rates can be quite useful in mixing and recording situations.

They're pointless for playback.


> higher bitdepths and sampling rates can be quite useful in mixing

That is really the central issue. It's much like imaging since the time of Ansel Adams: the sensor can capture more dynamic range than the human eye can experience. The producer may have use for that range when editing, but the audience will never know what was -- may have been -- missed. And we're not talking about limits of reproduction. We're talking about the human sensors both instantaneous and absolute upper and lower bounds.


> That is really the central issue. It's much like imaging since the time of Ansel Adams: the sensor can capture more dynamic range than the human eye can experience.

That's not really true. Dynamic range refers to the difference between the biggest value that isn't clipped and the smallest value that isn't rounded to zero. The human eye is a logarithmic detector, cameras are linear. The only reason HDR is a thing is because cameras DON'T have enough dynamic range.

http://photo.stackexchange.com/questions/21579/how-does-the-...


The instantaneous dynamic range of the human eye is estimated to be less than eight stops. Both film and modern camera sensors can capture much more than that. The reason we're able to adequately perceive scenes with 12+ stops of dynamic range is that our visual system does continuous integration and reconstructs an HDR "mosaic" in real time. HDR photography is required in situations where even the 12-14 stops that a modern sensor is capable of is not enough.

However, that's neither here nor there because the human eye is not the real bottleneck here. The media we use to display photos are. Printed photographs have approximately six stops of DR; typical monitors have eight. Modern cameras capture much more information than can be displayed, and the raw sensor data must be tone-mapped either by the camera software or in post-processing to produce a viewable image. There is a lot of latitude in deciding how to map 2^14 discrete values of input to a mere 2^8 values of output.


Nice photo prints can typically show something like 100:1 to 200:1 contrast (~6.5–7.5 stops) between white and black, or at the top end, some dye transfer prints under carefully controlled lighting can get up to about 500:1 (9 stops).

Nice computer displays (and mobile device displays) without glare and with the brightness cranked all the way up can get up to something like 9.5–10.5 stops.

Of course, that range still pales in comparison to the contrast between shadows and highlights on a sunny day, which can be more like 16+ stops.


You're thinking about absolute range of a human system. Yes, I can look into shadows and my iris will imperceptibly dilate. Similarly, the stapedius muscle can dampen input to the oval window. So you can appreciate a wide dynamic range. But instantaneously, your retina is only good for about 7-8 EV steps. Modern imaging sensors deliver well over 10 EV, some deliver more than 14. Depending on which format you save to, you could be throwing away half the information! So, much like a artist should record 24/192, a photographer should be saving to RAW. But no monitor or film or the human eye will ever capture all that range in one instant. So, especially for temporal media, like audio and video, that space can be scoped down in the interest of space saving without any perceptible loss.


Define HDR. Almost all displays people have these days are not capable of reproducing the dynamic range that most cameras can capture. Hence why we have colour grading, which is the image equivalent of audio mastering.


I don't know about music and audio, but your photography comparison is incorrect for two reasons.

First, as msandford pointed out, the human eye has significantly better dynamic range than image sensors. Technically, our eyes have a lower range at any specific instant, but due to the way our eyes work we effectively see upwards of 20 stops of dynamic range. The best sensors available (in medium format cameras, pro-DSLRs, etc) can only capture 14-15 stops.

Second, some black and white films have a better dynamic range than digital sensors, so it's also not the case that digital is strictly better. 18-20 stops isn't unheard of for some types of film.


Doesn't that depend on what type of playback you are doing? More and more playback these days is done via digital transfer with the volume set in software at the sending end, to amplifiers at fixed volume, such as many multi-room systems.

If I airplay a song from my iPhone and have the volume at 50% set in software, then a few extra bits can help. Not sure if it makes a noticable difference, but it's a digital mixing scenario occurring at playback. If you play at extremely low volume it should be noticable.


An aside: I've never understood why the logic around digital signal-path volume adjustment isn't "keep the volume number around all the way to the end; throw the signal through the DAC at 100% gain, and then attenuate the signal on the analogue path using the digital volume setting." Uses more power, I suppose.


That's how it should work. Not sure why it doesn't. Needs some updated standard for digital transfer I suppose. Updates to AirPlay, Toslink etc.


no, because if you play at lower volume, that simply means that the quietest parts are closer to (or underneath) the threshold of hearing.

I see it this way: let's say 16 bits is needed to represent the entire discernable dynamic range between the threshold of hearing and the threshold of pain. if you turn the volume to 50%, then you throw out 8 bits, but you also only need to represent 8 bits worth of hearing range.


turning the volume %50 down would yield 15 bits, not 8 bits, which would be discernible.


er, my log math is bad, but the point is still valid.


Why woulda few extra bit in the source data help there? A DAC with more bits would, but that can easily be scaled from the existing data.


If you attenuate digitally you bit shift things out. If you have data in a 16bit (-32k..32k) stream and set a low volume in software before you send it, then it will scale the samples to (say) -8k..8k which is basically now a 14bit stream.

With a 24bit stream you can easily give up a few bits without losing dynamic range.


Sure, but that only to the stream you actually send to the DAC, not the source material? (as in, you can take a 16 bit stream, scale it to 24 bits and then lower the volume) Am I missing something?


I think the point was that sometimes you do want to apply some effect to the sound at playback time, e.g. an equalizer, and in that case a higher bit depth could maybe conceivably become useful.


No they're not. And no matter how many times this gets linked to on the Internet, it's still wrong.

The basic problem: the quieter a sound or detail gets, the fewer bits of resolution are used to represent it.

In 16-bit recording, there simply aren't enough bits to represent very low level details without distorting them with a subtle but audible crunchy digital halo of quantisation noise.

In a 24-bit recording, there are.

Talking about dynamic range completely misses the point. It's the not the absolute difference between the loudest and quietest sounds that matters - it's the accuracy with which the quieter sounds are reproduced.

This is because in a studio, 0dB full-scale meter redline is calibrated to a standard voltage reference, and both consumer and professional audio has equivalent standard levels for the loudest level possible.

These levels don't change for different bit depths, and they're used on both analog and digital equipment. (In fact they've been standard for decades now.)

This is why using more bits does not mean you can "reproduce music with a bigger dynamic range" - not without turning the volume up, anyway.

What actually happens is that the maximum possible volume of a playback system stays the same, but quieter sounds are reproduced with more or less accuracy.

In a 16-bit recording quiet sounds below around 50Db have 1-8 bits of effective resolution, which is nowhere near enough for truly accurate reproduction. (Try listening to an 8-bit recording to hear what this means.)

You might think it doesn't matter because they're quiet. Not so. 50dB is a long way from being inaudible, ears can be incredibly good at spectral estimation, and your brain parses spectral content and volume as separate things.

There's a wide range between "loud enough to hear" and "too loud" and 24-bit covers that whole range accurately. 16-bit is fine for louder sounds, but the quieter details just above "loud enough to get hear" get audibly bit-crushed.

The effect isn't glaringly disturbing, and adding dither helps make it even less obvious. But it's still there.

24-bit doesn't need tricks like dither - because it does the job properly in the first place.

Now - whether or not commercial recordings have enough musical detail to take full advantage of 24-bits is a different question. For various reasons - compression, mastering, cheapness - many don't.

But if you have any kind of aural sensitivity, you really should be able to A/B the difference between a 24-bit uncompressed orchestral recording and a 16-bit recording using an otherwise identical studio-grade mixer/mike/recorder/speaker system without too much difficulty.


"This is why using more bits does not mean you can "reproduce music with a bigger dynamic range" - not without turning the volume up, anyway. What actually happens is that the maximum possible volume of a playback system stays the same, but quieter sounds are reproduced with more or less accuracy."

You are slightly confused. (It may help to remember that a decibel always refers to a ratio, so the setting of your volume knob is not important.) Greater bit depth does allow for greater dynamic range, this stems directly from the definition of dynamic range. 16-bit audio has a theoretical dynamic range of:

  10 * log10 (2^16)^2 ~ 96dB
24-bit audio has a theoretical dynamic range of:

  10 * log10 (2^24)^2 ~ 144dB
For reference, a quiet recording room has a noise floor of ~40dB SPL and the loudest amplified music rarely exceeds 115dB SPL. This gives a dynamic range of 75db, which indicates that a well-recorded 16-bit master should be more than adequate.

The idea that having bits in excess of this amount will somehow result in the perception of a smoother or more accurate sound is fallacious. Even at maximum playback volume, this information will exist well below the noise floor and will simply not be perceived. In fact, this information will likely exist well below the noise floor of the recording studio and thus, in some sense, will not even be recorded.


If your mastering is done competently, you really aren't going to be able to hear it in a realistic scenario. Which is why:

  "Talking about dynamic range completely misses the point."
isn't really sensible. It's the use of dynamic range that decides how much useful resolution you have when quantizing a signal. This is really why higher bit depths on record and mixing are useful - they let you be sloppier with the inputs without losing much information before you've had a chance to work with it. It still doesn't gain you anything fundamental but it does mean if you got the levels a bit wrong you can salvage it. Higher bit rates here are excellent.

There is nothing magic about 24 bits here. Record something with 48 bits but set up your equipment all screwy so your only actually using the first 8 bits... and you've got an effectively 8 bit recording.

In real world applications the codec is giving you trouble with the low amplitude stuff, not the quantizer. Not that in realistic situations your equipment is likely to be able to generate this cleanly anyway.

   "24-bit doesn't need tricks like dither - because it does the job properly in the first place."
No. Dither isn't a trick, it is a fundamental approach to quantization error at any depth.

On playback, the issue goes the other way around. If you've mastered things correctly you'll be using the available dynamic range of the output in such a way that the information content of your signal is well represented. This is sufficient at CD rates for all practical listening scenarios.


Mastering, especially modern mastering, compresses the bejesus out of the end product. Trust me, you do not want to listen to uncompressed recordings under real-world conditions. Details will be so quiet you can't hear them. It'll sound thin and dull. Most modern pop has maybe 5-6db of dynamic range. Really loose, open mastering will be 20db or so.

As someone who both records/mixes albums and a live-instrument musician, a live instrument in the room sounds utterly different than any recording. Not necessarily worse, just different. The pursuit of "accuracy" in audio playback is childish and naive. The sound of a recording is a function of technical limitations, compromises, and aesthetic decisions as much as it is a product of the raw source sounds. Don't make it sound accurate, make it sound GOOD! And that usually means a lot of compression, and often deliberate distortion.


> No they're not. And no matter how many times this gets linked to on the Internet, it's still wrong.

The article is still correct, just like it always was.

Ironically, most of your analysis is also correct. Somewhere in your understanding though, you're leaping sideways to an incorrect conclusion.

>The basic problem: the quieter a sound or detail gets, the fewer bits of resolution are used to represent it.

So far so good, but you're about to go wrong again once you start thinking in terms of stairsteps and boxes and looking instead of hearing.

Back to the bits.

What lower amplitude (and fewer used bits) means is that the sound, as represented, is not as far above the noise floor as a full-amplitude sound. The digital noise floor is completely analogous to, eg, the noise floor of analog tape. If you use a dithered digital representation, you get something that behaves exactly as analog does. You hear and perceive both the same way.

>In 16-bit recording, there simply aren't enough bits to represent very low level details without distorting them with a subtle but audible crunchy digital halo of quantisation noise.

On an audio tape, the magnetic grains are just too large to represent very low level details without distorting them with a subtle but audible crunchy halo of analogue distortion and hiss.

In a 24 bit recording, the noise you mention is still there! It's just shifted down [theoretically] 8 bits or -48dB. That's the only difference. The noise floor is lower.

[In reality, 24 bit isn't. Most recordings don't even hit a full 16 bits, and no recordings, unless they're mathematically rendered, can get deeper than about 21 bits. There is no such thing as a 24-bit audio ADC/DAC that delivers 24 bits. The very best available today are about 21 bits of signal + 3 bits of noise.]

So the difference in playback between 16 bits and 24 bits is about 5 actual bits. If you're complaining about soft sounds in a 16-bit recording 'not having enough detail' because they're down at, say, 3 bits of resolution, are you saying it's all fixed by using 8? Aren't 8 bits woefully too few for any kind of quality sound?

(I hope at this point, you realize you're barking up an incorrect tree)

If you're following me so far, we can continue, but I expect even this much is going to require more conversation.


For a properly dithered recording, bit depth is exactly equivalent to noise floor. If low-volume details are lost in 16 bit recordings it is because their volume falls near or below the noise floor imposed by the 16 bit recording. 16 bits is good enough because the noise floor is low enough to be imperceptible to a human, if the gain of the recording is such that the full dynamic range available is used


""" in a studio, 0dB full-scale meter redline is calibrated to a standard voltage reference, and both consumer and professional audio has equivalent standard levels for the loudest level possible"""

This is only minor nitpicking, but the standard 0dB levels for professional audio (0dB reference at +4dBu == 1.23Vrms) and consumer audio (0dB reference at -10dBV == 0.32Vrms) are not meant to indicate the maximum ("loudest level possible") but just serve at a reference point, for e.g. the 1kHz sine you inject when setting up your gain throughout the signal chain. On most studio gear, you'll easily have +15dB headroom left.

AD/DA converters haven't really standardized on a full-scale level and there are quite a few different definitions in use: https://en.wikipedia.org/wiki/DBFS. Most "line level" ADC/DACs will have switches or jumpers to select between two or so settings. You'll choose them so that you are not likely to clip your ADC, and will only playback on your DAC with an appropriate level trimmed to not clip your analog gear.


Are you an expert?

In a 16-bit master, a noise shaping function is applied during down-conversion, by which quantization noise will be re-distributed so that most of the noise energy goes to the high frequencies (>15k) where it is completely inaudible.

For a good example of such a recording, see Ahn-Plugged (Ahn Trio, 2000, Sony BMG Masterworks). Fire up a good spectrum analyzer. You'll find the noise floor is well below <110 dB throughout most of the spectrum, even though it's 'just' a 16 bit CD.


Compressed formats are not really relevant to considerations about the bit depth itself.

Besides, mp3 [audio] compressions have difficulty in handling specific samples, or type of samples (eg. sharp attacks), and they may manifest artifacts independently of the bitrate; MP3, AFAIK, also has a ceiling of 320 kbps within the standard specification, which certainly doesn't help.

Secondly, I'm not sure if you process further the MP3s (when you refer to mixing), but if you do, you're definitely going to make noticeable, artifacts which weren't so in the unprocessed MP3 form.


I mean, I'm just some hobbyist, but my understanding is everyone renders to lossless and then converts that to the various MP3/AAC formats, never changing anything solely because of the final compressed format.


Old thread, but I thought I made it clear that I wasn't talking about data compression, but rather dynamic range compression.


MP3's perceptual model can still throw away information at the highest quality settings. FLAC doesn't throw away anything.

It's possible you are just hearing the difference between codecs. You'd have a fairer comparison with 24-bit vs 16-bit FLAC.


Cymbals are the biggest tell for compressed music. Even with crappy speakers they sound very strange.


SiruisXM satellite radio is probably the worst offender. Makes music unlistenable to me.

Even 128Kbps MP3s render cymbals better.


" can tell the difference from effects in audio mixing."

Yes, non-linear effects can be sample rate sensitive. However-- this really means that their internal model is aliasing and not faithfully simulating an infinite sample rate system.

In an ideal world, effect that needed more sample rate would internally upsample/downsample (or be constructed in a way that they didn't need to). Then they would behave consistently across rates; though doing this would waste cpu cycles.

In any case, the article is all about distribution. Having excess rate in mastering is cheap and harmless, and-- because of these reasons, can be practically pretty useful.


For those interesting in different versions of an album where it was mastered by a different audio engineer, you should check out the Steve Hoffman forums: http://forums.stevehoffman.tv/


Or even the same engineer years later.


That's your processing overhead, you can mess with the sound a lot more at 96k before you hear audio issues.

The difference you hear is the difference between flac's lossless format and mp3's lossy format it has nothing to do with 16 bit versus 24 bit.


This article hits close to home: before I became a programmer I worked as an audio engineer at a fledgling studio in my hometown.

The amount of misinformation / junk-science in the audio world is preposterous. There's a religious-cult of an industry that feeds off the ignorance and placebos of its participants. I have many friends who swear by their What.cd 24/192 FLAC vinyl rips and spend hundreds of dollars on audiophile AC wall outlets. Not to say that there are no differences in high-end audio equipment, but so much of what's "good" is subjective.


In the case of sites like what.cd, I think that FLAC 16/44 rips of CDs and vinyl are useful for creating distributed backups of our cultural corpus. But I agree that 24/192 FLACs of vinyl are ridiculous.


I agree, in fact I very much like the sound of vinyl, but to say it's more "accurate" or of higher fidelity and dynamic range than 16/44 is completely false.


this is the first mention of what.cd outside of the tracker scene i ever saw. funny when you think about it.



First, let me state that I believe that CD audio, played through a modern DAC and quality stereo equipment is pretty much the pinnacle of home audio listening. That is to say, I think 44.1kHz 16-bit PCM audio is plenty good and I'm in no rush to replace my CD collection, nor do I think significant investment in higher bandwidth audio (for playback, mixing and mastering are another story) buys you much.

That said, there's one thing the article does not address and that is "beating", or really inter-modulation distortion from instrumental overtones.

Instruments are not limited to 20-20kHz. They can have overtones well above this range. Additionally, note that short pulse-width signals, i.e. transients, like drum strikes, especially involving wooden percussion, can have infinite bandwidth. (Not really infinite, but pulse-width is inversely proportional to bandwidth.

In a real listening environment (i.e. live performance) these overtones have a chance to interact with one another in the air. It is possible that these overtones may beat with one another and cause inter-modulation products in the audible range. For an example of this, play a 1000 Hz tone through your left speaker, and a 1001 Hz through your right speaker. You will hear a distinct 1 Hz "beat". The audibility of these are largely dependent on listening position and amplitude, but it is possible to occur with instruments. Since most recordings are done using a "close mic" technique (placing the microphone very close to the source) the interactions such as this are never recorded.

However, if full bandwidth of the producing instruments is preserved, these interactions of the overtones can be reproduced in a playback environment given equipment having a wide enough bandwidth and degree of quality.


Nope. Intermodulation distortion for out-of-range frequencies is inaudible. The 1hz beat you are hearing is not a 1hz sound wave, it's a 1000.5hz sound wave becoming louder and softer once per second.

The comparison of a 1hz beat to a 1hz sound should be absurd on its face: you need about 20-30hz to become audible, and it's a low rumble more felt than heard. Very low frequencies sound absolutely nothing like intermodulation beats.


Intermodulation creates both sum and difference frequencies, the latter can certainly fall into the audible spectrum, assuming the ultrasonic frequencies are within the passband of your recording medium. The sum component can also alias back into the passband as well...


Is there a difference? Audio is one-dimensional, frequency is just the derivative of amplitude. An arbitrarily high frequency sound wave becoming louder and softer 440 times per second is just as much an A as a 440 Hz sound wave at constant volume. A lot of cheap audio gear even uses a "1-bit DAC" that is just very high frequency PWM.


The signal you describe in your comment is very much NOT an A. look at the fourier transform of a 1kHz signal modulated by a 440Hz signal, and you won't see any frequency component at 440Hz, nor any integer multiple 440Hz!

You can look at your "A440 at a constant volume" example as a 0Hz(DC) signal getting louder and softer 440 times a second, but this is the only case in which your example holds. Amplitude modulation creates sum and difference frequencies, so the A that you hear is 440Hz + 0Hz. if you change that 0Hz to 1kHz, you get a signal thats the sum of a sinewave at 560Hz and a sinewave at 1.44kHz, neither of which are an A.

The distinction is that the 1Hz signal is modulating the audible signal, not adding to it, if you look at the spectrum of the sum of those frequencies there is no 1Hz component, whereas if you added a 1Hz signal you'd get something completely different. And in this case the amplitude of the signal is always changing faster than 1Hz.

Edit: another way of looking at it: You wouldn't say you can "hear DC" because you can hear an A440 played at a perfectly constant volume.


All of that makes sense if we consider the case of amplitude modulation, which is multiplication. But if we are talking about the interference patterns caused by two overlapping audio signals, that is addition, is it not?


The two signals which are "interfering" are added together. The amplitude of the resultant signal varies sinusoidally, as the instantaneous phase difference between the two signals goes from 0->2pi. One way of describing the signal would be that it is a separate tone (sitting in the middle of the two frequencies) being amplitude modulated by a signal at the beat frequency... which is what you hear and why you "hear" the beat frequency) I went with this way of describing the signal because you were talking about "an arbitrarily high frequency signal getting louder and softer 440 times per second" which is the definition of amplitude modulation.

Counterintuitively, there is no frequency component generated at the beat frequency when you sum a 1kHz and 1.001kHz signal, its easy to test that out with matlab, octave, scipy/numpy/matplotlib, etc. Generate the two signals, add them together, and look at the Fourier transform, you'll see two components, one at 1kHz and one at 1.001kHz (assuming you take a long enough window to have that type of resolution) and no component at the beat frequency. A third sinewave doesn't just jump out of nowhere when you add two separate sinewaves together.

If you take the sum of those two signals and run them through an ideal brickwall highpass filter at 999Hz so there are no frequency components below 999Hz, you'll still "hear" the beat frequency because it isn't a separate spectral component, its just the two signals slowly going out of phase, cancelling eachother out, and then going back in phase and boosting the amplitude.


"...and cause inter-modulation products in the audible range." AFAIK, this is true in acoustic environments under conditions as described in the original post.


There are no inter-modulation products in the sense rplst8 expects. The interaction of waves of 1000Hz and 1001Hz doesn't produce 1Hz waves in linear media.

Even if it did produce some new frequencies through some non-linearity (which is negligible in most environments afaik), the recording equipment would capture the low frequency waves produced by those interactions. So the only question is whether there are significant non-linearities in our hearing system, and the overwhelming evidence is no again afaik.


The recording equipment won't necessarily pick it up. If the microphones are place very close the instruments relative to the other instruments creating the interaction.


If no recording equipment would pick it up, no human would pick it up either.

Quoting xiphmont

> Once you're driving air so hard it becomes nonlinear, thus introducing intermodulation distortion in the air, that distortion produces actual audible-range distortion products. And because the distortion you're hearing is in the audible range, a recording will sample and reproduce it accurately.

> You're hearing the audible _result_ of IMD, you're not somehow listening to the distortion curve itself.


> If no recording equipment would pick it up, no human would pick it up either.

Have you been to a concert? There's no recording/playback technology that can reproduce anything close to the sound of a full orchestra. It's all lossy.


That's an entirely different discussion having nothing to do with intermodulation products.

Have you read the article/quantization discussion on what is meant by lossy? If your recording equipment is good and your reproduction equipment is good, 16/44 is enough to reproduce the concert sound perfectly (as far as human hearing goes). What you do not experience is everything else but the sound -- the vibration of the super loud bass on your skin, the energy of the public, the beauty of the venue.


First off, from the point of view of the linear theory of sound waves, you're plain wrong. Two waves of any frequencies will only interact additively in linear media -- so no low frequencies are created through their interaction (unless non-linear effects come into play, but those usually create higher, not lower, order harmonics afaik). Beating is merely an interpretation of a modulating wave, the reality is the Fourier spectrum.

Second, as far as I know our hearing is composed of linear excitation elements (they have a definite bandwidth), and this is confirmed pretty well by experiments with human hearing -- you can see the threshold of our hearing at about 20kHz and that we experience tones of different frequencies fairly independently. Those assumptions imply that two tones, one at e.g. 50kHz and another at 50.001kHz are inaudible, end of story.

You can actually do this experiment yourself if you have a signal generator that can do 1Hz amplitude modulation and drive a transducer with a non-negligible sensitivity in that range.


But the within-audible-range-"beat" at the recorded "listening position" (where the microphones are located) would be recorded anyway, no?[1] So how does hi-res audio help in this case?

[1] AFAIK most music is not recorded like that, instruments are recorded separately and then overlaid; but then adding realistic-sounding "beats" based on whatever positioning the sound engineer envisions should be possible in software?


>That said, there's one thing the article does not address and that is "beating", or really inter-modulation distortion from instrumental overtones.

Beating and intermodulation distortion are entirely different things. They look similar on an oscilloscope, but they're not and they don't sound the same.

>Instruments are not limited to 20-20kHz. They can have overtones well above this range.

Correct. You can't hear the overtones beyond the upper portion of the hearing range (many people believe you can).

>In a real listening environment (i.e. live performance) these overtones have a chance to interact with one another in the air.

In reality they do not unless you're driving the air so hard the trough rarification is approaching hard vacuum. (That's not actually impossible. It's how ultrasonic audio 'beaming' devices work). Some performances are powerful enough to get close, eg, if you're sitting six feet from the pipe organ.

Once you're driving air so hard it becomes nonlinear, thus introducing intermodulation distortion in the air, that distortion produces actual audible-range distortion products. And because the distortion you're hearing is in the audible range, a recording will sample and reproduce it accurately.

You're hearing the audible _result_ of IMD, you're not somehow listening to the distortion curve itself.

> It is possible that these overtones may beat with one another

You're continuing to confuse beats and IMD, but here you're talking about beat frequencies, so Yes. But beat frequencies are a sort of auditory illusion. If one of the frequencies that would produce a beat is inaudible--- there's no beat. Easy to test, go try it.

> and cause inter-modulation products in the audible range.

IMD is not a beat. Inaudible ultrasonics will produce audible artifacts when the underlying reproduction system is nonlinear (another way of saying 'there's intermodulation distortion'). However, that's a playback artifact. If the IMD products were audible in the original signal, audible range sampling would reproduce them.

If it wasn't audible in the original performance, it should not be part of the recording, and it should not be part of the playback.


Dumb question, but wouldn't you need to reproduce position of audio sources as well in order to replicate that?


It is not just headphones that are the problem, it is the speakers.

People today are often amazed when they listen to CD or turntable content through 70's era crossover speakers. Back in the 70's you'd have a stereo with 2 "speakers" that each had 3 subspeakers for a total of six speakers. The fad today is to have 5.1 sound with a single driver in each satellite, also a total of six speakers. The spatial resolution increase is good for movies, games and TV but surround sound in music is marginal. An amazing number of old "classic rock" recordings were done in quad and anything by Donald Fagan will sound pretty good w/ Dolby Pro Logic, there are some more recent Bjork recordings, but almost everything is mixed for stereo and what you loose in frequency response is not compensated by anything, except perhaps the ability to produce more volume with more speakers.


If you want to know more, Monty made one of the best intros to digital sampling I've ever seen: https://www.xiph.org/video/vid2.shtml


I have a pair of Roger Sound Labs studio monitors for my speakers at home. I got to look at their insides when a technician was replacing a blown midrange speaker (they have a "lifetime" warranty, however that warranty expired when RSL did). Looking at the cross over filter network I could see a network selecting for frequencies > 20khz and it was shunted to a resistor. I asked about it, and the reponse was exactly like the authors, by filtering out signals higher than the tweeter could reproduce, they improved the listening experience.

It made sense to me, and I love how the speakers sound. Understanding is not inserting distortion makes even more sense.


Why do high quality DACs clearly sound better then? And they sound better with better files. Maybe it really is all in my head but I mean listening to a £20000 hifi the other day (vinyl) really just shocked me.

I was listening to Marvin Gaye on my friends system and I could hear that there were several different backing singers all moving and at different distances from the microphone.

Are there any double blind trials anywhere of Vinyl/CD/24-192khz with super high end hifi systems? Mostly I see people suggesting that these tests are performed from the phono output of a mac with a pair of average ear buds...


>Why do high quality DACs clearly sound better then? And they sound better with better files. Maybe it really is all on my head but I mean listening to a £20000 hifi the other day...

You were listening to £20,000 worth of amps and speakers, and you were most likely in an acoustically treated room.

Also, novelty is almost always euphonic when it isn't overtly bad. This fact is often neglected. You hear something you didn't hear before and your brain immediately tells you that it sounds better, even if it doesn't actually represent higher fidelity. Actually making an objective judgement requires a careersworth of experience, or a test lab and the skills to use it.

For example: you were listening to vinyl, which is covered in delicious noise and warm harmonic distortion, and is mastered differently. Highly euphonic, very novel if youve only ever heard the CD version before, but definitely not higher fidelity.

BTW higher end DACs do sound better, but the rest of your signal chain needs to be really good for you to notice it. It's often to do with better phase accuracy between the left and right channels, which affects the soundstage, or stereo image. If your speakers/amp have loose timing however, you'll never be able to tell.


> higher end DACs do sound better

This hasn't passed the blind tests either. A good, 100 dollar dac (a schiit or an odac) will sound just as good as a 1000 dollar dac.


Higher end DACs. My fireface 400 has many DACs in it and it cost £549 new when it came out, in 2007. It sounds obviously better than the ~£80 USB soundcards and various built-in DACs in laptops. I've done A/B testing. (I didn't buy it for the converters btw, I bought it for the features and the build quality, which are unmatched.)

Forget about spending £1000s though. I'm sure that ODAC thing sounds better than the RME, I saw that test where it was identical to the various industry standard units.


Designing the circuitry around the DAC to provide ultra low noise reference voltages, to fully isolate ground loops, and provide a fully linear response (AC coupled, but linear in the relevant range) is not trivial even for 12-bits. It's not the hardest thing in the world, but lots of people do it wrong. Trivial sounding things like using a cheaper capacitor in the low pass filter or not finely regulating your power supply voltage can put audible noise onto things.

Worse of course if the sound card is integrated into a computer due to the opportunity to pick up much greater ambient RF noise from other components, although that is less of a problem now than it used to be back when I could hear my hard drive kicking up on my speakers...

In any event, I'd absolutely believe that the quality between a $100 DAC assembly and a $200 system is enough to be noticeable. More than that and I'm very skeptical. So I guess I don't really disagree with your statement, but I think that in current dollars $100 isn't necessarily enough to pay for solid underlying engineering and good components.


Vinyl actually has far less fidelity. You also physically change the recording every time you play it back. Even on the same equipment, no two plays of a vinyl LP sound exactly the same, unlike digital.

This fact alone should cause you to question your subjective experience. You have no idea what part of that system was contributing to what you found pleasant. Someone who knew what they were doing could probably build a $2000 system that would blow you away just the same.

And if you were playing vinyl, there wasn't even a DAC present in the signal chain :)


>Vinyl actually has far less fidelity.

Vinyl mastering is sometimes better than CD mastering though, due to the loudness war.

I would love to sell my turntable and vinyl collection and rely purely on digital formats. Takes up less space, technically superior format, etc.

But one thing keeps me buying vinyl:

AWFUL mastering on CDs. A significant portion of LPs are released with more normal mastering on the vinyl, while the CD will be brickwalled all to hell.

I listen to metal, and rock as a broader genre is particularly bad about it. One of my favorite albums of last year, Fallujah's The Flesh Prevails, had a dynamic range of 2 to 3 on almost every track on the CD. The vinyl master? 9 to 10. Still not great, but leaps and bounds better. The CD actually clips if you convert the songs into MP3.

Until they go back to not murdering CD mastering, I'll continue buying vinyl :(

(I know your comment isn't directly about vinyl being bad or anything - I just have a compulsion to bitch about the loudness war any chance I can)


How do the versions on streaming services compare, are they usually copies of the CD masterings ?


Usually the CD master.

The optimistic people have said that iTunes and YouTube not allowing the high volume compression will kill the loudness war [1][2], but, it is still happening [3].

[1] http://www.digitalmusicnews.com/2013/10/28/itunesloudness/ [2] http://productionadvice.co.uk/youtube-loudness/ [3] http://dr.loudness-war.info/album/list/year/desc


> You also physically change the recording every time you play it back

Not on a laser stylus turntable.

http://diffuser.fm/laser-turntable/


These are not recommended for high-fidelity playback. They're good for archiving records you don't want to damage by playing, but the laser picks up every tiny speck of dust that a needle would plow inaudibly through.


Here's a classic "OMG I can hear him moving around" recording that works with the cheapest of headphones: https://www.youtube.com/watch?v=IUDTlvagjJA Most of the audio experience quality comes out of the production phase, and artists can put as much or little effort into that as possible and consider several mediums of listening (headphones, TV, surround, concert, vinyl) and make tradeoffs for the medium's particular experience.

I don't know about double blind trials but people do tests on their own. It's further complicated though because the hardware you use could be optimized for certain types of music, e.g. have a read through http://arstechnica.com/gadgets/2014/07/some-of-the-worlds-mo...


That is awesome. At first, I actually thought people around me were making the noise and speaking. I thought the audio sample had not yet started.


Wine tastes better when it's poured out of fancy bottles, too.

The article mentions just such a study performed with high end equipment.


Better masters are the key difference. See for example [1]; everyone agreed that the DVD-A and SACDs sounded better when truncated to 16 bits then a printed CD.

1: http://drewdaniels.com/audible.pdf



From my experience, what matters more than sample rate is 24 bit vs. 16 bit sampling in the recording/production process. Using heavy compression and EQ can mean that very quiet sounds can become louder, in this case 24 bit recording is ideal. Sample rate wise, anything above 40khz is fine for most ears (I've probably lost a few khz in the upper range anyways) Another note is that most converters operate at a multiple of 48K, so it makes sense to use 48/96khz if you are recording. It all comes down to how much disk space you have, and want to use up.


I can still hear around the 20k range, so let's not exclude a few listeners just because we wear hearing protection when it matters. In practice, 20k audio content means 44k or higher sample rates, due to the fact that actual filters have finite transition bands. There's an unfortunate history of engineers with poor hearing who inflict pain on others, such as the horizontal retrace on NTSC TVs which still annoys me when I encounter one.

24bit also means we don't have to record at 0dBFS, which saves a lot of time.


I'm very jealous of your healthy hearing range... I now wear hearing protection, but the damage has been done. When it comes to file storage, I will take a 44.1/48K 24 bit FLAC happily, since it usually comes out to the same size as a CD-Q WAV file anyways. I see no reason why everything shouldn't be in this format, but CD's have made a serious dent in formatting standards/habits.


24bit has been shown to be excessive in actual tests, and it doesn't really pass the "bullshit test". 24 bit allows you to switch between a whisper in a library and a jet engine or shotgun blast in a single recording—yes, it would instantly damage your hearing.

By comparison, 16 bit audio can "only" record a whisper in a library and a motorcycle or jackhammer.

Double blind tests show that 8 bits are not enough, but 14 bits are.


Citation? The article is talking about dithering, which you really don't want to do, not least because the end result will probably compress worse than the higher-bit-depth version. The fact that they suggest it at all implies, to my mind at least, that 16 bits isn't enough.


You absolutely do want to do dithering. Dithering converts distortion (error correlated with the signal, which is bad) to noise (error uncorrelated with the signal, which is less bad). This means that even though the noise floor is higher, you can recover more of the original signal. There is virtually no case where that is not desirable.

You're right of course that it will compress less well, but that's to be expected because you've lost less information!


Or because you've added randomness?

Store the 24-bit signal, and you could do a dithered downsample to 16-bit on playback if you think that's a good idea. Wouldn't that be better all round?


It's my understanding that 24 bit audio can capture the quieter sounds in greater detail than 16 bit. So say if you EQ out a frequency range in order to "zoom in" on a much quieter range, like the motorcycle vs. whisper, you can hear the whisper in as much detail as a full CD quality recording.

For playback though, I agree that 14 bits are probably enough. Even high quality mastering tape has the equivalent of about 12 bits of dynamic range, which is fine. Many fabled analog pieces of equipment have terrible signal-noise characteristics, but are still valued for other reasons (coloration, distortion etc...)-which is all fine by me.


Telephone audio is 8khz, so recording at a multiple of 8k helps with downsampling for IVR prompts or hold music. With dithering, it isn't terrible to downsampling from 44.1k to 8k, but it's nice to avoid it.


Have you tried listening to SACD? The high sample rate might not give you more reproduction of audible frequencies, but the difference in arrival times it can encode makes well recorded stereo stuff more interesting to listen to, in my limited experience.


I know this seems counterintuitive, but there is literally no difference in arrival times (over audible frequencies) that can be encoded at higher sampling rates. Digital sampling does not quantize over the time domain for any frequency below the Nyquist frequency.

If you have the time, watch the two videos that xiph.org did a few years ago[0]. There's a great in-depth explanation, as well as a hands on demonstration to demonstrate this reality.

[0] https://xiph.org/video/


This is directly addressed by the article under the "Sampling fallacies and misconceptions". You don't lose "arrival time" (AKA phase) when you use a lower bitrate. They have a video that explains it very well: http://xiph.org/video/vid2.shtml


I would be very curious to listen to SACD on some good headphones in a quiet room. Not sure if I've ever even seen a SACD player aside from maybe in the Sony store 10 years ago. The trick would be to find something that would be mastered for the format.


Are you sure? Many so-called "universal" bluray players can play SACDs.

I got a Denon one. I haven't played any SACD on it yet (I got it for bluray), though I guess I could easily find some at that video rental store (in Tokyo).


  Because digital filters have few of the practical  
  limitations of an analog filter, we can complete the 
  anti-aliasing process with greater efficiency and 
  precision digitally. The very high rate raw digital 
  signal passes through a digital anti-aliasing filter, 
  which has no trouble fitting a transition band into a 
  tight space.
I always thought digital anti-aliasing filters were creatures from a fairy-tale world. Much talked about but no one has ever seen one.

My understanding: If you have a an analog filter of a given steepness the only way to further reduce aliasing effects digitally is oversampling. Or less steep (cheaper) analog filter plus oversampling is the same as steeper (more) expensive analog filter. People tend to say digital anti-aliasing filters when they really mean oversampling.

"24/192 music downloads make no sense" seems to be a thoroughly researched and carefully written article. It explains oversampling very well, possible confusion with digital filtering (anti-aliasing or not) is out of question. But then it goes on to talk about digital anti-aliasing filters, which makes me afraid I could be wrong.

Do digital anti-aliasing filters exist?


The digital anti-aliasing filter can only ever work on a digital -> digital signal, but they're still useful in the analog->digital process.

> My understanding: If you have a an analog filter of a given steepness the only way to further reduce aliasing effects digitally is oversampling. Or less steep (cheaper) analog filter plus oversampling is the same as steeper (more) expensive analog filter. People tend to say digital anti-aliasing filters when they really mean oversampling

You're right, and it's actually both. The ADC can run at a much higher sample rate with a cheaper analog filter, and then that digital signal is again passed through a digital filter and downsampled.


Yes, but obviously they don't work if your signal is already aliased. The most common example would be when decimating a signal, e.g using a digital filter with a cutoff at 22khz before down sampling from 192kHz to 44.1kHz. This is often realized in a single step... Check out polyphase interpolation/decimation if you're interested in learning more


Try what age is your ears https://www.youtube.com/watch?v=VxcbppCX6Rk

Or generate a tone sweep in audacity. Generate->chirp http://www.audacityteam.org/

You loose the ability to hear high frequency sounds as you age.

Personally I can hear up to about 14kHZ


Huh. Downloaded a tone generator, and found that while in the video I heard a series of clicks at 16khz and beyond, I could in fact hear 16khz if I raised and lowered the volume from nothing to loudest in the app. It sounded like a whine and much harder to hear than I expected, distinguished most easily by when it quickly went from present to not present and back. In fact, I kept going up the scale doing that, and raising the volume, and found that I was able to hear even 19 to low 20khz as a high pitched noise, very quiet even at -6db. So ... yeah, probably does me no good considering that the loudness of other pitches makes it near impossible for me to hear anything practical in those frequencies. Of course, then I go to listen to music and wow, I can hear all this detail. I think I trained my ears for it, or I'm losing it. ;-)


Wouldn't this question be answered with a large-scale double blind trial?

If more people prefer the sound at the higher bitrate and sampling rate, then that's the better format, even if there's no technical reason why that format is superior.

Much like how some people prefer the "warm" sound of tube amps, even if that means more distortion.


From the article:

>Empirical evidence from listening tests backs up the assertion that 44.1kHz/16 bit provides highest-possible fidelity playback.

You can read the article if you want to find the actual references. No one is arguing that higher rates/bits produces any sort of distortion that anyone would prefer.


> Much like how some people prefer the "warm" sound of tube amps, even if that means more distortion.

The difference from my perspective is that an amp is a tool for sound production while a digital music format is a tool for sound reproduction. When producing sound, choosing more distortion over less distortion is a valid choice. When reproducing sound, the goal should be accurate reproduction of the original.


I can hear insects and buzzing electronic devices, and my partner thinks I'm crazy some times. Thinking I might have golden ears I tested * my range and I could hear up to 18kHz.

* http://onlinetonegenerator.com/hearingtest.html


Honestly, depending on your age, that still could be “golden” — I’m 31, I’ve taken very good care of my hearing, I’m very acutely aware of audio subtleties, and my hearing range tops out around 16.5KHz. The so-called standard upper limit of 20KHz really only applies to young children, which is why CD audio being able to reproduce frequencies of 22.05KHz is already beyond ideal, and calls for 48! no, 96!, no, 192! (or higher) is literally insane for playback.


Using a tone generator on my computer and a pair of headphones, I found that I couldn't easily hear past 15-ish myself, then I started turning up the volume, or playing with turning the volume all the way up, then all the way down. Using that technique, I was able to distinguish noise and high pitches up to 20.2khz or so. So I think from now on, if I hear some whine, I'm going to trust that it's there and not my imagination. Of course, it's also the definition of going deaf, I suppose, that I have to turn up the pitches to such a loud volume to hear them in the first place...


Some [consumer] digital low-pass filter can benefit from higher sampling rates, leading to an overall better representation of the analog signal up to 20kHz. But there are diminishing returns as the filter "folds" the octaves above 22kHz; A rate of 96k for certain lowpass filters is better than 48k, but at some point there's little (if any) benefit by going to 192k or 384k. For recording studios, go as high as you can in both bit-rate and bit-depth. Especially when you're processing the signal "in the box". Give the software as much data as possible to operate without introducing errors and artifacts. There are diminishing returns there as well, but RTFM for (for example) UA gear and software and you're good to go.


TFA mentions that for recording and mastering there is a use. Furthermore, the headline implies it see the term "downloads" in the title.


Yep, I read that too. Even so, there are low-pass filters in some consumer gear that benefit from, say, 96k sampling rates and result in better quality sound. This does imply that at 44.1 or 48 they don't represent up to 20kHz properly, of course.


Lossless upsampling the 44.1 kHz recording to, say 192 kHz is trivial for the reproduction equipment. That the LPF on the reproduction end wants the DAC to run at greater than 44.1 kHz has no bearing on the sampling rate of the distribution format.


Are you talking about digital low-pass filters? In the event that they are using one that benefits from a higher sampling rate, they could upsample before applying. Or just use a better low-pass filter implementation.


24/192 lossless is a digital Veblen good; some people will pay more for it (and/or the HW to play it & store it), and almost all of them will enjoy it more, if only because it costs more. Whether it actually sounds better is somewhat tangential.


So, what are the better settings for ripping songs?


I rip to FLAC, not because I think it sounds better, but simply because if some newer better codec comes along in the future that will let me compress my songs on my smartphone even smaller (Opus?) I don't want to have to get my CDs out again. I can just transcode from the FLAC files.


when ripping songs you are probably starting out with either 44.1 (CD) or 48kHz (DVD) sound. Just keep whatever the native sampling rate is.


Compression wise I go for 256kbps AAC. It's quite superior to MP3 as a codec.


I just recently purchased Izotope Ozone 7 advanced. One feature it has is "codec preview" which lets you "solo" the codec artifacts for MP3 and AAC format. Even at high but rates it's amazing how swishy bit reduction sounds. It also made me realize what I was hearing with mp3s was artifacts from compression. That said, it's not unlike tape hiss or vinyl noise. In fact I think it can have its own charm and in some cases make the music sound more full. It's also probably why 24/192 digital audio can sound so "cold" or lifeless.


From mastering records at home, I've found that in all but the most golden-ears focused listening, I can't hear the difference between 192 bit mp3 and 44.1/16 cd quality. But 128 bit mp3 is audibly degraded and irritating.

That's a pretty cool feature for Ozone 7, for sure! I'm still using Ozone 5 and don't feel a need to upgrade, but that might make it...


I love the idea the author mentions in passing of a dedicated speaker assembly for ultrasonics. This seems like something that could be a huge margin business, and the parts costs would be as low as you wanted.


You're late to the game. Such high margin products have existed for some time. There are even published papers about them()!

The published papers tend to be by the same people making the supertweeters


They are useful if you're resampling them or editing them, but I doubt that's something consumer music services are overly concerned with.


I'll note that's the entire point of Monty's (great) article, which has this near the top:

Unfortunately, there is no point to distributing music in 24-bit/192kHz format. Its playback fidelity is slightly inferior to 16/44.1 or 16/48, and it takes up 6 times the space.

This has all been known to anyone with actual signal processing and/or audio engineering knowledge for a long time now. As in, common knowledge to the kinds of folks attending the AES conference at least back to ~2001 or so. The high sample rate/bit depth stuff is useful for production process, but irrelevant for final distribution.


There's a reasonable argument that fits within DSP theory that frequencies sampled above audible range could have harmonics down in the audible range.


This is addressed in the article. While theoretically relevant to some recording applications, (overdubbing a string section one violin at a time, why would you want to do that?) this kind of intermodulation distortion can only harm the reproduction of mixed material.


Or if you apply an equilizer, like lots of people do in consumer applications.


There's a big difference in impulse response with different sample rates, any one can see it on a oscilloscope, I bet some one can hear the difference.

Those who don't have a oscilloscope can see the picture here: http://i.imgur.com/wY0wzcW.png


What you are showing is _precisely_ the effect of low-passing, nothing more, nothing less.

See the digital media primer 2 for more information on that: https://wiki.xiph.org/Videos/Digital_Show_and_Tell

If humans were able to hear audio above 22kHz (or what not) in any meaningful way, we'd expect to be be able to demonstrate that effect in carefully controlled studied and then that lack of low-passing may matter; but that isn't what the best evidence so far shows.


The low-passing with a brick wall filter on 44.1KHz audio can be a bad thing sometimes, for example, pre-echo https://en.wikipedia.org/wiki/Pre-echo You won't hear the pre-echo on a 2.8MHz DSD audio.


In the real world, it is almost impossible to make a voltage divider with 24bit resolution. So all the DAC makers have to convert 24bit audio into lower bits(6bits to 1bits), this step requires oversampling the original audio. It is a lot easier to oversample a 192KHz/24bit audio than a 44.1K/24bit audio, and the ringing is much less after oversampling the 192KHz/24bit audio.


The two pictures don't have the same vertical scaling, and it's clear that the probe is ahead of the LPF in the signal chain.


the probe is placed on the headphone jack. The difference in the vertical scaling is that 0dB DSD signal is 6dB below a 0dB PCM.


The brick wall filters used on low sample rate sound cause ringing in the time domain, which can "blur" the neighboring impulse.


My third time reading this, and a new question popped into my head: Are there any volume adjustments (on software or hardware) that take into account the pain threshold curves? That is, volume adjustments that aren't flat, but that will attenuate the frequencies that will cause discomfort at the lowest volumes?


So, no point in 24/192 because it makes no difference in playback... but having lossless downloads is important in part for enabling remix culture? There's a bit of a double-standard here. Maybe I can't hear 24/192 audio, but isn't it better input for sampling?


The article is specifying 24/192 as useless for playback quality only. Halfway down he addresses the benefits of 24/192 for the sake of mixing and mastering different digital audio signals, but a final mix offers no benefit to the human listener when choosing between 16bit/44Khz and 24bit/192Khz.


What I was trying to get across is: every file has two potential purposes—listening and serving as input for sampling. So, if we care about enabling "remix culture", wouldn't make sense to offer a "24/192 FLAC" option for download, push DVDA over CD, etc., anyway?

I've never seen the hype from artists about 24/192 as being about better listening experience. It's about handing their consumers a better master so as to encourage and enable more of them to be remixers.


Yeah I think that's not obvious from the title of the article: 24/192 are useful downloads for the sake of editing.


[deleted]


The number after the slash in this conversation is 192kHz (sample rate), not 192kbps (compressed bitrate). The "44100Hz PCM audio" you see on your CDA/WAV/FLAC source, before lossy encoding, would be "192000Hz PCM audio" instead. Unlike bitrate, this number does not affect the "quality" of the sound (unless it's really low), but rather acts as a lowpass filter, dropping out the frequencies above 0.5x of it.


To what can I attribute the consistently horrible quality of 64kHz streams ten or fifteen years ago? Would that fall under the "bad encoder" bucket?

Edit: christ, I mixed up bitrates (e.g. 192kbps) with sampling frequency (e.g. 192kHz) again. I was referring to 64kbps streams.


64 kHz isn't a standard sample rate -- you're probably thinking of the bit rate of an MP3 or AAC file. A 64-kbit MP3 does sound pretty awful.


Yup. Further confusing me was the fact that (if memory serves) Apple did offer MP3's at 192kbps for a while, before upping to 320kbps.

Edit: apparently my memory is worse than I thought.


Apple doesn't sell 320kbps anything, but 256kbps AAC, which is probably better than 320kbps MP3.


Hasn't Apple always been offering AAC? At first at 128 kbps and then 256 kbps.


iTunes has offered 128kps AAC and 256kbps AAC files. Now it's only the 256kbps versions.


mp3 encoders have gotten better over time. As well as general improvements in fidelity, older encoders had bugs that would cause occasional terrible encoding for fragments of a sample.


Everything else is fine and good in the article, but I can see the infrared in the Apple remote (and all the other IR remotes I've tried). It's faint, but plainly visible. Am I the only one?


Went to a dark room with an Apple Remote; let my eyes adjust for a little while. Pressed it many times; I couldn't see the infrared coming from the LED with my naked eye. (But the camera on my iPhone imaged the infrared from the remote's LED.) I envy your biological wavelength detection.


Hmm, that's odd. I've noticed this with lots of remotes, I usually just look at them to tell if the batteries are dead. I wonder why I can see it.


This article really misses the facts of the Nyquest-Shannon theory.

In order to decimate a signal to 44.1 or 48khz, and preserve high-frequency content, high frequencies need to be phase-shifted.

This phase-shift is similar to how lossy codecs work.

For what it's worth: I'm a big fan of music in surround, and most of it comes in high sampling rates. When I investigated ripping my DVD-As and Blurays, I found that they never have music over 20khz. It's all filtered out. However, downsampling to 44.1 or 48khz isn't "lossless" because of the phase shift needed due to the Nyquist-Shannon theory.

I still rip my DVD-As at 48khz, though. There isn't a good lossless codec that can preserve phase at high frequencies, yet approach the bitrate of 12/48 flac.


> In order to decimate a signal to 44.1 or 48khz, and preserve high-frequency content, high frequencies need to be phase-shifted.

Your understanding of sampling theorem is incorrect. Sampling alone (not quantization, of course) is completely lossless under the critical frequency.

We demonstrated this in a very clear way near the end, at about 21 minutes in, on the primer two video: http://www.xiph.org/video/vid2.shtml where we show a square wave being phase shifted tiny fractions of the intersample length.


When you say

> In order to decimate a signal to 44.1 or 48khz, and preserve high-frequency content, high frequencies need to be phase-shifted.

What do you mean by high frequency? If you mean frequencies below but near the Nyquist frequency then no, there is no phase shift. If you mean at or above...

I'm struggling to avoid a blatant appeal to authority here, but your position is that the author of the Ogg Vorbis coded doesn't understand digital sampling, which seems challenging to believe.


no, it addresses it fairly clearly (albeit briefly):

> So the math is ideal, but what of real world complications? The most notorious is the band-limiting requirement. Signals with content over the Nyquist frequency must be lowpassed before sampling to avoid aliasing distortion; this analog lowpass is the infamous antialiasing filter. Antialiasing can't be ideal in practice, but modern techniques bring it very close. ...and with that we come to oversampling.

if you accept that the limit of hearing is around 20 kHz, then you must also accept that frequencies above that can freely be removed without loss of fidelity to the human ear.

the article notes that higher frequencies can be heard, but only in the form of ultrasonic intermodulation distortion. (i.e. not in fact the higher frequencies at all)


It may be worth noting (though it doesn't change any of the science) that this is from 2012.


With nowdays bandwidths, why do we keep using destructive compression for songs?


Lossy vs non-lossy compression is orthogonal to the sampling rate and bit-depth (which is what this article is about). While the MP3 standard effectively means sampling more than 48kHz is useless, there's no reason you can't have a lossy comprssion scheme that attempts to capture higher frequencies.


But the article makes a compelling case for why > 48kHz is completely pointless.


A lot of listening today occurs through streaming services, and huge uncompressed songs will eat into data plans quickly.


Dithering is a horrible thing to be doing, and 44.1 is an awkward rate. So while I agree that 192khz is dumb, 24/48 is a better standard than CD.


No, dithering (properly) is usually what you should be doing when you quantize.

See Vanderkooy and Lipshitz 1987 for why.


Paper seems to be paywalled. I can't imagine any possible purpose for dithering before encoding that wouldn't be better served by dithering on playback.


At 16 bits dithering is probably pointless for listening purposes.

What dithering does is it decorrelates the quantization noise with the signal. Absent it, quantization generates harmonic spurs. In theory, on a very clean and noiseless signal these harmonic spurs might be more audible than you'd expect from the overall quantization level.

In practice, 16 bits is enough precision that these harmonics are inaudible even in fairly pathlogical cases. But quantization eliminates the potential problem by replacing the harmonic content with white noise.

Adding noise on playback just adds noise, it would not remove the harmonic generation.

The _best_ kind of dithering scheme is a subtractive dither, where noise is added before quanitization and then the _same_ noise is subtracted from the dequantized signal on playback. This is best in the sense that it's the scheme that completely eliminates the distortion with the least amount of additional noise power. But it's not ever used for audio applications due to the additional complexity of managing the synchronized noise on each side.


> But it's not ever used for audio applications due to the additional complexity of managing the synchronized noise on each side.

Mersenne twister with a shared seed in metadata?


The article is saying you can use dithering to represent sounds quieter than your 16-bit "0000000000000001". That's what I'm objecting to.


Consider the case of a 1-bit image. Let's say the "signal" is a smooth gradient of black to white from one side of the image to the other. If you simply quantize to the nearest value, one half of the image ends up solid black and the other half, solid white. No amount of after-the-fact "dithering" of this image will recover the original gradient - it is lost forever.

Now supposing we add noise to our signal before we quantize. A given pixel at 25% gray (which under the previous scheme would always end up solid black) now has a 25% chance of ending up white. A contiguous block of such pixels will have an average value of 25% gray, even though an individual pixel can only be black or white. Thus, by flip-flopping between the two closest values ("dithering") in statistical proportion to the original signal, information is preserved.

https://en.wikipedia.org/wiki/File:1_bit.png


Sure, I know how it works. But it sacrifices resolution (spatial in this example, temporal in the case of audio) and compresses poorly. Rather than dithering, you should use a higher bit depth so that you can represent your original gradient (with the desired smoothness) directly.


You dither before quantization in order to decorrelate the quantization error. If you don't do this, you risk artifacts in any digital filtering (or for that matter, playback) done afterwards. This includes any requantization.

In audio if I recall correctly it is also important to avoid obvious noise modulation.


This is totally offtopic but I can't stand the "XXX considered harmful" stuff. I had to rage-quit the article.




Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: