Recommended reading: https://en.wikipedia.org/wiki/Loudness_war
edit: compression as in dynamic range compression, not data compression like mp3 in audio
Compression is the best tool we have for accurately reproducing the musicality and emotion of a musical performance. Without compression, most recordings would be unlistenable.
Don't confuse the foolishness of the loudness wars for "compression is bad". That's like saying the internet is bad because there's porn on it.
Compression is a style. There is far more to musicality and emotion than compression. The problem compression solves is that the environments where industrialized cultures now listen in are not dedicated listening areas, but alternately loud and quiet places, so compression makes all parts of the music almost equally loud so there are no drop outs where the quieter parts would be. There is no need to compress music in headphones, for example, to the extent that it is currently compressed.
I find compression and other techniques such as removing vocal breath sounds, makes most recordings unlistenable. They don't sound like humans anymore, but synthetic puppets animated by humans with conflicting values. Take the Foo Fighters, for example. They're popular, sure, but all of their songs sounds like one continuous din. Between the compression induced by the guitar distortion settings and the compression added to the recording, then the compression added by the radio station, it just sounds like a waterfall with a few bandpass filters changing between the verse and chorus.
Also their vocals have no dynamics. When he yells loud, the vocals don't get louder but the timbre changes. That changes it from cathartic to strained. The dynamics have all been flattened.
Why do you think the indie rock movement and bands and styles with wide dynamic range like the Pixies, Nirvana and dubstep got so popular? They eschewed the trend of hardline compression with alternating loud and quiet parts. They match the rhythm of human thought and motion which has fast and slow, detail and empty parts.
> That's like saying the internet is bad because there's porn on it.
Yes but on the internet you can go where there is no porn. Where can you find music with no compression?
One of these things is not like the others.
I can't tell if you're trolling here, or if you legitimately don't know much about audio engineering, because Dubstep (and electronic music in general) is probably the most aggressively compressed and limited genre of music out there.
If you actually do feel that Dubstep has a lot of dynamics, then you're misattributing the lack of dynamic range in a lot of modern music to audio engineering. What you're really bothered by is the songwriting and musicianship, not the engineering or compression.
it is extra-ironic, b/c yes dubstep uses compression a lot, but at the same time, a lot of the dubstep I've heard uses silence judiciously to create huge amounts of contrast - in between the speaker-shreddingly compressed passages.
Both house music and dnb IME tend to employ quiet as well to make the loud seem louder. It's mostly just rock and some americana these days on the radio that gets to me. Actually country is one of the most egregious genres, too. Sounds totally flat like AM radio.
That's a good way to put it. I hate over-compressed music. It sounds terrible and is no fun to listen to once you notice the issue.
But, are you an audio engineer? I ask because I work in the field, and all engineers I know accept that dynamic range compression is a necessary part of creating recorded music. The problem is when you abuse compression to create a wall of noise with no dynamic range.
It's like autotune: you can use it to correct one bad note, or you can abuse it create T-Pain.
The engineers behind The Pixies and Nirvana , and all dubstep producers use or have used compression at various stages of the creative process.
I guarantee you that whoever mixed Smells Like Teen Spirit spent a lot of time doing the compression on that song.
What you are referring to is only one particular use-case for which a "compressor" is used during music production: Most often people apply dedicated plugins to the "master" mix (the final result to be put onto CD or sold as a file). These plugins will apply what's called a "multiband compressor" which can reduce dynamics individually on different parts of the frequency range, and often do quite a lot more magic than what I claim to understand. -- Done excessively the result is the mentioned "always one volume" result of the loudness wars.
The compressor as a tool isn't limited to this use-case, though. You will, for example, put it on individual instruments' track to shape the relative strength of the percussive and decaying content of an instrument, a drum or a plucked guitar for example: Typical compressor plugins have a attack-time which shape how fast the reduction of gain follows an input signal. If you have this slower than the duration of the percussive sound, the instrument will sound more "agressive", instead of being leveled down, as the "attack" sound is increased, relative to the resonating portion. -- The end result sounds just like the exact opposite of the excessive master compression people complain about.
Then you will invariably have unintended peaks in a real-world recordings, and if you don't want to level down the whole recording to account for those peaks, or manually level them down for every instance, you'll also employ a compressor plugin on problematic tracks.
Then there's the possibility to filter the part of the spectrum that will trigger the compressor (it's "sidechain") or to let a compressor be triggered by one instrument (or group of instruments) and act on the signal of another instrument (or vocals, or group of tracks): That way you increase the percieved separation of different voices in your mix, again increasing the perceived dynamics of a song.
So, if you really want to find music with no compression at all, you'll likely only find classical recordings made only with one single X/Y microphone pair... :-)
Problem #1 - Audio speakers are not the same as instruments! If you've ever had the pleasure of getting your hair blown back by a cranked guitar amp or a drum kit... well, you just can't reproduce that experience through a set of earbuds. You can't exactly reproduce the sound of an acoustic guitar in your lap, or a violin in a beautiful hall. Speakers just aren't up to it. A recording is a miniature of a sound. Like all miniatures, it exaggerates some details and obscures others.
Second, most modern music consists of many instruments of different volumes and dynamic ranges played together. Mixing the sounds so everything sounds clear and balanced is extremely difficult. Some of the most important instruments are the worst about it - acoustic guitars and drum kits in particular walk all over everything else, with wide dynamic and frequency ranges that compete with steadier sounds. And certain harmonically-dense sounds like female vocals and violins can easily distort the audio amps used in reproduction. Compression, and its close cousin equalization, help engineers "carve" spaces for each instrument to live in.
You have probably never heard a recording without extensive compression, unless you've mixed records. And if you've been actively involved in recording, you'll already know all this.
That is your artistic choice, as it should be.
"The MP3 only has 5 percent of the data present in the original recording. … The convenience of the digital age has forced people to choose between quality and convenience, but they shouldn’t have to make that choice." -- Neil Young. 
Not every artist wants this to happen. They have no choice and listeners get a fraction of the sound recorded. This was not the case with vinyl.
@mborch, the exact compression method is of less importance than recognising that for all the compression being discussed, is a retrograde step from vinyl. Why?
Often digital plugins try to recreate the characteristics of such old tube monsters, because their sound is considered superior by some.
On the other hand, mastering plugins (including compressors and much more) nowadays are algorithmically that complicated that an analog version is not feasible.
EDIT: sorry, link is to an old limiter, not strictly a compressor, but one gets the idea
The source song, PSY's Gagnam Style, is the epitome of modern pop. I got a 3/10 on the listening test on a decent pair of Sennheiser headphones in a quiet room.
Some people are commenting that modern pop pairs well with 16-bit because of the heavy-handed mastering techniques and that older music thrives under 24-bits. Well, Audio Check offers the same 16 vs 8 test, using a Neil Young track from 1989... I couldn't fool myself into hearing any differences between the source WAVs at all and didn't even attempt to score the 10 soundbites.
I already knew that 24-bit vs 16-bit would be indistinguishable for pop music but I would not have expected that the same would go for 16-bit vs 8-bit. Or, well, I was kind of expecting it since otherwise the person who put this test together likely wouldn't have bothered.
However, I don't mind. I've never claimed to be "audiophile" -- a fact which is reflected by the inexpensive headphones I use :)
Yes, it has much more dynamic range, but it sounds wrong. Compression basically emulates what our ears naturally do when hearing very loud material, so compression gives one the feeling that the music is LOUD.
Yes, dynamic range compression is currently used/abused extensively in pop music productions, but if mixes weren't compressed they would have a much wider range for the sounds to play around in.
Not only is Justin Bieber's My World 2.0 louder than
Metallica's The Black Album, it's louder than The Sex
Pistols' Never Mind The Bollocks.
with older songs, you can turn them up louder, then the dynamic parts really punch you.
with newer songs, it's just a steady fatigue.
Compression is an important part of both experiences. I like to think of it this way:
The soundscape has two axis, one is the overall perception of loudness, the other is the range of frequencies occupied by a given element, or track if you will, in a piece of music. In a stereo, or multi-track recording such as surround sound, there are more axis involving perception of placement, or "imaging" But the basics are all that are needed to consider this compression matter.
When you take one of those tracks, say the drum, or the bass guitar, in isolation a raw playback at volume on a reference system is going to largely reproduce what went in. So far so good.
Now, start adding the various bits and what happens?
That soundscape gets crowded! Little quiet bits you could appreciate are suddenly lost amidst all that is going on, so what to do?
And that's a very good thing. First, few people actually have systems that can reproduce high dynamic range material in a quality way, and they don't have a listening environment that would make any sense if they did. So there is that. But more importantly, the details, subtle bits really are quiet! They will get lost, and so we compress the track overall to make sure those are present in the final mix, and pleasing to the ear too. Yes, there is a style and art to this, no doubt. But the compression really is necessary too.
A secondary reason for compression, and light processing by things like auto-tune, is to polish up a performance. Vocalist might be a little soft on part of an expression, or someone playing guitar might not deliver the same solid stroke every time. These things stand right out on a raw recording. If they get mixed in, they will get lost, or feel wrong, weak, etc... Compression can level this out and result in a solid, consistent sound. Again, the minds ear tends to want to hear this.
Ever listen to something a bit distorted and then play it back in your mind? Notice how your mind tosses out a lot of stuff, leaving you with that which you really craved? It's actually pretty difficult to recall something with high aural clarity for most people. Good production involves training the mind and the ear to actually pick this out so it can be managed into something people will really crave over and over.
One might also process things a bit too. This may be done to emphasize some characteristics of the sound with respect to the overall context of the mix. For a vocal, maybe it makes sense to punch the formant frequencies a bit, or dampen them to maximize the color and character in the vocalists vocalizations, for example. Our brain is an awesome audio machine, and it does a lot for us that recordings do not do. So we must bring those out and make them available to listeners of those recordings.
Now, comes more of the basics in the art!
What the minds ear hears is what we want to put on the record. Take that bass, compress it so that it occupies a smaller range in the loudness department (reduce it's overall dynamic range), and set it's LEVEL to one that's appropriate for it's overall contribution to the mix, which itself represents the overall perception of the music. Think two components here. The individual component of the mix, or track, has an overall dynamic range, but it also has a level at which it's present in the mix too. What makes sense here varies a lot and is highly subjective. Good production involves listening to the music and picking out what defines it as good, that which is resonant in terms of style, color, etc... A strong bass in one tune might make great sense, yet on another it might be pushed to the background more, etc... Depends.
A good producer will spread these things out in the soundscape so that the listener can hear them! Bringing up the little details, while bringing down the punch is needed to make room for it all to get in there and have an impact without fatigue or overloading the medium itself. A CD has it's limits, cassette other limits, radio still others, etc... An appropriate balance must be struck here, and that's not always optimal either.
This is what you are hearing when you listen to those older tunes, well produced. And it's damn good stuff too. You aren't wrong about it at all. Just blaming a good tool, when you should be blaming who ever is wielding it poorly.
(this is why remastered recordings exist Here's something fun. Go and get the DVD or Blu Ray of something you really like and compare it to what you might hear on the radio, or off a CD, or download. Often those are pushed out to the edges with the assumption the consumer has much better gear and or will pay for better production. Sometimes, shitty production on a CD can be avoided this way. Interestingly, some video game music gets remastered and I've heard more than a few wall of sound tracks sound great off my PS3...)
When this is done properly, the dynamic range is filled with a lot of things, each occupying their frequency range, sometimes overlapping, sometimes not, and each having an overall level that makes sense and that is aesthetically pleasing to the listener. This is why you can hear a great vocal right on top of that awesome guitar lick and drum set, despite the fact that they may be sharing a significant set of common frequency components.
Someone applied appropriate compression and some processing to make room for everything when it's mixed down into the final track. When they get it right, you don't even notice. It's just fucking good sound you crave. When they get it wrong, it's the tiring wall, or you find yourself straining to hear interesting bits that should just flow.
And I love this done well. To me, it's the most important thing I can say differentiates producers I love. Good production qualities in these areas are what makes a recording "pop" and you feel "there" No joke. This stuff, right here, is what makes a recording "immersive" for you. The sound goes right in, the audio engineers have made sure that's gonna happen, and it tickles your mind, taking you away from the fact that it's a recording.
Now, that brick wall... Take all that nice work, and listen to it on a great system, appropriate volume level, and the music will just stand out there, crisp, clear, every important part audible and enjoyable right?
A final processing layer can further crush this into a wall of sound, removing more and more of the dynamics to a point where it's all one intense thing, yet people can largely still differentiate the details! This can also be done, and likely and frequently is, in the mixing stage. But I'm mentioning it because commercial radio employs this processing extensively, due the limitations inherent in broadcast. FM, for example, has bandwidth and dynamic range / signal to noise limits that require this kind of processing to combat road and car noise.
newer digital radios are controversial, and a topic for another day, but they do not have those same limits, and can deliver the perception of a much better overall experience that many listeners will say is comparable to a CD. (it's not though, again another day)
It's the abuse that you don't like. Neither do I. Often the clowns even let it clip a little for "grit" or some other BS. That really sucks, because we don't even get the music, crushed as it is! But, let's set that ugly crime aside, and just stay with the brick wall for a moment longer.
It's all still there, LOUD, and that is tiring to you because it does not "breathe", "punch", etc... Graphically, it's a lot like cranking the gamma up on images. More subtle details stand out in more conditions, but the overall depth and feel are lost... and it's tiring to listen to, even at low volume. Our minds expectation of what things sound like can clash with this kind of excessive production, even though the first impression can be good. Producing for that initial, "wow" by maxing out the medium is, IMHO, always an ugly mistake, but the brutal truth is sales metrics select for LOUD over GREAT.
All that said, compression is good. We need it. Vocals, in particular, can very seriously benefit from appropriate compression and processing to really bring out the harmonics inherent in a great set of pipes owned by someone who knows who they are and how to express that well. But it all adds up, and it all needs that bit of compression love to make sure the good stuff isn't lost on the way into your head.
Maybe this helps a little to get at where some of the pain and fatigue really is.
Now, I'm not Gaga's recording engineer, and for all I know you're absolutely right and her & her team remastered each track without compression for the vinyl pressing. Considering how rare such a process is in industry recording, however, I doubt it. I assume that really the difference you're hearing is typical of all vinyl recordings: a type of distortion that comes from both pressing and playback through an RIAA preamplifier.
People use all sorts of terms to describe the difference between vinyl & digital sound, but mostly descriptors center around words like "warmer," or "more open" or "brighter." These are really just beneficial side-effects of the standards developed to overcome vinyl's limitations as a recording medium.
So, in reality quality and compression come in to play long before the pressing actually occurs, on the mastered track itself. I've come to realize that a poorly handled transfer to vinyl will often sound not nearly as clear or detailed as a high-res or even "Mastered for iTunes" track.
• The audio is subjected to low-pass or all-pass
filtering, which can result in broad peaks becoming
• The amount and stereo separation of deep bass content
is reduced for vinyl, to keep the stylus from being
thrown out of the groove.
Hence the RIAA curve, which is an eq curve applied to the mastering before put to vinyl that reduces the lows and exaggerates the highs. It is reversed by the phono preamp during playback.
I was skeptical of this, so I sought out a copy on Youtube. "Better" is subjective, and this is definitely lower quality overall (given the Youtube compression). But it is undeniably a very different song. I've listened to the CD version of this song countless times, and I don't think I ever remember hearing the male voice in the intro so prominently.
For someone hearing the song for the first time (or even the first few), it may not be as obvious, but as someone who's used to hearing that song on a regular basis, it definitely jumps out as having a crisper sound.
There's a major 60hz ground loop in there, too.
But yeah, it's obvious when I have my car stereo nearly on max to listen to classical or jazz, and then if I turn the radio and get a pop music station my ears are about to explode.
Also, how exactly does it sound when most of my recording is at -10dbRMS and a couple peaks are at -5dbRMS?
Are you using "normalize" in the audio production sense, as in adding or subtracting a constant gain to an entire track? I'm pretty sure that iTunes doesn't do this, at least not by default.
Anyone that has mixed 120+ tracks for a single "modern rock" song with a compressor at the end would tell you so.
The good thing about compression is that it allows you to save your hearing quite a bit. Some music have dramatic parts that get super loud, which can have awesome emotional response; but it does take a toll on your hearing, unless you are in a very silent environment or have fantastic headphone insulation (I have none) -- so compression actually allows me to hear everything the music has to offer. I also use compression tool on my soundcard to play most games, specially FPSes that have incredibly loud bangs and yet you need to hear footsteps and quiet environmental noises -- with a compressor that's possible without blowing up your ears.
This does degrade the quality, sometimes heavily. Nevertheless it is done by mastering engineers (they rarely enjoy it) as well as by radio and tv stations extensively because of the psycho-acoustic fact that a songs appears to be better if it is played louder. This gives them an advantage over the competition: On average, people searching for a radio station are more likely to listen to your radio station if it is louder than the competition.
The main issue lies in the fact that the current peak measurement of audio signals does only marginally correlate with the perceived loudness and heavy compression is used to trick this system. The broadcasting industry is aware of this. An open and quite effective loudness measurement algorithm  has been introduced a few years ago and it gets slowly adapted all over the world by new broadcasting laws: AGCOM 219/09/CSP (Italy), ARIB TR-B32 (Japan), ATSC A/85 PRSS CALM Act (US), EBU R128 (Europe) and OP-59 (Australia). iTunes Soundcheck is also based on  and since this year Youtube applies this to newly uploaded videos as well . Even games use  to keep their audio at a consistent loudness.
So slowly, the over-usage of compression does not give music producers and broadcasters any advantage anymore and beautiful dynamic music will be competitive again.
I have collected some links  about this topic. Because of the lack of any affordable implementation at the time I created one myself  with some additional notes .
 ITU-R BS.1770, http://www.itu.int/dms_pubrec/itu-r/rec/bs/R-REC-BS.1770-4-2...
Edit: Some useful educational material to read before moderating: https://en.wikipedia.org/wiki/Loudness_war .
Maybe if you really mangled your audio by encoding at extremely low bit rates.
But in general, no.
mp3, for example, loves to save space by cutting away sounds just above the noise floor, and some less-noticeable frequencies.
Convert it to MP3 directly, use whatever settings you want
Watch as the mp3 rip is now clipping due to further compression of the dynamic range.
I have a 96khz/24bit interface that I use and ATH-M30X headphones, and I can tell a difference between at least some 24bit FLAC files and 16bit highest-quality-possible MP3s. I was mixing my own music and the difference was quite obvious to me. The notable thing was that drum cymbals seemed to have a bit less sizzle and such.
Now that being said, if I hadn't heard the song a million times in it's lossless form from trying to mix it, I probably wouldn't have noticed, and even then it didn't actually affect my "experience".
I'm one of those guys that downloads vinyl rips as well, but I do that mostly just to experience the alternative mastering, not that I think it's higher quality or anything. (though I have heard a terrible loudness-war CD master that sounded great on vinyl with a different master)
They're pointless for playback.
That is really the central issue. It's much like imaging since the time of Ansel Adams: the sensor can capture more dynamic range than the human eye can experience. The producer may have use for that range when editing, but the audience will never know what was -- may have been -- missed. And we're not talking about limits of reproduction. We're talking about the human sensors both instantaneous and absolute upper and lower bounds.
That's not really true. Dynamic range refers to the difference between the biggest value that isn't clipped and the smallest value that isn't rounded to zero. The human eye is a logarithmic detector, cameras are linear. The only reason HDR is a thing is because cameras DON'T have enough dynamic range.
However, that's neither here nor there because the human eye is not the real bottleneck here. The media we use to display photos are. Printed photographs have approximately six stops of DR; typical monitors have eight. Modern cameras capture much more information than can be displayed, and the raw sensor data must be tone-mapped either by the camera software or in post-processing to produce a viewable image. There is a lot of latitude in deciding how to map 2^14 discrete values of input to a mere 2^8 values of output.
Nice computer displays (and mobile device displays) without glare and with the brightness cranked all the way up can get up to something like 9.5–10.5 stops.
Of course, that range still pales in comparison to the contrast between shadows and highlights on a sunny day, which can be more like 16+ stops.
First, as msandford pointed out, the human eye has significantly better dynamic range than image sensors. Technically, our eyes have a lower range at any specific instant, but due to the way our eyes work we effectively see upwards of 20 stops of dynamic range. The best sensors available (in medium format cameras, pro-DSLRs, etc) can only capture 14-15 stops.
Second, some black and white films have a better dynamic range than digital sensors, so it's also not the case that digital is strictly better. 18-20 stops isn't unheard of for some types of film.
If I airplay a song from my iPhone and have the volume at 50% set in software, then a few extra bits can help. Not sure if it makes a noticable difference, but it's a digital mixing scenario occurring at playback. If you play at extremely low volume it should be noticable.
I see it this way: let's say 16 bits is needed to represent the entire discernable dynamic range between the threshold of hearing and the threshold of pain. if you turn the volume to 50%, then you throw out 8 bits, but you also only need to represent 8 bits worth of hearing range.
With a 24bit stream you can easily give up a few bits without losing dynamic range.
The basic problem: the quieter a sound or detail gets, the fewer bits of resolution are used to represent it.
In 16-bit recording, there simply aren't enough bits to represent very low level details without distorting them with a subtle but audible crunchy digital halo of quantisation noise.
In a 24-bit recording, there are.
Talking about dynamic range completely misses the point. It's the not the absolute difference between the loudest and quietest sounds that matters - it's the accuracy with which the quieter sounds are reproduced.
This is because in a studio, 0dB full-scale meter redline is calibrated to a standard voltage reference, and both consumer and professional audio has equivalent standard levels for the loudest level possible.
These levels don't change for different bit depths, and they're used on both analog and digital equipment. (In fact they've been standard for decades now.)
This is why using more bits does not mean you can "reproduce music with a bigger dynamic range" - not without turning the volume up, anyway.
What actually happens is that the maximum possible volume of a playback system stays the same, but quieter sounds are reproduced with more or less accuracy.
In a 16-bit recording quiet sounds below around 50Db have 1-8 bits of effective resolution, which is nowhere near enough for truly accurate reproduction. (Try listening to an 8-bit recording to hear what this means.)
You might think it doesn't matter because they're quiet. Not so. 50dB is a long way from being inaudible, ears can be incredibly good at spectral estimation, and your brain parses spectral content and volume as separate things.
There's a wide range between "loud enough to hear" and "too loud" and 24-bit covers that whole range accurately. 16-bit is fine for louder sounds, but the quieter details just above "loud enough to get hear" get audibly bit-crushed.
The effect isn't glaringly disturbing, and adding dither helps make it even less obvious. But it's still there.
24-bit doesn't need tricks like dither - because it does the job properly in the first place.
Now - whether or not commercial recordings have enough musical detail to take full advantage of 24-bits is a different question. For various reasons - compression, mastering, cheapness - many don't.
But if you have any kind of aural sensitivity, you really should be able to A/B the difference between a 24-bit uncompressed orchestral recording and a 16-bit recording using an otherwise identical studio-grade mixer/mike/recorder/speaker system without too much difficulty.
You are slightly confused. (It may help to remember that a decibel always refers to a ratio, so the setting of your volume knob is not important.) Greater bit depth does allow for greater dynamic range, this stems directly from the definition of dynamic range. 16-bit audio has a theoretical dynamic range of:
10 * log10 (2^16)^2 ~ 96dB
10 * log10 (2^24)^2 ~ 144dB
The idea that having bits in excess of this amount will somehow result in the perception of a smoother or more accurate sound is fallacious. Even at maximum playback volume, this information will exist well below the noise floor and will simply not be perceived. In fact, this information will likely exist well below the noise floor of the recording studio and thus, in some sense, will not even be recorded.
"Talking about dynamic range completely misses the point."
There is nothing magic about 24 bits here. Record something with 48 bits but set up your equipment all screwy so your only actually using the first 8 bits... and you've got an effectively 8 bit recording.
In real world applications the codec is giving you trouble with the low amplitude stuff, not the quantizer. Not that in realistic situations your equipment is likely to be able to generate this cleanly anyway.
"24-bit doesn't need tricks like dither - because it does the job properly in the first place."
On playback, the issue goes the other way around. If you've mastered things correctly you'll be using the available dynamic range of the output in such a way that the information content of your signal is well represented. This is sufficient at CD rates for all practical listening scenarios.
As someone who both records/mixes albums and a live-instrument musician, a live instrument in the room sounds utterly different than any recording. Not necessarily worse, just different. The pursuit of "accuracy" in audio playback is childish and naive. The sound of a recording is a function of technical limitations, compromises, and aesthetic decisions as much as it is a product of the raw source sounds. Don't make it sound accurate, make it sound GOOD! And that usually means a lot of compression, and often deliberate distortion.
The article is still correct, just like it always was.
Ironically, most of your analysis is also correct. Somewhere in your understanding though, you're leaping sideways to an incorrect conclusion.
>The basic problem: the quieter a sound or detail gets, the fewer bits of resolution are used to represent it.
So far so good, but you're about to go wrong again once you start thinking in terms of stairsteps and boxes and looking instead of hearing.
Back to the bits.
What lower amplitude (and fewer used bits) means is that the sound, as represented, is not as far above the noise floor as a full-amplitude sound. The digital noise floor is completely analogous to, eg, the noise floor of analog tape. If you use a dithered digital representation, you get something that behaves exactly as analog does. You hear and perceive both the same way.
>In 16-bit recording, there simply aren't enough bits to represent very low level details without distorting them with a subtle but audible crunchy digital halo of quantisation noise.
On an audio tape, the magnetic grains are just too large to represent very low level details without distorting them with a subtle but audible crunchy halo of analogue distortion and hiss.
In a 24 bit recording, the noise you mention is still there! It's just shifted down [theoretically] 8 bits or -48dB. That's the only difference. The noise floor is lower.
[In reality, 24 bit isn't. Most recordings don't even hit a full 16 bits, and no recordings, unless they're mathematically rendered, can get deeper than about 21 bits. There is no such thing as a 24-bit audio ADC/DAC that delivers 24 bits. The very best available today
are about 21 bits of signal + 3 bits of noise.]
So the difference in playback between 16 bits and 24 bits is about 5 actual bits. If you're complaining about soft sounds in a 16-bit recording 'not having enough detail' because they're down at, say, 3 bits of resolution, are you saying it's all fixed by using 8? Aren't 8 bits woefully too few for any kind of quality sound?
(I hope at this point, you realize you're barking up an incorrect tree)
If you're following me so far, we can continue, but I expect even this much is going to require more conversation.
This is only minor nitpicking, but the standard 0dB levels for professional audio (0dB reference at +4dBu == 1.23Vrms) and consumer audio (0dB reference at -10dBV == 0.32Vrms) are not meant to indicate the maximum ("loudest level possible") but just serve at a reference point, for e.g. the 1kHz sine you inject when setting up your gain throughout the signal chain. On most studio gear, you'll easily have +15dB headroom left.
AD/DA converters haven't really standardized on a full-scale level and there are quite a few different definitions in use: https://en.wikipedia.org/wiki/DBFS. Most "line level" ADC/DACs will have switches or jumpers to select between two or so settings. You'll choose them so that you are not likely to clip your ADC, and will only playback on your DAC with an appropriate level trimmed to not clip your analog gear.
In a 16-bit master, a noise shaping function is applied during down-conversion, by which quantization noise will be re-distributed so that most of the noise energy goes to the high frequencies (>15k) where it is completely inaudible.
For a good example of such a recording, see Ahn-Plugged (Ahn Trio, 2000, Sony BMG Masterworks). Fire up a good spectrum analyzer. You'll find the noise floor is well below <110 dB throughout most of the spectrum, even though it's 'just' a 16 bit CD.
Besides, mp3 [audio] compressions have difficulty in handling specific samples, or type of samples (eg. sharp attacks), and they may manifest artifacts independently of the bitrate; MP3, AFAIK, also has a ceiling of 320 kbps within the standard specification, which certainly doesn't help.
Secondly, I'm not sure if you process further the MP3s (when you refer to mixing), but if you do, you're definitely going to make noticeable, artifacts which weren't so in the unprocessed MP3 form.
It's possible you are just hearing the difference between codecs. You'd have a fairer comparison with 24-bit vs 16-bit FLAC.
Even 128Kbps MP3s render cymbals better.
Yes, non-linear effects can be sample rate sensitive. However-- this really means that their internal model is aliasing and not faithfully simulating an infinite sample rate system.
In an ideal world, effect that needed more sample rate would internally upsample/downsample (or be constructed in a way that they didn't need to). Then they would behave consistently across rates; though doing this would waste cpu cycles.
In any case, the article is all about distribution. Having excess rate in mastering is cheap and harmless, and-- because of these reasons, can be practically pretty useful.
The difference you hear is the difference between flac's lossless format and mp3's lossy format it has nothing to do with 16 bit versus 24 bit.
The amount of misinformation / junk-science in the audio world is preposterous. There's a religious-cult of an industry that feeds off the ignorance and placebos of its participants. I have many friends who swear by their What.cd 24/192 FLAC vinyl rips and spend hundreds of dollars on audiophile AC wall outlets. Not to say that there are no differences in high-end audio equipment, but so much of what's "good" is subjective.
Previous discussion https://news.ycombinator.com/item?id=3668310
That said, there's one thing the article does not address and that is "beating", or really inter-modulation distortion from instrumental overtones.
Instruments are not limited to 20-20kHz. They can have overtones well above this range. Additionally, note that short pulse-width signals, i.e. transients, like drum strikes, especially involving wooden percussion, can have infinite bandwidth. (Not really infinite, but pulse-width is inversely proportional to bandwidth.
In a real listening environment (i.e. live performance) these overtones have a chance to interact with one another in the air. It is possible that these overtones may beat with one another and cause inter-modulation products in the audible range. For an example of this, play a 1000 Hz tone through your left speaker, and a 1001 Hz through your right speaker. You will hear a distinct 1 Hz "beat". The audibility of these are largely dependent on listening position and amplitude, but it is possible to occur with instruments. Since most recordings are done using a "close mic" technique (placing the microphone very close to the source) the interactions such as this are never recorded.
However, if full bandwidth of the producing instruments is preserved, these interactions of the overtones can be reproduced in a playback environment given equipment having a wide enough bandwidth and degree of quality.
The comparison of a 1hz beat to a 1hz sound should be absurd on its face: you need about 20-30hz to become audible, and it's a low rumble more felt than heard. Very low frequencies sound absolutely nothing like intermodulation beats.
You can look at your "A440 at a constant volume" example as a 0Hz(DC) signal getting louder and softer 440 times a second, but this is the only case in which your example holds. Amplitude modulation creates sum and difference frequencies, so the A that you hear is 440Hz + 0Hz. if you change that 0Hz to 1kHz, you get a signal thats the sum of a sinewave at 560Hz and a sinewave at 1.44kHz, neither of which are an A.
The distinction is that the 1Hz signal is modulating the audible signal, not adding to it, if you look at the spectrum of the sum of those frequencies there is no 1Hz component, whereas if you added a 1Hz signal you'd get something completely different. And in this case the amplitude of the signal is always changing faster than 1Hz.
Edit: another way of looking at it: You wouldn't say you can "hear DC" because you can hear an A440 played at a perfectly constant volume.
Counterintuitively, there is no frequency component generated at the beat frequency when you sum a 1kHz and 1.001kHz signal, its easy to test that out with matlab, octave, scipy/numpy/matplotlib, etc. Generate the two signals, add them together, and look at the Fourier transform, you'll see two components, one at 1kHz and one at 1.001kHz (assuming you take a long enough window to have that type of resolution) and no component at the beat frequency. A third sinewave doesn't just jump out of nowhere when you add two separate sinewaves together.
If you take the sum of those two signals and run them through an ideal brickwall highpass filter at 999Hz so there are no frequency components below 999Hz, you'll still "hear" the beat frequency because it isn't a separate spectral component, its just the two signals slowly going out of phase, cancelling eachother out, and then going back in phase and boosting the amplitude.
Even if it did produce some new frequencies through some non-linearity (which is negligible in most environments afaik), the recording equipment would capture the low frequency waves produced by those interactions. So the only question is whether there are significant non-linearities in our hearing system, and the overwhelming evidence is no again afaik.
> Once you're driving air so hard it becomes nonlinear, thus introducing intermodulation distortion in the air, that distortion produces actual audible-range distortion products. And because the distortion you're hearing is in the audible range, a recording will sample and reproduce it accurately.
> You're hearing the audible _result_ of IMD, you're not somehow listening to the distortion curve itself.
Have you been to a concert? There's no recording/playback technology that can reproduce anything close to the sound of a full orchestra. It's all lossy.
Have you read the article/quantization discussion on what is meant by lossy? If your recording equipment is good and your reproduction equipment is good, 16/44 is enough to reproduce the concert sound perfectly (as far as human hearing goes). What you do not experience is everything else but the sound -- the vibration of the super loud bass on your skin, the energy of the public, the beauty of the venue.
Second, as far as I know our hearing is composed of linear excitation elements (they have a definite bandwidth), and this is confirmed pretty well by experiments with human hearing -- you can see the threshold of our hearing at about 20kHz and that we experience tones of different frequencies fairly independently. Those assumptions imply that two tones, one at e.g. 50kHz and another at 50.001kHz are inaudible, end of story.
You can actually do this experiment yourself if you have a signal generator that can do 1Hz amplitude modulation and drive a transducer with a non-negligible sensitivity in that range.
 AFAIK most music is not recorded like that, instruments are recorded separately and then overlaid; but then adding realistic-sounding "beats" based on whatever positioning the sound engineer envisions should be possible in software?
Beating and intermodulation distortion are entirely different things. They look similar on an oscilloscope, but they're not and they don't sound the same.
>Instruments are not limited to 20-20kHz. They can have overtones well above this range.
Correct. You can't hear the overtones beyond the upper portion of the hearing range (many people believe you can).
>In a real listening environment (i.e. live performance) these overtones have a chance to interact with one another in the air.
In reality they do not unless you're driving the air so hard the trough rarification is approaching hard vacuum. (That's not actually impossible. It's how ultrasonic audio 'beaming' devices work). Some performances are powerful enough to get close, eg, if you're sitting six feet from the pipe organ.
Once you're driving air so hard it becomes nonlinear, thus introducing intermodulation distortion in the air, that distortion produces actual audible-range distortion products. And because the distortion you're hearing is in the audible range, a recording will sample and reproduce it accurately.
You're hearing the audible _result_ of IMD, you're not somehow listening to the distortion curve itself.
> It is possible that these overtones may beat with one another
You're continuing to confuse beats and IMD, but here you're talking about beat frequencies, so Yes. But beat frequencies are a sort of auditory illusion. If one of the frequencies that would produce a beat is inaudible--- there's no beat. Easy to test, go try it.
> and cause inter-modulation products in the audible range.
IMD is not a beat. Inaudible ultrasonics will produce audible artifacts when the underlying reproduction system is nonlinear (another way of saying 'there's intermodulation distortion'). However, that's a playback artifact. If the IMD products were audible in the original signal, audible range sampling would reproduce them.
If it wasn't audible in the original performance, it should not be part of the recording, and it should not be part of the playback.
People today are often amazed when they listen to CD or turntable content through 70's era crossover speakers. Back in the 70's you'd have a stereo with 2 "speakers" that each had 3 subspeakers for a total of six speakers. The fad today is to have 5.1 sound with a single driver in each satellite, also a total of six speakers. The spatial resolution increase is good for movies, games and TV but surround sound in music is marginal. An amazing number of old "classic rock" recordings were done in quad and anything by Donald Fagan will sound pretty good w/ Dolby Pro Logic, there are some more recent Bjork recordings, but almost everything is mixed for stereo and what you loose in frequency response is not compensated by anything, except perhaps the ability to produce more volume with more speakers.
It made sense to me, and I love how the speakers sound. Understanding is not inserting distortion makes even more sense.
I was listening to Marvin Gaye on my friends system and I could hear that there were several different backing singers all moving and at different distances from the microphone.
Are there any double blind trials anywhere of Vinyl/CD/24-192khz with super high end hifi systems? Mostly I see people suggesting that these tests are performed from the phono output of a mac with a pair of average ear buds...
You were listening to £20,000 worth of amps and speakers, and you were most likely in an acoustically treated room.
Also, novelty is almost always euphonic when it isn't overtly bad. This fact is often neglected. You hear something you didn't hear before and your brain immediately tells you that it sounds better, even if it doesn't actually represent higher fidelity. Actually making an objective judgement requires a careersworth of experience, or a test lab and the skills to use it.
For example: you were listening to vinyl, which is covered in delicious noise and warm harmonic distortion, and is mastered differently. Highly euphonic, very novel if youve only ever heard the CD version before, but definitely not higher fidelity.
BTW higher end DACs do sound better, but the rest of your signal chain needs to be really good for you to notice it. It's often to do with better phase accuracy between the left and right channels, which affects the soundstage, or stereo image. If your speakers/amp have loose timing however, you'll never be able to tell.
This hasn't passed the blind tests either. A good, 100 dollar dac (a schiit or an odac) will sound just as good as a 1000 dollar dac.
Forget about spending £1000s though. I'm sure that ODAC thing sounds better than the RME, I saw that test where it was identical to the various industry standard units.
Worse of course if the sound card is integrated into a computer due to the opportunity to pick up much greater ambient RF noise from other components, although that is less of a problem now than it used to be back when I could hear my hard drive kicking up on my speakers...
In any event, I'd absolutely believe that the quality between a $100 DAC assembly and a $200 system is enough to be noticeable. More than that and I'm very skeptical. So I guess I don't really disagree with your statement, but I think that in current dollars $100 isn't necessarily enough to pay for solid underlying engineering and good components.
This fact alone should cause you to question your subjective experience. You have no idea what part of that system was contributing to what you found pleasant. Someone who knew what they were doing could probably build a $2000 system that would blow you away just the same.
And if you were playing vinyl, there wasn't even a DAC present in the signal chain :)
Vinyl mastering is sometimes better than CD mastering though, due to the loudness war.
I would love to sell my turntable and vinyl collection and rely purely on digital formats. Takes up less space, technically superior format, etc.
But one thing keeps me buying vinyl:
AWFUL mastering on CDs. A significant portion of LPs are released with more normal mastering on the vinyl, while the CD will be brickwalled all to hell.
I listen to metal, and rock as a broader genre is particularly bad about it. One of my favorite albums of last year, Fallujah's The Flesh Prevails, had a dynamic range of 2 to 3 on almost every track on the CD. The vinyl master? 9 to 10. Still not great, but leaps and bounds better. The CD actually clips if you convert the songs into MP3.
Until they go back to not murdering CD mastering, I'll continue buying vinyl :(
(I know your comment isn't directly about vinyl being bad or anything - I just have a compulsion to bitch about the loudness war any chance I can)
The optimistic people have said that iTunes and YouTube not allowing the high volume compression will kill the loudness war , but, it is still happening .
Not on a laser stylus turntable.
I don't know about double blind trials but people do tests on their own. It's further complicated though because the hardware you use could be optimized for certain types of music, e.g. have a read through http://arstechnica.com/gadgets/2014/07/some-of-the-worlds-mo...
The article mentions just such a study performed with high end equipment.
24bit also means we don't have to record at 0dBFS, which saves a lot of time.
By comparison, 16 bit audio can "only" record a whisper in a library and a motorcycle or jackhammer.
Double blind tests show that 8 bits are not enough, but 14 bits are.
You're right of course that it will compress less well, but that's to be expected because you've lost less information!
Store the 24-bit signal, and you could do a dithered downsample to 16-bit on playback if you think that's a good idea. Wouldn't that be better all round?
For playback though, I agree that 14 bits are probably enough. Even high quality mastering tape has the equivalent of about 12 bits of dynamic range, which is fine. Many fabled analog pieces of equipment have terrible signal-noise characteristics, but are still valued for other reasons (coloration, distortion etc...)-which is all fine by me.
If you have the time, watch the two videos that xiph.org did a few years ago. There's a great in-depth explanation, as well as a hands on demonstration to demonstrate this reality.
I got a Denon one. I haven't played any SACD on it yet (I got it for bluray), though I guess I could easily find some at that video rental store (in Tokyo).
Because digital filters have few of the practical
limitations of an analog filter, we can complete the
anti-aliasing process with greater efficiency and
precision digitally. The very high rate raw digital
signal passes through a digital anti-aliasing filter,
which has no trouble fitting a transition band into a
My understanding: If you have a an analog filter of a given steepness the only way to further reduce aliasing effects digitally is oversampling. Or less steep (cheaper) analog filter plus oversampling is the same as steeper (more) expensive analog filter. People tend to say digital anti-aliasing filters when they really mean oversampling.
"24/192 music downloads make no sense" seems to be a thoroughly researched and carefully written article. It explains oversampling very well, possible confusion with digital filtering (anti-aliasing or not) is out of question. But then it goes on to talk about digital anti-aliasing filters, which makes me afraid I could be wrong.
Do digital anti-aliasing filters exist?
> My understanding: If you have a an analog filter of a given steepness the only way to further reduce aliasing effects digitally is oversampling. Or less steep (cheaper) analog filter plus oversampling is the same as steeper (more) expensive analog filter. People tend to say digital anti-aliasing filters when they really mean oversampling
You're right, and it's actually both. The ADC can run at a much higher sample rate with a cheaper analog filter, and then that digital signal is again passed through a digital filter and downsampled.
Or generate a tone sweep in audacity.
You loose the ability to hear high frequency sounds as you age.
Personally I can hear up to about 14kHZ
If more people prefer the sound at the higher bitrate and sampling rate, then that's the better format, even if there's no technical reason why that format is superior.
Much like how some people prefer the "warm" sound of tube amps, even if that means more distortion.
>Empirical evidence from listening tests backs up the assertion that 44.1kHz/16 bit provides highest-possible fidelity playback.
You can read the article if you want to find the actual references. No one is arguing that higher rates/bits produces any sort of distortion that anyone would prefer.
The difference from my perspective is that an amp is a tool for sound production while a digital music format is a tool for sound reproduction. When producing sound, choosing more distortion over less distortion is a valid choice. When reproducing sound, the goal should be accurate reproduction of the original.
That's a pretty cool feature for Ozone 7, for sure! I'm still using Ozone 5 and don't feel a need to upgrade, but that might make it...
The published papers tend to be by the same people making the supertweeters
Unfortunately, there is no point to distributing music in 24-bit/192kHz format. Its playback fidelity is slightly inferior to 16/44.1 or 16/48, and it takes up 6 times the space.
This has all been known to anyone with actual signal processing and/or audio engineering knowledge for a long time now. As in, common knowledge to the kinds of folks attending the AES conference at least back to ~2001 or so. The high sample rate/bit depth stuff is useful for production process, but irrelevant for final distribution.
Those who don't have a oscilloscope can see the picture here:
See the digital media primer 2 for more information on that: https://wiki.xiph.org/Videos/Digital_Show_and_Tell
If humans were able to hear audio above 22kHz (or what not) in any meaningful way, we'd expect to be be able to demonstrate that effect in carefully controlled studied and then that lack of low-passing may matter; but that isn't what the best evidence so far shows.
I've never seen the hype from artists about 24/192 as being about better listening experience. It's about handing their consumers a better master so as to encourage and enable more of them to be remixers.
Edit: christ, I mixed up bitrates (e.g. 192kbps) with sampling frequency (e.g. 192kHz) again. I was referring to 64kbps streams.
Edit: apparently my memory is worse than I thought.
In order to decimate a signal to 44.1 or 48khz, and preserve high-frequency content, high frequencies need to be phase-shifted.
This phase-shift is similar to how lossy codecs work.
For what it's worth: I'm a big fan of music in surround, and most of it comes in high sampling rates. When I investigated ripping my DVD-As and Blurays, I found that they never have music over 20khz. It's all filtered out. However, downsampling to 44.1 or 48khz isn't "lossless" because of the phase shift needed due to the Nyquist-Shannon theory.
I still rip my DVD-As at 48khz, though. There isn't a good lossless codec that can preserve phase at high frequencies, yet approach the bitrate of 12/48 flac.
Your understanding of sampling theorem is incorrect. Sampling alone (not quantization, of course) is completely lossless under the critical frequency.
We demonstrated this in a very clear way near the end, at about 21 minutes in, on the primer two video: http://www.xiph.org/video/vid2.shtml where we show a square wave being phase shifted tiny fractions of the intersample length.
> In order to decimate a signal to 44.1 or 48khz, and preserve high-frequency content, high frequencies need to be phase-shifted.
What do you mean by high frequency? If you mean frequencies below but near the Nyquist frequency then no, there is no phase shift. If you mean at or above...
I'm struggling to avoid a blatant appeal to authority here, but your position is that the author of the Ogg Vorbis coded doesn't understand digital sampling, which seems challenging to believe.
> So the math is ideal, but what of real world complications? The most notorious is the band-limiting requirement. Signals with content over the Nyquist frequency must be lowpassed before sampling to avoid aliasing distortion; this analog lowpass is the infamous antialiasing filter. Antialiasing can't be ideal in practice, but modern techniques bring it very close. ...and with that we come to oversampling.
if you accept that the limit of hearing is around 20 kHz, then you must also accept that frequencies above that can freely be removed without loss of fidelity to the human ear.
the article notes that higher frequencies can be heard, but only in the form of ultrasonic intermodulation distortion. (i.e. not in fact the higher frequencies at all)
See Vanderkooy and Lipshitz 1987 for why.
What dithering does is it decorrelates the quantization noise with the signal. Absent it, quantization generates harmonic spurs. In theory, on a very clean and noiseless signal these harmonic spurs might be more audible than you'd expect from the overall quantization level.
In practice, 16 bits is enough precision that these harmonics are inaudible even in fairly pathlogical cases. But quantization eliminates the potential problem by replacing the harmonic content with white noise.
Adding noise on playback just adds noise, it would not remove the harmonic generation.
The _best_ kind of dithering scheme is a subtractive dither, where noise is added before quanitization and then the _same_ noise is subtracted from the dequantized signal on playback. This is best in the sense that it's the scheme that completely eliminates the distortion with the least amount of additional noise power. But it's not ever used for audio applications due to the additional complexity of managing the synchronized noise on each side.
Mersenne twister with a shared seed in metadata?
Now supposing we add noise to our signal before we quantize. A given pixel at 25% gray (which under the previous scheme would always end up solid black) now has a 25% chance of ending up white. A contiguous block of such pixels will have an average value of 25% gray, even though an individual pixel can only be black or white. Thus, by flip-flopping between the two closest values ("dithering") in statistical proportion to the original signal, information is preserved.
In audio if I recall correctly it is also important to avoid obvious noise modulation.