This was posted almost a year ago and enjoyed a decent conversation in the comments [0]
I'll repeat my comment from the time:
"For what it's worth, the presence of seemingly significant signal in the difference between the original and compressed tracks does not necessarily mean that significant sonic/perceptual loss has occurred. Operating correctly, the encoder is designed to cut not just sounds that the human ear cannot hear in general (e.g. sounds above 22kHZ) but also sounds which may not be perceptible in context (e.g. the quietest signals in a loud section). So if you find something beautiful about the ghost tracks (and I think there is something beautiful to find), don't immediately jump to concluding that mp3 is awful for cutting these sounds—they might be hardly perceptible when added to the mix.
Of course, at high-compression rates mp3 does begin to significantly degrade fidelity.
Edit: all of this is not to put down the project—I still think it's pretty cool as art and as a demonstration of the encoder, I just didn't want people to think that this was some sort of massive failing of mp3."
The information present here isn't really "lost" as much as it can't be heard in the context of the other sounds in the original recording. These forms of audio compression take advantage of auditory masking[1] which means those sounds likely wouldn't be heard in the original.
A quick scan didn't reveal to me whether he's time aligning the signals during the subtraction. I've played with listening to the wav-mp3 signal before, and I seem to recall that the mp3 encoder would introduce a little delay.
I added a transient pulse in front of the music so that I could (visually) time align the signals before subtracting them.
(Dependencies: python 2.7, lame, GNU make, mpg123, and (if you use FLAC files as input) flac. Tested on my Linux PC with LAME 64bits version 3.99.5 and mpg123 1.14.4 from the debian stable package repository. Run with -h to get some "help".)
It uses lame to compress and mpg123 to decompress, and I don't know if there's something special going on but the output WAVs always seem to have the same number of samples as the original. And they seem to be aligned - if you use this program you'll find that the difference between WAV and 128kbps MP3 is somewhat noisy, but WAV vs 320kbps MP3 is pretty much silent.
(Or maybe you'll find something totally different! Who knows. I only tested this on my system.)
Neat, thanks for running the experiment to see how differing MP3 encoder settings affect the lost portion. This explains why 320kbps is generally accepted amongst DJs, as any loss is significantly less than that caused by the club sound system :P
I did a blind test when I was in my 20s, and while on a couple of tracks I could actually tell the difference between 320kbps and the original, I did have to concentrate. And I couldn't really have said that one was necessarily better than the other; the effect was as if one type of noise-y sound was being replaced with a subtly different type of sound with the same noise-y quality. Different, but overall the same.
Listening to the diff of one of those tracks today was interesting! All I can hear is the drums... and where the sound I'm thinking of plays, it sounds like rather quiet interference! But the drums as I recall sounded absolutely identical. Interesting that the ears can detect one thing but not the other.
(I didn't bother to re-run the full comparison, as I'm no longer in my 20s. One good (?) thing about getting old is that your hearing deteriorates, and issues such as this become moot. You can also afford the disk space to just compress everything at 320kbps. Then you don't have to worry about it, and it fits OK on your phone too.)
(Note: no idea what mp3 encoder Audacity uses, and I'm sure the results will vary with encoder settings as well.)
I just fired up Audacity and generated a click track, with the first click at 1 second in. The exported 44.1k wav file, when loaded in Audacity, shows the click at exactly 44100 samples in.
The exported mp3 file, when loaded, shows the click to be around 46357 samples in. (It's a bit hard to measure, because the encoded has smeared the pulse.) Somewhere between 51-52 ms late relative to the wav file.
Listening to the wav and mp3 ticks summed, the delay is obvious--they are not in sync at all. Adding 2257 samples of silence to the front end of the wav file puts them back in audible sync.
He is probably dealing with this, given that the audio piece is not just "tomsdiner.wav - timsdiner.mp3". There's a lot of processing happening after that:
----
Verse one finds the narrator in a bustling diner, making observations about her environment. The focus of this text is external to it's author, as opposed to later verses which exist in a more subjective, internal space. Using different settings to harvest the lost material, I was able to isolate both clear, pitched content and more ephemeral transient signals.
Using the python library headspace, and a reverb model of a small diner, I began to construct a virtual 3-d space. Beginning by fragmenting and scrambling the more transient material, I applied head related transfer functions to simulate the background conversation one might hear in a diner. Tracking the amplitude of the original melody in the verse, I applied a loose amplitude envelope to these signals. Thus, a remnant of the original vocal line comes through in its amplitude contour.
Having constructed this background, prominent pitches from the original melody appear and disappear, located variously in this virtual space. These ephemeral sounds hint at a familiar melody, playing with aural memory and imagination, a flickering apparition hovering at the border of consciousness.
It played for me 30 minutes ago, but it doesn't anymore.
I thought it was a embedded Youtube video at first, but it's actually a .mov file hosted in Google docs. First time I've noticed that way of hosting, maybe they have a bandwidth limit?
How does he get the information lost in compression? Does he put the compressed and uncompressed version on two different tracks with one phase-flipped?
I'll repeat my comment from the time:
"For what it's worth, the presence of seemingly significant signal in the difference between the original and compressed tracks does not necessarily mean that significant sonic/perceptual loss has occurred. Operating correctly, the encoder is designed to cut not just sounds that the human ear cannot hear in general (e.g. sounds above 22kHZ) but also sounds which may not be perceptible in context (e.g. the quietest signals in a loud section). So if you find something beautiful about the ghost tracks (and I think there is something beautiful to find), don't immediately jump to concluding that mp3 is awful for cutting these sounds—they might be hardly perceptible when added to the mix.
Of course, at high-compression rates mp3 does begin to significantly degrade fidelity.
Edit: all of this is not to put down the project—I still think it's pretty cool as art and as a demonstration of the encoder, I just didn't want people to think that this was some sort of massive failing of mp3."
[0]: https://news.ycombinator.com/item?id=7955917