The Ghost in the MP3

daturkel · on Feb 14, 2015

This was posted almost a year ago and enjoyed a decent conversation in the comments [0]

I'll repeat my comment from the time:

"For what it's worth, the presence of seemingly significant signal in the difference between the original and compressed tracks does not necessarily mean that significant sonic/perceptual loss has occurred. Operating correctly, the encoder is designed to cut not just sounds that the human ear cannot hear in general (e.g. sounds above 22kHZ) but also sounds which may not be perceptible in context (e.g. the quietest signals in a loud section). So if you find something beautiful about the ghost tracks (and I think there is something beautiful to find), don't immediately jump to concluding that mp3 is awful for cutting these sounds—they might be hardly perceptible when added to the mix.

Of course, at high-compression rates mp3 does begin to significantly degrade fidelity.

Edit: all of this is not to put down the project—I still think it's pretty cool as art and as a demonstration of the encoder, I just didn't want people to think that this was some sort of massive failing of mp3."

[0]: https://news.ycombinator.com/item?id=7955917

dankoss · on Feb 14, 2015

The information present here isn't really "lost" as much as it can't be heard in the context of the other sounds in the original recording. These forms of audio compression take advantage of auditory masking[1] which means those sounds likely wouldn't be heard in the original.

[1] http://en.wikipedia.org/wiki/Auditory_masking

sejje · on Feb 14, 2015

The information is lost in the sense that it's in the original, and not in the lossy version.

Whether or not you can hear it, the information is gone.

mdisraeli · on Feb 14, 2015

If you can't hear it, was it ever information?

daeken · on Feb 14, 2015

We can't see in infrared, but there's clearly information there. Same with infra/ultrasound and other sounds that are buried in our hearing.

It may not be pertinent information in the case of music, but it's definitely information.

sejje · on Feb 15, 2015

If you cover your eyes and can't see me, do I still exist?

TheOtherHobbes · on Feb 14, 2015

MP3 is audibly lossy, especially at lower sample rates.

So yes - you can hear it's gone.

And if you look at a spectrograph, you can see it's gone, too.

joe_bleau · on Feb 14, 2015

A quick scan didn't reveal to me whether he's time aligning the signals during the subtraction. I've played with listening to the wav-mp3 signal before, and I seem to recall that the mp3 encoder would introduce a little delay.

I added a transient pulse in front of the music so that I could (visually) time align the signals before subtracting them.

to3m · on Feb 14, 2015

I wondered what my favourite test tracks would sound like, so I made a (somewhat stupid, and a bit slow...) program to produce the difference between an MP3 and a WAV: https://github.com/tom-seddon/bin/blob/master/find_mp3_resid...

(Dependencies: python 2.7, lame, GNU make, mpg123, and (if you use FLAC files as input) flac. Tested on my Linux PC with LAME 64bits version 3.99.5 and mpg123 1.14.4 from the debian stable package repository. Run with -h to get some "help".)

It uses lame to compress and mpg123 to decompress, and I don't know if there's something special going on but the output WAVs always seem to have the same number of samples as the original. And they seem to be aligned - if you use this program you'll find that the difference between WAV and 128kbps MP3 is somewhat noisy, but WAV vs 320kbps MP3 is pretty much silent.

(Or maybe you'll find something totally different! Who knows. I only tested this on my system.)

mdisraeli · on Feb 15, 2015

Neat, thanks for running the experiment to see how differing MP3 encoder settings affect the lost portion. This explains why 320kbps is generally accepted amongst DJs, as any loss is significantly less than that caused by the club sound system :P

to3m · on Feb 15, 2015

I did a blind test when I was in my 20s, and while on a couple of tracks I could actually tell the difference between 320kbps and the original, I did have to concentrate. And I couldn't really have said that one was necessarily better than the other; the effect was as if one type of noise-y sound was being replaced with a subtly different type of sound with the same noise-y quality. Different, but overall the same.

Listening to the diff of one of those tracks today was interesting! All I can hear is the drums... and where the sound I'm thinking of plays, it sounds like rather quiet interference! But the drums as I recall sounded absolutely identical. Interesting that the ears can detect one thing but not the other.

(I didn't bother to re-run the full comparison, as I'm no longer in my 20s. One good (?) thing about getting old is that your hearing deteriorates, and issues such as this become moot. You can also afford the disk space to just compress everything at 320kbps. Then you don't have to worry about it, and it fits OK on your phone too.)

mdisraeli · on Feb 14, 2015

320kbps with highest quality setting is pretty much an industry standard now, and many DJs, myself included, make use of that.

As you've looked into this before, do you know what the similar difference is like for such professional-grade encoding?

joe_bleau · on Feb 14, 2015

(Note: no idea what mp3 encoder Audacity uses, and I'm sure the results will vary with encoder settings as well.)

I just fired up Audacity and generated a click track, with the first click at 1 second in. The exported 44.1k wav file, when loaded in Audacity, shows the click at exactly 44100 samples in.

The exported mp3 file, when loaded, shows the click to be around 46357 samples in. (It's a bit hard to measure, because the encoded has smeared the pulse.) Somewhere between 51-52 ms late relative to the wav file.

Listening to the wav and mp3 ticks summed, the delay is obvious--they are not in sync at all. Adding 2257 samples of silence to the front end of the wav file puts them back in audible sync.

egypturnash · on Feb 14, 2015

He is probably dealing with this, given that the audio piece is not just "tomsdiner.wav - timsdiner.mp3". There's a lot of processing happening after that:

----

Verse one finds the narrator in a bustling diner, making observations about her environment. The focus of this text is external to it's author, as opposed to later verses which exist in a more subjective, internal space. Using different settings to harvest the lost material, I was able to isolate both clear, pitched content and more ephemeral transient signals.

Using the python library headspace, and a reverb model of a small diner, I began to construct a virtual 3-d space. Beginning by fragmenting and scrambling the more transient material, I applied head related transfer functions to simulate the background conversation one might hear in a diner. Tracking the amplitude of the original melody in the verse, I applied a loose amplitude envelope to these signals. Thus, a remnant of the original vocal line comes through in its amplitude contour.

Having constructed this background, prominent pitches from the original melody appear and disappear, located variously in this virtual space. These ephemeral sounds hint at a familiar melody, playing with aural memory and imagination, a flickering apparition hovering at the border of consciousness.

----

- found near the bottom of http://theghostinthemp3.com/theghostinthemp3.html

sukilot · on Feb 14, 2015

That seems to me mean that the author composed new audio, and isn't presenting "wav minus mp3"

mdisraeli · on Feb 14, 2015

That would explain the phasing/flanging like sound which gives the ghost recording such an eerie feel

oakwhiz · on Feb 14, 2015

You could solve this automatically with time shifted convolution/correlation with the original signal.

im3w1l · on Feb 14, 2015

The file we are watching is lossily compressed. So we are watching the lossy compression of a delta between original and lossy compression.

How good is the lossy compression at capturing that delta?

Buge · on Feb 14, 2015

It gives an error when I try to play the video in Firefox or Chrome.

_jomo · on Feb 14, 2015

Searched it on YouTube, someone uploaded it 2 minutes ago: https://www.youtube.com/watch?v=DkQ2p5QSbyc

tveita · on Feb 14, 2015

It played for me 30 minutes ago, but it doesn't anymore.

I thought it was a embedded Youtube video at first, but it's actually a .mov file hosted in Google docs. First time I've noticed that way of hosting, maybe they have a bandwidth limit?

eitland · on Feb 14, 2015

Played fine on my phone (Android) just now. Seems like they are using Vimeo now.

rMBP · on Feb 14, 2015

Safari checking in.

chanux · on Feb 14, 2015

Suzanne Vega - Tom's Dinner https://www.youtube.com/watch?v=FLP6QluMlrg

intopieces · on Feb 14, 2015

If this kind of thing interests you, I highly recommend the book "MP3: The Meaning of a Format" by Jonathan Sterne [0]

[0]https://www.dukeupress.edu/MP3/

MrJagil · on Feb 14, 2015

How does he get the information lost in compression? Does he put the compressed and uncompressed version on two different tracks with one phase-flipped?

magwhyr · on Feb 14, 2015

on vimeo: https://vimeo.com/107845118