
MP3 for Image Compression (2006) - joshumax
http://keyj.emphy.de/mp3-for-image-compression/
======
robert_foss
I just posted the article[1] that I think prompted this script to be brought
up.

What I did instead was to run images through an audio editing tool, which lets
you apply echoes or do mindboggling things like change the volume of the
image. The script can be found on github[2].

[1] [http://memcpy.io/audio-editing-images.html](http://memcpy.io/audio-
editing-images.html)

[2]
[https://github.com/robertfoss/audio_shop/](https://github.com/robertfoss/audio_shop/)

~~~
lcrs
Nice script :) I did a similar thing by hand with graphicsmagick and sox a
couple years ago - the most fun result was from wahwah, pretty much a wide
bandpass filter with centre frequency varying cyclically over time:
[http://lewisinthelandofmachines.tumblr.com/post/59405537096/...](http://lewisinthelandofmachines.tumblr.com/post/59405537096/the-
cover-of-ripley-through-a-wah-wah-audio)

~~~
robert_foss
Thank you!

I haven't even tried the wahwah effect, you've definitely piqued my interest.

I had a look through the SoX effects list[1] for the wahwah, but can't seem to
find it. The closest thing I found was the 'flanger' effect[2].

[1]
[http://sox.sourceforge.net/sox.html#EFFECTS](http://sox.sourceforge.net/sox.html#EFFECTS)

[2] [https://allg.one/zKMP](https://allg.one/zKMP)

~~~
lcrs
Shamefully I can't remember which wahwah I used - I was treating the audio
with both Audacity and Pro Tools as well as sox though...

------
kardos
The obvious question here is: what does JPEG encoded music sound like?

~~~
JoshCole
I'm more curious about a different idea. If you hook up electrodes to a tongue
and then start activating its sensors according to the amount of light being
received, people quickly develop the ability to see via their tongue.

So what happens when you encode visual information into sound in such a way as
to have a sound-to-environment signal and you just keep that active and
running real time. We already know that this mechanism exists in nature and
can even be developed in humans, since echo-location is a thing.

Edit: It occurs to me that actually doing sensory experiments like this might
really mess with someones head... maybe not a good idea to play around with it
casually.

Edit 2: If we were going to swap out one of our senses for another, which
should we get rid of and which should we gain?

~~~
dr_zoidberg
Actually there are around (and have been for a few years aready) some glasses
that scan the image you'd see and translate that into a sound that you need a
bit training to understand as an image[0]. Some people have gotten used to
them, while others have gone the way of the bat and learned to echolocate[1].

[0] [http://gizmodo.com/these-synthesia-glasses-help-blind-
people...](http://gizmodo.com/these-synthesia-glasses-help-blind-people-see-
via-son-745027691)

[1]
[https://en.wikipedia.org/wiki/Human_echolocation](https://en.wikipedia.org/wiki/Human_echolocation)

------
xamuel
There's a whole subreddit for things like this.
[https://www.reddit.com/r/glitch_art/](https://www.reddit.com/r/glitch_art/)

------
bwang29
Now I wonder, what happens if you add sound effect, to change pitch, tone, or
widen the sound stage. What would do to the decoded image?

I think the 2.00 bits/ pixel result looks quite more "analog" with a film
grain effect to me.

~~~
moutansos
I did this with Audacity to some bmp images. I got some interesting results. I
just read in the bmp files as raw data and manipulated them that way.

~~~
gedy
I did similar a long time ago with zip compressed TIFs and JPGs. Added reverb
and the results were really unusual:
[https://vimeo.com/105317804](https://vimeo.com/105317804)

~~~
peterwwillis
I really wanted salad fingers to pop out of my screen.

Thanks for that.

------
avian
I did something similar with Ogg Vorbis back in 2006.

There are also some results of experiments with the Opus codec posted in the
comments.

[https://www.tablix.org/~avian/blog/archives/2006/01/lossy_co...](https://www.tablix.org/~avian/blog/archives/2006/01/lossy_compression/)

------
amelius
MP3 is inherently a one dimensional codec, whereas JPEG is two dimensional. No
wonder it performs much better.

~~~
pslam
JPEG transforms 2D images into 1D arrays using a "zigzag" ordering of 8x8
blocks. See this diagram on wikipedia:

[https://en.wikipedia.org/wiki/JPEG#/media/File:JPEG_ZigZag.s...](https://en.wikipedia.org/wiki/JPEG#/media/File:JPEG_ZigZag.svg)

It's a shame the author didn't do the same transformation, because it would
de-correlate a lot of the error noise. You can see in the highest compression
settings that the "MP3" image compression is smearing everything horizontally.
If it used a zigzag transformation, it would be a more smeared both
horizontally and vertically, but probably less visually bad.

~~~
cperciva
_JPEG transforms 2D images into 1D arrays using a "zigzag" ordering of 8x8
blocks._

No. The zigzag ordering is applied to the _frequency components_ , not to the
_image pixels_.

~~~
xorblurb
What if you try to compress the frequency components, scanned in zig-zag,
using MP3 (without the first FFT like layers if they exists, I guess)? - if
that even makes any sense...

~~~
cperciva
That doesn't really make sense. MP3 takes inputs in signal-space, not
frequency-space. You _could_ run:

1\. Divide image into blocks (as in JPEG),

2\. Perform two-dimensional FFTs (as in JPEG),

3\. Scan frequency components in zig-zag order (as in JPEG),

4\. Run all of the steps of MP3 compression aside from the initial "split
audio into blocks" and "perform FFTs" stages.

That would pretty much just give you a less efficient version of JPEG; both
JPEG and MP3 take advantage of knowing how much each frequency component
"matters" (i.e., how precisely it's necessary to encode the value to avoid
artifacts noticed by humans), so using the MP3 quantization logic on frequency
amplitudes from images would result in wasting bits by encoding certain
amplitudes more precisely than is useful.

------
dheera
Our ears are more sensitive to amplitude errors than phase errors (as a
function of frequency, in frequency space). Our eyes are the opposite.

------
heywire
I love this. I am really hoping others come to the comments to share similar
"misuse" of technology stories.

~~~
dhbx9
As long as we're dealing with signals, everything is within the realm of
possibility.

~~~
amelius
I'm hoping for someone to drive the timing of a spark-ignition internal-
combustion-engine with MP3 data, and report what it sounds like.

~~~
heywire
Not exactly what you were asking for, but made me think of these:

[https://www.youtube.com/watch?v=Tr4zb-
HHZs4](https://www.youtube.com/watch?v=Tr4zb-HHZs4)

and

[https://www.youtube.com/watch?v=Ee5evlN8Bbs](https://www.youtube.com/watch?v=Ee5evlN8Bbs)

and if you go down the rabbit hole, you end up here:

[https://www.youtube.com/watch?v=b9UO9tn4MpI](https://www.youtube.com/watch?v=b9UO9tn4MpI)

------
BinaryBullet
Not really related, but a while back I wrote a script to visualize/hear audio
generation loss with different file formats:

[https://github.com/skratchdot/audio-generation-
loss/tree/mas...](https://github.com/skratchdot/audio-generation-
loss/tree/master/files/loop01)

So, mp3s add a bunch of silence to the beginning of the file, and ogg files
start to "chirp". I never got around to putting this info in a consumable,
easy to understand format though. The videos in these folders just
continuously re-encode a source file w/ a given lossy format.

See also:
[https://en.wikipedia.org/wiki/Generation_loss](https://en.wikipedia.org/wiki/Generation_loss)

~~~
jedimastert
Kinda sounds like "I'm sitting in a room"

[https://en.m.wikipedia.org/wiki/I_Am_Sitting_in_a_Room](https://en.m.wikipedia.org/wiki/I_Am_Sitting_in_a_Room)

~~~
BinaryBullet
Thanks for the link! I hadn't heard of this before.

------
E6300
It would be interesting to see if the horizontal artifacts could be avoided by
feeding the pixels in a different order to the encoder.

~~~
derefr
Ordering the pixel data by their locations along a space-filling curve, maybe.

~~~
jay-anderson
Definitely:
[https://en.wikipedia.org/wiki/Hilbert_curve](https://en.wikipedia.org/wiki/Hilbert_curve).
I'd still expect the final result to look worse than jpeg, but it'd be a much
more interesting comparison.

~~~
mxfh
or split the image in its frequency domains first, and then run the mp3
compression on a spatial curve, but at this point you're already doing a
[https://en.wikipedia.org/wiki/Discrete_cosine_transform](https://en.wikipedia.org/wiki/Discrete_cosine_transform)
a core part of JPEG compression, which was actually adapted for mp3:
[https://en.wikipedia.org/wiki/Modified_discrete_cosine_trans...](https://en.wikipedia.org/wiki/Modified_discrete_cosine_transform)

~~~
leetbulb
Before I clicked the OP link, I originally thought it was going to be this
type of implementation...cool nonetheless, but would really like to see
someone try this and share!

------
tonymillion
They're both based on the DCT. Mp3 (and AAC and Vorbis and more) use a
modified DCT which uses block overlapping to mitigate aberrations on the block
boundary.

Its no surprise it works, however you wouldn't necessarily get "as good"
compression as you would from an optimized DCT coder (JPEG etc) based on the
data duplication (2x for the overlapping blocks).

See
[https://en.wikipedia.org/wiki/Modified_discrete_cosine_trans...](https://en.wikipedia.org/wiki/Modified_discrete_cosine_transform)
[https://en.wikipedia.org/wiki/Discrete_cosine_transform](https://en.wikipedia.org/wiki/Discrete_cosine_transform)

------
ManlyBread
I wonder if it's possible to create a set of data that produces an actual
image when compressed with JPEG and an actual music when compressed with MP3
(for example, a JPEG picture of pianist that also gives a MP3 piano piece).

~~~
pizza
well.. does this count?
[https://en.wikipedia.org/wiki/Windowlicker#Hidden_images](https://en.wikipedia.org/wiki/Windowlicker#Hidden_images)

------
throwaway19373
While amusing, this is bound to quite bad since the function basis is
restricted to a single axis.

------
sakawa
Curious enough, years ago I've seen a blog post where the author used PNG
lossless compression for FLAC audios. Guess there should be a lot of room for
improving both image and audio compression, even because we're still ending up
using jpeg and mp3.

~~~
QuercusMax
Isn't PNG just DEFLATE? You should get basically the same result by just
gzipping it.

------
peterburkimsher
Is there a way to use image compression for MP3s?

I'd like to store music files in my phone's camera roll, and easily upload
them to a website where some Javascript could decode and play them.

------
randcraw
Now for big time shits-and-grins, add a deep learning GAN to generate a more
refined signal during the decompression / upsampling stage.

That's something your Turbo Pascal code never attempted.

------
hgears
I think the question really is, does it do anything for file size? If there
were a radical difference in total size, the quality degradation might be an
interesting compromise.

~~~
vernie
The side-by-side assessment compares JPEG and MP3 at the same bitrate.

------
bluedino
I remember reading an article where file data (a ZIP file I believe) was
converted into a bitmap image, and then it was compressed with PNG for another
few % of compression.

~~~
alkoumpa
While it might be possible for special cases [1], I have my concerns, you
really can't compress (lossless) beyond the (information theoretic) entropy --
in which case one would try to compress deflate (zip) with deflate (png).

[1]: I remember when I was taking an information theory lesson at the
university and told the (video/jpeg compression guru) professor that I
compressed an AVI file with my dump arithmetic coding implementation, he was
shocked, turns out the file had some large crappy header from the editor

------
MrBra
What about turning the image into a sound, then compressing it with MP3, then
turning it back to an image again?

------
jacquesm
The search for an image that encodes the Close Encounters theme is on.

------
gwbas1c
Where are the images???

------
marvy
I'm amazed this works at all!

~~~
acjohnson55
It kind of has to -- both compression algorithms work by doing a bunch of
lossless steps to get to a format that can easily be quantized for lossy
compression. The prioritization of how much quantization to apply is picked
according to perceptual models of vision/hearing, to devote more bits to
quantizing the things we can actually notice.

Needless to say, a perceptual model designed for audio is a pretty bad choice
for a long string of grayscale pixels, interpreted as sampled audio. It looks
like a lot of high-frequency content was discarded, resulting in horizontal
blurring.

~~~
marvy
Has to? Are you saying that the whole thing would not fall apart if, say,
someone used color images instead of greyscale?

~~~
acjohnson55
If you ordered each subpixel linearly, that would work too. You'd see pretty
much the same effect of horizontal distortion. Because the input data is PCM
(just magnitude values), same as audio, it "works" just fine. If the input
data were some other representation, like text, you'd get gibberish out.

~~~
marvy
Okay, so it does depend on the representation, but most reasonable
representations will work. I guess the big exception would be trying to do
this on something that is already compressed, like a gzipped bmp, or
something.

------
lifterbro
Interesting!

