
Unlimited Data Storage Using Image Steganography and Cat GIFs - minimaxir
http://minimaxir.com/2016/03/gif-unlimited/
======
ejcx
That's pretty neat, but it can be way easier. File formats have never really
been robustly verified. The difference between `file` telling you that a
binary file is "data" versus your image format is about 3 bytes.

I wrote a program[0] a few years ago that does exactly this "steganography"
without encoding it in as pixels. It just prints a tiny png and then
concatenates your data on to the back of it. Even my script could be way
simpler if I was a better programmer at the time =]. To image sites, etc it's
all the same. I thought it was pretty cool when I did it. Simple concept.
Worked well. All you need is to encrypt your data first, since stego is hand
wavy mumbo jumbo. Here[1] is an example

[0] -
[https://github.com/ejcx/pnger/blob/master/pnger.py](https://github.com/ejcx/pnger/blob/master/pnger.py)

[1] - [http://i.imgur.com/J9OJZfS.png](http://i.imgur.com/J9OJZfS.png)

------
ikeboy
>The potential amount of encodable data is larger than the file size of the
image itself! Screw secret messages, we’re literally creating data storage out
of nothing!

Uh, nope. The file is small because the frames are non random, and thus
heavily compressed. If you wanted to store arbitrary data, the frames would be
far more random, and therefore take up more space.

(The above was written halfway through the article)

I disagree that you need "deep knowledge of how image compression works". I
have very little idea how the actual compression works, the only thing I
needed to know was that images are compressed and random/arbitrary data
doesn't compress as well for fundamental algorithmic reasons. It's obvious
that changing images can make them larger even without any knowledge of the
particular compression used.

~~~
zerohm
I thought the same thing from the opposite direction. My understanding of jpg
compression is that the algorithm says, "I see you have a lot of color X, X+1,
and X+2 in this area. Let's just call all of that region color X+1." Now you
only have the ability to hide a bit in that whole region, instead of for every
pixel within.

~~~
ikeboy
That's lossy compression. If you try to embed data and then use lossy
compression, your data will be messed up. If you insist on not losing
anything, it will take up as much space as it would otherwise.

------
xhrpost
Can this be demonstrated as true? I'm pretty sure the only reason it appears
that the cat GIF could contain itself is because of compression. I imagine
there's some mathematical proof that by adding more information effectively,
you'll likely reduce the compression ratio and thus this infinite "trick"
doesn't hold up. Happy to be proved wrong.

~~~
minimaxir
If you haven't, scroll down past the email address.

There's the issue with image compression admist noise, but there's a _second_
issue with GIF animation compression specifically which makes the math I
provide a complete sleight of hand.

I dicuss the GIF file size issue and the quest to minimize it in my original
GIF post ([http://minimaxir.com/2015/08/gif-to-video-
osx/](http://minimaxir.com/2015/08/gif-to-video-osx/)), which is why I think
mechanics of GIF file size are interesting and why I will write a followup.

~~~
xhrpost
Completely missed that, well played. Oddly enough I tried PixelJihad on an
image and the PNG size was actually reduced nearly 25% but resolution and
quality kept the same.

------
ChuckMcM
I'm on the fence if this is horrible or brilliant. On the one hand the
technique has been available for decades, and UMich's Citi group did an
interesting project to see if people could both exchange messages on Ebay
using pictures of things for sale, and if so did any images on Ebay have
steganographic content (at the time it was the largest source of user supplied
images). They didn't find any (except for their own test images). But it
presumably continues to be a channel people check.

Now you flood the web with images that have had data added to them, and it
creates potentially millions of false positives. That then actually _enables_
using the channel for covert conversations because so many images are now
carrying data, they no longer stand out.

Anyway, given that a 2TB drive is like $90, so you can create a dual parity
raid set of 6 for less than $600, that seems like way more storage than you'll
probably use and if you amortize it over 3 years its way less than the cost of
your Internet connection over that same time.

~~~
minimaxir
The goal of the post was "brilliantly horrible" so it appears I have
succeeded. :p

Another issue I am looking into is "could an image steganography SaaS startup
exist today?" I believe the answer is no because web services are inconsistent
about how they modify the picture in transit. (e.g. Twitter lossly compresses
the photo with no way to retrieve the original.)

~~~
rzzzt
Digital watermarking methods are usually resilient against some levels and
kinds of manipulation of the original image (flipping horizontally or
vertically, cropping, magnification); JPEG compression can probably be
accounted for as well.

------
brad0
When you understand basic information theory you know this is impossible.

In gifs you have a separate "color map" that stores at most 256 colors. So by
modifying the LSB of each color you get at most 256 bits aka 32 bytes.

------
te_platt
Many years ago I was asked how to convert a .gif file to a .wav file. I
remember taking the time to patiently explain that it just doesn't work that
way. Looks like I owe someone an apology.

~~~
nickpsecurity
rename _.gif_.wav

One of my actual obfuscations a long time ago against casual users. Except it
worked the other way around to hide midi's and mp3's as broken executables,
etc. :)

~~~
Negative1
Nope; your header won't match which is an easy thing to detect and red flag.

In regards to the original challenge, you could convert a gif to wav a number
of ways, the simplest being just copying image data as raw PCM.

Of course you can make it even more interesting.

Imagine turn each color channel into notes (dark red = C, blue = D, etc...).
Better yet, what if remapped each byte value of a color into a table of chord
progressions. What if you only allowed harmonized chord progressions and made
it in a 4/4 rhythm. You just turned a gif into a song.

If you wanted to go even more ambitious, use something like OpenCV to identify
shapes, objects, people/animals. Take what you find and create a sound mixer
that generates a cacophony of sounds.

Use your imagination; code can be beautiful.

~~~
nickpsecurity
"casual users"

It won't matter given my use case of non-technical people snooping on
computers or half-assed educators finding what I stashed on barely maintained
PC's. I'm not speculating: they were fooled to the point they gave up trying
far as I know.

"Of course you can make it even more interesting."

This is true. Your ideas are interesting. My brief foray into this sort of
thing brought in a problem: compression passes applied at different points in
apps and services. Your file might get modified at some point by a cheap
storage service.

Also, the readers are lossy by design. So, these two things must be considered
in any of these stego designs using multimedia formats disguised as actual
multimedia. Plenty of methods still left.

------
nickpsecurity
Neat trick. I'll pass on dropping tens of millions on your startup. Might
still use the trick for storage, though. I'll run the stuff through GPG or
bcrypt first. :)

------
kafkaesq
Except that if your adversary finds your secret trove of a billion cat GIFs
somewhere... then it isn't "steganography" anymore.

