In the years since I posted that, twitter has changed some of their image processing pipeline, so that precise technique no longer works if you want to upload new files (last time I checked - and I didn't investigate further or look for workarounds).
However, it's much easier to embed data into PNGs, and the best twitter-compatible implementation of this I'm aware of can be found here: https://github.com/CleasbyCode/pdvzip
Edit: also, the standard `unzip` utility has become a bit stricter about the validity of input files - in ways which would have been possible to work around, but of course not retroactively.
Using binwalk, you can extract the zip file which in turn contains various rar files which when uncompressed give a 6.8M .html file containing the works of Shakespeare.
$ nix run nixpkgs#binwalk DqteCf6WsAAhqwV.jpg
DECIMAL HEXADECIMAL DESCRIPTION
--------------------------------------------------------------------------------
0 0x0 JPEG image data, JFIF standard 1.01
182 0xB6 Zip archive data, at least v1.0 to extract, compressed size: 64512, uncompressed size: 64512, name: shakespeare.part001.rar
65575 0x10027 Zip archive data, at least v1.0 to extract, compressed size: 64512, uncompressed size: 64512, name: shakespeare.part002.rar
131112 0x20028 Zip archive data, at least v1.0 to extract, compressed size: 64512, uncompressed size: 64512, name: shakespeare.part003.rar
...
You shouldn't really need binwalk given how zipfiles work with the directory at the end of the file. You should just be able to unzip the file as-is. Though I suppose that could vary based on what you're unzipping it with. They should support it, though, as that's how a self-extracting zip is supported...ignoring any extra bytes at the top.
The zip on Linux has "zip -A file.zip" functionality that will strip the non-zip preamble too.
I am surprised that it survives twitter's image compression. I'm not familiar with JPEG format, but I'm familiar with PNG. I guess it's using some zlib-compressed EXIF-like metadata field, seeing as ZIP also uses DEFLATE/zlib/whatever?
Some eager image compression techniques will strip metadata including profiles (e. g. ImageOptim: https://imageoptim.com/color-profiles.html) But generally you are correct.
How would do that nicely and automatically, to do funny things like that in a way that would be resistant to the accidental optimizations/resizing that will be applied to the JPG?
It's a polyglot file. JPEG decoders will ignore data that comes after the end-of-data marker, while zip decoders ignore data that comes before the first archive entry.
Essentially, they ran `cat pic.jpg arch.zip > output.jpg`.
I'm on Firefox so I used: Right click image -> Open in new window -> Right click -> Save as. The resulted .jpg file I've rename it to .zip then used 7Zip to extract it to a folder. Within that folder I had multiple .rar files so I used 7Zip again on 1st file to extract it to another folder. That one resulted shakespeare.html file which I could indeed see that is a Project Gutenberg collection of all his work. Awesome.
I think that's a little bit different from steganography... it looks more like a trick where the author attaches a zip file to the end of a picture and do some work so twitter keeps that file, while steganography allows you to embed that file in the image data so it's just in front of your eyes even if you didn't see it.
However, it's much easier to embed data into PNGs, and the best twitter-compatible implementation of this I'm aware of can be found here: https://github.com/CleasbyCode/pdvzip
Edit: also, the standard `unzip` utility has become a bit stricter about the validity of input files - in ways which would have been possible to work around, but of course not retroactively.