Hacker News new | past | comments | ask | show | jobs | submit login
JPEG of Shakespeare is also a zip file containing his complete works (2018) (twitter.com/david3141593)
182 points by metadat on Oct 24, 2022 | hide | past | favorite | 26 comments



In the years since I posted that, twitter has changed some of their image processing pipeline, so that precise technique no longer works if you want to upload new files (last time I checked - and I didn't investigate further or look for workarounds).

However, it's much easier to embed data into PNGs, and the best twitter-compatible implementation of this I'm aware of can be found here: https://github.com/CleasbyCode/pdvzip

Edit: also, the standard `unzip` utility has become a bit stricter about the validity of input files - in ways which would have been possible to work around, but of course not retroactively.


I had to run it like this

UNZIP_DISABLE_ZIPBOMB_DETECTION=TRUE unzip DqteCf6WsAAhqwV.jpg


Using binwalk, you can extract the zip file which in turn contains various rar files which when uncompressed give a 6.8M .html file containing the works of Shakespeare.

  $ nix run nixpkgs#binwalk DqteCf6WsAAhqwV.jpg
  
  DECIMAL       HEXADECIMAL     DESCRIPTION
  --------------------------------------------------------------------------------
  0             0x0             JPEG image data, JFIF standard 1.01
  182           0xB6            Zip archive data, at least v1.0 to extract, compressed size: 64512, uncompressed size: 64512, name: shakespeare.part001.rar
  65575         0x10027         Zip archive data, at least v1.0 to extract, compressed size: 64512, uncompressed size: 64512, name: shakespeare.part002.rar
  131112        0x20028         Zip archive data, at least v1.0 to extract, compressed size: 64512, uncompressed size: 64512, name: shakespeare.part003.rar
  ...


You shouldn't really need binwalk given how zipfiles work with the directory at the end of the file. You should just be able to unzip the file as-is. Though I suppose that could vary based on what you're unzipping it with. They should support it, though, as that's how a self-extracting zip is supported...ignoring any extra bytes at the top.

The zip on Linux has "zip -A file.zip" functionality that will strip the non-zip preamble too.


I am surprised that it survives twitter's image compression. I'm not familiar with JPEG format, but I'm familiar with PNG. I guess it's using some zlib-compressed EXIF-like metadata field, seeing as ZIP also uses DEFLATE/zlib/whatever?


It's embedded within an ICC colour profile, and therefore doesn't get touched by any image compression/resizing logic.


Some eager image compression techniques will strip metadata including profiles (e. g. ImageOptim: https://imageoptim.com/color-profiles.html) But generally you are correct.


any image compression/resizing logic [in twitter.com] [in 2018]


How would do that nicely and automatically, to do funny things like that in a way that would be resistant to the accidental optimizations/resizing that will be applied to the JPG?


It's a polyglot file. JPEG decoders will ignore data that comes after the end-of-data marker, while zip decoders ignore data that comes before the first archive entry.

Essentially, they ran `cat pic.jpg arch.zip > output.jpg`.


"Back in the day" we had some issues on wikipedia with people quietly uploading book cover images where the jpg was also a rar containing the book...


Nothing new, it took Steam to patch this in steam user's artworks. People shared game hacks this way, hh.




On Ubuntu I can't unzip it with unzip but 7z works for me,

      wget https://pbs.twimg.com/media/DqteCf6WsAAhqwV.jpg ; 7z e DqteCf6WsAAhqwV.jpg
should fill the directory with a lot of rars.


I'm on Firefox so I used: Right click image -> Open in new window -> Right click -> Save as. The resulted .jpg file I've rename it to .zip then used 7Zip to extract it to a folder. Within that folder I had multiple .rar files so I used 7Zip again on 1st file to extract it to another folder. That one resulted shakespeare.html file which I could indeed see that is a Project Gutenberg collection of all his work. Awesome.


If you are looking to learn more about these techniques, the field is named steganography: https://en.wikipedia.org/wiki/Steganography.


I think that's a little bit different from steganography... it looks more like a trick where the author attaches a zip file to the end of a picture and do some work so twitter keeps that file, while steganography allows you to embed that file in the image data so it's just in front of your eyes even if you didn't see it.



Ah, I had played around with this the first time the story came out:

  $ curl -s https://pbs.twimg.com/media/Dq2sPGNU0AEKyyC.jpg | dd status=none bs=1 skip=599 count=40| sh


A similar method is used by Pico-8 and other "fantasy consoles" to create "cartridges" that contain cover art but also the code of the game.


Is there a way to use this method to stash a crypto key inside the file that could be used to authenticate a nft against a Blockchain?


Yes, but then the key can also be trivially removed, leaving you with the same image, sans the key.

Artificial digital scarcity and general-purpose computers don't mix.

(Which is why we've been fighting and losing the war on the latter for a good decade now.)


For more X is also a Y-style fun, check out PoC||GTFO - "This PDF is also an MBR boot sector" is just the beginning.


7zip unzips it just fine if you rename it to a zip, but Total Commander fails to enter it.





Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: