curl 'https://pbs.twimg.com/media/DqteCf6WsAAhqwV.jpg' > shakespeare.zip
unrar e shakespeare.part001.rar
Here you go...
$ curl -s https://pbs.twimg.com/media/Dq2sPGNU0AEKyyC.jpg | dd status=none bs=1 skip=599 count=40| sh
$ unrar e shakespeare.part001.rar
unknown archive type, only plain RAR 2.0 supported(normal and solid archives), SFX and Volumes are NOT supported!
$ sudo apt install rar
$ rar x shakespeare.part001.rar
... shakespeare.html OK
$ unrar e shakespeare.part001.rar
UNRAR 5.30 beta 2 freeware Copyright (c) 1993-2015 Alexander Roshal
Extracting from shakespeare.part001.rar
$ dpkg -l "*unrar*"
un unrar <none> <none> (no description available)
ii unrar-free 1:0.0.1+cvs2014070 amd64 Unarchiver for .rar files
un unrar-nonfree <none> <none> (no description available)
(7-zip works fine.)
7-zip reads the zip file by scanning from the beginning of the file for the first entry signature (which need not be at the start of the file, which is why this trick works at all.)
WinRAR reads the central directory located at the end of the file, which is technically how you're supposed to do it, but entry offsets given in the central directory are messed up by the existence of the JPG data at the beginning of the file. There also appears to be ~61KB of garbage data at the end of the file, which will trip up some zip readers that use the central directory.
tl;dr: The zip file technically malformed, whether or not a given archive reader will choke on it depends on which technique they use and how forgiving their implementation is.
This is (or at least used to be) popular to share arbitrary files on imageboards.
$curl 'https://pbs.twimg.com/media/DqteCf6WsAAhqwV?format=jpg' > output $ file output output: JPEG image data, JFIF standard 1.01, aspect ratio, density 1x1, segment length 16
$ strings output.jpg | grep PK $./binwalk output.jpg
$7z x shakespeare.part001.rar
$ file shakespeare.html shakespeare.html: XML 1.0 document, UTF-8 Unicode (with BOM) text, with CRLF line terminators
Taken from 0x16 's introduction:
This file, pocorgtfo16.pdf , is a polyglot that is valid as a PDF document,
a ZIP archive, and a Bash script that runs a Python webserver which hosts
Kaitai Struct’s WebIDE which, allowing you to view the file’s own annotated bytes.
0x17 is a PDF that's also a valid ZIP and valid firmware for Apollo Guidance Computer.
And that's just the tip of an iceberg. This project is absolutely crazy, in the most positive possible way.
Edit: 0x15 has the additional feature that viewing it in Chrome makes my work machine throw a "Threat detected: A threat has been blocked and quarantined" message, so I think I'm going to avoid this site now. Not to say that I don't trust this random PDF hacker (...that sounded less sarcastic in my head), but I don't need IT asking me pointed questions about what sites I've been visiting.
PoC||GTFO is a well known project, and has a tradition of turning their PDFs into absurdly polyglot files. Your Chrome is giving you a false positive here.
DECIMAL HEXADECIMAL DESCRIPTION
0 0x0 JPEG image data, JFIF standard 1.01
182 0xB6 Zip archive data, at least v1.0 to extract, ..., name: shakespeare.part001.rar
1971177 0x1E13E9 End of Zip archive
"It survives all of twitters scaling, compression and thumbnailing. /how/ is left as an exercise to the reader :P"
I can't quite figure out what additional tricks the author has used to disperse the image data inside the body to evade entropy analysis as shown by the binwalker analysis posted in this thread; however image processing software seems to have no trouble parsing the file and extract only the image data without the embedded zip.
I've done this before by appending data to the end of the file but if I could make it more resistant to reprocessing that would be useful.
The difference is that twitter applies a series of operations to all uploaded image, stripping EXIF data, recompressing, etc., which would normally be difficult to work around.
There are lots of apps/software to do just that.
So then I tried the same thing with MP4 video on twitter. Twitter didn't like that at all :) https://imgur.com/a/O0QFZC9
Basically the "top level" zip file section lives at the end of the zip file and contains pointers back into the file for the actual data. This is so that you can keep adding to a zip file without having to rewrite anything you have done previously - just append a few more files and then your updated directory at the end.
JPEG segments contain two special opening bytes, maybe a length, and then the actual data. So it would be trivial to have the last jpeg section in a jpeg file end with zip directory bytes. This directory could point back to data in other sections of the jpeg.
..my guess, anyway.
What's surprising is that it works on Twitter. I tried the same thing with image hosts and the data got scrubbed. I guess they don't re-encode small images but I dunno.
cat file.jpg file.zip > magic.jpg
(Although if they do something with this file and you still are able to do this trick that would seem even more .. interesting on their side than leaving images below certain size untouched)
It's similar in the sense that there's more than one possible operation on any file (decompress it, or compress it), and different in that the file type doesn't change.
2 MB / 512 GB * 100 mm2 ~= 400 um2
400 square micrometers is an area of a square with 20 micrometer sides. Approx. 1/100th of the cross-section of a human hair.
And Hamlet would be split amongst dozens of these books.
25^3200 is a large number.
Steganography can be used to hide data in clever ways, but it isn't a substitute for encryption -- anybody who knows whatever trick you used can extract the payload. You could always encrypt the payload, but all you're doing is giving your adversary another (trivial) hoop to jump through when you could've just encrypted the message to begin with.
> Can it be made resistant to detection?
Yes and no. There are some clever steganographic techniques that take advantage of PNG and JPEG implementation details to foil basic entropy checks, but anybody who knows the algorithm can trivially extract the payload. In other words, it's security by obscurity, not by any sort of strong cryptographic property.
Wouldn't "an encrypted message that no one is sure you sent in the first place" sometimes be more useful than "an encrypted message that any eavesdropper knows you sent" in oppressive-surveillance-state scenarios? (It seems that, if you find some subset of bits in a JPEG/PNG that normally have random distribution and don't affect the image that much, putting an encrypted message into those bits might be indistinguishable from a "completely normal" image even to a well-informed attacker.)
> without massive amount of false positives
China, at the very least, has no problems with this!
This was done because it allows adding files to zip archive files without rewriting the whole file, instead just start writing where the index starts and add an updated index at the end. Please note that over the years there's been multiple methods of doing this, including partial indexes which don't even rewrite the index.
If the file you're adding things to is also tolerant of wrong file sizes and extra data at the end (like JPG and many others), you can just:
cat someJPG.jpg someZIP.zip > wth.jpg
feh wth.jpg => shows image
mv wth.jpg wth.zip; unzip wth.zip => works
Steganography can be done even without file-format hacks; all that's special about this hack is its simplicity.
It could easily be defeated -- I'm sure Twitter would have no trouble sanitizing uploaded image files if they wanted to.
Well, it's still using the trick I pointed out, placing the ZIP file index at the end.
So from one viewpoint it's a JFIF("jpg") file with large application segments containing the zipfile data for the shakespeare.part0xx.rar files.
From another viewpoint, from the back of the file. It's a incremental zip file (not compressed in one go), with the garbage bytes (the "overwritten") bytes in the zipfile updates forming a valid JFIF file.
When did resumable BBS download protocols become a thing? ZMODEM at least was widely used by the early '90s.
Edit: Tried it, and got it to work. The zip command on Linux has a -A option that will fix the offset headers after you cat an mp3 and a zip file together. Just "cat file.mp3 file.zip > newfile.mp3;zip -A newfile.mp3" and you're done.
I did notice that some mp3 file upload sites reprocess the mp3 file and strip out the extra bits. Some don't though.
Here's a Bach mp3 that you can play online: https://instaud.io/2RLM
But, if you download it, via the buttons on the page, you'll see it's also a valid zip file containing a jpg meme of bach.
I had a school friend that used 7" records as frisbees and made 12" records into plant pot holders. This was fine until he started doing this with my prized records.
It is the same here, there is a disconnect going on, a lack of appreciation for Shakespeare which is more than mere 'classic literature', it is an education.
PoC||GTFO indeed. TLAs are not helpful.
There is nothing new in hiding text in images, in fact I do this for myself to see if images can do better in Google SEO, and if indexed images can be searched for by unique phrases hidden in the binary as well as the EXIF data. I do this with the permission of the owner of the images and I don't blindly take images or text for this that others might consider sacred.
I think that Shakespeare 'himself' would have been okay with having his complete works stuffed into a tweet as 'he' did steal most of his stories from elsewhere.
Joint Photographic Experts Group -> Joint Photographic Group
No experts to be found here! :P
So I honestly don't get why people still use them.
Extensions that particularly annoy me are:
yml instead of yaml
jpg instead of jpeg
htm instead of html
Not sure why'd you'd say "objectively" when it's clearly not, the hint being the word "ugly."
I can certainly appreciate the limitation of "extensions will always be three characters". In fact, I wasn't exposed to extensions habitually longer than that until I did iOS dev. "Main.storyboard" "Model.xcdatamodeld" So if we're going to talk about ugly, I'd start there. ;)
xcdatamodeld is definitely ugly (though main.storyboard is ok in my personal opinion) and I'm sure there will be other examples where file extensions have been taken too far. I think if one wants to use long extensions then the extension needs to at least be readable at a glance (eg xcdatamodeld is ugly to read - even if you switched it the other way around, xcdatamodeld.model would still be unpleasant). So I'd argue that xcdatamodeld is more of an example of a poor naming convention than a problem with longer extensions (I say this with no experience of iOS development, however I see the same problem quite as acutely in the other languages and platforms I've developed on).
It should also be noted that my argument was really more focused on people who shorten already short 4 character acronyms - just for the purpose of making it 3 letters (eg the yaml and html examples I gave).
The 3 character file extension is much older than FAT32. It was part of the original FAT that was created nearly 20 years earlier, and later part of FAT12 and FAT16. But it predated all that in various other microcomputer DOS, as well as minicomputer systems going back into the 60s.
I didn't think it was quite as old as the 60s though? CP/M and FAT are mid-to-late 70s. VMS was around the same time too, or maybe slightly earlier. What else stored the extension as a separate meta-property to the file name?