
JPEG image of Shakespeare which is also a zip file containing his complete works - firasd
https://twitter.com/David3141593/status/1057042085029822464
======
BeefySwain
Just tried it out and it does indeed work.

    
    
      curl 'https://pbs.twimg.com/media/DqteCf6WsAAhqwV.jpg' > shakespeare.zip
      unzip shakespeare.zip
      unrar e shakespeare.part001.rar
    

I'm excited for the inevitable curl | unzip | bash method of installing
software directly from twitter that is sure to follow.

~~~
fireattack
Interesting that Winrar seems to have trouble to uncompress the first layer of
it (zip to rars).

(7-zip works fine.)

~~~
jethro_tell
Well, if someone out there had paid for winrar, perhaps the support would be
better XD

~~~
tobyhinloopen
“I paid for Winrar, AMA”

~~~
rzzzt
[https://www.reddit.com/r/paidforwinrar](https://www.reddit.com/r/paidforwinrar)

------
bayindirh
PoC||GTFO's [0] most issues are polyglot files.

Taken from 0x16 [1]'s introduction:

    
    
        This file, pocorgtfo16.pdf , is a polyglot that is valid as a PDF document,
        a ZIP archive, and a Bash script that runs a Python webserver which hosts
        Kaitai Struct’s WebIDE which, allowing you to view the file’s own annotated bytes.
    

[0]:
[https://www.alchemistowl.org/pocorgtfo/](https://www.alchemistowl.org/pocorgtfo/)

[1]:
[https://www.alchemistowl.org/pocorgtfo/pocorgtfo16.pdf](https://www.alchemistowl.org/pocorgtfo/pocorgtfo16.pdf)

~~~
TeMPOraL
0x15 is a laser-projectable PDF that's also a ZIP containing, among other
things, another PDF that is also a Git repo of its own source code.

0x17 is a PDF that's also a valid ZIP and valid firmware for Apollo Guidance
Computer.

And that's just the tip of an iceberg. This project is absolutely crazy, in
the most positive possible way.

~~~
PhasmaFelis
> _0x17 is a PDF that 's also a valid ZIP and valid firmware for Apollo
> Guidance Computer._

Wait, what?

Edit: 0x15 has the additional feature that viewing it in Chrome makes my work
machine throw a "Threat detected: A threat has been blocked and quarantined"
message, so I think I'm going to avoid this site now. Not to say that I don't
trust this random PDF hacker (...that sounded less sarcastic in my head), but
I don't need IT asking me pointed questions about what sites I've been
visiting.

~~~
TeMPOraL
Yes, that.

PoC||GTFO is a well known project, and has a tradition of turning their PDFs
into absurdly polyglot files. Your Chrome is giving you a false positive here.

------
diimdeep

       binwalk DqteCf6WsAAhqwV.jpg
       DECIMAL       HEXADECIMAL     DESCRIPTION
       --------------------------------------------------------------------------------
       0             0x0             JPEG image data, JFIF standard 1.01
       182           0xB6            Zip archive data, at least v1.0 to extract, ..., name: shakespeare.part001.rar
       ...
       1971177       0x1E13E9        End of Zip archive

------
tyingq
This is also somewhat interesting...

 _" It survives all of twitters scaling, compression and thumbnailing. /how/
is left as an exercise to the reader :P"_

~~~
stupidbird
Guessing Twitter doesn't scale/thumbnail/compress already tiny images... which
is probably an oversight in this case because the file size is
disproportionately large.

~~~
martincmartin
No, the rars are in the image metadata, not in the actual pixels at all.

~~~
stupidbird
They still contribute to filesize.

~~~
daveFNbuck
They don't address the metadata in compression, which is probably what the bug
report [1] was about. This thumbnail is over 2MB. With the metadata removed,
it's only about 2KB.

[1]
[https://twitter.com/David3141593/status/1057042727945322496](https://twitter.com/David3141593/status/1057042727945322496)

------
AdmiralAsshat
This is hardly new. I remember collecting ebooks on 4chan circa 2005 via this
method. The fact that it was a jpg allowed you to post it on the imageboard
with the zip/RAR file therein containing a txt/doc/rtf of the ebook.

~~~
DavidBuchanan
And yet, nobody has done the same for Twitter until now.

The difference is that twitter applies a series of operations to all uploaded
image, stripping EXIF data, recompressing, etc., which would normally be
difficult to work around.

~~~
awirth
Did people do this back in the day? 4chan used to be totally fine with just
uploading a jpeg concatenated with a zip, but I haven't seen this ICC profile
trick before today.

~~~
palunon
Well, 4chan served you the original file, so you didn't need to evade image
processing at all

------
gm3dmo
I do extend him, sir, within himself; Crush him together rather than unfold

~~~
Symbiote
Well found!

[https://www.opensourceshakespeare.org/views/plays/play_view....](https://www.opensourceshakespeare.org/views/plays/play_view.php?WorkID=cymbeline&Act=1&Scene=1&Scope=scene&LineHighlight=30#30)

------
tyingq
I tried doing something similar with an mp3 first, and that worked. You can't
upload mp3 files to twitter, but I have one here:
[https://instaud.io/2RLM](https://instaud.io/2RLM) Play it on the page, and
it's an mp3, download it via the buttons, and it's also a valid zip file with
a jpg in it.

So then I tried the same thing with MP4 video on twitter. Twitter didn't like
that at all :) [https://imgur.com/a/O0QFZC9](https://imgur.com/a/O0QFZC9)

------
willnewman
Oh I did this too a few years ago. Except I used gifs instead of jpegs, and
then wrote a Ruby wrapper for backing up files to Flickr. Blog post about it
here: [https://namwen.svbtle.com/hoardr](https://namwen.svbtle.com/hoardr)

~~~
ricardobeat
Did the same a few months before with PNGs!
[https://github.com/ricardobeat/filr](https://github.com/ricardobeat/filr)

------
catpolice
Anybody have any insight into what kind of trickery goes into this? I know a
decent amount about both of the compression algorithms, but I've forgotten the
structure of the file formats so I can't tell if this required any really
funny business.

~~~
comboy

        cat file.jpg file.zip > magic.jpg
    

that's all there is to it (tested under linux, file compressed and
decompressed with zip command)

~~~
Retr0spectrum
That's not all there is to it. Did you try uploading the resulting file to
twitter?

~~~
comboy
I did not. This is just a response to the parent comment not to the post
itself.

(Although if they do something with this file and you still are able to do
this trick that would seem even more .. interesting on their side than leaving
images below certain size untouched)

------
JayXon
If you are interested in this sort of thing, make sure to checkout Funky File
Formats by Ange Albertini
[https://events.ccc.de/congress/2014/Fahrplan/events/5930.htm...](https://events.ccc.de/congress/2014/Fahrplan/events/5930.html)

~~~
theoh
Something else interesting from a few years ago: a compression algorithm
that's bijective.
[http://www3.sympatico.ca/mt0000/bicom/bicom.html](http://www3.sympatico.ca/mt0000/bicom/bicom.html)

It's similar in the sense that there's more than one possible operation on any
file (decompress it, or compress it), and different in that the file type
doesn't change.

------
barking
Very thought provoking. The file size is 1.93 MB. I wonder what dimensions
would a disk just big enough to store that much have? Not long ago I read of a
suggestion to archive information as DNA (for its durability iirc). But in
terms of physical space required to store stuff, is DNA more or less efficient
than today's disks?

~~~
avian
There are 512 GB micro SD cards on the market today. So at that density
roughly:

2 MB / 512 GB * 100 mm2 ~= 400 um2

400 square micrometers is an area of a square with 20 micrometer sides.
Approx. 1/100th of the cross-section of a human hair.

~~~
barking
Thanks, I like that comparison. Now I wish I could think of a way to mesh this
with the infinite monkey theorem!

~~~
sonofgod
So Borges' Library of Babel -- every possible 3200 character book from a
25-character alphabet; essentially the 'space' equivalent of the infinite
monkey theorem's 'time' domain -- is sufficiently large that if we took this
universe and shrunk it down to the Planck length (the smallest length there
is) and filled another universe with these femtouniverses, and then scaled
that one down likewise, and repeated this process EIGHTY times, we'd be able
to fit all the books in.

And Hamlet would be split amongst dozens of these books.

25^3200 is a large number.

------
alanwong
Can the technique be used to bypass content censorship (e.g. in China)? Can it
be made resistant to detection?

~~~
woodruffw
> Can the technique be used to bypass content censorship (e.g. in China)?

Steganography can be used to hide data in clever ways, but it isn't a
substitute for encryption -- anybody who knows whatever trick you used can
extract the payload. You could always encrypt the payload, but all you're
doing is giving your adversary another (trivial) hoop to jump through when you
could've just encrypted the message to begin with.

> Can it be made resistant to detection?

Yes and no. There are some clever steganographic techniques that take
advantage of PNG and JPEG implementation details to foil basic entropy checks,
but anybody who knows the algorithm can trivially extract the payload. In
other words, it's security by obscurity, not by any sort of strong
cryptographic property.

~~~
waterhouse
> You could always encrypt the payload, but all you're doing is giving your
> adversary another (trivial) hoop to jump through when you could've just
> encrypted the message to begin with.

Wouldn't "an encrypted message that no one is sure you sent in the first
place" sometimes be more useful than "an encrypted message that any
eavesdropper knows you sent" in oppressive-surveillance-state scenarios? (It
seems that, if you find some subset of bits in a JPEG/PNG that normally have
random distribution and don't affect the image that much, putting an encrypted
message into those bits might be indistinguishable from a "completely normal"
image even to a well-informed attacker.)

~~~
0-_-0
That's true, the least significant bits of a png encoding of a photograph
could be effectively random (if the photo had high noise) and could be
replaced with random looking data.

------
candiodari
For those who don't know the trick: ZIP files have their index at the end of
the file. So you can add a zip file to anything else and have it unzippable.

This was done because it allows adding files to zip archive files without
rewriting the whole file, instead just start writing where the index starts
and add an updated index at the end. Please note that over the years there's
been multiple methods of doing this, including partial indexes which don't
even rewrite the index.

If the file you're adding things to is also tolerant of wrong file sizes and
extra data at the end (like JPG and many others), you can just:

cat someJPG.jpg someZIP.zip > wth.jpg

feh wth.jpg => shows image

mv wth.jpg wth.zip; unzip wth.zip => works

~~~
AnabeeKnox
It's astonishing that it's as simple as CATting a zip file to the end of a
jpg. I feel there are consequences here for any website that accepts image
uploads.

~~~
MaxBarraclough
> I feel there are consequences here for any website that accepts image
> uploads

Steganography can be done even without file-format hacks; all that's special
about this hack is its simplicity.

It could easily be defeated -- I'm sure Twitter would have no trouble
sanitizing uploaded image files if they wanted to.

------
rgrieselhuber
Would be interesting if Renaissance paintings were actually similar
repositories.

~~~
emiliobumachar
Quite a what-if. "The Library of Alexandria was not lost, merely hidden"

~~~
Avamander
That would be a great scifi plot.

------
judge2020
There was a similar technique used in the tv show Mr. Robot, season 3 episode
9, to hide some secret information (trying not to spoil anything). Ryan
Kazanciyan, the Technical Consultant, wrote a blog piece on this technique.

[https://link.medium.com/x7YFdksWrR](https://link.medium.com/x7YFdksWrR)

------
nan0
Dr. Mike Pound from Computerphile did a video on this "Secrets Hidden in
Images (Steganography)" [Youtube video]
[https://youtu.be/TWEXCYQKyDc](https://youtu.be/TWEXCYQKyDc)

~~~
chaoticmass
Good video, but what was done here is not Steganography. No ZIP data was
embedded into the pixels of the image.

------
thomastjeffery
Is there any reason the zip contains .rar files?

~~~
bigdubs
To get around JPEG block size limits.

~~~
fireattack
Is it possible to have split files with zip?

~~~
romed
Yes, it is. I feel like this hack is a wry nod to the old warez scene where
multipart archives within archives were, for some reason, de rigueur.

~~~
oorza
No way to resume BBS and later FTP uploads back in the day.

~~~
PhasmaFelis
I thought it was assumed that the files, once downloaded, would be distributed
locally on floppies. The sub-archives were usually 1.44MB to fit on a 3.5".

When did resumable BBS download protocols become a thing? ZMODEM at least was
widely used by the early '90s.

------
dexterdog
I dealt with this problem on a photo site 15 years ago where people were
embedding RAR pieces in their JPGs to use us as a distribution repository. The
bandwidth increase was a giveaway so I just wrote an incoming processor that
re-compressed everything that had more than X bytes per pixel. I can't believe
Twitter doesn't have similar logic given its scale.

~~~
Sargos
Twitter does re-compress everything. This method survives re-compression.

~~~
dexterdog
Then it's not recompression. Recompression is pulling (let's say) a JPG into
memory so it becomes a bitmap and writing it back out to a JPG. If you want to
keep metadata you only keep a specific set and you cap the value lengths.

------
enimodas
Similarly, here is one with music. Winamp necessary.
[https://a.doko.moe/uigihv.jpg](https://a.doko.moe/uigihv.jpg)

------
JetSpiegel
Great What If question: what are the odds of taking a picture of a Shakerpeare
play and get that play's text in the resulting JPG?.

------
expertentipp
Do this for mp3.

~~~
tyingq
Probably easier. The MP3 standard didn't include any standard for tags, so the
tags standards evolved separately. Basically, you can append any random crap
you want at the end of an mp3. Since a zip file has it's headers at the
bottom, it wouldn't be hard to do. I don't know if popular mp3 sites
scrub/transform mp3 files in the way image sites do to images though.

Edit: Tried it, and got it to work. The zip command on Linux has a -A option
that will fix the offset headers after you cat an mp3 and a zip file together.
Just "cat file.mp3 file.zip > newfile.mp3;zip -A newfile.mp3" and you're done.

I did notice that some mp3 file upload sites reprocess the mp3 file and strip
out the extra bits. Some don't though.

Here's a Bach mp3 that you can play online:
[https://instaud.io/2RLM](https://instaud.io/2RLM)

But, if you download it, via the buttons on the page, you'll see it's also a
valid zip file containing a jpg meme of bach.

------
germs12
I'm pretty sure Ender's Game has this in it. It was how one of the characters
snuck info out.

------
rootsudo
Simple Steganography. I wrote a similar article to use Amazon Photo storage as
a free way to store files if anyone is curious how this works:

[https://medium.com/@eirurueta/free-unlimited-cloud-
storage-o...](https://medium.com/@eirurueta/free-unlimited-cloud-storage-on-
amazon-w-this-one-weird-trick-6f50d35b1159)

~~~
comboy
While technically it would probably fit the definition, usually what's meant
by steganography regarding images is hiding the data within the image itself
(using differences in pixel values that are not easy to spot). As far as I
understand it, this just makes use of the file structure. It's beyond me
though how it survives twitter thumbnailer. I would assume that one of its
purposes is to strip any unnecessary metadata from the image, making it as
small as possible.

~~~
logfromblammo
The easiest implementation would be to use the 2 least significant bits in the
blue channel, and the least significant bit of the red channel, to encode your
data payload. Just sticking the data in the metadata portions of an image file
seems like hiding behind the only bush in an otherwise featureless field.
("Reginald Maudlin, will you stand up, please?")

------
ainiriand
I f __*ing love technology...

------
angel_j
are the complete works actually encoded in the pixels?

------
llIIllIIllIIl
раржпег? this trick was used on anonymous forums for years.

~~~
mikorym
Is there a recent repo?

------
spacelama
Love it

------
DerekL
Title should say “JPEG”, not “JPG”. “JPG” is just a file extension, not the
name of the compression method.

~~~
jjoonathan
I know that the real reason is a requirement for length 3 combined with a
slight preference for vowel elision, but I just realized that if one were so
inclined it could be interpreted as a snub.

Joint Photographic Experts Group -> Joint Photographic Group

No experts to be found here! :P

~~~
laumars
I never really understood why people still adhere to the 3 character file
extension limit when that is a throwback to FAT32 on pre-Windows 95 (and some
very old mainframes which shouldn't be internet connected anyway). Yet some
people still favor the 3 character extension despite it looking objectively
uglier (as a dyslexic I do find them harder to read) and dropping a solitary
vowel in a file extension saves you nothing in terms of development time /
file size / etc.

So I honestly don't get why people still use them.

Extensions that particularly annoy me are:

    
    
        yml instead of yaml
    

Why do people do this when even the earliest YAML spec is much newer than the
last of the systems that had a 3 character extension limit.

    
    
        jpg instead of jpeg
    

Nearly all of the .jpg's in the last 10 years would be too large to load on
any systems that couldn't handle a 4 character extension anyway. So backwards
compatibility isn't even an argument here.

    
    
        htm instead of html
    

Similar argument as JPG except this time it's more to do with the HTML
specification and browsers capable of rendering them.

~~~
wild_preference
> objectively uglier

Not sure why'd you'd say "objectively" when it's clearly not, the hint being
the word "ugly."

I can certainly appreciate the limitation of "extensions will always be three
characters". In fact, I wasn't exposed to extensions habitually longer than
that until I did iOS dev. "Main.storyboard" "Model.xcdatamodeld" So if we're
going to talk about ugly, I'd start there. ;)

~~~
laumars
Sorry, as another poster commented, I did mean "subjectively" (doh!)

xcdatamodeld is definitely ugly (though main.storyboard is ok in my personal
opinion) and I'm sure there will be other examples where file extensions have
been taken too far. I think if one wants to use long extensions then the
extension needs to at least be readable at a glance (eg xcdatamodeld is ugly
to read - even if you switched it the other way around, xcdatamodeld.model
would still be unpleasant). So I'd argue that xcdatamodeld is more of an
example of a poor naming convention than a problem with longer extensions (I
say this with no experience of iOS development, however I see the same problem
quite as acutely in the other languages and platforms I've developed on).

It should also be noted that my argument was really more focused on people who
shorten already short 4 character acronyms - just for the purpose of making it
3 letters (eg the yaml and html examples I gave).

------
mygo
Could the future of open source and P2P be tweeting apps to one another?

