
A file that’s both an acceptable HTML page and a JPEG (2012) - ScottWRobinson
http://lcamtuf.coredump.cx/squirrel/
======
EMRZ
The last week i found this [1] thread by Notch. He is working in an entry to
js13k [2] (a game jam about html5 games restricted to 13 KBs).

He bundled the whole source code and binary data used in the game/tech demo to
use the in browser image decompression, avoiding to write a custom
(de)compression code, saving KBs this way.

Then he just eval() the needed parts for each thing.

I found this extremely clever, game jam limitations, specially size ones,
evolve into clever programming tricks that are interesting both to code and to
look at.

[1]
[https://twitter.com/notch/status/1035811278520872960](https://twitter.com/notch/status/1035811278520872960)

[2] [http://js13kgames.com/](http://js13kgames.com/)

~~~
jfries
While definitely cool, I wonder how effective this is. Code/random binary data
should have different characteristics than image data.

~~~
boomlinde
DEFLATE, which is used by PNG, is a general purpose LZ77-like compression
algorithm suitable for all kinds of repetitive data.

~~~
tialaramex
PNG filters each horizontal line of the image data before running it through
Deflate, with a pre-processing step.

Each of the possible filters is very simple, but the difference between a
great PNG exporter, a mediocre one, and a trash fire is the use of a heuristic
to decide which of the pre-processing filters should be applied to each line
of output. libpng, the free implementation, includes a heuristic that does a
fairly OK job, if you just don't implement a heuristic and use no filtering at
all, the results are enormous PNG files as seen in turn of the century Adobe
Photoshop. So, no, a general purpose compression algorithm isn't very good for
image data on its own.

(Choose for yourself whether you think Adobe wanted to discourage use of a
popular free image format in favour of licensed formats for which it held
relevant IP, or their development team are just incompetent morons, or both).

~~~
boomlinde
_> PNG filters each horizontal line of the image data before running it
through Deflate, with a pre-processing step._

Yes, as a means of making the data fed into the compressor repetitive enough
to benefit from the compression algorithm.

 _> So, no, a general purpose compression algorithm isn't very good for image
data on its own._

That's not what I said.

------
bitofhope
A similar interesting hybrid is K Lange's résumé, which is both a PDF and an
ISO image for x86 bootable operating system

[http://r.dakko.us/](http://r.dakko.us/)

Also of note, klange's unofficial résumé in C
[https://gist.github.com/klange/4042963](https://gist.github.com/klange/4042963)

------
vandot
This looks like the original thread:
[https://news.ycombinator.com/item?id=4209052](https://news.ycombinator.com/item?id=4209052)

~~~
jwilk
Also discussed in 2016:

[https://news.ycombinator.com/item?id=12262470](https://news.ycombinator.com/item?id=12262470)

------
kyo3
Found the real gold:
[http://lcamtuf.coredump.cx/prep/](http://lcamtuf.coredump.cx/prep/)

~~~
wumms
This links to a detailed article titled "Doomsday planning for less crazy
folk". Should I try loading it as JS? :)

~~~
xaedes
No, nothing special tech-wise, but very sane content about disaster planning!

------
notadog
EDIT: I realize that the subject of this thread is not a polygot, but I'm
leaving this comment up in case it interests someone. :/

If anyone's interested in what these are called, they are called polygots.

> In computing, a polyglot is a computer program or script written in a valid
> form of multiple programming languages, which performs the same operations
> or output independent of the programming language used to compile or
> interpret it.

[1]
[https://en.wikipedia.org/wiki/Polyglot_(computing)](https://en.wikipedia.org/wiki/Polyglot_\(computing\))

~~~
13of40
I always thought it was called a chimera.

~~~
Drup
So, if I remember correctly Ange Albertini's nomenclature, chimeras are
particular types of polyglots where a single data is disguised as different
file formats. For example, consider a file that is both a JPEG of a picture
and a PDF which contains the same picture, and the data of the picture is
present only once in the file.

------
blattimwind
I think the most irritating thing about this is right-clicking on the image
and selecting "View image", just to be confronted with the very same page as
before, since the browser continues to interpret it as HTML.

------
wimagguc
Nice one! So is the trick that:

1\. JPEG allows text comments

2\. Browsers don't enforce correct HTML (alternatively, is
"����JFIF,,��r<html>" correct HTML?)

3\. And exif tools mostly work based on header & statistical analysis?

Or am I completely wrong here?

------
peterburkimsher
It's a pity that phones recompress the JPG when doing Save To Camera Roll.

I'd like to build a system for distributing mini-apps as images, bypassing the
App Store.

A JavaScript bookmarklet could be used to provide a basic "bootloader" that
provides an "upload image" button. Then a binary representation of the HTML
could be loaded from the image data.

The problem I had was lossy recompression - colours bleed across edges, so the
binary data was corrupt after the first save. I'm still not sure how to work
around that problem.

~~~
opencl
My phone left the JPG entirely intact when downloading it. I suppose you're
talking about iOS devices?

------
Drup
On that topic, I recently wrote a tool to take a PDF file, an OCaml bytecode
file (OCaml can compile to both byte or native code) and smash them together
to create a file that is both a valid PDF and a valid bytecode.

[https://github.com/Drup/bytepdf](https://github.com/Drup/bytepdf)

------
jaclaz
The best example/explanation of a polyglot I ever found is in this "pocorgtfo"
file [2015] (12 MB PDF but not only):

[https://www.alchemistowl.org/pocorgtfo/](https://www.alchemistowl.org/pocorgtfo/)

[https://www.alchemistowl.org/pocorgtfo/pocorgtfo07.pdf](https://www.alchemistowl.org/pocorgtfo/pocorgtfo07.pdf)

6 Abusing ﬁle formats; or, Corkami, the Novella by Ange Albertini

And more example/pocs:

[https://code.google.com/archive/p/corkami/](https://code.google.com/archive/p/corkami/)

[https://github.com/corkami](https://github.com/corkami)

------
3chelon
The first thing to cross my mind when I saw this was XSS vulnerabilities.
Presumably there's nothing to stop this page running JS?

~~~
kenbellows
Assuming you can include JS in the file, I imagine it would only run if the
file was being parsed as HTML at the moment, not when parsed as a JPEG. That's
the main point here: this file can be interpreted in two ways, but the data in
it is treated very differently and has different effects depending on the way
it's currently interpreted.

------
tomglynch
Explanation please?

~~~
saagarjha
Took a look at the page source, and it's full of binary garbage. It looks to
me that it's a JPEG image that has been slightly tweaked to contain just
enough HTML to get a (lenient) browser to display it.

~~~
cryptonaut
It's a JPEG with the HTML written in the comment field:

    
    
      # exiftool -b -comment <(curl -s http://lcamtuf.coredump.cx/squirrel/)
      <html><body><style>body { visibility: hidden; } .n { visibility: visible; position: absolute; padding: 0 1ex 0 1ex; margin: 0; top: 0; left: 0; } h1 { margin-top: 0.4ex; margin-bottom: 0.8ex; }</style><div class=n><h1><i>Hello, squirrel fans!</i></h1>This is an embedded landing page for an image. You can link to this URL and get the HTML document you are viewing right now (soon to include essential squirrel facts); or embed the exact same URL as an image on your own squirrel-themed page:<p><xmp><a href="http://lcamtuf.coredump.cx/squirrel/">Click here!</a></xmp><xmp><img src="http://lcamtuf.coredump.cx/squirrel/"></xmp><p>No server-side hacks involved - the magic happens in your browser. Let's try embedding the current page as an image right now (INCEPTION!):<p><img src="#" style="border: 1px solid crimson"><p>Pretty radical, eh? Send money to: lcamtuf@coredump.cx<!--

~~~
mikeryan
It also uses an (non terminated - AFAICT) HTML Comment to hide the rest of the
JPEG data.

~~~
saagarjha
> non terminated - AFAICT

Yes, it shows up as all green in Safari. And:

    
    
        $ { curl -s http://lcamtuf.coredump.cx/squirrel/?1 | grep -- '-->' ; } || echo "Not found"
        Not found

------
niceoutput
P01 also uses that technique in 2015. You can see here the technique
[http://www.p01.org/jsconf_asia_2015/](http://www.p01.org/jsconf_asia_2015/)

------
VeXocide
It causes a 500 Internal Service Error when passed to the W3C validator:
[https://validator.w3.org/check?uri=http%3A%2F%2Flcamtuf.core...](https://validator.w3.org/check?uri=http%3A%2F%2Flcamtuf.coredump.cx%2Fsquirrel%2F&charset=%28detect+automatically%29)

~~~
sebazzz
Might be a vulnerability thinks the hacker in me.

------
amelius
But what does the Unix "file" command say?

~~~
duckerude

      $ curl -s http://lcamtuf.coredump.cx/squirrel/ | file -
      /dev/stdin: JPEG image data, JFIF standard 1.01, resolution (DPI), density 300x300, segment length 16, comment: "<html><body><style>body { visibility: hidden; } .n { visibility: visible; position: absolute; ", baseline, precision 8, 1000x667, frames 3
    

file(1) version 5.32-2ubuntu0.1.

~~~
londons_explore
file knows...

~~~
ddalex
/dev/stdin: JPEG image data, JFIF standard 1.01, resolution (DPI), density
300x300, segment length 16, comment: "<html><body><style>body { visibility:
hidden; } .n { visibility: visible; position: absolute; ", baseline, precision
8, 1000x667, frames 3

so you get even more details...

------
iforgotpassword
It should be possible to also make this a zip file since they're read from the
end by official spec.

------
ramshorns
It doesn't even use JavaScript.

------
bigiain
POC||GTFO issue 2 (from 2013) was a PDF and a bootable OS at the same time:

From section 8:

"A careful reader may have noticed that a bootable OS image was hidden in the
last issue of PoC k GTFO, as one of the files in its dual PDF/ZIP structure
(if you haven’t, download and extract it now!). This time, though, let’s hide
it in plain sight. You will find by running ‘qemu-system-i386 -fda
pocorgtfo02.pdf’ that the PDF file you are reading is also a bootable disk
image."

[https://greatscottgadgets.com/pocorgtfo/pocorgtfo02.pdf](https://greatscottgadgets.com/pocorgtfo/pocorgtfo02.pdf)

And issue 3:

"This file, pocorgtfo03.pdf, complies with the PDF, JPEG, and ZIP file
formats. When encrypted with AES in CBC mode with an IV of 5B F0 15 E2 04 8C
E3 D3 8C 3A 97 E7 8B 79 5B C1 and a key of “Manul Laphroaig!”, it becomes a
valid PNG file. Treated as single-channel raw audio, 16-bit signed little-
endian integer, at a sample rate of 22,050 Hz, it contains a 2400 baud AFSK
transmission."

~~~
na85
Any reading you can link to regarding how to accomplish stuff like this? I.e.
valid pdfs that are also valid bootable images?

~~~
qu4z-2
They usually document their tricks in the file itself. For example, see
Chapter 8 in GP's link.

------
byron_fast
Cool, but it's a chipmunk not a squirrel. Just saying!

~~~
f-
Hey - I'm the author of that page. Your comment is a common misconception, but
the animal pictured is actually a golden-mantled ground squirrel. You can tell
because the stripe doesn't extend to the eye. Thank you for subscribing to
squirrel facts!

~~~
mistersquid
I can attest to the veracity of your claims. :-) [0]

[0]
[https://news.ycombinator.com/item?id=12264275](https://news.ycombinator.com/item?id=12264275)

------
notadog
If you think that this is interesting, you should check out Daeken's Magister:
[http://demoseen.com/windowpane/magister.png.html](http://demoseen.com/windowpane/magister.png.html)

It's a PNG that's interpreted as HTML and loads itself as compressed
JavaScript!

~~~
gammatrigono
[https://news.ycombinator.com/item?id=4209168](https://news.ycombinator.com/item?id=4209168)

So, literally copying previous top comments on old posts is now a popular
thing?

~~~
dang
That's indeed weird. Please don't do that, GP!

It seems to have been a one-off, so we won't ban the account.

