Hacker News new | comments | ask | show | jobs | submit login
A file that's both an acceptable HTML page and a JPEG (view source on it) (coredump.cx)
536 points by mcfunley on July 6, 2012 | hide | past | web | favorite | 128 comments



If you think that's cool, look at Daeken's Magister: http://demoseen.com/windowpane/magister.png.html

A PNG that's interpreted as HTML and loads itself as compressed JavaScript!


Note: Chrome or Firefox with WebGL required. I also wrote an article on how I got this down to 1kb, http://daeken.com/superpacking-js-demos , and just released a new demo based on the same techniques (747 bytes): http://demoseen.com/windowpane/nufl0wer.png.html


Holy Fuck.... that looked really cool as it brought my whole system to a dead stop. Took forever to recover enough to close the tab. :/


Really? I see no slowdown whatsoever. What are you running?


It didn't die this last time so I must have just had a perfect storm of activity right at that moment.


Yeah it locks up Chrome for me, pressing Ctrl+w closed the tab after about 10 seconds. Chrome 20 on Ubuntu 12.04 x86_64. Core 2 Duo with Intel graphics.


It did fine for me on Chrome 20, Arch 64bit. Also a dual core, but nVidia graphics. Maybe the intel graphics drivers are being weird?


works fine on Win7 32bit + opera 12


I started zooming out and my graphics driver crashed which is quite a rare a occurrence these days.


Works perfectly here, in both "in browser" as well as "totally blowing my mind" mode. Kudos, sir.



Matraka is seriously worth a watch. All 2d canvas, and it actually has music in a 1k. What's funny is that p01 is the reason I got into web demo development, and he's actually using the png technique I came up with. Makes me proud to be a hacker.

Edit: It also won a much-deserved first place in the DemoJS 1k compo.


Wait, is it using the image data as code to evaluate? My mind is too blown to appreciate this.

EDIT: Oh, okay. It's not a valid PNG. That would have been all sorts of incredible. Still great, though.


I wrote up a little explanation of how it works here https://gist.github.com/3039247


Very interesting, thanks!

By the way, the for loop seems to be irrelevant to the invocation, (1,eval) just returns eval. (1,console.log)("hi") looks like it should work, except it raises an error. (1,2)+3 returns 5, however, and (1,console.log) returns the log function.


it's a performance thing. i forgot the details, but if you change the `(1,eval)` to `eval`, it's all much slower. something with the scope of the code that's being eval'ed, if i recall correctly.


(1,eval)('2*2') evaluates in the global context, so it may be slower I would think. Here's a really long and insanely detailed post about this odd feature: http://perfectionkills.com/global-eval-what-are-the-options/


Trying to right click and save page as in Chrome crashes OS 10.7.4 for me. Pretty cool stuff.


WATCH IT. This file hard-froze my machine (osx/chrome) and I now have to rebuild my dev environment.


How does a hard freeze force you to rebuild your dev env?


Someone doesn't use dotfiles, I'm guessing.


it's god punishing you for F5'ing HN when you should be working.


totally displays binary on iOS Safari.


froze browser for a minute, alerted of unresponsive script and printed some binary


Here's the same thing done with a compiled executable using the padding bits in an ELF file.

http://cs.unm.edu/~eschulte/data/webpage.html

download webpage.html and it should run on any 64-bit linux machine as an executable printing out the same text shown on the web page. Here's the C file used to compile the original executable, nothing exciting...

http://cs.unm.edu/~eschulte/data/webpage.c


I'm from the Internet, and I can confirm that he promises that this file is not malicious.



But valid HTML5! Just kidding, 934 Errors, 73 warning(s) when validated as HTML5.


He never said valid, just acceptable ;-)


You can also use this trick to launch cross-site script attacks against sites that allow you to upload images.

Step 1: upload the "image" to the site. Let the site do whatever it does to ensure it has received a valid image. Nine validators out of ten will happily accept the file; the case that is likeliest to shoot you down is if the site modifies the image by cropping, resizing, or watermarking it.

Step 2: point your victim back to the uploaded "image" as though it's actually a page, and presto!, it's a page -- a page with malicious javascript in it.

Step 3: profit!


There used to be a vulnerability where you could combine jars and gifs to similar effect; gifs are read front to back and jars (well, zip archives) are read back to front, so all you needed to do was concatenate them, upload your gifar, embed an applet pointing to the gifar into a page you owned and get a person to visit. Pretty sure it was patched ages ago though :)


But almost any site will be sending the image along with a Content-type header, so your browser would still open it up as an image, not an HTML page with JavaScript? Or no?


If you can control the filename, you can do things like embed <?php something_malicious(); ?> into an image, put it up as foo.jpg.php, and then execute it by hitting the 'image' directly. That's... sadly common.


Filename validation, I would imagine, is far more common than content validation.

If you are inspecting binary data for validity, and not checking the parameter (filename) that affects how Apache serves your file, you are doing something wrong.


Checking a filename may leave bugs to exploit. It's quite unlikely, but why break your head over a possible way to exploit your validation when you can just rename the file to something of your liking? Check the file for a png, jpg, etc. header, append that as extension (erroring when none was found), and done; no risk of it being executed.


You'd think this would be enough protection, but it turns out that some browsers (looking at you, IE) actually try to infer the content type from the page content. See http://msdn.microsoft.com/en-us/library/ms775147(v=vs.85).as... for more info.


This is why you should really be careful to get Content-Type correct and use X-Content-Type-Options: nosniff.


Congratulations, the user is on your website, running your malicious javascript. Which is going to do what, exactly? It doesn't have access to any other site's cookies or information.


I think (s?)he meant upload the image to someone else's site. The malicious code be running on that site... not your own. So then it does have access to that site's cookies.


nope, in this case the image will be interpreted as an image, not a script.


Right. I just took the example squirrel page, saved it, altered the comment section of the image to insert some javascript code alert('Hello') and opened it in my browser. It works but only if it is interpreted as html. So you'd need to be able to control it more.


Yeah, what jack-r-abbit said: the point is you've got malicious script embedded in a page from somebody else's web site, so you have access to cookies and can inspect and/or manipulate the user's session arbitrarily.


Nope, you have an image embedded in somebody else's web site, the script never runs.

Also that's completely different than what you originally said.


Andy, you're very confused.


Look. You are the one making the claim that you can exploit this. I call bullshit. So either prove it, or drop it. Accusing me of being "confused" does not provide evidence for your claim.


I'm not sure where I appeared to contradict myself in my earlier posts, so I'm unsure how to clarify this for you. Best I can do is this:

Here is a link to a variation of the "image" file which is the subject of this post: https://dl.dropbox.com/u/131649/squirrel.html

I have embedded harmless (-- honest! --) script in the file to demonstrate that your browser will execute the script in the context of the site where the file is hosted.

So, click the link. (Again I promise that no harm will come to your computer.) Now imagine that dl.dropbox.com is, instead, some hypothetical site where users are expected to upload images, but not HTML documents containing arbitrary script, and the security implications should be fairly obvious.


Be incredibly annoying and make you look like an idiot? e.g. the MySpace worm http://namb.la/popular/tech.html


well, after all samy is my hero.


I've always wondered how the site snag.gy does something similar. Take this link, for example (you'll have to disable AdBlock if you want to see the ad): http://i.snag.gy/0obAy.jpg (ignore the image itself; it was one of the first to pop up in my history)

The source is just the image, and you can embed the image, but there's an ad under the image. Also, right click -> view image or copy image location point to the same URL.


I made Snaggy! It's pretty hacky; each browser sends a slightly different header depending on if you're opening the image directly or if you're embedding it. I just have a check for each browser and serve the appropriate page. It doesn't work great on all browsers though (I haven't even tested mobile). I'm going to try to clean it up sometime.


Hey, just wanted to say Snaggy is awesome. I saw your post a long time ago on Reddit, and you have saved me many seconds in the screenshot process. Not a huge fan of the fact that "View image" doesn't work exactly as I'd expect, but I understand the ads are necessary, so no complaints. Thanks for the awesome site!


I suspect that's doing some kind of autodetection to decide which version of the page to serve. (Perhaps based on what MIME type the browser requests.)


Yep. In Chrome, pasting the URL in the address bar:

  Accept:text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Embedding the URL as a <img> element in a HTML file:

  Accept:*/*


Depends on the MIME type the server requests.

If your browser requests a html page, it will respond with a html page... and vice versa.


Perhaps I don't know enough about how this works, but couldn't you use this to inject runnable javascript in to a page? If this is possible it's pretty scary as it would allow you to upload a hidden payload in to an otherwise innocent looking image.


Yes, certain versions of IE can be tricked into executing javascript in images: http://www.h-online.com/security/features/Risky-MIME-sniffin...



I was able to add javascript code in there and it executed properly, but the browser won't parse the jpg as html unless I give it a .html filename extension. I don't see how this could be easily exploitable.


Is it possible for the file extension to say one thing and the MIME type to say something else? So the file extension could be .jpg (reassuring the user that it is only an image) but the HTTP response says it is text/html?

I think a similar exploit was used recently with .svg images - they can contain javascript (being XML) which will be executed by the browser. Not sure about the details however.


>I think a similar exploit was used recently with .svg images - they can contain javascript (being XML) which will be executed by the browser. Not sure about the details however.

However, the JavaScript shouldn't execute if the image is embedded via <img>.


Interesting thought but browsers should not interpret javascript inside an image. I would expect image rendering to be separated. Can someone with an expertise in browser design tell us how this actually works?


On slightly tangential lines, it's possible to manipulate D's compiler to output object code that renders as a graphic: http://h3.gd/ctrace/


I guess the HTML renderer skips the JPEG information and the image renderer skips the HTML information. Smart!


The CSS on the page makes the JPEG header (html body) invisible:

body { visibility: hidden; } .n { visibility: visible; position: absolute; padding: 0 1ex 0 1ex; margin: 0; top: 0; left: 0; }

the html portion comments the remaining part of the data with <!--


Can someone explain what is going on here?


The file has been created in such a way that the web browser is ignoring the non-html parts of the document, while the image renderer is ignoring the parts that make up the html page.

The first part probably isn't too hard, since most web browsers go to great lengths to render non-standard html in a sensible way, I'm not too sure about the second part. I'm guessing the jpeg spec has some variable length space in some kind of file header that the html for the page can be put in to.

I read something similar a while back (I think it was called a Jafar attack) where a clever person worked out how to create a file that was both a valid .gif image and .jar java executable.


jar files are just zip files, which put the header info at the end of the file, making it very easy to construct a jar/zip that's also got a different file header at the front. bad news for web apps which allow such files to be uploaded without inspecting them. it's not a terrible idea to always transcode all uploaded images/videos to prevent that.


> I think it was called a Jafar attack

GIFAR: http://en.wikipedia.org/wiki/GIFAR


Web browsers need to be very loose in how they interpret data for historical reasons. A lot of this is even codified in the current HTML standard, like always content-sniffing images and identifying data that can be ignored during parsing. You also have HTML comments, which is where most of the JPEG data is packed in this example. Combine that with the fact that image formats generally allow you to pack comments or other arbitrary metadata into fields, and you end up with a file that can be read as either a JPEG or HTML. Also, Michal has a weird thing for squirrels.


view source shows that he has an html document embedded in the jpeg.

Apparently, the jpeg format allows this.


The HTML document is in the "comment" field of the JPEG, which is perfectly reasonable.

What is surprising is that web browsers just ignore the 24 bytes of binary data between the start of the file and the start of the HTML.


It's interpreting that data as "text" and sticking it at the beginning of the <body> of the document. The css makes the body invisible so you don't see it on the screen (unless you disable that rule) - take a look in the DOM inspector.


As pointed out by someone else, the browser has been instructed via CSS to hide the body. If you inspect the page and manipulate the CSS to show the body, those odd bytes (ÿØÿàJFIF,,ÿþr) do get rendered.


That extra binary data starts with an HTML comment tag (<!--) which is never closed, so it makes sense that it is ignored.

Edit: Misread your comment...the bytes at the beginning of the file are hidden by CSS (as pointed out by others).


Commenter is referring to the binary data at the beginning of the file, which makes up the file header for the jpeg. It is before the <html> tag and is neither commented out or actually ignored.

The browser actually picks that "text" up and shows it on page. It's just the html content itself contains some css rule to make that text not visible.


JPEG allows for additional data chunks (that's how thumbnails, EXIF data, ... are added). The HTML uses CSS to hide the "body" (since that would include the JPEG header), putting the real content in a container element that poses as new root.

Neat hack.


W3C squirrelpocalypse?


>Pretty radical, eh? Send money to: lcamtuf@coredump.cx

Would have been smarter to put a bitcoin address :)


There's a practical side to that trick. I have altered a posterous template to make my posterous a working JSONP response. http://zbyszek.posterous.com


How did you do it?

ps. you have two uncommented slashes at the top of your <body> tag


Yeah, the slashes are important. See my answer to the same question below in this topic. I don't want to repost more in case it's not welcome by the community :)


I call such things "chameleon files"

JS and PHP is also possible http://tantek.pbworks.com/w/page/19402872/CassisProject

JS and HTML http://project.mahemoff.com/josh/ (also demonstrated by Tantek Çelik earlier on in a project that eentually led him to Cassis.)


The term most commonly used for such things is a 'polyglot': http://en.wikipedia.org/wiki/Polyglot_(computing)


Didn't know that, thanks!


Hello everyone, i appreciate the great solution that this is but i have a similar problem that could be solved by this solution but has not been solved.

My problem is that i want to publish a series of JPEG images as a Kindle book, but i can't, since the reader slices some of my images and puts padding around them. I would prefer that the images render like the cover page, in full screen, but this is impossible to achieve despite saving the images in 600 * 800 like the cover page.

How can i use this great wisdom to create an .epub file that then becomes a Kindle book.

PS; The scans are a business book that is made up entirely of mindmaps, which are like spatial roadmaps on paper. The book has been written to teach newbies in business the most important things and all the trade-offs involved in this important things.

I think that that sort of thing would do very well on the Kindle platform but i am unable to do it.


You can't do it with epub because they would just get converted to Mobi when it gets to Amazon. You might be able to do something with the new Mobi 8 format though: http://www.amazon.com/gp/feature.html?docId=1000729511


The problem with KF8 is that i now will be limited to a fraction of the kindle crowd i was targeting. KF8 does look promising though, truly a new age of publishing. What about the iBooks platform, any hope over there?


When I saw this, the first thing I immediately thought was: why don't webcomic authors use this to fix the problem of people linking directly to their images instead of the pages their images are on? This could revolutionize how webcomics and social aggregators interact.


Hotlinking isn't actually a big problem these days, or at least I don't hear anyone complaining about it. A bigger issue is people who take the comic, remove any watermarks, and then upload it somewhere else.

I don't know why they remove the watermarks, but that step alone invalidates your suggestion.


Looks more like a chipmunk to me.


I think it's a 13-lined ground squirrel. Commonly mistaken for a chipmunk. Not all squirrels have bushy tails.


Golden-mantled ground squirrel.


My browsers, Chrome 19.0.1084.56 and Firefox 13.0.1, on Linux, both render it as a bunch of garbage characters. This does not appear to be valid HTML to them.

However I can download the file, rename it to .jpg, and view the image just fine.


This trick does not work correctly in IE9, due to the unclosed comment tag.


This looks pretty scary on Windows Phone 7. Anybody else getting chunks of video memory all over the page? (HTC Arrive).


> No server-side hacks involved

Well, the JPEG file doesn't have the correct mime-type. Chrome warns, "Resource interpreted as Image but transferred with MIME type text/html" in the console. Apparently in the context of an <img src=""> URL it figures it out though.


I don’t have GraphicsMagick installed on this machine, else I would try this:

    $ gm convert http://lcamtuf.coredump.cx/squirrel/ -comment '' x.jpg
…and…

    $ gm identify -format '%c' http://lcamtuf.coredump.cx/squirrel/


is apt-get/yum not working for you?


Or no root privileges.


Or I’m on a phone.


Or you're a phone.


Apparently OP uses some JPEG feature to create custom header (EXIF?). It allows to embed HTML close to the start of the file. HTML ends with <!--, which saves HTML parser from choking on actual image data that comes afterwards.


Reminds me of this story (JPEG and ZIP as one file):

http://www.reddit.com/comments/arc79/reddit_i_got_the_best_p...


That just looks like the "append rar to a jpg" trick that /i/nsurgents have been using to pass files around on the *chans for ages ( see: dangerous kitten http://encyclopediadramatica.se/Dangerous_kitten )


Indeed it is.


I used to use a similar trick with windows PE executables and ZIP files, basically making self extracting ZIP applications.


He should do it with his 404 page, too:

http://lcamtuf.coredump.cx/squirrel/404.html

:)


Any practical use for this or just for fun?


I posted my comment right before spotting your question... I Used a trick like that to be able to load my posterous posts with JSONP. http://zbyszek.posterous.com is loaded as content in http://naugtur.pl


Please explain~ (I can tell the magic is happening in fun.js and I could figure it out if I spend some time on it, but wouldn't mind the explanation handed to me on silver platter.)

http://zbyszek.posterous.com - the theme for this, did you create it? It's neat. There are a pair of forward slashes at the beginning of the page though. In Chrome at least.

http://naugtur.pl - Love the categorization and of course the animation. What are you using to do the animation?

Thanks~


my posterous page starts with two slashes. They are discrete and they allow me to put JS code in a HTML comment. If you look at the source, you'll notice that when interpreted as JS it feeds a big array to window.posterousCB(). The rest is plain ond JSONP - define the global function first, and then load external cross-domain resource that runs the function with data.

As for animations - It's just a CSS transition definition and a single rule to rotate and scale an element. I created a hack that makes the CSS rule apply recursively. fun.js has some code that converts the simple html into a deeper structure.

I think someone else got the idea of _recursive CSS_ first, but it was used for drawing shapes as far as I remember.


There was a 'virus' that spread on 4chan years ago that did something like this. AFAIK when saved as a .js file and run it would post itself back 4chan to continue spreading.


For a "positive" 4chan use, this allowed people to concatenate zip files at the end of jpegs to make "books". It was common to use a picture of a book, and then concat the book in a zipped pdf or other text file and then post it.


The best part about that virus was it also zipped up a random file from your pc to upload along with it. Everyone in /g/ was grabbing for these things. Can you say identity fraud.


Did this virus operate on the honor system by any chance?


This is awesome. It's been a long time since I was blown away by an HTML hack but this blew me away. Yes!


This is what Dropbox needs to do to get everyone to stop complaining about dropping the public folder.


You can put HTML files in public folders. Files in public folders are loaded on a separate domain.


No, I mean for new users they are doing away with the public folders. You can now get a public link to any folder in your dropbox, it's just it goes to a splash page where they let you view the pic, or download the html. Basically new users won't be able to use their dropbox public folder as a webserver because of that anymore, no hotlinking images with dropbox links.


I know. My point was that this isn't a good justification to remove the public folder, as people could always put HTML files there and it was safe to do so.


Using firefox, right click on the image in that page and select "View Image"


It's stuff like this that makes me smile at humanity.


How does it work?


trick summary:

enter in you jpeg comment field: "<html>...your page...</html><!--"

then the "image" will look like:

   @^PJFIF^@^A^A^A^A,^A,^@^@000^Cr<html>...your page...</html><!-- rest of garbage
to the browser this is just the same as:

   \n
   \n
   \n
   <html>...your page...</html>


lol "send money" for discovering an idea that is at least 15 years old? gtfo.


This is just more evidence that we should strive to do everything in a browser. Or an app that functions like one. It is more secure. Details should not be exposed to the user.

Remember there is no file system. In fact, there are no files.

We hid them so they do not exist. Out of sight, out of mind.

There's no such thing as binary. That only existed when you were younger. Now it no longer exists. The numbers are gone. They do not exist.

What's really important is how good fonts look. The javascript, the CSS, the browser!

No user cares about content like text, audio and video, they care about window dressing: html and browsers. They care about what you can do with javascript. What can you do? Show me some tricks.

Content alone is not enough. Who wants to read a story or download a video clip? You have to present it; you must entertain and you must persuade, by trickery if necessary. It's not the content, silly. It's the webpage. No javascript, no dice. Don't just deliver the content, entertain me for a few minutes first. Tell me about something else.

No one cares about TV programming. They care about the TV's setup screens and onscreen channel guide. They want these menus to come to life. They want their TV's to become "intelligent".

A webpage without javascript is like a lifeless onscreen TV channel guide that does not track what you watch and report it to marketers, or make automatic suggestions on what you should watch, or display animations while you sit and wait for seconds while the TV's software is "Loading..." in response to your last button push. Boring.

Users want books, newspapers, radios and TV's that have "artificial intelligence". They want others to know what they are reading and watching and they want advertisers to address them by name. Let's get with it. Bring us the future.


I am a html5 JS developer taking part in all this and I happen to share your opinion from time to time. I enjoy using lynx (ok, links2 actually).

Don't worry, there's lots of people who are trying to keep the web worth your attention :)


Few other "browsers" beat links (or links2 nogui), and they are not even "browsers": e.g., the original netcat, tcpclient and BSD's ftp with http support. No, curl is not on the list. It's dog slow.

No one needs to cater to my attention. Apparently I'm not today's end user. I'm for all intents and purposes a blind user. The web is not for me. I don't even start X11 if it's not necessary. I work with text. Graphics and multimedia are for recreation.

One of my favorite recent HN comments/stories was from Diego Basch. He described what happened at Inktomi, an early search engine that eventually was made all but obsolescent by Google.

In his story, he stated what he saw as one of the sure signs that Inktomi was being overtaken by Google. He said he saw that Inktomi engineers did not use Inktomi's search. They used Google.

Are complex browsers the way of the future? I find it easier to work on _operating systems_ than I do to work on today's "modern browsers". That is how complex (and therby insecure) the code has gotten. I would rather try to understand the code for ffmpeg or mplayer than I would for Chrome or Mozilla. But as I said, I'm not the "end user" to focus on.

Developers/engineers gotta eat. Do what works today. Focus on what you think "end users" are doing. Try to anticpate what they "want".

When I'm pondering "the next big thing" and what may work tomorrow to pay the bills, I will always remember Diego's story of the Inktomi engineers.




Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: