Show HN: Embed your source code in PNG files

derefr · on Dec 23, 2021

I thought that this is what PICO-8 cartridge files (.p8.png) did; but it turns out that those use steganography within the image pixmap itself, rather than taking advantage of ancillary PNG chunks. Kind of a strange choice, honestly.

On a separate note, a fun fact: PNG uses what is basically a de-facto v3 of the https://en.m.wikipedia.org/wiki/Interchange_File_Format . PNGs can be parsed and generated with generic IFF tools. (Which can also be used to operate on AIFF, TIFF, and—perhaps surprisingly—Erlang .beam files.)

IFF is, IMHO, an incredibly underutilized “metaformat” for how simple it is to work with, how observable/inspectable the results are (for a binary file format), and yet how efficient it is (compared to text-based formats.)

PNG’s (backward-compatible) extensions over IFF are all pretty great ideas as well — e.g. using chunk name capitalization as metadata to mark chunks as optional (plus a few other things); linking chunks with checksums to indicate when derived chunks need to be recalculated; etc. If these extensions were promoted to features of the metaformat itself, that’d make probably the best document-oriented container metaformat around, beating e.g. “a zip file with a META-INF dir inside” by a wide margin. Sadly, AFAIK, nobody has tried to write a formal IFFv3 RFC to formalize these extensions. (Maybe something I’ll do one day myself.)

spicybright · on Dec 23, 2021

Wow, I never knew that about PICO-8. Very strange choice indeed given how much easier they could have done it.

I remember downloading a lot of albums off 4chan where you would save the image, then rename it to .zip to get a folder of MP3s. Good times!

I do wonder though how "stable" embedding files is. Like do most major hosting services process images to the point it strips that stuff out?

tyingq · on Dec 23, 2021

>I do wonder though how "stable" embedding files is.

I was surprised to see this was still working 3 years later:

  $ curl -s https://pbs.twimg.com/media/Dq2sPGNU0AEKyyC.jpg | dd status=none bs=1 skip=599 count=40| sh

From https://news.ycombinator.com/item?id=18347985

thrashh · on Dec 23, 2021

It’s not that strange if you consider that you use image files to transfer images. Trying to store data outside of that (in a custom chunk) isn’t a use case anyone is accommodating so it will get stripped even by accident.

So if you use stego and store data in the image, you have a bigger chance of preserving the data.

derefr · on Dec 25, 2021

Depends on whether you're expecting people to treat the files as "images" that happen to contain other data, and so e.g. upload them to photo-sharing sites, imageboards, etc.; or whether you're expecting people to treat the files as "programs" that happen to render with a thumbnail by default on most Operating Systems.

Personally, I don't see a PICO-8 .p8.png cartridge as an "image" any more than a Fireworks project file is an "image." It's a document that wraps itself in an image container to enable the 'document' to be previewed. It just so happens that you're able to very carefully treat the document as "an image" in some contexts (e.g. if you put it up on your own web server, and then embed the resulting URL in a webpage, people who right-click "Save As..." the 'image' will get the original document.) But this isn't really the goal (since you could do that just as well by generating an ancillary "cover art" file to go with the cartridge, and linking to the cartridge file using the cover art file.) The goal of such embeddings is just to make your document visually "self-describing" when examined with regular OS tools.

Of course, if you're considering designing your own PNG-embedded document format, and sharing the document losslessly via imageboards, photo-sharing sites, etc. is explicitly the goal of your format choice of PNG; then yes, steganography is the way to go.

But, well... if you are going to go the "embed the data in the pixmap" route — why not go all the way? Skip steganography (which will survive re-containerization, but won't survive the slightest lossy re-encoding), and instead just generate a "cover art" image containing a QR code that embeds the document data. Then the document would even survive digital-analogue-digital conversion!

(For the PICO-8 case, if the .p8.png files were simply art containing QR codes that the software could read directly, then a PICO-8 mobile app could support importing cartridges using the camera. Then people could just stick their carts up as posters at indie game conferences, or give them out as business cards.)

cyansmoker · on Dec 23, 2021

If you are like me, you spend a non-negligible amount of time creating architectural diagrams using a DSL (UML, python libraries or what not), exporting them to a portable image format and uploading these to some form of Wiki.

I would like to be able to edit everything I store in said Wiki, and this flow breaks when it comes to images. Inspired by draw.io, I created this simple util that lets me (and you) store the diagram's code with the image. Now, as long as you have the final image, you and others can keep editing your diagram!

vicaya · on Dec 23, 2021

BTW, plantuml has been storing the source uml dsl in the metadata of generated png for years. cf. https://plantuml.com/command-line#ce21470ab49d1d19

jamietanna · on Dec 24, 2021

I've been using Kroki for this - https://kroki.io is a hosted server for it - and it's pretty great, and supports a tonne of formats

userbinator · on Dec 23, 2021

I thought the first question in the FAQ was amusing:

FAQ Is this an Electron app?

It's a few MB, which is a two orders of magnitude below Electron size, but still seems rather large for what it does; especially the CLI version.

latchkey · on Dec 23, 2021

Looking at the makefile, it is possible to strip out a decent amount of a golang binary size with go build -ldflags '-w -s'...

https://stackoverflow.com/a/22276273/253773

javajosh · on Dec 23, 2021

Cool. But something I want to know is: what are the limits of the text chunk in PNG? I just browsed the specification[1] but there was no mention of it.

It's also interesting to me that the spec says that the viewer should give the user a way to look at all the textual parts of a png (there are three), although I've never seen this offered.

[1] https://www.w3.org/TR/PNG/#13Text-chunk-processing

userbinator · on Dec 23, 2021

https://www.w3.org/TR/PNG/#5DataRep

2GB like all the other chunks.

MontagFTB · on Dec 23, 2021

Adobe Fireworks used to do something similar to this. Their base file format was a PNG of the document composite, and their proprietary data was stored in a nonstandard chunk (which was safely ignored by standard PNG readers). Thus a client could always see the latest of a file simply by sharing it with them- no Fireworks required.

jamiethompson · on Dec 23, 2021

Draw.io does this. When you export a diagram as a PNG. There is an option to embed the source file in the png. If you subsequently open one of those PNGs in Draw.io you can carry on editing it. I find it really handy.

tomcam · on Dec 28, 2021

Whoa. Had no idea!

a9h74j · on Dec 23, 2021

Has one use-case been creating "ebooks" (including visible covers), but not ebook per se?

Terry_Roll · on Dec 23, 2021

Is this a troll post considering the NSO hacking post seen here https://news.ycombinator.com/item?id=29640474 and the google project zero post? https://googleprojectzero.blogspot.com/2021/12/a-deep-dive-i...

What is being described will get you on your way to the NSO hack as a service, because their hack was using a decompression algo to build a virtual cpu of sorts and run it, in a single pass of the decompression process.

How hard would it be to embed source code in such a way that you could also build a limited cpu to then run this embedded source code in memory with a single pass of graphic processing or decompression algo?

oshiar53-0 · on Dec 23, 2021

No, that's irrelevant here.

PNGSource embeds code in ancillary chunks. That's it. No code execution. No steganography (yet).

Terry_Roll · on Dec 23, 2021

Yeah, but because of NSO I now look at every mandatory or common practice process that is used on a file to see if the NSO methods can be used for exploitation.

For example, PNG seems benign, but what it was stored in a zip file of sorts, could the MS windows zip process be exploited, could 7-Zip be exploited or even PKzip for that matter, do you see where I am coming from?

What about if I embedded some icons and image files as a resource in an application exe or dll. You have persistence then, even if its just a beacon or some unique domain name lookup to track the app online. https://docs.microsoft.com/en-us/windows/win32/menurc/enumer...

Likewise, what about compression built into HTML/Web browsers, could that be exploited? https://en.wikipedia.org/wiki/HTTP_compression

Would it be possible to build something into a webpage or imagefile on a popular website where it can exploit the methods NSO have/are using? Maybe we should go back to reading the internet using wget?

oshiar53-0 · on Dec 24, 2021

> Yeah, but because of NSO I now look at every mandatory or common practice process that is used on a file to see if the NSO methods can be used for exploitation.

NSO is definitely neither the first nor the only one to do this, but let's move on.

> For example, PNG seems benign, but what it was stored in a zip file of sorts, could the MS windows zip process be exploited, could 7-Zip be exploited or even PKzip for that matter, do you see where I am coming from?

Any nontrivial parser written in an unsafe language has a potential for being exploitable, that's for sure.

> What about if I embedded some icons and image files as a resource in an application exe or dll. You have persistence then, even if its just a beacon or some unique domain name lookup to track the app online.

This is why we have code signing. Well, that works unless the ASN.1 parser or the signature verifier has got some security issues, of course.

> Likewise, what about compression built into HTML/Web browsers, could that be exploited? https://en.wikipedia.org/wiki/HTTP_compression

It's usually much easier to just exploit the renderer/JavaScript engine.

> Would it be possible to build something into a webpage or imagefile on a popular website where it can exploit the methods NSO have/are using?

This is basically how malware distribution works over the web, just look for some VirusTotal samples...

> Maybe we should go back to reading the internet using wget?

If we're at that level of paranoid, bugs in the HTTP parser, TLS implementation, or the TCP/IP stack should be just as sensitive.

Terry_Roll · on Dec 27, 2021

>If we're at that level of paranoid, bugs in the HTTP parser, TLS implementation, or the TCP/IP stack should be just as sensitive.

How many zero days exist when you put a new distro online in order to update, and thats without looking at the firmware for bugs.

oshiar53-0 · on Dec 27, 2021

...why are you even talking to some stranger here then, is it worth enough to risk being exploited with a RCE 0day

Like, uh, just define a clear threat model, accept risks, and move on??? Or just don't use computers

withinrafael · on Dec 23, 2021

Is this preferable over concatenating the code onto the end of the file? The PNG structure remains intact and no app needed for insertion and extraction, right?

netr0ute · on Dec 23, 2021

What about the opposite, embedding PNG files into source code?

shakna · on Dec 23, 2021

xxd can out put any random set of bytes as C code.

For example:

    > xxd -i tmp.f

    unsigned char tmp_f[] = {
        0x38, 0x4a, 0x39, 0x6f, 0x61
    };
    unsigned int tmp_f_len = 5;

I use it as part of a number of build scripts.

rmbyrro · on Dec 23, 2021

Well, isn't it the binary representation of the PNG already?

blacksmith_tb · on Dec 23, 2021

Well, that's not exactly novel[1], though it can be handy.

1: https://png-pixel.com/

colejohnson66 · on Dec 23, 2021

IIRC, HolyC from TempleOS could embed arbitrary files into a source file.