Hacker News new | past | comments | ask | show | jobs | submit login
Hello, PNG (da.vidbuchanan.co.uk)
268 points by EntICOnc on Jan 18, 2023 | hide | past | favorite | 155 comments



Honestly, I don't consider PNG a simple format. The CRC and the compression are non-trivial. If you're using a new language that doesn't have those features built in and/or you don't have a reasonable amount of programming experience then you're going to likely fail (or learn a ton). zlib is 23k lines. "simple" is not word I'd use to describe PNG

Simple formats are like certain forms of .TGA and .BMP. A simple header and then the pixel data. No CRCs, no compression. Done. You can write an entire reader in 20-30 lines of code and a writer in other 20-30 lines of code as well. Both of those formats have options that can probably make them more work but if you're storing 24bit "True color" or 32 bit "true color + alpha" then they are way easier formats.

Of course they're not common formats so you're stuck with complex formats like PNG


For audio, my all-time favorite format to work with is raw PCM.

One time, I had to split a bunch of WAV files at precise intervals. I first tried ffmpeg, but its seeking algorithm was nowhere near accurate enough. I finally wrote a bash script that did the splitting much more accurately. All I had to do to find the byte offset from a timestamp in an raw PCM audio file is multiply the timestamp (in seconds) by the sample rate (in Hz) by the bit depth (in bytes) by the number of channels. The offset was then rounded up to the nearest multiple of the bit depth (in bytes) times the number of channels (this avoids inversions of the stereo channels at cut points).

Once I had the byte offset, I could use the head and tail commands to manipulate the audio streams to get perfectly cut audio files. I had to admire the simplicity of dealing with raw data.


Smart file systems should offer a way to access raw datastreams and elements within more complex filetypes. e.g. one could call fopen("./my_sound.wav/pcm_data") and not have to bother with the header. This would blur the distinction between file and directory, requiring new semantics.


That sounded quite fun, thanks for sharing.


PNG is not a format for uncompressed or RLE "hello worlds". It's a format designed for the Web, so it has to have a decent compression level. Off-the-shelf DEFLATE implementations were easily available since its inception.

I think it is pretty pragmatic and relatively simple, even though in hindsight some features were unnecessary. The CRC was originally a big feature, because back then filesystems didn't have checksums, people used unreliable disks, and FTPs with automatic DOS/Unix/Mac line ending conversions were mangling files.

PNG could be simpler now if it didn't support 1/2/4-bit depths, keyed 1-bit alpha for opaque modes, or interlacing. But these features were needed to compete with GIF on low-memory machines and slow modems.

Today, latest image formats also do this competition of ticking every checkbox to even worse degree by adding animation that is worse than any video format in the last 20 years, support all the obsolete analog video color spaces, redundant ICC color profiles alongside better built-in color spaces, etc. By modern standards PNG is super simple.


There was talk about upgrading PNG to support the equivalent of animated GIFs but it never really happened because of complexity, see

https://en.wikipedia.org/wiki/Multiple-image_Network_Graphic...

As for color spaces that is a case where things get worse before they get better. In the 1990s I remember the horror of making images for the web with Photoshop because inevitably Photoshop would try some kind of color correction that would have been appropriate for print output but it ensured that the colors were wrong every time on the screen.

Today I am seeing my high color gamut screen as a problem rather than a solution because I like making red-cyan anaglyph images and found out that Windows makes (16,176,16) when I asked for (0,180,0) because it wants to save my eyes from the laser pointer green of the monitor by desaturating it to something that looks like sRGB green to my eyes, but looking through 3d glasses it means the right channel blends into the left channel. To get the level of control I need for this application it turns out I need to make both sRGB and high gamut images and display the right one... Which is a product of the complexity of display technology and how it gets exposed to developers.


Animated PNGs eventually did quietly happen in the form of APNG.

https://en.wikipedia.org/wiki/APNG


> There was talk about upgrading PNG to support the equivalent of animated GIFs but it never really happened because of complexity

This was mostly due to overengineering on the part of the PNG committee. Why stop at animated PNGs, when we could support sound and interactivity! MNG is not a simple format, and the spec has MNG-LC ("low complexity") and MNG-VLC ("very low complexity") subsets, because the whole thing is too complex. Did you know you can embed JPEGs in MNGs? That it has synchronization points for sound, even though sound is still "coming at a later date"? That it allows pasting other images into the movie at arbitrary 2D transforms?

MNG's complexity is self-inflicted, because they second-system effect'd their way into features nobody wanted.

APNG, by contrast, is a series of PNG chunks with a couple extra fields on top for timing and control information.


> Today, latest image formats also do this competition of ticking every checkbox to even worse degree by adding animation that is worse than any video format in the last 20 years,

yet just seeking in any random vpX / h26x / ... format is A PITA compared to trusty old gifs. it's simple, if you cannot display any random frame N in any random order in constant (and very close to zero) time it's not a good animation format


You can't do that for GIF. Each frame can be composited on top of the last frame (ie. no disposal; this allows storing only the part that changed), so to seek to a random frame you may need to replay the whole GIF from the start.

The reason you can seek to any frame is GIFs tend to be small, so your browser caches all the frames in memory.


GIF has no seeking at all. It's entirely sequential, requires single-threaded decoding start to finish.

OTOH every video format has keyframes, is more friendly to post-90s CPUs, and recent formats are parallelizable.

You have only been able to notice how "heavy" VPx is because lengths and resolutions used with this format are way beyond GIFs capabilities.

Try encoding a mere 10-minute 30fps 1080p video in a GIF, I dare you.


That’s not a format problem so much as a viewer software problem


Simple formats are PPM / Netpbm; they’re ASCII text with an identifier line (“P1” for mono, “P2” for grayscale or “P3” for colour), a width and height in pixels (e.g. 320 200), then a stream of numbers for pixel values. Line breaks optional. Almost any language that can count and print can make them, your can write them from APL if you want

As ASCII they can pass through email and UUNET and clipboards without BASE64 or equivalent. With flexible line breaks they can even be laid out so the monochrome ones look like the image they describe in a text editor.

See the examples at https://en.wikipedia.org/wiki/Netpbm#


The Netbpm format is amazing if you quickly want to try something out and need to generate an image of some sorts. The P6 binary format is even simpler, you write the header followed by a raw pixel data blob, e.g.:

    fprintf(stdout, "P6\n%d %d\n255\n", WIDTH, HEIGHT);
    fwrite(image, 1, WIDTH * HEIGHT * 3, stdout);
Yes, I know, this obviously misses error handling, etc... The snippet is from a simple Mandelbrot renderer I cobbled together for a school exam exercise many moons ago: https://gist.github.com/AgentD/86445daed5fb21def3699b8122ea2...

The simplicity of the format nicely limits the image output to the last 2 lines here.


I use this all the time. I love that it's simple enough that I can type something like those two lines off the top of my head at this point. And as an alternative to that fwrite(), another common pattern that I use is:

    for (int y = 0; y < HEIGHT; ++y)
        for (int x = 0; x < WIDTH; ++x)
        {
            // ... compute r, g, and b one pixel at a time
            printf("%c%c%c", r, g, b);
        }
I also find ImageMagick very convenient for working with the format when my program writes a PPM to stdout:

    ./myprog | convert - foo.png
or:

    ./myprog | display -


I don't consider ASCII simple because it needs to be parsed (more than a binary format).

As sample example, a binary format could be as simple as

    struct Header {
      uint32 width;
      uint32 height;
    }

    struct Image {
      Header header;
      uint8* data;
    }

    Image* readIMG(const char* filename) {
      int fd = open(filename, ...)
      Image* image = new Image();
      read(fd, &image->header, sizeof(image->header));
      size_t size = image->header.width * image->header.height * 4;
      image->data = malloc(size);
      read(fd, image->data, size);
      close(fd);
      return image;
    }
Yea I know, that's not a complete example, endian issues, error checking.

Reading a PPM file is only simple if you already have something to read buffered strings and parse numbers etc... And it's slow and large, especially for todays files.


It would be nice if the CRCs and compression were optional features, but perversely that would increase the overall complexity of the format. Having compression makes it more useful on the web, which is why we're still using it today (most browsers do support BMP, but nobody uses it)

The fun thing about DEFLATE is that compression is actually optional, since it supports a non-compressed block type, and you can generate a valid stream as a one-liner* (with maybe a couple of extra lines to implement the adler32 checksum which is part of zlib)

The CRCs are entirely dead weight today, but in general I'd say PNG was right in the sweet-spot of simplicity versus practical utility (and yes, you could do better with a clean-sheet design today, but convincing other people to use it would be a challenge).

*Edit: OK, maybe more than a one-liner, but it's not that bad https://gist.github.com/DavidBuchanan314/7559825adcf96dcddf0...

Edit 2: Actual zlib deflate oneliner, just for fun:

  deflate=lambda d:b"\x78\x01"+b"".join(bytes([(i+0x8000)>=len(d)])+len(d[i:i+0x8000]).to_bytes(2,"little")+(len(d[i:i+0x8000])^0xffff).to_bytes(2,"little")+d[i:i+0x8000]for i in range(0,len(d),0x8000))+(((sum(d)+1)%65521)|(((len(d)+sum((len(d)-i)*c for i,c in enumerate(d)))%65521)<<16)).to_bytes(4,"big")


>The CRCs are entirely dead weight today

Why?

The usual answer is that "checksumming should be part of the FS layer".

My usual retort to such an assertion is that filesystem checksums won't save you when the data given to the FS layer is already corrupted, due to bit flips in the writer process's memory. I personally have encountered data loss due to faulty RAM (admittedly non-ECC, thanks to Intel) when copying large amounts of data from one machine to another. You need end-to-end integrity checks. Period.


I agree with the "usual" answer, or more generally, "the layer above". We shouldn't expect every file format to roll its own error detection.

If you truly care about detecting bit-flips in a writer process's memory, that's a very niche use-case - and maybe you should wrap your files in PAR2 (or even just a .zip in store mode!).

99% of in-the-wild PNGs are checksummed or cryptographically signed at a layer above the file format (e.g. as part of a signed software package, or served over SSL).

Edit: Furthermore, the PNG image data is already checksummed as part of zlib (with the slightly weaker adler32 checksum), so the second layer of checksumming is mostly redundant.


> We shouldn't expect every file format to roll its own error detection.

On the other hand, why not? If you are dealing with files that are usually 200kB+, putting 4 or 16 bytes towards a checksum is not a big deal and can help in some unusual situations. Even if the decoder ignores it for speed, the cost is very low.


The space cost is negligible, but the time cost for the encoder is real. Since most decoders do verify checksums, you can't just skip it. Take fpng[1] as an example, which tries to push the boundaries of PNG encode speed.

> The above benchmarks were made before SSE adler32/crc32 functions were added to the encoder. With 24bpp images and MSVC2022 the encoder is now around 15% faster.

I can't see the total percentage cost of checksums mentioned anywhere on the page, but we can infer that it's at least 15% of the overall CPU time, on platforms without accelerated checksum implementations.

[1] https://github.com/richgel999/fpng


I didn't infer 15% from the way it was written there. But most platforms these days have some form of CRC32 "acceleration". Adler32 is easy to compute so I'm even less concerned there.

I spent a bunch of time optimising the code in fpnge (https://github.com/veluca93/fpnge), which is often notably faster than fpng (https://github.com/nigeltao/qoir/blob/5671f584dcf84ddb71e28d...), yet checksum time is basically negligible.

Having said that, the double-checksum aspect of PNG does feel unnecessary.


Does 15% more time to encode matter? How much time is spent encoding files vs decoding? That is probably still an negligible amount of compute, out of the total compute spent on PNGs.

Your specific number seem to come from an (old version of) an encoder that has super-optimized encode and not (yet) optimized CRC.


If the 15% didn't matter, the optimization wouldn't have been made.


CRC can't save you from faulty RAM. It can save you from bitrot in data at rest and from transmission errors. If you have faulty RAM, all bets are off. The data could be corrupted after it's been processed by the CPU (to compute the CRC) and before it's been sent to the storage device.

Arguably, the real reason CRC is useless is that most people don't care about the data integrity of their PNGs. Those who do care probably already have a better system of error detection, or maybe even correction.


My retort is "please please give me a filesystem for windows that checksums and isn't highly buggy".


Edit 3: simplified the adler32 implementation

  deflate=lambda d:b"\x78\x01"+b"".join(bytes([(i+0x8000)>=len(d)])+len(d[i:i+0x8000]).to_bytes(2,"little")+(len(d[i:i+0x8000])^0xffff).to_bytes(2,"little")+d[i:i+0x8000]for i in range(0,len(d),0x8000))+(((sum(d)+1)%65521)|(((sum((len(d)-i)*c+1 for i,c in enumerate(d)))%65521)<<16)).to_bytes(4,"big")


Agree. Programming video games in the early 2000s, TGA was my goto format. Dead simple to parse and upload to OpenGL, support for transparency, true colors, all boxes ticked.


I had forgotten about that but yes, TGA was easy to deal with even doing low level programming.


I always used PCX for some reason I can't remember.


I once wrote a PCX decoder in Pascal outputting VGA w/mode 13. The cool part for me was it had run length encoding, which I was able to figure out trivially just reading the spec. May not have been the most efficient, but way easier than trying to figure out GIF!


Possibly because it was also used by Andre LaMothe's original Tricks of the Game Programming Gurus book?


I have implemented a Zlib / Deflate decompressor (RFC 1951) in 4000 characters of Javascript. It could be shorter, if I did not try to optimize.

E.g. this C implementation of Deflate adds 2 kB to a binary file: https://github.com/jibsen/tinf


I really like QOI (The Quite OK Image format). It achieves similar compression to PNG, but it's ridiculously easy to implement (the entire spec fits on a single page), and its encoding and decoding times are many times faster than PNG.

https://qoiformat.org


I'm also a big fan of QOI as a simple imagine format.

Yes, it's not as good as PNG (as the sibling comments point out), but I view it more as an alternative to PPM (and maybe a BMP subset), as something that I can semi-quickly write an encoder/decoder if needed.

IMO, PNG is in a completely different level. Case in point, in the linked article the author mentions to not worry about the CRC implementation and "just use a lib"... If that's the case, why not just use a PNG lib?


> It achieves similar compression to PNG

It really doesn’t, even on Wii’s own curated corpus qoi is often >30% larger, and on worst case scenarios it can reach 4x.


It depends on the implementation. fpng can beat QOI in both speed and compression ratio https://github.com/richgel999/fpng


It depends mostly on the year of birth of the beholder.

I imagine in a couple of decades that "built-in features" of a programming environment will include Bayesian inference, GPT-like frameworks and graph databases, just as now Python, Ruby, Go, etc. include zlib by default, and Python even includes SQLite by default.


Some languages will. However there will also be a constant resurgence brand new of "simple" languages without all of that cruft that "you don't need" (read whoever came up with the language doesn't need).


Another relatively simple format, that is apparently additionally superior to PNG in terms of compression and speed, is the Quite OK Image format (QOI):

https://qoiformat.org/

(And OT, but interesting, regarding their acronyms:

P -> Q

N -> O

G->->I ...so close!)


> complex formats like PNG

I have written TIFF readers.

Hold my ginger ale.


PPM reader!


What is PPM? I’m not familiar with the acronym.


It's a particular variant of the Netpbm image format: https://netpbm.sourceforge.net/doc/ppm.html

It's dead simple to emit. The P6 binary version is just a short header, followed by RGB pixel data, one byte per channel.

If you don't have a PNG encoder handy and need a quick "I just need to dump this image to disk to view it" for debugging, PPM is a great format due to how trivial it is. But it doesn't fit a lot of use cases (e.g., files are huge, because no compression).


Ah, got it.

TIFF, on the other hand is a "highest common denominator, lowest common denominator, what the hell, let's just throw every denominator -including uncommon ones- in there" format.

For example, you can have images with four (or more) color channels, of different bit lengths, and different gammas and image characteristics (I actually saw these, in early medical imaging). You can have multiple compression schemes, tile-based, or strip-based layout, etc. A lot of what informed early TIFF, was drum scanners and frame captures.

Writing TIFF: Easy.

Reading TIFF: Not so easy. We would usually "cop out," and restrict to just the image formats our stuff wrote.


I would say Netpbm is similar. Writing it is easy. … reading it … not so much.

PPM is just one format; Netpbm is like a whole family. The "P6" is sort of the identifier that we're using that format — the other identifiers can identify other formats, like greyscale, or monochrome, or the pixel data is encoded in ASCII. The header is in text and permits more flexibility than it probably should. Channels greater than a byte are supported.

Writing a parser for the whole lot would be more complex. (I think TIFF would still beat it, though.) Just dumping RGB? Easy.


Not sure how commonly known it is, but TIFF's extended cousin, GeoTIFF, is a standard for GIS data because of the flexibility you describe, especially the (almost) limitless number of channels and the different data format in channels.

At that point you're not dealing with 'images', but instead raster datasets: gridded data. So, you can combine byte t/f results with int16 classification codes, with float32 elevation data, with 4 channels of RGB+Near Infrared imagery data in uint32, plus some arbitrary number of gridded satellite data sources.

That can all be given lossless compression and assigned geotagging headers, and the format itself is (afaik) essentially open.

https://gdal.org/drivers/raster/gtiff.html is a good resource for anyone interested.

Edit: Plus, its magic number is 42, which is clearly great:

https://www.itu.int/itudoc/itu-t/com16/tiff-fx/docs/tiff6.pd...

"Bytes 2-3 An arbitrary but carefully chosen number (42) that further identifies the file as a TIFF file"


PPM is the ultimate simple format, particularly the plain form:

https://netpbm.sourceforge.net/doc/ppm.html


There's also the X Bitmap format: https://en.wikipedia.org/wiki/X_BitMap

GIMP outputs it, which means you can make much any image embeddable into a C source.


> zlib is 23k lines

I don't know, because zlib makes concessions for every imaginable platform, has special optimizations for them, plus is in C which isn't particularly logic-dense.


> The CRC and the compression are non-trivial. CRC is a table and 5 lines of code. That's trivial.

>zlib is 23k lines

It's not needed to make a PNG reader/writer. zlib is massive overkill for only making a PNG reader or writer. Here's a tiny deflate/inflate code [2] under 1k lines (and could be much smaller if needed).

stb[0] has single headers of ~7k lines total including all of the formats PNG, JPG, BMP,. PSD, GIF, HDR, and PIC. Here's [1] a 3k lines single file PNG version with tons if #ifdefs for all sorts of platforms. Removing those and I'd not be surprised if you could not do it in ~1k lines (which I'd consider quite simple compared to most of todays' media formats).

>Of course they're not common formats so you're stuck with complex formats like PNG

BMP is super common and easy to use anywhere.

I use flat image files all the time for quick and dirty stuff. They quickly saturate disk speeds and networking speeds (say recording a few decent speed cameras), and I've found PNG compression to alleviate those saturate CPU speeds (some libs are super slow, some are vastly faster). I've many times made custom compression formats to balance these for high performance tools when neither things like BMPs or things like PNG would suffice.

[0] https://github.com/nothings/stb

[1] https://github.com/richgel999/fpng/blob/main/src/fpng.cpp

[2] https://github.com/jibsen/tinf/tree/master/src


While PNG is definitely not as simple as TGA, I'd say it's "simple" in that it's spec is mostly unambiguous and implementing it is straight forward. For its relative simplicity it's very capable and works in a variety of situations.

One nice aspect of PNG is it gives a reader a bunch of data to validate the file before it even starts decoding image data. For instance a decoder can check for the magic bytes, the IHDR, and then the IEND chunk and reasonably guess the file is trying to be a PNG. The chunks also give you some metadata about the chunk to validate those before you even start decoding. There's a lot of chances to bail early on a corrupt file and avoid decode errors or exploits.

A format like TGA with a simplistic header and a blob of bytes is hard to try validating before you start decoding. A file extension or a MIME header don't tell you what the bytes actually are, only what some external system thinks they are.


> zlib is 23k lines.

The zlib format includes uncompressed* chunks, and CRC is only non-trivial if you're also trying to do it quickly, so a faux-zlib can be much, much smaller.

(I don't recall if I've done this with PNG specifically, but consider suitably crafted palettes for byte-per-pixel writing: quick-n-dirty image writers need not be much more complex than they would've been for netpbm)

* exercise: why is this true of any reasonable compression scheme?


I've done this. For a project where I didn't want any external dependencies, I wrote an uncompressed PNG writer for RGBA8 images in a single function. It's just over 90 lines of C++:

https://github.com/a-e-k/canvas_ity/blob/f32fbb37e2fe7c0fcae...


The "compressed" file may end up larger than the original?


why not? most formats have some headers and some kind of frames with data (additional headers)


> why is this true of any reasonable compression scheme?

Any? I wouldn't say that. If you took LZ4 and made it even simpler by removing uncompressed chunks, you would only have half a percent of overhead on random data. A thousandth of a percent if you tweaked how it represents large numbers.


TIL. IIUC, LZ4 doesn't care about the compression ratio (to which you are correct I had been alluding) but does strongly care about guaranteeing a block maximum size. (so still the same kind of concern, just on an absolute and not a relative basis)


Just simplify it further. Get rid of the implicit +4 to the match size. 0-15 instead of 4-19. Now you can guarantee any block size you want.

If you wanted to go even simpler, here's an entire compression format described in one line:

one byte literal length, one byte match length, two bytes match offset, 0-255 literal bytes, repeat


BMP is really great, the whole format is described on wikipedia with enough detail to code it yourself in literally 10 minutes, and the 'hardest' part of creating (or parsing) a bmp is counting the bytes to pad the data correctly, and remembering where [0,0] is :)

https://en.wikipedia.org/wiki/BMP_file_format#Example_1


But there are lots of BMP versions - wiki says "Many different versions of some of these structures can appear in the file, due to the long evolution of this file format."


Exactly. It is easy to write a BMP reader, but if you want to read any BMP file that you might find in the wild then you're going to have a fun time.


There's even some very niche extensions to BMP which allow it to be used as a container for PNG or JPEG data.

https://learn.microsoft.com/en-us/windows/win32/gdi/jpeg-and...


If you think PNG is complex have a gander at webp. That plane crash is a single frame of vp8 video. Outside of a Rube Goldberg web browser the format is useless.


I don't know about other platforms but .webp is very well supported on Linux. I've got .webp files showing up just fine from Emacs and picture viewers and ImageMagick's tools do support .webp just fine.

Lossless WEBP is smaller than optimized/crushed PNG files.

And I'd say that's quite a feat, which may explain the complexity of the format.

So WEBP may be complicated but if my OS supports it by default, where's the problem? It's not as if I needed to write another encoder/decoder myself.


If you want to handle the format by yourself from scratch it's super complex indeed, but OTOH everyone just uses libwebp which has a very simple API, especially compared to something like libpng. I have added WebP support via libwebp into Allegro 5 myself and didn't even have to stop to think, it was as straightforward as it gets - and implementing animated WebPs wasn't hard either.


WebP is useful for lossless image storage for games/game engines, it takes roughly 80% of the time to load/decode vs the same image stored as a png, and is usually significantly (multiple megabytes) smaller for large textures. That stuff doesn't matter too much in a web browser, but in a game where you have potentially hundreds of these images being loaded and unloaded dynamically and every millisecond counts, it's worthwhile.


Erm, aren't both WebP and PNG rather useless for games? How do you convert those formats on the fly into one of the hardware-compressed texture formats consumed by the GPU (like BCx, ETC or ASTC)? If you're decoding PNG or WebP to one of the linear texture formats, you're wasting a ton of GPU memory and texture sampling bandwidth.

(these are probably better alternatives: https://github.com/BinomialLLC/basis_universal, or http://www.radgametools.com/oodletexture.htm)


Hardware compressed texture formats introduce compression artifacts, which is fine for some art styles or PBR maps that don't need to be super accurate, but for some styles (such as pixel art or "clean" non-pixel styles, in both 2d and 3d) lossless compression is preferred, and yeah they're just decoded into bitmap data on the fly. Whether it wastes memory or not is subjective and dependent on use case. Yeah if you're pushing 4k PBR maps for terrain to the gpu using lossless formats for storage that's not smart, but you could argue that for many textures, using VRAM formats wastes disk/download space vs lossless (especially on mobile devices or webgl/wasm where space matters more). If disk space/download size isnt a concern then uncompressed vram formats can work for smaller textures. Though there is an initial decoding/upload cost to compressed lossless images, and they're not optimised well for streaming, at least with pixel art that's not a huge concern as textures tend to have small dimensions, though a spritesheet in a VRAM format can quickly baloon to ridiculous sizes for what is otherwise low resolution artwork. Of all the open formats that support lossless compression, are easy to link against, with wide platform support, webp is good, and smaller/faster than png for basically all images. Basis universal is a decent solution to the disk size problem of traditional VRAM formats, but it still isn't lossless (afaik?). Oodle is new to me, it looks good, appears to solve all of the above if the blurb is to be believed; it's a shame it's proprietary. I'd use it right away if it was FOSS.


IME most 2D games use uncompressed textures. Looking perfect matters less if you're going to stretch it across a 3D tri and do a bunch of fancy lighting.


That's a limited use case that I would consider embedded. The game player isn't interacting with those files directly.


So?


Even simpler is farbfeld which supports 16bit per channel + alpha. The header is nothing more than a magic string and image dimensions.


One of the annoyances of TGA format is that they have no signature at beginning of the file. The signature is at bottom. This allows you to craft a TGA file that could be misidentified.


Um... for the record: BMP and TGA may have compression. And, since it is rarely implemented, you may crash a lot of stuff with your RLE bitmap )


Agree.

My go-to graphics format in the days of MCGA was PCX. Very easy to decode even with a small assembler routine.


Having implemented most of the PNG specification from scratch in the past month, I agree with all of the features highlighted by the author in the article's introduction. Although there are some minor things I don't like, overall it is a very well-designed format that has minimal ambiguity and stands the test of time.

You can find my modern Java PNG library at: https://www.nayuki.io/page/png-library , https://github.com/nayuki/PNG-library

I also made a web-based tool to dissect PNG files and show all the fields and errors: https://www.nayuki.io/page/png-file-chunk-inspector

And my own "minimum-viable PNG writer" in ~140 lines of Java back in the year 2012: https://www.nayuki.io/page/dumb-png-output-java


> What makes this library modern? It [...] consumes more CPU and memory to simplify the logic and improve reliability.

Intriguingly unapologetic, but I think I'll stick to the PNG libraries that are mature enough to be both reliable and fast :-)


These might change your mind on reliability: https://github.com/glennrp/libpng/issues ; http://www.libpng.org/pub/png/libpng.html section "Security and Crash Bugs in Older Versions"; https://www.cvedetails.com/vulnerability-list/vendor_id-7294...

Regarding performance, I already lost the game before it started because I'm writing Java. If I wanted to squeeze CPU time, I would be writing C/C++/asm. So I decided to aim for conciseness and reliability instead of the endless stream of vulnerabilities.


Point made, but I was actually thinking of the default Java platform support for writing PNG files (javax.imageio + zlib, which has a decent track record).

https://www.cvedetails.com/product/111843/Zlib-Zlib.html?ven...


> a "PNG four byte integer", which is limited to the range 0 to 231-1, to defend against the existence of C programmers.

Kind of an odd thing to say, considering the existence and prevalence of libpng, which is written in C, and which uses setjmp() and longjmp() as part of its API. It's difficult to think of a more ill-advised and bonkers but extremely C-centric thing to do.


I guess that's one important reason why stb_image.h became so popular. Last time I tried to integrate libpng into a project I just gave up (that was on Windows, I guess on Linux it would just be an 'apt install libpng-dev').


Are there reasons to interpret `setjmp` and `longjmp` as anything other than a `C` (/hardware) representation of Effects? (In the sense of exceptions, coroutines/await, etc.)

If so, then why aren't they fundamentally quite reasonable?


Some platforms don't support full setjmp/longjmp feature set (WASM for instance). As far as I'm aware libpng also works without setjmp/longjmp support though via a build config option (it's still not fun to integrate into a project if you need to build it from source).


As of libpng 1.6.0, a so-called "simplified API" was added, which does not use setjmp/longjmp. A while back I had a C project using the old API, and I converted it to C++, and the interaction of setjmp/longjmp with exceptions was giving me headaches. I switched to the simplified API, and it was a breeze. So much less code, and no hacky C "exceptions". If you can require libpng 1.6 or newer, it's worth looking at the simplified API, if it supports your needs.

It's described here: http://www.libpng.org/pub/png/libpng-manual.txt


That first paragraph is a master class in making me feel old. 1996 was yesterday for some of us.


I recently heard someone refer to this period as "the late 1900s" and proceeded to shrivel into dust.


As a young programmer back then, the guys who could write image libraries were wizards to me.


It's the "similarily old" ZIP comparison for me


This article makes me feel old, having lived through a time where you avoided using PNGs on the web because IE6 didn't support the transparency


A workaround introduced me to the CSS `filter` property. (Some monster value starting with progid:DXImageTransform...).

Crazy considering how much I use the property now.


IE6 as well as some of the later versions also got the gamma wrong, so even if you used a non-transparent PNG the colors would be subtly different from the surrounding CSS colors.


> you avoided using PNGs on the web because IE6 didn't support the transparency

Nonsense. The only reason to avoid PNG was IE’s gamma issues: IE6 did not support progressive transparency, so at worst you had full-color gifs.


Absolutely not. Not supporting more than 1-bit alpha was a huge reason to avoid doing anything serious with PNGs if you cared about IE6; gamma was like a cherry on top.

You could actually make it work with non-standard DirectX filters, but it came with its own set of drawbacks and wasn't always a viable option.


> Absolutely not. Not supporting more than 1-bit alpha was a huge reason to avoid doing anything serious with PNGs if you cared about IE6

That makes absolutely no sense, because there was no other format which could do progressive transparency on IE, and short of animation palletised PNG is superior to GIF. And for non-photographic full-color, PNG is generally much better than JPEG.

So avoiding PNG just gave you larger files or worse results for no gain.


Your options weren't "either use PNG or some other format", but rather "either ignore IE and design things that can use 8-bit alpha thanks to PNG, or don't consider using possibilities PNG could give you when designing things at all".

So yes, you did use PNGs in some areas (illustrations etc.), but anything that would make you think "I'd have to use PNG to do that" meant "no-go because of IE". Which generally means avoiding PNGs.

When you add gamma issues on top of that, it was pretty rare to ever use PNGs for web designs if you cared about IE6.


I love this kind of introduction to "simple" formats! Thanks for sharing.

Always a good insight to know how the basic concepts of these work without needing hours of learning deep specific knowledge that you'd only spend if you had to work directly with the format, like if you're writing a png lib.


It's a tiny issue, but what I like least about the PNG format is the checksum at the end of each block.

From what I gather, the checksum is there for two reasons: 1) a check on archival integrity, based on experience with its usefulness in ZIP files, and 2) a way to check for download errors early, before reaching the end-of-file, which was more important in the slow and noisy modem era of the 1990s.

However, it's still possible for a change in chunk type to go undetected, for example, if "zTXt" were transformed to "xTXt" - a one bit change.

(I think it's also possible to construct a chunk such that if the length changed to just the right value then it could be interpreted as two chunks (with a smaller length) or be merged with later chunks. This requires getting the CRCs to align just right, and even harder to have just a single bit change.)

My belief is that removing the per-block CRC32 and putting the checksum in the IEND at the very end of the PNG data stream, and using a stronger checksum - even MD5 - would be more effective at archival integrity.

This of course can't happen now. Still, I regard it as a small bit of 1990s cruft.

When I developed my own format, for non-image data, I started with PNG as a guideline, then found that dealing with the checksum, even just to always generate a valid value, was a nuisance, with seemingly no good reason to justify its overhead.

I decided to drop the checksum, with a hand-waving argument that people should use other tools to detect and even repair file corruption, depending on their specific requirements.

This more generic and widespread family of formats is called "FourCC" for "four-character code" (https://en.wikipedia.org/wiki/FourCC).


The check value is doubly stupid because there are two of them: zlib compression adds the Adler32 of the uncompressed data, and then the IDAT adds a CRC32 of the compressed data.


Not all chunk types use compression. The CRC is a generic mechanism that is agnostic of the chunk type.


I think that per-chunk checksums can be helpful if you are manipulating PNG files. For example imaging that you are "cleaning" a PNG by removing metadata such as location information and timezone. The checksum can help make sure that you don't mess up your copying of unchanged chunks (or at least you notice your mistake faster). Even if your code is perfect it could help detect a bitflip during processing. Admittedly minor, but why not?

I think the biggest mistake is that the checksum doesn't cover the type and length. If it did then I most of your concerns would be resolved. Although it may also make sense to have a full-file checksum in the IEND, but the only thing that could really detect is if whole chunks were perfectly dropped somehow, so not much added value, but again 4 bytes seems worth it.


While I've never done the task you describe, it's similar across many FourCC formats so I think my experience is applicable.

I strongly suspect the input checksum won't be checked against the output data. Data ingestion might/should verify the chucksum, which is then thrown away.

This is especially true if working in a language with immutable strings, or using a functional-style immutable approach, where it's easier to know the payload doesn't change.

The checksum will be recomputed in egress.

As an alternative approach, the entire chunk might be stored in a single block, and either filtered or written as a single block, with no need to change anything, so no need to recompute the checksum.

In any case, if the developer thought this was appropriate, it's easy to add any sort of checksum or hash fingerprint as part of the chunk reader API, without it being present in the file.

> doesn't cover the type and length

While it could cover type, length is harder for some use cases. If you have a seekable output file, and don't have the ability to buffer all the data in memory, you might be able to process a segment at a time, write the crc, seek to the beginning of the chunk, then write the size.

Oh! I just realized that if the CRC were in the order typecode, data, length (which is different than the presentation order in the PNG data stream) then it would be possible to include the length in the CRC.

Though I don't think including the length would improve things as I think the failure modes are identical. Maybe?


If the length comes last, how would you know how much data to read? Remember that chunks like tEXt are variable-length and not self-terminating; it relies on the outer level to signal the end of data.


The length would still come first in the PNG datastream.

My idea is that those 4 bytes could have been added to the CRC after processing the chunk tag and data. Here's an example:

  class BlockWriter:
    def __init__(self, output, chunk_type):
        self.output = output
        self._start = output.tell()
        output.write(b"\0\0\0\0")
        output.write(chunk_type)
        self._crc = zlib.crc32(chunk_type)
        self._chunk_length = 0
    
    def write(self, data):
        self.output.write(data)
        self._crc = zlib.crc32(data, self._crc)
        self._chunk_length += len(data)
    
    def finish(self):
        chunk_length_bytes = self._chunk_length.to_bytes(4, "big")
        self._crc = zlib.crc32(chunk_length_bytes, self._crc) # !! NEW  !!
        crc_bytes = self._crc.to_bytes(4, "big")
        self.output.write(crc_bytes)
        # Go back and update the length
        end = self.output.tell()
        self.output.seek(self._start)
        self.output.write(chunk_length_bytes)
        self.output.seek(end)
It could be used like:

    writer = BlockWriter(png_f, b"IDAT")
    writer.write(b"shrdlu")
    writer.write(b"etaoin")
    writer.finish()
which produces the correct block structure:

    '\x00\x00\x00\x0cIDATshrdluetaoin\xed?\xa6\xa4'
     |--------------| four byte length = 12
                     |--| four byte character code 'IDAT'
                         |----------| = 12 bytes of payload
                                     |------------| = four byte CRC
I'll show the CRC is in the order {tag}, {data}, {length}:

   >>> zlib.crc32(b"IDAT" + b"shrdluetaoin" + b"\x00\x00\x00\x0c"
           ).to_bytes(4, "big")
   b'\xed?\xa6\xa4'


@parent and @grandparent: The chunk CRC-32 does cover the chunk type.

> A four-byte CRC (Cyclic Redundancy Code) calculated on the preceding bytes in the chunk, including the chunk type field and chunk data fields, but not including the length field. -- https://www.w3.org/TR/2003/REC-PNG-20031110/#5Chunk-layout


Thank you for the correction!

It has been many years since I looked into this matter, and I seem to have forgotten that detail.

Digging through my sent box, to png-mng-misc, I see that knew that back in 2012!

I also wrote that most of the tools I checked didn't verify the CRC:

> I tried a PNG with an IDAT chunk with an invalid CRC on various software on the my Mac. Unless I messed up my testing, the desktop, email preview, OmniGraffle, and Pixen.app all used the chunk with the invalid CRC.

My thread was "I would like some insight about PNG CRC and other experience" in case someone wants to dig it up.

I was able to make an PNG with a length field which, if changed, would produce another PNG showing a different result. At least one of the PNGs had an invalid checksum, but was still displayed.

EDIT: I found a public copy of the thread at https://png-mng-misc.narkive.com/YgUoUekk/i-would-like-some-... .


For what it's worth, here's an example of two PNGs which differ by 1 bit, in the length.

http://dalkescientific.com/xo.png is a PNG for an "O" with two IDAT blocks:

  'IHDR' 13  (this is the chunk data size, excluding the 4 bytes of crc)
  'xtra' 0
  'IDAT' 1012  <-- this one is displayed
  'IDAT' 1090  <-- this one is ignored
  'IEND' 0
http://dalkescientific.com/xo_bad.png is an invalid PNG for an "X" with one IDAT block and with an invalid checksum:

  'IHDR' 13
  'xtra' 1024  <-- the extra length contains what was the first IDAT
  'IDAT' 1090  <-- this one is displayed
  'IEND' 0
Both display in Preview.app, Firefox, and Safari.

They differ only in byte offset 35, where one has a chr(0) and the other a chr(4):

  % python -c 'print(set(enumerate(open("xo.png", "rb").read()))\
                .symmetric_difference(\
                     set(enumerate(open("xo_bad.png", "rb").read()))))'
  {(35, 0), (35, 4)}
With a bit more work I could probably construct PNGs which are both valid, and which differ only by a bit.

It would require brute-forcing some chunk data so the CRCs would get a match.

I have some ideas on how to make it so there's only one valid (and different) IDAT block as well, but that's more than I can do in an evening.


> which is limited to the range 0 to 2^31-1, to defend against the existence of C programmers.

What is this nonsense? You mean Java, right? C has always had unsigned types.


>"31-bit" is not a typo - PNG defines a "PNG four byte integer", which is limited to the range 0 to 231-1, to defend against the existence of C programmers.

What exactly is implied here? Silly guard against overflow?


It's "in order to accommodate languages that have difficulty with unsigned four-byte values" ie. Java.


That makes more sense


Mostly just a joke at C's expense. When reviewing C code, any time the sign-bit gets touched I consider the danger zone to have been entered (especially in the context of parsing data). Limiting the range of ints is a good defensive programming tactic.

As another commenter points out, the real reason is about practicality - some languages like Java don't natively support unsigned ints.


Maybe I'm old, but i dont get it. I'd say signed is more dangerous as the overflow is undefined. And if you read to unsigned, you are just wasting range.


Yes, that's the point. Signed ints are a potential footgun, which can be partially mitigated by limiting the range to 0 - 2^31-1.

Good specs pre-emptively mitigate implementation bugs.


Yeah I found that part a bit opaque as well. What about alignment? Does it just use 4 byte integers but one bit is unused? Isn't that a much more clear way of putting it?


I found the info-graphics in this repository to be of great use when I was trying to digest different image formats: https://github.com/corkami/pics/blob/master/binary/PNG.png


Due to the PNG signature ("magic bytes"), every PNG file starts with the sequence "89 50 4E 47 0D 0A 1A 0A".

In ASCII: "[\x89]PNG\r\n[SUB]\n"

Is there any information on the origin of these bytes? Why were they chosen like that?


So you'd get "PNG" printed out when you accidentally "type"ed that file in MS-DOS. The 1A was the EOF character for text data back then.


Ah, that makes a lot of sense!

So what's 0x89 at the start? It's outside the ASCII range.


So when a 7-bit transfer masks out all highest bits, it becomes a invalid signature instead of the pixel data getting silently corrupted.


They are explained in detail on the Wikipedia page: https://en.wikipedia.org/wiki/Portable_Network_Graphics#File...


Last time i was on the scene.org ftp (-style) download webpage, there was a .png from a party, -years ago my internetconection was damn slow that time, but i've seen a picture of 'hal' (computer in a movie) beginning with the 'rasterpoint' (image build-up-start-point) on 'down right' running 'to the left' and up, line by line...

maybe i calculated his bits wrong but over a hundred thousand bits for a picture in 300 x XXX ?

here's a (german) comic in 12,5 kb but i think they don't like hotlnking ^^

> //i.ibb.co/TPgSkF6/10126-DER-WARME-PULLI-EIN-AKT-FINAL-Mail.png

regards...


Just in case others find it useful, I recently came across Netpbm[0] which is comically simple to implement![1]

It’s a pretty good image format if one wants to quickly make something visual with code.

[0] https://en.m.wikipedia.org/wiki/Netpbm

[1] https://git.sr.ht/~benjcal/bc_libs/tree/main/item/bc_buffer....


No alpha channel unfortunately.


yeah :(


Netpbm (the spec) supports alpha. It's in the PAM format: https://netpbm.sourceforge.net/doc/pam.html

I've never tried that variant, nor do I think I've seen files in the wild, so IDK how widely stuff supports it. If I have need for an alpha channel, I usually reach for a PNG encoder at that point…


Do browsers support it?


Not sure, tested it with firefox and it offered to download it. I don't think is a good format for the web though because it takes too much space.

I use it to make images with C locally and monitor them with a quick image viewer[0] I wrote

[0] https://git.sr.ht/~benjcal/bc_tools/tree/main/item/bc_viewer...


I've written a CC0-licensed (basically public domain) implementation of PNG/ZLIB:

http://public-domain.advel.cz/

It also contains a simple version without actual compression which is actually a good alternative to BMP files as I was quite confused about the specifications for BMP so I rather wrote a PNG implementation.


What's more PNG is flexible enough that Macromedia Fireworks used it as native file format for its documents (like more reasonable psd equivalent).


You know you’re old when you still reflexively flinch just a bit when you see PNGs mentioned, as you have battle scars from working with PNG-24 files in IE6.

https://24ways.org/2007/supersleight-transparent-png-in-ie6


… one of the many reasons that led to Firefox's rise.


Does anyone know why filtering is mandatory?

It seems like operations with the format would be a bit faster if the pre-compression data would just be a framebuffer dump, instead of prefixing each row with a 1-byte "filter ID", possibly breaking data alignment.


I suspect even if you weren't using heuristics you would be best to pick a different static filter than "None". For example I would expect that "Up" does much better on average as you are essentially bringing in context that the compressor would struggle to line up. I think most encoders would probably only use "None" in cases where heuristics show that nothing else helps.

So it is probably mandatory because only a tiny minority of images wouldn't use a filter. So it is better to just require it to avoid one more condition in the decoder.


Adding 1 Byte to each line of raw pixel data is not that much. You can set it to zeros to avoid filtering. But it gives you a chance to improve the compression.


The assumption was that it improves compression of gradients and photographic data. In order for compressors to be able to use it, it has to be mandatory for decompressors. Anything that is optional becomes unusable (e.g. arithmetic coding in JPEG).

Filtering as a separate preprocessing step allowed PNG to use off-the-shelf zlib for compression without needing to modify it.


Can’t answer as to why it is mandatory but it is known to improve compression.


Are there formats that try to do more of a 2D compression? With png horizontal lines compress way better than vertical lines. Something like what JBIG2 does but without the focus on letters


JPEG-XL's lossless mode has something called the "squeeze transform" which operates over the image data in 2D. It's a variation of the Haar Transform, which works on a similar principle as DCT

https://en.wikipedia.org/wiki/Haar_wavelet#Haar_transform

It also has "meta-adaptive filters", which act similarly to PNG's filters except you get to encode a custom decision-tree that defines how pixels are filtered, as a function of their neighboring pixels


Most image compression formats work in pixel blocks instead of scanlines. For example JPEG uses a 8x8 block size.

This way rotating the image doesn't affect compression performance (unlike PNG).


I like how that blog / website looks with that ASCII header! Often I wish I would come up with such a design. And yes, I also like PNG a lot :)


I've been using this generator for years for ASCII headers: http://patorjk.com/software/taag/#p=display&f=ANSI%20Shadow&...


Oh nice! That seems to be using the tool 'figlet' (or maybe toilet, from libcaca), and the font is banner3-D, that's available here: https://github.com/xero/figlet-fonts

If you install the figlet (or toilet) tool and clone that font repo you can do a :

   figlet -d ./figlet-fonts -f Banner3-D My text
  '##::::'##:'##:::'##::::'########:'########:'##::::'##:'########:
   ###::'###:. ##:'##:::::... ##..:: ##.....::. ##::'##::... ##..::
   ####'####::. ####::::::::: ##:::: ##::::::::. ##'##:::::: ##::::
   ## ### ##:::. ##:::::::::: ##:::: ######:::::. ###::::::: ##::::
   ##. #: ##:::: ##:::::::::: ##:::: ##...:::::: ## ##:::::: ##::::
   ##:.:: ##:::: ##:::::::::: ##:::: ##:::::::: ##:. ##::::: ##::::
   ##:::: ##:::: ##:::::::::: ##:::: ########: ##:::. ##:::: ##::::
  ..:::::..:::::..:::::::::::..:::::........::..:::::..:::::..:::::
Toilet also has colour effects and can output in different formats:

   toilet -E list
  Available export formats:
  "caca": native libcaca format
  "ansi": ANSI
  "utf8": UTF-8 with ANSI escape codes
  "utf8cr": UTF-8 with ANSI escape codes and MS-DOS \r
  "html": HTML
  "html3": backwards-compatible HTML
  "bbfr": BBCode (French)
  "irc": IRC with mIRC colours
  "ps": PostScript document
  "svg": SVG vector image
  "tga": TGA image
  "troff": troff source
Cool!

Edit: added image formats


Take a look at the CSS scaling on the site too. It scales the <pre> block responsively so it works on mobile browsers. The powers of figlet/toilet and CSS combine for a cool look.


> PNG is my favourite file format of all time. Version 1.0 of the specification was released in 1996 (before I was born!)

god i feel ancient.


Doesn’t PNG also support infinite non-image data? I thought things like the PICO-8 would share programs as .png images. Or is this a hack?


You can add your own chunks, and you can mark them as optional.

It wasn't specified in the article but IIRC the casing of each letter in the chunk type has special meaning. So AAAA and aaaa have implicit meanings such as "required", "keep when processing" and I guess two other flags.


PICO-8 uses steganography, encoding the data into the 2 least significant bits of the ARGB channels, so 1 byte per pixel.


It’s just stenography – each pixel stores a byte in the least significant 2 bits of ARGB.

https://pico-8.fandom.com/wiki/P8PNGFileFormat


Remember when IE didn't support it? And the goofy hacks Microsoft made people do to support transparency on websites?


Oh, IE "supported" PNGs almost maliciously. If you had a True Color PNG? No problem. Paletted 8-bit? No problem. 8-but with alpha? Fine. True Color with alpha? I hope you like seeing your background color you meant to alpha out.

You had to use Microsoft's DirectX filtering CSS extensions to properly handle the alpha channel of True Color PNGs.


> 8-but with alpha? Fine.

Not exactly, palletised png supports a full alpha channel which did not work with IE.

Though you had to work to get that as usually software would limit palletised output to GIF (if you didn’t outright have to create your pngs from gifs).


Oh god. filter:progid:DXImageTransform.Microsoft.AlphaImageLoader(...)


I love this comment: PNG defines a "PNG four byte integer", which is limited to the range 0 to 2^31-1, to defend against the existence of C programmers.


PNG IS great but it’s dog slow.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: