Honestly, I don't consider PNG a simple format. The CRC and the compression are non-trivial. If you're using a new language that doesn't have those features built in and/or you don't have a reasonable amount of programming experience then you're going to likely fail (or learn a ton). zlib is 23k lines. "simple" is not word I'd use to describe PNG
Simple formats are like certain forms of .TGA and .BMP. A simple header and then the pixel data. No CRCs, no compression. Done. You can write an entire reader in 20-30 lines of code and a writer in other 20-30 lines of code as well. Both of those formats have options that can probably make them more work but if you're storing 24bit "True color" or 32 bit "true color + alpha" then they are way easier formats.
Of course they're not common formats so you're stuck with complex formats like PNG
For audio, my all-time favorite format to work with is raw PCM.
One time, I had to split a bunch of WAV files at precise intervals. I first tried ffmpeg, but its seeking algorithm was nowhere near accurate enough. I finally wrote a bash script that did the splitting much more accurately. All I had to do to find the byte offset from a timestamp in an raw PCM audio file is multiply the timestamp (in seconds) by the sample rate (in Hz) by the bit depth (in bytes) by the number of channels. The offset was then rounded up to the nearest multiple of the bit depth (in bytes) times the number of channels (this avoids inversions of the stereo channels at cut points).
Once I had the byte offset, I could use the head and tail commands to manipulate the audio streams to get perfectly cut audio files. I had to admire the simplicity of dealing with raw data.
Smart file systems should offer a way to access raw datastreams and elements within more complex filetypes. e.g. one could call fopen("./my_sound.wav/pcm_data") and not have to bother with the header. This would blur the distinction between file and directory, requiring new semantics.
PNG is not a format for uncompressed or RLE "hello worlds". It's a format designed for the Web, so it has to have a decent compression level. Off-the-shelf DEFLATE implementations were easily available since its inception.
I think it is pretty pragmatic and relatively simple, even though in hindsight some features were unnecessary. The CRC was originally a big feature, because back then filesystems didn't have checksums, people used unreliable disks, and FTPs with automatic DOS/Unix/Mac line ending conversions were mangling files.
PNG could be simpler now if it didn't support 1/2/4-bit depths, keyed 1-bit alpha for opaque modes, or interlacing. But these features were needed to compete with GIF on low-memory machines and slow modems.
Today, latest image formats also do this competition of ticking every checkbox to even worse degree by adding animation that is worse than any video format in the last 20 years, support all the obsolete analog video color spaces, redundant ICC color profiles alongside better built-in color spaces, etc. By modern standards PNG is super simple.
As for color spaces that is a case where things get worse before they get better. In the 1990s I remember the horror of making images for the web with Photoshop because inevitably Photoshop would try some kind of color correction that would have been appropriate for print output but it ensured that the colors were wrong every time on the screen.
Today I am seeing my high color gamut screen as a problem rather than a solution because I like making red-cyan anaglyph images and found out that Windows makes (16,176,16) when I asked for (0,180,0) because it wants to save my eyes from the laser pointer green of the monitor by desaturating it to something that looks like sRGB green to my eyes, but looking through 3d glasses it means the right channel blends into the left channel. To get the level of control I need for this application it turns out I need to make both sRGB and high gamut images and display the right one... Which is a product of the complexity of display technology and how it gets exposed to developers.
> There was talk about upgrading PNG to support the equivalent of animated GIFs but it never really happened because of complexity
This was mostly due to overengineering on the part of the PNG committee. Why stop at animated PNGs, when we could support sound and interactivity! MNG is not a simple format, and the spec has MNG-LC ("low complexity") and MNG-VLC ("very low complexity") subsets, because the whole thing is too complex. Did you know you can embed JPEGs in MNGs? That it has synchronization points for sound, even though sound is still "coming at a later date"? That it allows pasting other images into the movie at arbitrary 2D transforms?
MNG's complexity is self-inflicted, because they second-system effect'd their way into features nobody wanted.
APNG, by contrast, is a series of PNG chunks with a couple extra fields on top for timing and control information.
> Today, latest image formats also do this competition of ticking every checkbox to even worse degree by adding animation that is worse than any video format in the last 20 years,
yet just seeking in any random vpX / h26x / ... format is A PITA compared to trusty old gifs. it's simple, if you cannot display any random frame N in any random order in constant (and very close to zero) time it's not a good animation format
You can't do that for GIF. Each frame can be composited on top of the last frame (ie. no disposal; this allows storing only the part that changed), so to seek to a random frame you may need to replay the whole GIF from the start.
The reason you can seek to any frame is GIFs tend to be small, so your browser caches all the frames in memory.
Simple formats are PPM / Netpbm; they’re ASCII text with an identifier line (“P1” for mono, “P2” for grayscale or “P3” for colour), a width and height in pixels (e.g. 320 200), then a stream of numbers for pixel values. Line breaks optional. Almost any language that can count and print can make them, your can write them from APL if you want
As ASCII they can pass through email and UUNET and clipboards without BASE64 or equivalent. With flexible line breaks they can even be laid out so the monochrome ones look like the image they describe in a text editor.
The Netbpm format is amazing if you quickly want to try something out and need to generate an image of some sorts. The P6 binary format is even simpler, you write the header followed by a raw pixel data blob, e.g.:
I use this all the time. I love that it's simple enough that I can type something like those two lines off the top of my head at this point. And as an alternative to that fwrite(), another common pattern that I use is:
for (int y = 0; y < HEIGHT; ++y)
for (int x = 0; x < WIDTH; ++x)
{
// ... compute r, g, and b one pixel at a time
printf("%c%c%c", r, g, b);
}
I also find ImageMagick very convenient for working with the format when my program writes a PPM to stdout:
Yea I know, that's not a complete example, endian issues, error checking.
Reading a PPM file is only simple if you already have something to read buffered strings and parse numbers etc... And it's slow and large, especially for todays files.
It would be nice if the CRCs and compression were optional features, but perversely that would increase the overall complexity of the format. Having compression makes it more useful on the web, which is why we're still using it today (most browsers do support BMP, but nobody uses it)
The fun thing about DEFLATE is that compression is actually optional, since it supports a non-compressed block type, and you can generate a valid stream as a one-liner* (with maybe a couple of extra lines to implement the adler32 checksum which is part of zlib)
The CRCs are entirely dead weight today, but in general I'd say PNG was right in the sweet-spot of simplicity versus practical utility (and yes, you could do better with a clean-sheet design today, but convincing other people to use it would be a challenge).
Edit 2: Actual zlib deflate oneliner, just for fun:
deflate=lambda d:b"\x78\x01"+b"".join(bytes([(i+0x8000)>=len(d)])+len(d[i:i+0x8000]).to_bytes(2,"little")+(len(d[i:i+0x8000])^0xffff).to_bytes(2,"little")+d[i:i+0x8000]for i in range(0,len(d),0x8000))+(((sum(d)+1)%65521)|(((len(d)+sum((len(d)-i)*c for i,c in enumerate(d)))%65521)<<16)).to_bytes(4,"big")
The usual answer is that "checksumming should be part of the FS layer".
My usual retort to such an assertion is that filesystem checksums won't save you when the data given to the FS layer is already corrupted, due to bit flips in the writer process's memory. I personally have encountered data loss due to faulty RAM (admittedly non-ECC, thanks to Intel) when copying large amounts of data from one machine to another. You need end-to-end integrity checks. Period.
I agree with the "usual" answer, or more generally, "the layer above". We shouldn't expect every file format to roll its own error detection.
If you truly care about detecting bit-flips in a writer process's memory, that's a very niche use-case - and maybe you should wrap your files in PAR2 (or even just a .zip in store mode!).
99% of in-the-wild PNGs are checksummed or cryptographically signed at a layer above the file format (e.g. as part of a signed software package, or served over SSL).
Edit: Furthermore, the PNG image data is already checksummed as part of zlib (with the slightly weaker adler32 checksum), so the second layer of checksumming is mostly redundant.
> We shouldn't expect every file format to roll its own error detection.
On the other hand, why not? If you are dealing with files that are usually 200kB+, putting 4 or 16 bytes towards a checksum is not a big deal and can help in some unusual situations. Even if the decoder ignores it for speed, the cost is very low.
The space cost is negligible, but the time cost for the encoder is real. Since most decoders do verify checksums, you can't just skip it. Take fpng[1] as an example, which tries to push the boundaries of PNG encode speed.
> The above benchmarks were made before SSE adler32/crc32 functions were added to the encoder. With 24bpp images and MSVC2022 the encoder is now around 15% faster.
I can't see the total percentage cost of checksums mentioned anywhere on the page, but we can infer that it's at least 15% of the overall CPU time, on platforms without accelerated checksum implementations.
I didn't infer 15% from the way it was written there.
But most platforms these days have some form of CRC32 "acceleration". Adler32 is easy to compute so I'm even less concerned there.
Does 15% more time to encode matter? How much time is spent encoding files vs decoding? That is probably still an negligible amount of compute, out of the total compute spent on PNGs.
Your specific number seem to come from an (old version of) an encoder that has super-optimized encode and not (yet) optimized CRC.
CRC can't save you from faulty RAM. It can save you from bitrot in data at rest and from transmission errors. If you have faulty RAM, all bets are off. The data could be corrupted after it's been processed by the CPU (to compute the CRC) and before it's been sent to the storage device.
Arguably, the real reason CRC is useless is that most people don't care about the data integrity of their PNGs. Those who do care probably already have a better system of error detection, or maybe even correction.
deflate=lambda d:b"\x78\x01"+b"".join(bytes([(i+0x8000)>=len(d)])+len(d[i:i+0x8000]).to_bytes(2,"little")+(len(d[i:i+0x8000])^0xffff).to_bytes(2,"little")+d[i:i+0x8000]for i in range(0,len(d),0x8000))+(((sum(d)+1)%65521)|(((sum((len(d)-i)*c+1 for i,c in enumerate(d)))%65521)<<16)).to_bytes(4,"big")
Agree. Programming video games in the early 2000s, TGA was my goto format. Dead simple to parse and upload to OpenGL, support for transparency, true colors, all boxes ticked.
I once wrote a PCX decoder in Pascal outputting VGA w/mode 13. The cool part for me was it had run length encoding, which I was able to figure out trivially just reading the spec. May not have been the most efficient, but way easier than trying to figure out GIF!
I really like QOI (The Quite OK Image format). It achieves similar compression to PNG, but it's ridiculously easy to implement (the entire spec fits on a single page), and its encoding and decoding times are many times faster than PNG.
I'm also a big fan of QOI as a simple imagine format.
Yes, it's not as good as PNG (as the sibling comments point out), but I view it more as an alternative to PPM (and maybe a BMP subset), as something that I can semi-quickly write an encoder/decoder if needed.
IMO, PNG is in a completely different level. Case in point, in the linked article the author mentions to not worry about the CRC implementation and "just use a lib"... If that's the case, why not just use a PNG lib?
It depends mostly on the year of birth of the beholder.
I imagine in a couple of decades that "built-in features" of a programming environment will include Bayesian inference, GPT-like frameworks and graph databases, just as now Python, Ruby, Go, etc. include zlib by default, and Python even includes SQLite by default.
Some languages will. However there will also be a constant resurgence brand new of "simple" languages without all of that cruft that "you don't need" (read whoever came up with the language doesn't need).
Another relatively simple format, that is apparently additionally superior to PNG in terms of compression and speed, is the Quite OK Image format (QOI):
It's dead simple to emit. The P6 binary version is just a short header, followed by RGB pixel data, one byte per channel.
If you don't have a PNG encoder handy and need a quick "I just need to dump this image to disk to view it" for debugging, PPM is a great format due to how trivial it is. But it doesn't fit a lot of use cases (e.g., files are huge, because no compression).
TIFF, on the other hand is a "highest common denominator, lowest common denominator, what the hell, let's just throw every denominator -including uncommon ones- in there" format.
For example, you can have images with four (or more) color channels, of different bit lengths, and different gammas and image characteristics (I actually saw these, in early medical imaging). You can have multiple compression schemes, tile-based, or strip-based layout, etc. A lot of what informed early TIFF, was drum scanners and frame captures.
Writing TIFF: Easy.
Reading TIFF: Not so easy. We would usually "cop out," and restrict to just the image formats our stuff wrote.
I would say Netpbm is similar. Writing it is easy. … reading it … not so much.
PPM is just one format; Netpbm is like a whole family. The "P6" is sort of the identifier that we're using that format — the other identifiers can identify other formats, like greyscale, or monochrome, or the pixel data is encoded in ASCII. The header is in text and permits more flexibility than it probably should. Channels greater than a byte are supported.
Writing a parser for the whole lot would be more complex. (I think TIFF would still beat it, though.) Just dumping RGB? Easy.
Not sure how commonly known it is, but TIFF's extended cousin, GeoTIFF, is a standard for GIS data because of the flexibility you describe, especially the (almost) limitless number of channels and the different data format in channels.
At that point you're not dealing with 'images', but instead raster datasets: gridded data. So, you can combine byte t/f results with int16 classification codes, with float32 elevation data, with 4 channels of RGB+Near Infrared imagery data in uint32, plus some arbitrary number of gridded satellite data sources.
That can all be given lossless compression and assigned geotagging headers, and the format itself is (afaik) essentially open.
I don't know, because zlib makes concessions for every imaginable platform, has special optimizations for them, plus is in C which isn't particularly logic-dense.
> The CRC and the compression are non-trivial.
CRC is a table and 5 lines of code. That's trivial.
>zlib is 23k lines
It's not needed to make a PNG reader/writer. zlib is massive overkill for only making a PNG reader or writer. Here's a tiny deflate/inflate code [2] under 1k lines (and could be much smaller if needed).
stb[0] has single headers of ~7k lines total including all of the formats PNG, JPG, BMP,. PSD, GIF, HDR, and PIC. Here's [1] a 3k lines single file PNG version with tons if #ifdefs for all sorts of platforms. Removing those and I'd not be surprised if you could not do it in ~1k lines (which I'd consider quite simple compared to most of todays' media formats).
>Of course they're not common formats so you're stuck with complex formats like PNG
BMP is super common and easy to use anywhere.
I use flat image files all the time for quick and dirty stuff. They quickly saturate disk speeds and networking speeds (say recording a few decent speed cameras), and I've found PNG compression to alleviate those saturate CPU speeds (some libs are super slow, some are vastly faster). I've many times made custom compression formats to balance these for high performance tools when neither things like BMPs or things like PNG would suffice.
While PNG is definitely not as simple as TGA, I'd say it's "simple" in that it's spec is mostly unambiguous and implementing it is straight forward. For its relative simplicity it's very capable and works in a variety of situations.
One nice aspect of PNG is it gives a reader a bunch of data to validate the file before it even starts decoding image data. For instance a decoder can check for the magic bytes, the IHDR, and then the IEND chunk and reasonably guess the file is trying to be a PNG. The chunks also give you some metadata about the chunk to validate those before you even start decoding. There's a lot of chances to bail early on a corrupt file and avoid decode errors or exploits.
A format like TGA with a simplistic header and a blob of bytes is hard to try validating before you start decoding. A file extension or a MIME header don't tell you what the bytes actually are, only what some external system thinks they are.
The zlib format includes uncompressed* chunks, and CRC is only non-trivial if you're also trying to do it quickly, so a faux-zlib can be much, much smaller.
(I don't recall if I've done this with PNG specifically, but consider suitably crafted palettes for byte-per-pixel writing: quick-n-dirty image writers need not be much more complex than they would've been for netpbm)
* exercise: why is this true of any reasonable compression scheme?
I've done this. For a project where I didn't want any external dependencies, I wrote an uncompressed PNG writer for RGBA8 images in a single function. It's just over 90 lines of C++:
> why is this true of any reasonable compression scheme?
Any? I wouldn't say that. If you took LZ4 and made it even simpler by removing uncompressed chunks, you would only have half a percent of overhead on random data. A thousandth of a percent if you tweaked how it represents large numbers.
TIL. IIUC, LZ4 doesn't care about the compression ratio (to which you are correct I had been alluding) but does strongly care about guaranteeing a block maximum size. (so still the same kind of concern, just on an absolute and not a relative basis)
BMP is really great, the whole format is described on wikipedia with enough detail to code it yourself in literally 10 minutes, and the 'hardest' part of creating (or parsing) a bmp is counting the bytes to pad the data correctly, and remembering where [0,0] is :)
But there are lots of BMP versions - wiki says "Many different versions of some of these structures can appear in the file, due to the long evolution of this file format."
If you think PNG is complex have a gander at webp. That plane crash is a single frame of vp8 video. Outside of a Rube Goldberg web browser the format is useless.
I don't know about other platforms but .webp is very well supported on Linux. I've got .webp files showing up just fine from Emacs and picture viewers and ImageMagick's tools do support .webp just fine.
Lossless WEBP is smaller than optimized/crushed PNG files.
And I'd say that's quite a feat, which may explain the complexity of the format.
So WEBP may be complicated but if my OS supports it by default, where's the problem? It's not as if I needed to write another encoder/decoder myself.
If you want to handle the format by yourself from scratch it's super complex indeed, but OTOH everyone just uses libwebp which has a very simple API, especially compared to something like libpng. I have added WebP support via libwebp into Allegro 5 myself and didn't even have to stop to think, it was as straightforward as it gets - and implementing animated WebPs wasn't hard either.
WebP is useful for lossless image storage for games/game engines, it takes roughly 80% of the time to load/decode vs the same image stored as a png, and is usually significantly (multiple megabytes) smaller for large textures. That stuff doesn't matter too much in a web browser, but in a game where you have potentially hundreds of these images being loaded and unloaded dynamically and every millisecond counts, it's worthwhile.
Erm, aren't both WebP and PNG rather useless for games? How do you convert those formats on the fly into one of the hardware-compressed texture formats consumed by the GPU (like BCx, ETC or ASTC)? If you're decoding PNG or WebP to one of the linear texture formats, you're wasting a ton of GPU memory and texture sampling bandwidth.
Hardware compressed texture formats introduce compression artifacts, which is fine for some art styles or PBR maps that don't need to be super accurate, but for some styles (such as pixel art or "clean" non-pixel styles, in both 2d and 3d) lossless compression is preferred, and yeah they're just decoded into bitmap data on the fly. Whether it wastes memory or not is subjective and dependent on use case. Yeah if you're pushing 4k PBR maps for terrain to the gpu using lossless formats for storage that's not smart, but you could argue that for many textures, using VRAM formats wastes disk/download space vs lossless (especially on mobile devices or webgl/wasm where space matters more). If disk space/download size isnt a concern then uncompressed vram formats can work for smaller textures. Though there is an initial decoding/upload cost to compressed lossless images, and they're not optimised well for streaming, at least with pixel art that's not a huge concern as textures tend to have small dimensions, though a spritesheet in a VRAM format can quickly baloon to ridiculous sizes for what is otherwise low resolution artwork. Of all the open formats that support lossless compression, are easy to link against, with wide platform support, webp is good, and smaller/faster than png for basically all images. Basis universal is a decent solution to the disk size problem of traditional VRAM formats, but it still isn't lossless (afaik?). Oodle is new to me, it looks good, appears to solve all of the above if the blurb is to be believed; it's a shame it's proprietary. I'd use it right away if it was FOSS.
IME most 2D games use uncompressed textures. Looking perfect matters less if you're going to stretch it across a 3D tri and do a bunch of fancy lighting.
One of the annoyances of TGA format is that they have no signature at beginning of the file. The signature is at bottom. This allows you to craft a TGA file that could be misidentified.
Having implemented most of the PNG specification from scratch in the past month, I agree with all of the features highlighted by the author in the article's introduction. Although there are some minor things I don't like, overall it is a very well-designed format that has minimal ambiguity and stands the test of time.
Regarding performance, I already lost the game before it started because I'm writing Java. If I wanted to squeeze CPU time, I would be writing C/C++/asm. So I decided to aim for conciseness and reliability instead of the endless stream of vulnerabilities.
Point made, but I was actually thinking of the default Java platform support for writing PNG files (javax.imageio + zlib, which has a decent track record).
> a "PNG four byte integer", which is limited to the range 0 to 231-1, to defend against the existence of C programmers.
Kind of an odd thing to say, considering the existence and prevalence of libpng, which is written in C, and which uses setjmp() and longjmp() as part of its API. It's difficult to think of a more ill-advised and bonkers but extremely C-centric thing to do.
I guess that's one important reason why stb_image.h became so popular. Last time I tried to integrate libpng into a project I just gave up (that was on Windows, I guess on Linux it would just be an 'apt install libpng-dev').
Are there reasons to interpret `setjmp` and `longjmp` as anything other than a `C` (/hardware) representation of Effects? (In the sense of exceptions, coroutines/await, etc.)
If so, then why aren't they fundamentally quite reasonable?
Some platforms don't support full setjmp/longjmp feature set (WASM for instance). As far as I'm aware libpng also works without setjmp/longjmp support though via a build config option (it's still not fun to integrate into a project if you need to build it from source).
As of libpng 1.6.0, a so-called "simplified API" was added, which does not use setjmp/longjmp. A while back I had a C project using the old API, and I converted it to C++, and the interaction of setjmp/longjmp with exceptions was giving me headaches. I switched to the simplified API, and it was a breeze. So much less code, and no hacky C "exceptions". If you can require libpng 1.6 or newer, it's worth looking at the simplified API, if it supports your needs.
IE6 as well as some of the later versions also got the gamma wrong, so even if you used a non-transparent PNG the colors would be subtly different from the surrounding CSS colors.
Absolutely not. Not supporting more than 1-bit alpha was a huge reason to avoid doing anything serious with PNGs if you cared about IE6; gamma was like a cherry on top.
You could actually make it work with non-standard DirectX filters, but it came with its own set of drawbacks and wasn't always a viable option.
> Absolutely not. Not supporting more than 1-bit alpha was a huge reason to avoid doing anything serious with PNGs if you cared about IE6
That makes absolutely no sense, because there was no other format which could do progressive transparency on IE, and short of animation palletised PNG is superior to GIF. And for non-photographic full-color, PNG is generally much better than JPEG.
So avoiding PNG just gave you larger files or worse results for no gain.
Your options weren't "either use PNG or some other format", but rather "either ignore IE and design things that can use 8-bit alpha thanks to PNG, or don't consider using possibilities PNG could give you when designing things at all".
So yes, you did use PNGs in some areas (illustrations etc.), but anything that would make you think "I'd have to use PNG to do that" meant "no-go because of IE". Which generally means avoiding PNGs.
When you add gamma issues on top of that, it was pretty rare to ever use PNGs for web designs if you cared about IE6.
I love this kind of introduction to "simple" formats! Thanks for sharing.
Always a good insight to know how the basic concepts of these work without needing hours of learning deep specific knowledge that you'd only spend if you had to work directly with the format, like if you're writing a png lib.
It's a tiny issue, but what I like least about the PNG format is the checksum at the end of each block.
From what I gather, the checksum is there for two reasons: 1) a check on archival integrity, based on experience with its usefulness in ZIP files, and 2) a way to check for download errors early, before reaching the end-of-file, which was more important in the slow and noisy modem era of the 1990s.
However, it's still possible for a change in chunk type to go undetected, for example, if "zTXt" were transformed to "xTXt" - a one bit change.
(I think it's also possible to construct a chunk such that if the length changed to just the right value then it could be interpreted as two chunks (with a smaller length) or be merged with later chunks. This requires getting the CRCs to align just right, and even harder to have just a single bit change.)
My belief is that removing the per-block CRC32 and putting the checksum in the IEND at the very end of the PNG data stream, and using a stronger checksum - even MD5 - would be more effective at archival integrity.
This of course can't happen now. Still, I regard it as a small bit of 1990s cruft.
When I developed my own format, for non-image data, I started with PNG as a guideline, then found that dealing with the checksum, even just to always generate a valid value, was a nuisance, with seemingly no good reason to justify its overhead.
I decided to drop the checksum, with a hand-waving argument that people should use other tools to detect and even repair file corruption, depending on their specific requirements.
The check value is doubly stupid because there are two of them: zlib compression adds the Adler32 of the uncompressed data, and then the IDAT adds a CRC32 of the compressed data.
I think that per-chunk checksums can be helpful if you are manipulating PNG files. For example imaging that you are "cleaning" a PNG by removing metadata such as location information and timezone. The checksum can help make sure that you don't mess up your copying of unchanged chunks (or at least you notice your mistake faster). Even if your code is perfect it could help detect a bitflip during processing. Admittedly minor, but why not?
I think the biggest mistake is that the checksum doesn't cover the type and length. If it did then I most of your concerns would be resolved. Although it may also make sense to have a full-file checksum in the IEND, but the only thing that could really detect is if whole chunks were perfectly dropped somehow, so not much added value, but again 4 bytes seems worth it.
While I've never done the task you describe, it's similar across many FourCC formats so I think my experience is applicable.
I strongly suspect the input checksum won't be checked against the output data. Data ingestion might/should verify the chucksum, which is then thrown away.
This is especially true if working in a language with immutable strings, or using a functional-style immutable approach, where it's easier to know the payload doesn't change.
The checksum will be recomputed in egress.
As an alternative approach, the entire chunk might be stored in a single block, and either filtered or written as a single block, with no need to change anything, so no need to recompute the checksum.
In any case, if the developer thought this was appropriate, it's easy to add any sort of checksum or hash fingerprint as part of the chunk reader API, without it being present in the file.
> doesn't cover the type and length
While it could cover type, length is harder for some use cases. If you have a seekable output file, and don't have the ability to buffer all the data in memory, you might be able to process a segment at a time, write the crc, seek to the beginning of the chunk, then write the size.
Oh! I just realized that if the CRC were in the order typecode, data, length (which is different than the presentation order in the PNG data stream) then it would be possible to include the length in the CRC.
Though I don't think including the length would improve things as I think the failure modes are identical. Maybe?
If the length comes last, how would you know how much data to read? Remember that chunks like tEXt are variable-length and not self-terminating; it relies on the outer level to signal the end of data.
'\x00\x00\x00\x0cIDATshrdluetaoin\xed?\xa6\xa4'
|--------------| four byte length = 12
|--| four byte character code 'IDAT'
|----------| = 12 bytes of payload
|------------| = four byte CRC
I'll show the CRC is in the order {tag}, {data}, {length}:
@parent and @grandparent: The chunk CRC-32 does cover the chunk type.
> A four-byte CRC (Cyclic Redundancy Code) calculated on the preceding bytes in the chunk, including the chunk type field and chunk data fields, but not including the length field. -- https://www.w3.org/TR/2003/REC-PNG-20031110/#5Chunk-layout
It has been many years since I looked into this matter, and I seem to have forgotten that detail.
Digging through my sent box, to png-mng-misc, I see that knew that back in 2012!
I also wrote that most of the tools I checked didn't verify the CRC:
> I tried a PNG with an IDAT chunk with an invalid CRC on various software
on the my Mac. Unless I messed up my testing, the desktop, email preview,
OmniGraffle, and Pixen.app all used the chunk with the invalid CRC.
My thread was "I would like some insight about PNG CRC and other experience" in case someone wants to dig it up.
I was able to make an PNG with a length field which, if changed, would produce another PNG showing a different result. At least one of the PNGs had an invalid checksum, but was still displayed.
'IHDR' 13 (this is the chunk data size, excluding the 4 bytes of crc)
'xtra' 0
'IDAT' 1012 <-- this one is displayed
'IDAT' 1090 <-- this one is ignored
'IEND' 0
>"31-bit" is not a typo - PNG defines a "PNG four byte integer", which is limited to the range 0 to 231-1, to defend against the existence of C programmers.
What exactly is implied here? Silly guard against overflow?
Mostly just a joke at C's expense. When reviewing C code, any time the sign-bit gets touched I consider the danger zone to have been entered (especially in the context of parsing data). Limiting the range of ints is a good defensive programming tactic.
As another commenter points out, the real reason is about practicality - some languages like Java don't natively support unsigned ints.
Maybe I'm old, but i dont get it. I'd say signed is more dangerous as the overflow is undefined. And if you read to unsigned, you are just wasting range.
Yeah I found that part a bit opaque as well. What about alignment? Does it just use 4 byte integers but one bit is unused? Isn't that a much more clear way of putting it?
Last time i was on the scene.org ftp (-style) download webpage, there was a .png from a party, -years ago my internetconection was damn slow that time, but i've seen a picture of 'hal' (computer in a movie) beginning with the 'rasterpoint' (image build-up-start-point) on 'down right' running 'to the left' and up, line by line...
maybe i calculated his bits wrong but over a hundred thousand bits for a picture
in 300 x XXX ?
here's a (german) comic in 12,5 kb but i think they don't like hotlnking ^^
I've never tried that variant, nor do I think I've seen files in the wild, so IDK how widely stuff supports it. If I have need for an alpha channel, I usually reach for a PNG encoder at that point…
It also contains a simple version without actual compression which is actually a good alternative to BMP files as I was quite confused about the specifications for BMP so I rather wrote a PNG implementation.
You know you’re old when you still reflexively flinch just a bit when you see PNGs mentioned, as you have battle scars from working with PNG-24 files in IE6.
It seems like operations with the format would be a bit faster if the pre-compression data would just be a framebuffer dump, instead of prefixing each row with a 1-byte "filter ID", possibly breaking data alignment.
I suspect even if you weren't using heuristics you would be best to pick a different static filter than "None". For example I would expect that "Up" does much better on average as you are essentially bringing in context that the compressor would struggle to line up. I think most encoders would probably only use "None" in cases where heuristics show that nothing else helps.
So it is probably mandatory because only a tiny minority of images wouldn't use a filter. So it is better to just require it to avoid one more condition in the decoder.
Adding 1 Byte to each line of raw pixel data is not that much. You can set it to zeros to avoid filtering. But it gives you a chance to improve the compression.
The assumption was that it improves compression of gradients and photographic data. In order for compressors to be able to use it, it has to be mandatory for decompressors. Anything that is optional becomes unusable (e.g. arithmetic coding in JPEG).
Filtering as a separate preprocessing step allowed PNG to use off-the-shelf zlib for compression without needing to modify it.
Are there formats that try to do more of a 2D compression? With png horizontal lines compress way better than vertical lines. Something like what JBIG2 does but without the focus on letters
JPEG-XL's lossless mode has something called the "squeeze transform" which operates over the image data in 2D. It's a variation of the Haar Transform, which works on a similar principle as DCT
It also has "meta-adaptive filters", which act similarly to PNG's filters except you get to encode a custom decision-tree that defines how pixels are filtered, as a function of their neighboring pixels
Oh nice! That seems to be using the tool 'figlet' (or maybe toilet, from libcaca), and the font is banner3-D, that's available here: https://github.com/xero/figlet-fonts
If you install the figlet (or toilet) tool and clone that font repo you can do a :
Take a look at the CSS scaling on the site too. It scales the <pre> block responsively so it works on mobile browsers. The powers of figlet/toilet and CSS combine for a cool look.
You can add your own chunks, and you can mark them as optional.
It wasn't specified in the article but IIRC the casing of each letter in the chunk type has special meaning. So AAAA and aaaa have implicit meanings such as "required", "keep when processing" and I guess two other flags.
Oh, IE "supported" PNGs almost maliciously. If you had a True Color PNG? No problem. Paletted 8-bit? No problem. 8-but with alpha? Fine. True Color with alpha? I hope you like seeing your background color you meant to alpha out.
You had to use Microsoft's DirectX filtering CSS extensions to properly handle the alpha channel of True Color PNGs.
Not exactly, palletised png supports a full alpha channel which did not work with IE.
Though you had to work to get that as usually software would limit palletised output to GIF (if you didn’t outright have to create your pngs from gifs).
I love this comment: PNG defines a "PNG four byte integer", which is limited to the range 0 to 2^31-1, to defend against the existence of C programmers.
Simple formats are like certain forms of .TGA and .BMP. A simple header and then the pixel data. No CRCs, no compression. Done. You can write an entire reader in 20-30 lines of code and a writer in other 20-30 lines of code as well. Both of those formats have options that can probably make them more work but if you're storing 24bit "True color" or 32 bit "true color + alpha" then they are way easier formats.
Of course they're not common formats so you're stuck with complex formats like PNG