If you want to know all the details, the spec itself isn't too hard to understand either, and comes with a bunch of flowcharts on how to decode the Huffman tables and the run-length codes for each DCT block:
https://www.w3.org/Graphics/JPEG/itu-t81.pdf
I highly recommend writing a JPEG en/decoder as an exercise in coding; getting to the point of being able to decode almost all the images you probably have or can find on the Internet shouldn't take more than a few hours and <1k lines of (C) code. Once you've done that, you'll be ready to take on a simple video codec, like H.261. I find it self-motivational because the results are very visible (and if you have a bug, all sorts of fun effects can appear too.)
(As you may have guessed, I've written a JPEG decoder, as well as JBIG2 (partially), H.261, and MPEG-1, each of which is roughly 1kLoC. Currently thinking of trying JPEG2000 too, which is over an order of magnitude more complex.)
The JPEG spec is a bit on the verbose side, and most of it can be skimmed easily; for example, one of the sentences near the beginning is "An encoder is an embodiment of an encoding process." There's also specifications of lossless modes and arithmetic compression, which you probably don't care about for a first exercise; the "baseline DCT" is the only variant that really matters. The spec is nearly 200 pages, but the diagrams and flowcharts take up a significant amount of that space.
Interestingly enough, the JPEG2000 spec is at first glance not that much longer, but it is far more terse, so the size of the spec is only a rough estimate of the complexity and work it'd take to implement.
This is actually fascinating. I had always imagined (and thought I experienced) that JPGs get corrupted by practically any modification, but it seems most modifications don't make the image undisplayable, rather only those that mess with the header (for obvious reasons!)
The key is that almost any sequence of bits can be parsed as valid Huffman codes (since unused codes would be a waste of the code space). JPEG doesn’t use any length field for image data, nor does it have any integrity verification. This means you can edit or delete almost any byte in the image data segment and a decoder will still interpret it. A few blocks near the edit site will be corrupted, but it’s very likely that the bitstream will resynchronize some blocks later, so the corruption tends to be highly localized (modulo block/colour component shifting).
By contrast, PNG, which also uses Huffman codes for image compression via DEFLATE, has both length fields and integrity checks (CRC32), which the majority of decoders will check. Hence, almost any corruption to a PNG will render it invalid. If you fix up the length and CRC fields after tampering with the file, you can glitch PNGs too, which can produce some very interesting effects.
It was incredibly fun to see how some changes had drastic effects, but I was also surprised by the fact that most modifications past the header either had no clear visible effect, or kept the image intact. Somehow I'd come to expect otherwise.
Not much to say aside from a really good demonstration!
It must have taken a good chunk of Javascript to convert all of the files in just the right way, as well as a lot of thinking to isolate the various parts of the JPEG format to make it easier to understand. Good job!
What’s remarkable is, like MP3, JPEG is a perceptual format designed decades ago but still so widely used today. We’ve actually gotten much, much better at image and audio compression in the intervening years, but both JPEG and MP3 hang on because they’re “good enough” for the majority of uses.
I do hope that new alternatives like BPG, HEIC or WebP displace JPEG someday - but that day promises to be very far into the future.
Well, BGP and HEIC (or rather HEIF as HEIC is just the container IIRC) uses HEVC which means royalties, so unless MPEGLA were to make images using HEVC royalty free, I can't see any scenario where it would replace jpeg.
Webp is royalty free, but for lossy compression it's not a good enough improvement (imo) over jpeg for it to replace it. For lossless compression though, it beats PNG both in size and compression/decompression speed by a good margin.
And the other one I find very interesting is PIK, which is a new lossy/lossless codec that is supposedly a good improvement over jpeg while being very fast, and also has the feature of being able to losslessly recompress jpeg images into pik with ~20% compression: https://github.com/google/pik
There is JPEG XL, and image format from EVC. Which is the purposed Royalty Free Video Codec from MPEG. ( All while VVC, the successor of HEVC is racing ahead.... )
Over 2 decades of software and hardware understand JPEG, its decoding/encoding complexity is low, and it's been around long enough that any patents on the original standard have expired. The same can't be said of the newer formats, so if you want to distribute images widely then you must use a highly compatible format. This is why I believe the other formats will always have only niche use --- remember JPEG2000? It's still around in things like PDFs and is popular for geographical applications (where the scalability and tiling features find practical use), but the immense complexity and somewhat unclear patent situation meant it never gained any widespread acceptance.
Just as a side note: I found a WebP file on a website I wanted to share in Slack. I downloaded it and tried to convert it to JPEG to put in Slack (as it didn't seem to support it). It was much harder than it should have been to convert the image type. Never had such an issue with a JPEG, PNG, GIF, etc.
It’s indeed not super easy on macOS, I imagine you’d use a website to convert it or get ImageMagick through brew. On Windows I wonder if the photo app supports it, in which case you’d hopefully just double click the image and then save a copy, but unfortunately I cannot try that right now.
I think ideally Slack should be able to convert on the fly, just like creating low quality versions for thumbnails. It’s a paid service after all! But that doesn’t change the current predicament.
I personally too use PNG for graphic, and JPG for pictures even today, despite Lighthouse suggesting I add WebP extras. It’s just so much easier and more portable with so little to gain by doing it.
I ended up using an online converter [1] as Gimp and Imagick wasn't having any of it (at the time). Someone in these comments suggested using ffmpeg as an alternative.
> I think ideally Slack should be able to convert on the
> fly, just like creating low quality versions for
> thumbnails. It’s a paid service after all! But that
> doesn’t change the current predicament.
I would mostly agree with you, but these days I'm easily pleased when these types of services do the simple things well. They are only really incentivized to support the most popular image types anyway.
> I personally too use PNG for graphic, and JPG for pictures
> even today, despite Lighthouse suggesting I add WebP
> extras. It’s just so much easier and more portable with so
> little to gain by doing it.
This is what really answers why we still use them, it's convenient. To the average user, the difference between 90kB and 80kB images is not worth losing sleep over. If you were serving that image several million times a day I could imagine it's slightly more pressing.
As an aside I sometimes use my Sony Mavica (floppy disk camera) and that thing outputs valid JPEGs that I can upload to anywhere. (I use it as mostly a joke, but I also love the idea of having a limited number of pictures and keeping old tech going.)
You'll be happy to know that FFmpeg will happily convert a WebP file to a PNG and vice versa. The reason being that WebP is a single frame WebM. Silly, yes, but it works.
I don't understand why they haven't put this into a Linux package? If you want it to get widely adopted, surely you should make it easier for software to add dependencies on your image type. For example one of my projects currently depends on libjpeg and libjpeg-dev for building.
This has thrown me off too, I drag a lot of images from my browser to my desktop when browsing.
WEBP files dragged from Mac chrome just fail silently and never appear on the desktop so now I have to pay extra attention every time.
Although I’m sure the server owners enjoy the benefit, but if it doesn’t work Like a jpeg and I have to change my workflow for an image format then it sucks for the user.
> We’ve actually gotten much, much better at image and audio compression in the intervening years,
For Audio, Over the last 25 years We had WMA, Real Audio, Vorbis, MP3 Pro, HE-AAC, these are some well knowns ones, and all of them promised to have MP3 128Kbps quality at half the bitrate. And none of them were even close. Not even at ~100kbps. It was until Opus, and after much tuning of its v 1.3 encoder, did manage to finally edge out LAME MP3 128Kbps with 96Kbps.
I hardly see that as much better audio compression in 25 years time. However we could definitely get much more ( 20 - 30% ) quality out of 128Kbps or 256Kbps with AAC or MPC ( MusePack, my favourite high bitrate Audio codec )
And yet during this time Consumer Internet speed went from a dismal 56Kbps to 10+Mbps, and for many it could be 100+Mbps. That is a 100 to 1000x difference.
For image, while BPG and newer image format do get 50% reduction from JPEG, but consider our display resolution is now 4x to 8x of JPEG era. 50% reduction aren't anywhere enough. I would like to see one that gives same quality of the current JPEG at 25% of file size. I.e 100KB JPEG to 25KB.
Although I am not entirely sure will it matter. It wasn't the Format that is good enough, it is that our Network Bandwidth and Storage has grown at a pace that makes them irrelevant. Compared to the unknown and research required to make those file size reduction, we have a very clear path in 5G evolution, apart from Africa which I am not so sure, Most of North, South America, Europe, Asia will be able to afford 5G Mobile plan with at least 10Mbps Real World Speed by 2030. And I would not be surprised many of the more advanced nation will get 100Mbps or even 300+Mbps in real world usage.
Note: Before anyone mention DataCap, I think 5G with much higher capacity will change the economics of Mobile Data.
What is the relevance of JFIF in this? That's the file format, JPEG is the compression scheme. Yet we refer to JPEG files not JFIF. How did it happen we care more, or give more credit to the compression part?
Honestly, I don’t consider JFIF to be a good format. It’s poorly extensible and requires hacks like byte stuffing to work properly. I much prefer PNG personally - it’s highly extensible via fourcc codes and doesn’t require weird hacks.
I'm not sure why you consider byte stuffing a "hack", because it's a great way to make the codestream self-synchronising and easily decoded by hardware. It's also used a lot in video formats, because it enables easy seeking.
This is an amazingly well done article! The writing is very clear and the interactive elements are fantastic in explaining the concepts. Really impressive and it was great to demystify how JPEGs work, I had a vague idea but the details are fascinating. Would love to see another one of these about MP3!
How does this compare to PNG? I believe the latter is lossless, so presumably it's just the plain pixels with Huffman or something similar. But I'd be curious to know more details.
It's a totally different approach, a bit too much to explain in a HN comment; but generally it involves rearranging the data with a set of filters, and then compressing with what is essentially gzip.
PNG filters basically difference each byte of each line of image data with a choice of [1] nothing, [2] the pixel preceding the current one, [3] the pixel immediately above on the previous line, [4] an average of the latter two, or [5] "Paeth" (a dynamic choice between the left, up, and upper-left pixels), the goal being to increase redundancy for areas of smooth gradients. Each line has its own choice of filter indicated by a prefix byte. Then it compresses the differenced data using gzip/flate/zlib (a general-purpose LZ+Huffman algorithm).
Yup. PNG is 1, if JPEG were a 10 on the complexity scale (which it isn't, it is pretty simple on the absolute scale). You can understand PNG if you know a bit about algorithms, but you need a tiny bit more than high-school math to really grok JPEG.
Lossless encoding can be far simpler to understand (in some cases it's not, see FLAC).
I highly recommend writing a JPEG en/decoder as an exercise in coding; getting to the point of being able to decode almost all the images you probably have or can find on the Internet shouldn't take more than a few hours and <1k lines of (C) code. Once you've done that, you'll be ready to take on a simple video codec, like H.261. I find it self-motivational because the results are very visible (and if you have a bug, all sorts of fun effects can appear too.)
(As you may have guessed, I've written a JPEG decoder, as well as JBIG2 (partially), H.261, and MPEG-1, each of which is roughly 1kLoC. Currently thinking of trying JPEG2000 too, which is over an order of magnitude more complex.)