
Unraveling the JPEG - catacombs
https://parametric.press/issue-01/unraveling-the-jpeg/
======
userbinator
If you want to know all the details, the spec itself isn't too hard to
understand either, and comes with a bunch of flowcharts on how to decode the
Huffman tables and the run-length codes for each DCT block:
[https://www.w3.org/Graphics/JPEG/itu-t81.pdf](https://www.w3.org/Graphics/JPEG/itu-t81.pdf)

I highly recommend writing a JPEG en/decoder as an exercise in coding; getting
to the point of being able to decode almost all the images you probably have
or can find on the Internet shouldn't take more than a few hours and <1k lines
of (C) code. Once you've done that, you'll be ready to take on a simple
_video_ codec, like H.261. I find it self-motivational because the results are
very visible (and if you have a bug, all sorts of fun effects can appear too.)

(As you may have guessed, I've written a JPEG decoder, as well as JBIG2
(partially), H.261, and MPEG-1, each of which is roughly 1kLoC. Currently
thinking of trying JPEG2000 too, which is over an order of magnitude more
complex.)

~~~
grenoire
What's always daunting to me in projects like these is the time it would take
me to dig and understand the spec itself...

~~~
userbinator
The JPEG spec is a bit on the verbose side, and most of it can be skimmed
easily; for example, one of the sentences near the beginning is "An encoder is
an embodiment of an encoding process." There's also specifications of lossless
modes and arithmetic compression, which you probably don't care about for a
first exercise; the "baseline DCT" is the only variant that really matters.
The spec is nearly 200 pages, but the diagrams and flowcharts take up a
significant amount of that space.

Interestingly enough, the JPEG2000 spec is at first glance not that much
longer, but it is far more terse, so the size of the spec is only a rough
estimate of the complexity and work it'd take to implement.

------
onemoresoop
Damn, this is the most granular JPEG editor I've seen so far. Leaving the joke
aside, that's a really fun research/demo.

------
herpderperator
This is actually fascinating. I had always imagined (and thought I
experienced) that JPGs get corrupted by practically any modification, but it
seems most modifications don't make the image undisplayable, rather only those
that mess with the header (for obvious reasons!)

~~~
nneonneo
The key is that almost any sequence of bits can be parsed as valid Huffman
codes (since unused codes would be a waste of the code space). JPEG doesn’t
use any length field for image data, nor does it have any integrity
verification. This means you can edit or delete almost any byte in the image
data segment and a decoder will still interpret it. A few blocks near the edit
site will be corrupted, but it’s very likely that the bitstream will
resynchronize some blocks later, so the corruption tends to be highly
localized (modulo block/colour component shifting).

By contrast, PNG, which also uses Huffman codes for image compression via
DEFLATE, has both length fields and integrity checks (CRC32), which the
majority of decoders will check. Hence, almost any corruption to a PNG will
render it invalid. If you fix up the length and CRC fields after tampering
with the file, you can glitch PNGs too, which can produce some very
interesting effects.

------
dragontamer
Not much to say aside from a really good demonstration!

It must have taken a good chunk of Javascript to convert all of the files in
just the right way, as well as a lot of thinking to isolate the various parts
of the JPEG format to make it easier to understand. Good job!

------
sprash
This shows again that JPEG is apparently some alien technology from the
future.

People always say that nothing good can come out of a committee. JPEG proves
the opposite.

------
nneonneo
What’s remarkable is, like MP3, JPEG is a perceptual format designed decades
ago but still so widely used today. We’ve actually gotten much, much better at
image and audio compression in the intervening years, but both JPEG and MP3
hang on because they’re “good enough” for the majority of uses.

I do hope that new alternatives like BPG, HEIC or WebP displace JPEG someday -
but that day promises to be very far into the future.

~~~
bArray
Just as a side note: I found a WebP file on a website I wanted to share in
Slack. I downloaded it and tried to convert it to JPEG to put in Slack (as it
didn't seem to support it). It was much harder than it should have been to
convert the image type. Never had such an issue with a JPEG, PNG, GIF, etc.

~~~
chronogram
So how did you end up doing it?

It’s indeed not super easy on macOS, I imagine you’d use a website to convert
it or get ImageMagick through brew. On Windows I wonder if the photo app
supports it, in which case you’d hopefully just double click the image and
then save a copy, but unfortunately I cannot try that right now.

I think ideally Slack should be able to convert on the fly, just like creating
low quality versions for thumbnails. It’s a paid service after all! But that
doesn’t change the current predicament.

I personally too use PNG for graphic, and JPG for pictures even today, despite
Lighthouse suggesting I add WebP extras. It’s just so much easier and more
portable with so little to gain by doing it.

~~~
bArray
> So how did you end up doing it?

I ended up using an online converter [1] as Gimp and Imagick wasn't having any
of it (at the time). Someone in these comments suggested using ffmpeg as an
alternative.

> I think ideally Slack should be able to convert on the

> fly, just like creating low quality versions for

> thumbnails. It’s a paid service after all! But that

> doesn’t change the current predicament.

I would mostly agree with you, but these days I'm easily pleased when these
types of services do the simple things well. They are only really incentivized
to support the most popular image types anyway.

> I personally too use PNG for graphic, and JPG for pictures

> even today, despite Lighthouse suggesting I add WebP

> extras. It’s just so much easier and more portable with so

> little to gain by doing it.

This is what really answers why we still use them, it's convenient. To the
average user, the difference between 90kB and 80kB images is not worth losing
sleep over. If you were serving that image several million times a day I could
imagine it's slightly more pressing.

As an aside I sometimes use my Sony Mavica (floppy disk camera) and that thing
outputs valid JPEGs that I can upload to anywhere. (I use it as mostly a joke,
but I also love the idea of having a limited number of pictures and keeping
old tech going.)

[1] [https://ezgif.com/webp-to-jpg](https://ezgif.com/webp-to-jpg)

------
cmurf
What is the relevance of JFIF in this? That's the file format, JPEG is the
compression scheme. Yet we refer to JPEG files not JFIF. How did it happen we
care more, or give more credit to the compression part?

~~~
nneonneo
Honestly, I don’t consider JFIF to be a good format. It’s poorly extensible
and requires hacks like byte stuffing to work properly. I much prefer PNG
personally - it’s highly extensible via fourcc codes and doesn’t require weird
hacks.

~~~
userbinator
I'm not sure why you consider byte stuffing a "hack", because it's a great way
to make the codestream self-synchronising and easily decoded by hardware. It's
also used a lot in video formats, because it enables easy seeking.

------
tomduncalf
This is an amazingly well done article! The writing is very clear and the
interactive elements are fantastic in explaining the concepts. Really
impressive and it was great to demystify how JPEGs work, I had a vague idea
but the details are fascinating. Would love to see another one of these about
MP3!

------
xchip
Related: This a JPEG visualizer in just 250 lines in easy to read Python 3.0
code.

[https://github.com/aguaviva/micro-jpeg-
visualizer](https://github.com/aguaviva/micro-jpeg-visualizer)

------
_bxg1
How does this compare to PNG? I believe the latter is lossless, so presumably
it's just the plain pixels with Huffman or something similar. But I'd be
curious to know more details.

~~~
microcolonel
It's a totally different approach, a bit too much to explain in a HN comment;
but generally it involves rearranging the data with a set of filters, and then
compressing with what is essentially gzip.

~~~
userbinator
PNG filters basically difference each byte of each line of image data with a
choice of [1] nothing, [2] the pixel preceding the current one, [3] the pixel
immediately above on the previous line, [4] an average of the latter two, or
[5] "Paeth" (a dynamic choice between the left, up, and upper-left pixels),
the goal being to increase redundancy for areas of smooth gradients. Each line
has its own choice of filter indicated by a prefix byte. Then it compresses
the differenced data using gzip/flate/zlib (a general-purpose LZ+Huffman
algorithm).

