
How JPEG actually works (2006) - ramijames
https://blogs.msdn.microsoft.com/devdev/2006/04/12/how-does-jpeg-actually-work/
======
userbinator
For those interested in writing their own JPEG decoder (a highly recommended
exercise --- it doesn't take all that long, actually) I suggest these two
detailed articles:

[https://www.impulseadventure.com/photo/jpeg-
decoder.html](https://www.impulseadventure.com/photo/jpeg-decoder.html)

[https://www.impulseadventure.com/photo/jpeg-huffman-
coding.h...](https://www.impulseadventure.com/photo/jpeg-huffman-coding.html)

Of course, there are also plenty of other "write a JPEG decoder/encoder
tutorial" articles out there, of varying quality --- which if anything proves
that it's not actually so hard. The standard contains a surprisingly simple
and elegant flowchart description of the Huffman encoder/decoder.

 _The more modern JPEG 2000 encoding based on wavelets can be pushed much
farther before visible artifacts appear, and includes a surprisingly efficient
lossless mode. Of course, all this improvement is rather moot until common
applications actually support it._

That was 12 years ago, and J2K is still pretty much unknown outside of niche
applications. I'm not sure what they mean by "surprisingly efficient lossless
mode", but decoding J2K is slower than regular JPEG by at least an order of
magnitude if not more. The lack of articles about how to write
decoders/encoders, or even just how it works in detail, is another sign of its
relative obscurity. I've been studying the standard on and off for a few
months, intending to write an article or even a decoder, but haven't gotten
around to it yet. It is certainly an order of magnitude more complex.

~~~
dunham
PDF uses jpeg2k, but I don't know of anything else that does. (Archive.org is
using both jpeg2k and jbig in their pdfs.)

~~~
userbinator
Yes, the PDFs on archive.org are the first thing to come to mind (and my first
exposure to J2K) --- not all of them use it, but you can identify them by the
surprisingly long time it takes to render each page.

------
cstrat
It is a shame how quickly the internet loses content. The first few
experiments linked in the document lead to 404 pages, even an inline image is
404'ing.

It would have been great if the author of the MSFT article maybe included
screenshots or something to preserve the referenced content.

I know the waybackmachine exists, but 2006 doesn't feel that long ago even
though it really was.

~~~
nayuki
Archive.org has the experiment link (Standford page), but misses quite a few
images in it.
[https://web.archive.org/web/20070101025129/http://www.stanfo...](https://web.archive.org/web/20070101025129/http://www.stanford.edu:80/~esetton/experiments.htm)

------
nayuki
Coincidentally, another link on the HN front page is about the image quality
of JPEG vs. JPEG 2000 vs. JPEG XR vs. WebP vs. BPG:
[https://news.ycombinator.com/item?id=17587684](https://news.ycombinator.com/item?id=17587684)
; [https://bellard.org/bpg/](https://bellard.org/bpg/) ;
[http://xooyoozoo.github.io/yolo-octo-
bugfixes/](http://xooyoozoo.github.io/yolo-octo-bugfixes/)

~~~
forgot-my-pw
Looks promising, but image format is one of the slowest adapted standards on
the Internet. How's the encode/decode speed?

Isn't there another format recently, FLIF?

------
peterburkimsher
I want to use images to encode HTML or other arbitrary binary data, in a way
that I call Fondant.

[https://news.ycombinator.com/item?id=17461955](https://news.ycombinator.com/item?id=17461955)

I made a checkerboard pattern of 8x8 pixels, but I still lose data to colour
bleeding when re-encoding (e.g. upload to Facebook or Save to Camera Roll on
the iPhone).

Is there some way I can use the "lossless" entropy coding to avoid this
problem?

~~~
userbinator
You can perturb the input of the DCT so that a specific IDCT implementation
will generate the desired output. This technique applied to JPEG has been
known for a few years:

[https://www.virusbulletin.com/virusbulletin/2015/03/script-l...](https://www.virusbulletin.com/virusbulletin/2015/03/script-
lossy-stream)

In general signal-processing terms, this is known as preemphasis/equalisation
--- since the communications channel will distort the signal in a known way,
by sending a signal distorted appropriately in the opposite direction, it will
arrive "undistorted" at the destination.

You can also consider using ECC codes, so that any small errors introduced can
be corrected.

~~~
peterburkimsher
Will that work for redistribution? I hope that users could save and re-share
the photo many times and let it still be loadable.

~~~
anyfoo
To be truly resilient against any kind of (potentially unknown, even) lossy
image compression, you would need to add a lot of redundancy. QR Codes are
pretty much that, but their data needs to survive being “transmitted” through
a blurry photo with arbitrary lighting, rotation and perspective, so they
likely have a lot more redundancy than you realistically need. (EDIT: You
might hit the limit where you need QR Code levels of efficiency relatively
quickly, depending on the number of steps and how aggressive each
recompression is.)

If saving and sharing can mean changing the image dimensions however, you of
course need redundancy by that factor.

But yeah, that’s why we don’t choose lossy compression algorithms for binary
data. If every bit counts, there is no entropy you can lose (though a lossless
algorithm might find a more efficient representation, so to speak).

~~~
peterburkimsher
QR codes have a maximum size of 7,089 numerical digits - not enough to
transmit a program or song. I'd be hoping to move about 5 MB.

~~~
snaily
Facebook, which was the use case you mentioned upthread, doesn't store 5MB
images. You'd have to design for way less than 5MB, I should imagine.

------
urvader
My thoughts exactly: "Just don't go sending JPEGs into space and expect aliens
to see them the same way."

------
kowdermeister
Instead of a wall of text, these Computerphile videos clear the topic pretty
well:

[https://www.youtube.com/watch?v=n_uNPbdenRs](https://www.youtube.com/watch?v=n_uNPbdenRs)

[https://www.youtube.com/watch?v=Q2aEzeMDHMA](https://www.youtube.com/watch?v=Q2aEzeMDHMA)

~~~
timbit42
I can read that "wall of text" in 2 minutes while you waste 20 minutes
watching those videos.

~~~
aw3c2
With 90% of the video being the image of some guy talking in a colloquial way
and wicked cool camera zooms for impact.

~~~
kowdermeister
And the rest are visualizations that do more than a text only article ever
could.

