
JPEG Committee releases a call for evidence for image compression based on AI - jonsneyers
https://jpeg.org/items/20200217_press.html
======
ksec
_JPEG XL

The JPEG XL Image Coding System (ISO/IEC 18181) has produced an open source
reference implementation available on the JPEG XL Gitlab repository. The
software is available under Apache 2, which includes a royalty-free patent
grant. Speed tests indicate the multithreaded encoder and decoder outperforms
libjpeg-turbo. Independent subjective and objective evaluation experiments
have indicated competitive performance with commonly used image coding
solutions while offering new functionalities such as lossless transcoding from
legacy JPEG format to JPEG XL. The standardization process has reached the
Draft International Standard stage._

That to me is the biggest news. I remember reading a presentation where it had
even better quality at medium bitrate than AVIF. Since JPEG does not belong to
Google or Apple, hopefully it would be a format easier to swallow from both
parties.

~~~
wmf
Half the technology in JPEG XL is from Google but hopefully people won't get
petty about that.

A bigger problem is that various companies just did a lot of work to adopt
HEIF so they're not going to be eager to churn onto JPEG XL. Generally people
wait 5-10 years before they're open to upgrading.

~~~
ksec
>Half the technology in JPEG XL is from Google

Yes but it is submitted as an open standard to an Organisation that doesn't
run or belong to Google.

Is HEIF used anywhere other than on Apple devices?

~~~
edw
Canon, Android.

~~~
zamadatix
Windows 10 has support as well.

~~~
ksec
Both Android and Windows does not support HEIF by default.

------
Ajedi32
I find the idea of AI-based image compression rather fascinating. In theory it
seems like a great fit for AI; "perceived quality" is exactly the sort of
fuzzy metric AI tends to be good at optimizing for, and a compression
algorithm possessing an understanding of high-level visual features like
shapes, lighting, etc seems like it would have a lot of potential.

Taken to the extreme, a sufficiently advanced AI compression system could
theoretically end up storing representations of high-level features like faces
in the compressed data; which could lead to some really _bizarre_ looking
compression artifacts when the system breaks down.

~~~
derefr
> Taken to the extreme, a sufficiently advanced AI compression system could
> theoretically end up storing representations of high-level features like
> faces in the compressed data

Or, you could have a compression system created specifically for photographic
images (as opposed to a “general compressor” like LZ77), where these high-
level features are part of the (de)compressor itself.

Now, _lossy_ domain-specific compressors are pretty common; one good example
is audio “voice codecs” for telecom, in which every snippet of audio is
encoded as a phoneme-like symbol, such that noise comes out the other end as a
gobbledygook that sounds a lot like words.

But much more interesting to me is a _lossless_ domain-specific compressor,
which I’m not aware of any examples of. (Please tell me if about some if you
know them!) In such a compressor, the input would be “rebased” against the
model, such that the output looks like an instruction stream ala “start with
[this synthesized template with these parameters]; and then add point-
modifications {A, B, C, ...} to get from there back to the original input.” So
inputs that were more like something it “understood” would compress well (i.e.
just be described by a template) while inputs that were less-well-understood
wouldn’t get much from the procedural-generation phase, and would end up being
specified entirely as point-corrections, and so not really compress at all.

~~~
sdenton4
It's pretty easy to build a lossless learned encoder. Encode the image with
your model, then apply a lossless encoding to the error. You can also include
a measure of the entropy of the error in your model's loss function.

The model should make the errors small, and including the entropy measure of
the error in the loss function means that the model produces "cheaper to
correct" errors.

A nice keyword on this front is "bits back argument".

~~~
barrkel
That's a nice theory, but what if your ML encoder uses textures to encode
perceptual equivalence efficiently that's actually got very high errors
measured by pixel? It seems to me the best ML-driven encoders would work by
inferring higher abstraction details from the image in ways which are hard to
get back to lossless - that there's an intrinsic tension there.

~~~
fwip
Then choose a different encoder, or add another layer to shift/correct the
texture.

------
leeoniya
sure, what can go wrong?

[https://www.zdnet.com/article/xerox-scanners-alter-
numbers-i...](https://www.zdnet.com/article/xerox-scanners-alter-numbers-in-
scanned-documents/)

~~~
basilgohar
That still baffles me to this day.

\--Edit-- I think there are differences between the case of using this for
compression and how Xerox implemented (which they _claim_ was due to
compression).

It can be argued that most advanced codecs of today already handle compression
via very complex and advanced heuristics already comparable to what a lot of
people would confuse for AI and/or ML today, so I think it's not as spooky as
it may sound at first.

The Xerox implementation actually changed values on the page from one to
another innocuously. I would hope that the AI being talked about is not "This
6 looks more like it should be an 8, so I'm going to put one there upon
decode" and more like, "this block or unit of the image has these
characteristics, and theses methods work very well to reproduce these
characteristics when next to these other blocks..." etc.

But that's the best way I can phrase it.

------
SrslyJosh
> Fake news, copyright violations, media forensics, privacy and security are
> emerging challenges in digital media. JPEG has determined that blockchain
> and distributed ledger technologies (DLT) have great potential as a
> technology component to address these challenges in transparent and
> trustable media transactions.

You've gotta be kidding me.

------
jonsneyers
Highlights:

\- Call for Evidence launched for AI-based image codecs

\- Call for Evidence launched for point cloud codecs

\- Next-gen image codec JPEG XL reaches Draft International Standard stage

\- Light-weight JPEG XS codec is looking at compression of camera raws

------
carapace
See also "Hutter Prize"
[http://www.hutter1.net/prize/index.htm](http://www.hutter1.net/prize/index.htm)
[https://en.wikipedia.org/wiki/Hutter_Prize](https://en.wikipedia.org/wiki/Hutter_Prize)

------
RcouF1uZ4gsC
I think we already have the tools to do this, but there might be issues with
fidelity.

First: Use AI to generate the text description of an image.
[http://homepages.inf.ed.ac.uk/keller/publications/jair16.pdf](http://homepages.inf.ed.ac.uk/keller/publications/jair16.pdf)
[https://www.captionbot.ai/](https://www.captionbot.ai/)

Second: Use AI to generate an image from the text description.
[https://news.developer.nvidia.com/ai-model-can-generate-
imag...](https://news.developer.nvidia.com/ai-model-can-generate-images-from-
natural-language-descriptions/)

~~~
mehh
What do you mean by us AI? Do you mean a Neural Network or something else, I
find the use of the broad term AI a bit weird, there is lots of stuff that can
fit under the AI category.

------
rcarmo
I was instantly reminded of this “pineapple”:

[https://twitter.com/cassmarketos/status/1229473344480673792?...](https://twitter.com/cassmarketos/status/1229473344480673792?s=21)

Of course, an actual working compression algorithm is quite likely. I recall
that recently someone was upscaling and colorizing ancient movie reels to
modern standards, and there is plenty of low-hanging fruit in fine-tuning
parameters to suit specific images, so I actually look forward to this.

I just hope they don’t name it JPEG2020, JPEG2000 was a bit of a dud.

~~~
BubRoss
Jpeg2000 was a dud because of licensing and patents, not having a year in the
name.

~~~
userbinator
...and performance (or lack thereof) --- AFAIK the baseline J2K was always
implementable freely (similar to the situation with JPEG; arithmetic
compression was specified but largely unimplemented due to patents, while
baseline JPEG was supposed to be free), but the abysmal performance relative
to JPEG for a minor increase in compression efficiency, along with the
surprising amount of complexity in the spec, made it a poor choice.

There are lots of scanned ebooks on archive.org that use J2K compression, and
the sluggishness when turning pages is extremely noticeable compared to others
which use normal JPEG.

------
rhacker
I know this is more relevant for movies, but it would be rather cool to have a
movie compressed for a year of computation, and deliverable in about 100MB,
then let it decompress for a few hours on your computer back to its original
9GB size.

~~~
0-_-0
How big is the market for that vs streaming 9 GB on demand?

------
PaulHoule
I wouldn't be surprised if you could make an AI that knows something about
what scenes look like, and something about what JPEG artifacts look like that
and could clean up existing JPEG images.

~~~
derefr
The JPEG artifact “look” comes from the fact that JPEG data is essentially
just a matrix of point-samples (like any other raster format) which happen to
carry a gradient and angle, rather than a solid color. So, at a low sampling
resolution, you see weird discontinuities where one sample gradient (gradel?)
sharply transitions to another sample gradient, with a discontinuity in either
color or angle.

An AI that “fixes” JPEG artifacts would just be a regular
upscaling/supersampling AI—just one trained on JPEG sample-matrices instead of
BMP sample-matrices, so that it can take advantage of the extra information
each JPEG sample-point carries.

~~~
skunkworker
My understanding of lossy compression like JPEG may be limited, but wouldn't
this ML network be guessing the eigenvectors to put back into a matrix that
were removed because they weren't dominant enough?

~~~
gugagore
Unless I'm missing a connection you've made, no.

You might have been exposed to the singular-value decomposition to compress a
matrix, and representing an image as a matrix e.g. [1].

JPEG encoding is about transforming the signal into a frequency domain (think
FFTs) where it is less detrimental to throw away information.

If there is a connection to what you're saying, I expect you should try to
make some statement about the eigenvalues of the DCT matrix [2]

[1]
[http://www.math.utah.edu/~goller/F15_M2270/BradyMathews_SVDI...](http://www.math.utah.edu/~goller/F15_M2270/BradyMathews_SVDImage.pdf)

[2] [https://math.stackexchange.com/questions/962533/why-does-
the...](https://math.stackexchange.com/questions/962533/why-does-the-discrete-
cosine-transform-as-matrix-multiplication-work-this-way)

~~~
cvwright
I interpreted the GP as saying that the AI could try to infer what the DCT
coefficients had been prior to quantization.

I don't know enough about JPEG or about AI to know whether that would be
doable.

------
keenmaster
Does anyone have an estimate of how small an AI-compressed image will be
compared to an image compressed using traditional compression algorithms? On
average of course, and controlling for image quality.

~~~
0-_-0
As an example from 29 Feb 2016, for lossless compression you can go from 8 to
3 bits per color according to Table 5 here:
[https://arxiv.org/abs/1601.06759](https://arxiv.org/abs/1601.06759)

------
jiggawatts
I've been in the C#, Java, and Rust world for so long that every time I come
across C/C++ library source it _cracks me up_ that an image compressor has to
have functions to redefine basic concepts like memory allocation, assertions,
and threading.

Reminds me of a "lightweight" zip compression library that I recently used
that was literally 98% overhead and 2% actual compression code. It had its own
definitions of _everything_ : integers, integer properties, memory allocation,
streams (as in sequences of memory blocks, not even real I/O!), error
handling, progress notifications, etc... Pages and pages of conditional macros
and compiler-specific hacks. Just to call a single a function that took an
array and returned an array. Ridiculous.

[https://gitlab.com/wg1/jpeg-
xl/-/blob/master/include/jpegxl/...](https://gitlab.com/wg1/jpeg-
xl/-/blob/master/include/jpegxl/memory_manager.h)

[https://gitlab.com/wg1/jpeg-
xl/-/blob/master/include/jpegxl/...](https://gitlab.com/wg1/jpeg-
xl/-/blob/master/include/jpegxl/parallel_runner.h)

[https://gitlab.com/wg1/jpeg-
xl/-/blob/master/jxl/base/compil...](https://gitlab.com/wg1/jpeg-
xl/-/blob/master/jxl/base/compiler_specific.h)

[https://gitlab.com/wg1/jpeg-
xl/-/blob/master/jxl/base/os_spe...](https://gitlab.com/wg1/jpeg-
xl/-/blob/master/jxl/base/os_specific.h)

[https://gitlab.com/wg1/jpeg-
xl/-/blob/master/jxl/base/arch_s...](https://gitlab.com/wg1/jpeg-
xl/-/blob/master/jxl/base/arch_specific.h)

That last one is trying to work out the NUMA topology in a portable way...
good luck with that! That's not something I expected from a JPEG library. Like
memory management, they'd actually be better off requesting thread-management
function pointers from the hosting application. Otherwise, this would not be a
"good citizen" in constrained environments such as embedded in a web server or
database engine.

Random examples of silliness from just this one file:

They ignore the first 2 cores on 3+ core systems:
[https://gitlab.com/wg1/jpeg-
xl/-/blob/master/jxl/base/os_spe...](https://gitlab.com/wg1/jpeg-
xl/-/blob/master/jxl/base/os_specific.cc#L266)

They'll be limited max 64 threads ( _Half_ the threads on the larger single-
socket AMD EPYC 2 CPUs!): [https://gitlab.com/wg1/jpeg-
xl/-/blob/master/jxl/base/os_spe...](https://gitlab.com/wg1/jpeg-
xl/-/blob/master/jxl/base/os_specific.cc#L211) because they're not using the
Processor Group Aware API: [https://docs.microsoft.com/en-
us/windows/win32/procthread/pr...](https://docs.microsoft.com/en-
us/windows/win32/procthread/processor-groups)

Sure, 64-core monsters are still somewhat rare _today_ , but 16-core/32-thread
desktops CPUs are now affordable and the TSMC 5nm process will likely result
in 96-128 core monstrosities _this year_. That's 384-512 threads for a dual-
socket server or workstation! If you're making an image codec library "for the
next decade", a little future-proofing might go a long way. Look at it this
way: Image decoding throughput is one of the main performance issues I have
right now with Lightroom and I work with "only" 36 Mpx images. There are 80
Mpx cameras just around the corner...

For example, there was a blog article a couple of years ago basically saying
that _the_ major Amdahl's law limitation to decoding formats like JPEG is the
Huffman bitstream decoding, which is Step #1 in the process. The author
pointed out that simply _restarting_ the Huffman bitstream a few times for a
typical PNG or JPEG file would increase the size less than 1%, but would allow
truly parallel decoding for every step.

For all the apparent effort to make JPEG-XL parallelised by default, it seems
odd that there doesn't appear to be any mention of parallelising the ANS
bitstream decoding...

~~~
janwas
JXL dev here. Thanks for raising these points. Indeed regrettable we have to
have a bunch of boilerplate.

FYI we actually do support hooks for malloc (JpegxlMemoryManagerStruct) and
threading (JpegxlParallelRunner) precisely in order to be a good citizen :)

The topology is more about #cores/sockets, used by the benchmark_xl app to
decide how many threads to spawn.

I agree adding Processor Groups makes sense, but Windows hasn't been our focus
yet.

FYI it is indeed critical to parallelize ANS streams. Each 256x256 "group" can
be decoded in parallel.

~~~
jiggawatts
> Indeed regrettable we have to have a bunch of boilerplate.

Just to clarify: I'm not criticising the JPEG XL code, it seems to be quite
clean at first glance! I was merely making a general observation that would
apply to practically all portable C-interface libraries, such as ZStandard,
libpng, etc...

PS: I wonder if it would be possible to make a "portable library header" that
all such similar libraries could share to avoid the rework on both ends
(library authors and library consumers)...

> Windows hasn't been our focus yet.

That's a real worry. Not just because of the word "Windows", but what else it
implies.

To provide some constructive feedback: JPEG XL will be yet another failure
thrown on the trash heap of similarly well-intentioned formats. It'll die
along with JPEG 2000, JPEG EXR, and all of the others. Don't be so arrogant to
think that somehow _your_ format will be special and adopted by the masses.
The others were _just_ as technically advanced for their time and they all
failed.

Please, please, _please_ learn this lesson, because this history of failure
saddens me. The lack of a capable image formats compatible with all client
platforms is what stops me as a photographer from sharing my images in full
quality. In 2020, still, it is effectively impossible to share images in any
other format than 8-bit sRGB JPEG across platforms. You can't _email_ a HDR,
10-bit, or wide gamut picture to anyone and expect them to be able to open it
and have it look even vaguely right. You can't post such an image on _any_
website you don't control yourself. Even if you control the website, it will
_look wrong_ for 99.9% of users OR they'll just see the 8-bit sRGB version
anyway, _even if_ they have 10-bit HDR display.

It is literally only the walled garden of the Apple ecosystem that gets this
right. Microsoft and Google most certainly doesn't. As an iPhone user, I can
share a wide-gamut HEIF picture and 100% of the time it will look correct for
other iPhone users. However, Apple "cheated" a little bit. When you send one
of their HEIFs to any non-Apple-controlled app, it's silently converted to...
8-bit sRGB JPEG. It's just sad.

What everyone gets wrong with new image formats is this: They think that the
problem that needs solving is the encoder/decoder. That's step one of many
problems that _users need solved_. Image format developers assume their format
will be "adopted", but there's no _network effect_. Why would anyone use a
format they can't email, can't open on Windows 7, and can't edit in Photoshop?

If you stop with the JPEG XL library, with Windows _compilation_ an
afterthought, I absolutely guarantee that you will fail to gain traction and
the entire effort will be essentially wasted.

To get actual adoption, you would need to:

* Most importantly, develop an image decoder plugin for Windows.

* Help Firefox, Safari, and other less common browsers merge support.

* Ensure that Chromium gets support, but I assume that's a given.

* Send pull requests to add JPEG XL support for all major open source libraries that perform image processing, such as WordPress plugins and server-side image processing utilities for websites.

* Develop a plugin for all major image editing tools such as Photoshop.

* Work with Adobe on Lightroom _export_ support.

* Contact Sony, Nikon, and Canon and _help_ them with in-body encoding support.

But even that's not enough. If you want 10-bit and the wide gamut (let alone
HDR), you have to take steps to _force_ support, or it won't magically
materialise. You'll end up with _two_ JPEG XL formats: The 8-bit SDR sRGB
variety with 99.9999% adoption and the "fancy" JPEG XL that nobody uses and
_looks wrong if they try_.

IMHO the only way to do this is to forcibly _break_ applications that aren't
processing the ICC profiles properly. Much like Google's Chrome experiment
with randomising some TLS header fields to occasionally _break_ middle boxes
written by idiots that think version numbers are checked with equality, not
inequality operators.

You'd have to do something drastic, like leaving the image planes _unnamed_
and _randomised_ so that literally the only way to decode the image correctly
is via a compliant colour space transformation.

It sounds drastic, but other people have tried the easy way and failed. As I
said: It's 2020!!! Email me a non-sRGB image that I can view in my mail
client...

~~~
janwas
> I'm not criticising the JPEG XL code, it seems to be quite clean at first
> glance! Thanks, I understood your point about each project including its own
> definitions. At one point there was
> [http://www.bookofhook.com/poshlib/](http://www.bookofhook.com/poshlib/),
> but I'm not aware of anything similar that's maintained, complete enough but
> still small.

> That's a real worry. Not just because of the word "Windows", but what else
> it implies. We have put a great deal of thought into an adoption plan. "This
> time is different" because we actually provide value to existing
> clients/servers with JPEG, rather than giving them yet another bitstream to
> store, or lossy transcoding that just adds more artifacts. The lossless JPEG
> transcoding is a game-changer, as is the feasibility nowadays of decoding
> via WebAssembly - even with SIMD for speed.

> If you stop with the JPEG XL library, with Windows compilation an
> afterthought Good news, we are absolutely not stopping there. The project
> started in 2015. We're now moving out of the research phase and into
> productionizing, with integrations and plugins underway. Investing in such
> things too early would have meant less R&D.

> You'd have to do something drastic, like leaving the image planes unnamed
> and randomised Interesting idea, thanks for this suggestion.

------
bloak
Apparently they're "responsible for the popular JPEG, JPEG 2000, JPEG XR,
JPSearch, JPEG XT and more recently, the JPEG XS, JPEG Systems, JPEG Pleno and
JPEG XL families of imaging standards".

Some of those may be more popular than others:

[https://xkcd.com/2254/](https://xkcd.com/2254/)

~~~
jonsneyers
JPEG 2000 is actually quite popular, for example any movie you see in a cinema
is encoded using JPEG 2000 – they don't use video codecs for digital cinema,
but encode each frame separately, to avoid video compression artifacts. Also
in medical imaging, nearly everything is using JPEG 2000. Apple products have
supported it for a long time.

JPEG XR is also known as "Windows Media Photo" and "HD Photo" and has been
supported in the Microsoft ecosystem since Windows Vista and Internet Explorer
9.

JPEG XS, XL and Pleno are of course much newer and will still have to be
battle-tested.

But yes, none of the newer JPEG standards have reached the popularity of the
first JPEG codec, which is by far the most popular image format ever.

~~~
userbinator
_for example any movie you see in a cinema is encoded using JPEG 2000 – they
don 't use video codecs for digital cinema, but encode each frame separately,
to avoid video compression artifacts._

They also use _very expensive_ dedicated hardware to decode Motion JPEG2000 at
movie framerate, because it's far more computationally intensive than regular
JPEG or even the newer video codecs like H264/H265.

~~~
codinghorror
Even on today’s hardware? That seems unlikely to me?

~~~
janwas
IIRC J2K is about 40 MPixels/s per core. 4K60p is 498 MPixel/s :)

------
The_rationalist
I wonder if this is a reaction to the issue I opened :)

[https://github.com/google/brunsli/issues/60](https://github.com/google/brunsli/issues/60)

