
Parallel streaming of progressive images - soheilpro
https://blog.cloudflare.com/parallel-streaming-of-progressive-images/
======
ZeroGravitas
> There are next-generation image formats that support progressive rendering
> better than JPEG, and compress better than WebP, but they're not supported
> in web browsers yet.

Is this now standard, do they all do it or just some of them?

~~~
pornel
Unfortunately, there's no format yet that's both significantly better, and
somewhat standard.

There are PIK, FLIF and FUIF, all of which have very nice progressive loading,
but they're totally experimental with no plan in sight to get them
standardized.

The closest to becoming standard (or de-facto standard) is AVIF, AV1-based
HEIF sibling. The compression there is great: half size of WebP, so regardless
whether it's progressive or not, it's going to load fast.

In the meantime there's still room to improve rendering of progressive JPEG.
The blockiness of early progressive stages is mostly an artifact of
implementation in libjpeg. The smoothing could be much better, so it can look
more elegant.

~~~
microcolonel
> _It 's going to load fast_

Minor nitpick: on kinda-average U.S. home internet, with a medium-low power
laptop decoding in software, the actual time to load an image could be a wash
because of decode time, even with the dramatically-improved software AV1
decoders.

~~~
pornel
With newer codecs the CPU becomes a factor indeed. This mostly makes JS/WASM
polyfills for these formats unattractive.

However, using native decoders with proper SIMD optimizations shouldn't be too
bad. dav1d achieves 100fps in full HD on SSE3 CPUs and A11. Only mass-market
Android phones are really worrying, so it may be possible to do content
negotiation that is aware of trade-off between the device and network speed.

~~~
microcolonel
> _dav1d achieves 100fps in full HD on SSE3 CPUs and A11_

That's with multiple threads, the actual frame latency is much worse than
1/100s on a typical SSE3 CPU, if you're just decoding a single frame/image on
a single thread. Decoding each image with multiple threads, you'll run into
issues with throughput.

That's not to say there aren't times it's worth it, and hardware decoders will
start to become a reasonable way to do this (especially on Linux/Chrome OS,
where it should now take less time to wire things like this up thanks to v4l2
m2m devices), but regardless there will continue to be a crossover between
decode-dominant and network-dominant image loading.

------
pbowyer
What's needed to be able to do this with our own nginx servers? The article is
somewhat light on how it works on Cloudflare's servers.

~~~
jgrahamc
We have a separate blog on how we did it coming.

~~~
tuananh
Cloudflare's blog posts' quality is top notch

------
londons_explore
The main reason _not_ to use progressive jpegs is that they are much more CPU
intensive to render.

Each pixel on the screen will be updated many times during the rendering
process.

With bandwidth increasing faster than CPU speeds, their use started declining
in the early 2000's, and pretty much no software uses progressive jpegs by
default now.

~~~
pornel
Images _don 't have to_ be updated many times if the data arrives fast enough.
When re-rendering is throttled, you get progressive rendering only when the
network is relatively slow, and don't spend extra CPU time when the data
arrives quickly.

Even though progressive is more costly than baseline, they're both relatively
cheap. JPEG was designed when CPUs were clocked at 25Mhz.

Progressive JPEGs are significantly smaller (thanks to having a Huffman table
per scan), and they're the default in all image optimization tools and
services (MozJPEG, ImageOptim, Cloudflare's Polish, etc.).

For a long time progressive JPEGs had a slow path in libjpeg-turbo, but this
has been improved recently, and there's still room for more optimization. For
example, a DC-only preview could be decoded and rendered very cheaply (approx
1/64th cost in CPU and memory).

We're planning to contribute to libjpeg to make progressive rendering more
optimized.

~~~
londons_explore
> When re-rendering is throttled, you get progressive rendering only when the
> network is relatively slow,

I suspect this is the issue... In a modern multi-process browser and OS, it's
exceptionally hard to know if it's worth making a low-res render, or if there
is more data sitting in kernel/application buffers somewhere which hasn't yet
made it to the renderer due to CPU pressure. For example, if you have an event
queue for un-gzipping and another for jpeg decompressing, you should never do
any jpeg work unless the gzip queue is empty.

Every event queue in the system needs to correctly prioritize every step in
the rendering process, otherwise work is wasted rendering a progressive jpeg
more times than necessary.

~~~
pornel
Buffering actually works in your favor, because when the client reads the
buffered data, it's more likely to get the whole file in one go.

Browsers already have delays and throttling for painting and relayout, because
incremental rendering of HTML and DOM updates create lots of similar problems.

I don't know if browsers do it, but moving progressive image decoding to a
very-low-priority thread would be an easy solution that solves 99% of the
problem (nothing gets blocked, you get as much progressive rendering as you
can spare CPU time).

It would be hard to be absolutely optimal about this, but my point is that in
practice it's one or two extra renders, not O(n^2) you'd expect from
synchronous redecoding on every byte received.

