Hacker News new | past | comments | ask | show | jobs | submit login
Parallel streaming of progressive images (cloudflare.com)
59 points by soheilpro 7 days ago | hide | past | web | favorite | 13 comments

> There are next-generation image formats that support progressive rendering better than JPEG, and compress better than WebP, but they're not supported in web browsers yet.

Is this now standard, do they all do it or just some of them?

Unfortunately, there's no format yet that's both significantly better, and somewhat standard.

There are PIK, FLIF and FUIF, all of which have very nice progressive loading, but they're totally experimental with no plan in sight to get them standardized.

The closest to becoming standard (or de-facto standard) is AVIF, AV1-based HEIF sibling. The compression there is great: half size of WebP, so regardless whether it's progressive or not, it's going to load fast.

In the meantime there's still room to improve rendering of progressive JPEG. The blockiness of early progressive stages is mostly an artifact of implementation in libjpeg. The smoothing could be much better, so it can look more elegant.

> It's going to load fast

Minor nitpick: on kinda-average U.S. home internet, with a medium-low power laptop decoding in software, the actual time to load an image could be a wash because of decode time, even with the dramatically-improved software AV1 decoders.

With newer codecs the CPU becomes a factor indeed. This mostly makes JS/WASM polyfills for these formats unattractive.

However, using native decoders with proper SIMD optimizations shouldn't be too bad. dav1d achieves 100fps in full HD on SSE3 CPUs and A11. Only mass-market Android phones are really worrying, so it may be possible to do content negotiation that is aware of trade-off between the device and network speed.

> dav1d achieves 100fps in full HD on SSE3 CPUs and A11

That's with multiple threads, the actual frame latency is much worse than 1/100s on a typical SSE3 CPU, if you're just decoding a single frame/image on a single thread. Decoding each image with multiple threads, you'll run into issues with throughput.

That's not to say there aren't times it's worth it, and hardware decoders will start to become a reasonable way to do this (especially on Linux/Chrome OS, where it should now take less time to wire things like this up thanks to v4l2 m2m devices), but regardless there will continue to be a crossover between decode-dominant and network-dominant image loading.

What's needed to be able to do this with our own nginx servers? The article is somewhat light on how it works on Cloudflare's servers.

We had to do major changes to nginx's HTTP/2 implementation. The original implementation is designed to pipe through all data as quickly as it can (read everything that is available, and send it down immediately), which makes prioritization and coordination between multiple responses impossible.

We have a separate blog on how we did it coming.

Cloudflare's blog posts' quality is top notch

The main reason not to use progressive jpegs is that they are much more CPU intensive to render.

Each pixel on the screen will be updated many times during the rendering process.

With bandwidth increasing faster than CPU speeds, their use started declining in the early 2000's, and pretty much no software uses progressive jpegs by default now.

Images don't have to be updated many times if the data arrives fast enough. When re-rendering is throttled, you get progressive rendering only when the network is relatively slow, and don't spend extra CPU time when the data arrives quickly.

Even though progressive is more costly than baseline, they're both relatively cheap. JPEG was designed when CPUs were clocked at 25Mhz.

Progressive JPEGs are significantly smaller (thanks to having a Huffman table per scan), and they're the default in all image optimization tools and services (MozJPEG, ImageOptim, Cloudflare's Polish, etc.).

For a long time progressive JPEGs had a slow path in libjpeg-turbo, but this has been improved recently, and there's still room for more optimization. For example, a DC-only preview could be decoded and rendered very cheaply (approx 1/64th cost in CPU and memory).

We're planning to contribute to libjpeg to make progressive rendering more optimized.

> When re-rendering is throttled, you get progressive rendering only when the network is relatively slow,

I suspect this is the issue... In a modern multi-process browser and OS, it's exceptionally hard to know if it's worth making a low-res render, or if there is more data sitting in kernel/application buffers somewhere which hasn't yet made it to the renderer due to CPU pressure. For example, if you have an event queue for un-gzipping and another for jpeg decompressing, you should never do any jpeg work unless the gzip queue is empty.

Every event queue in the system needs to correctly prioritize every step in the rendering process, otherwise work is wasted rendering a progressive jpeg more times than necessary.

Buffering actually works in your favor, because when the client reads the buffered data, it's more likely to get the whole file in one go.

Browsers already have delays and throttling for painting and relayout, because incremental rendering of HTML and DOM updates create lots of similar problems.

I don't know if browsers do it, but moving progressive image decoding to a very-low-priority thread would be an easy solution that solves 99% of the problem (nothing gets blocked, you get as much progressive rendering as you can spare CPU time).

It would be hard to be absolutely optimal about this, but my point is that in practice it's one or two extra renders, not O(n^2) you'd expect from synchronous redecoding on every byte received.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact