Is this now standard, do they all do it or just some of them?
There are PIK, FLIF and FUIF, all of which have very nice progressive loading, but they're totally experimental with no plan in sight to get them standardized.
The closest to becoming standard (or de-facto standard) is AVIF, AV1-based HEIF sibling. The compression there is great: half size of WebP, so regardless whether it's progressive or not, it's going to load fast.
In the meantime there's still room to improve rendering of progressive JPEG. The blockiness of early progressive stages is mostly an artifact of implementation in libjpeg. The smoothing could be much better, so it can look more elegant.
Minor nitpick: on kinda-average U.S. home internet, with a medium-low power laptop decoding in software, the actual time to load an image could be a wash because of decode time, even with the dramatically-improved software AV1 decoders.
However, using native decoders with proper SIMD optimizations shouldn't be too bad. dav1d achieves 100fps in full HD on SSE3 CPUs and A11. Only mass-market Android phones are really worrying, so it may be possible to do content negotiation that is aware of trade-off between the device and network speed.
That's with multiple threads, the actual frame latency is much worse than 1/100s on a typical SSE3 CPU, if you're just decoding a single frame/image on a single thread. Decoding each image with multiple threads, you'll run into issues with throughput.
That's not to say there aren't times it's worth it, and hardware decoders will start to become a reasonable way to do this (especially on Linux/Chrome OS, where it should now take less time to wire things like this up thanks to v4l2 m2m devices), but regardless there will continue to be a crossover between decode-dominant and network-dominant image loading.
Each pixel on the screen will be updated many times during the rendering process.
With bandwidth increasing faster than CPU speeds, their use started declining in the early 2000's, and pretty much no software uses progressive jpegs by default now.
Even though progressive is more costly than baseline, they're both relatively cheap. JPEG was designed when CPUs were clocked at 25Mhz.
Progressive JPEGs are significantly smaller (thanks to having a Huffman table per scan), and they're the default in all image optimization tools and services (MozJPEG, ImageOptim, Cloudflare's Polish, etc.).
For a long time progressive JPEGs had a slow path in libjpeg-turbo, but this has been improved recently, and there's still room for more optimization. For example, a DC-only preview could be decoded and rendered very cheaply (approx 1/64th cost in CPU and memory).
We're planning to contribute to libjpeg to make progressive rendering more optimized.
I suspect this is the issue... In a modern multi-process browser and OS, it's exceptionally hard to know if it's worth making a low-res render, or if there is more data sitting in kernel/application buffers somewhere which hasn't yet made it to the renderer due to CPU pressure. For example, if you have an event queue for un-gzipping and another for jpeg decompressing, you should never do any jpeg work unless the gzip queue is empty.
Every event queue in the system needs to correctly prioritize every step in the rendering process, otherwise work is wasted rendering a progressive jpeg more times than necessary.
Browsers already have delays and throttling for painting and relayout, because incremental rendering of HTML and DOM updates create lots of similar problems.
I don't know if browsers do it, but moving progressive image decoding to a very-low-priority thread would be an easy solution that solves 99% of the problem (nothing gets blocked, you get as much progressive rendering as you can spare CPU time).
It would be hard to be absolutely optimal about this, but my point is that in practice it's one or two extra renders, not O(n^2) you'd expect from synchronous redecoding on every byte received.