Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: SQZ – Low complexity, scalable lossy to lossless image compression (github.com/marciopais)
7 points by mpais 8 months ago | hide | past | favorite | 4 comments
I'd like to share a little toy project of mine, a really simple image codec that can do lossy to full lossless image compression with complete scalability at a byte level granularity - you can compress an image just once, even fully losslessly if needed, and then get any lossy version possible by simply stopping decompression at any offset in the compressed data.

This "encode onde, serve many" approach is especially interesting for providing downscaled low quality image previews (LQIP) in as tight a storage budget as possible, and then allowing seamless, transparent refinement as deemed necessary, without reencoding or needing to store multiple lower quality/resolution versions of the same image.

To better serve that purpose, the codec is tuned to provide as much detail as it can as soon as possible in the bitstream, so that downscaling will still produce sharp, good looking previews.

Features: - Support for 8bbp grayscale images, and 3 internal colorspaces (YCoCg-R, Oklab, logl1) for color images - 5/3 integer reversible DWT, 4 possible scan orders based on different space-filling curves (raster, snake, Morton/Z-order and Hilbert) - No entropy coding, just bitplane significance coding using the Wavelet Difference Reduction (WDR) method, followed by progressive refinement

Some sample images, comparison to JPEG-XL and previous discussion can be found at https://encode.su/threads/4183-SQZ-Low-complexity-scalable-l...

Any feedback, suggestions or questions would be most welcome, thanks.




Hey this is pretty cool! I’m new to this field, so it may be a rather amateur question, but I’ve noticed your codex ends up with slightly larger file sizes (compared to JPEG XL) at what I (on average) agree is better visually subjective quality. Would it be possible to tune that down even further without altering the perceptive quality?


The codec is extremely simple (hence why it's relatively fast even with such unoptimized code) and as such doesn't even use any entropy coding backend, which is what allows for the incredible scalability. Now, obviously, that means that when going for high quality output, it can't really match JPEG-XL, but that's just a tradeoff - using context-modelling would allow it to beat JXL, but then you couldn't really have byte-granularity scalability. So you can tell it to match the exact same file size of your JXL image, and compare both.

At nearly perceptually lossless quality levels, even if it doesn't match JXL, it usually is quite acceptable also, and at extremelly high compression ratios (think over 250:1) it's usually better than JXL, which is important, otherwise the scalability would be a moot point.

It was designed with a particular use case in mind - simplifying the process of serving visual content in a responsive environment. Instead of encoding the same image at 4 or 5 different resolutions and then serving the appropriate one based on the requested specifications - which usually would mean either storing those multiple versions or reencoding on-the-fly on an as-needed basis - you encode the image only once, at a quality you deem matches your maximum requirement; it can even be completely lossless.

Then when you receive a request for an image at a lower quality/resolution, you can simply send only the first 1/10 of the file, or 1/8, or however you like. You can literally just truncate the file at a random offset and when you decode it, you just get a lower quality version of the image, but which is still exactly the best the codec could have done at that file size anyway. The requesting party can then simply downscale the lower quality image, hence why it's important to have good details even at high compression ratios, so that the downscaled version will look good. The format allows for all that, even progressive enhancement, but the provided library for now doesn't support all of this.

There are other formats with similar scalability, but they're usually a lot more complex - SQZ is basically just a fancy run-length codec, in a single file header for C/C++.


Thank you for the write up, this is some amazing stuff! It all sounds like magic to me, and that just makes me want to understand it more :)

So the use case is, if I‘ve understood, to simplify serving content in “responsive” environments. What environments do you see this benefiting? I assume web development, particularly e-commerce where images are crucial. However, what’s the overhead for this? How well does this “encode once, read many” style scale compare to the typical “encode many, read many” in terms of server resources?


> What environments do you see this benefiting?

Anywhere where you have visual content that needs to be "consumed" in a large variety of possible viewings conditions, from low to extremelly high DPI 5 or 6-inch smartphone screens, to 10-inch tablets, to 60-inch TVs, and all at possibly variable quality.

Suppose your application uses a carousel-like UI for viewing photographs, so you need small (say scaled with correct aspect ratio to fit in 128px by 128px) thumbnails for the scrollable list at the bottom, a preview (1024px) for the center of the UI, and then the actual full size image for when the user clicks the preview. Codecs designed for progressive transmission allow you to use just one high-quality version of each image and obtain lower quality versions of those by simply sending only part of that file, or, conversely, allow you to refine the preview you've already received by simply sending more bits from the complete file, without the need to retransmit everything again. So you don't need 3 or however many versions of the same image encoded in different files with a format that doesn't have this scalability.

> However, what’s the overhead for this? How well does this “encode once, read many” style scale compare to the typical “encode many, read many” in terms of server resources?

That's the thing, you reduce the overhead with these scalable formats, either by reducing storage cost by not having to store multiple files per image, or computational costs if you planned on just transcoding on-the-fly to a lower quality and serving that, to a bit of both as you'd expect from a CDN that caches transcoded versions.

This isn't to say that such scalable formats are without downsides. The server still needs to know how much of the full quality image to serve based on the requested client needs, but this is usually trivial and of much less computational costs. The main questions are always if the compression ratio is heavily impacted or not, and whether the perceived visual quality remains acceptable when extracting heavily compressed representations for downscaled viewing, i.e., what is the useful range of this scalability. That is why I tuned SQZ to attempt to preserve as much detail as possible early on in its bitstream, so that when downscaling you still get good looking images. No downscaler is magically going to recover the details that were blured out by the heavy filtering that modern codecs do.

If you check out the discussion in the forum I linked in the description (don't be scared by its TLD, it's a great resource in this field, an old "hidden gem" in today's frivolous web) you'll find example images and analysis describing such questions.

And please bear in mind that SQZ is just a little fun thought experiment that I decided to implement, it has no grandiose ambitions. But if you need something like this for a quick side project, and don't want to use a 250k LOC library with who knows how many dependencies, then maybe try it out and let me know what you think.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: