A great read on dithering: Lucas Pope's development blogs while working on Return of the Obra Dinn.
It's an incredible dive into how he created the game's remarkable and unique look, featuring a wonderful and unexpected mathematical contribution from a forum member. If you're not familiar with the game peek at a trailer to see what an achievement it was.
On dithering, the original Playstation had built in support for dithering. On CRT televisions it helped provide a better looking visual and it's a huge part of the "look" of the system.
Wow exciting! I tried to use ditherit to get some baseline comparison images for my post! Is there anyway to both control the number of colors in the palette and have it auto pick colors at the same time?
That feature does not currently exist, but that is a great idea. I have added it to the list (which is just the github issue tracker, and contains no other items). I like that it auto-analyzes the palette when the images loads without requiring any further user input, but maybe an option after it loads to "auto-detect X colors" as a little dropdown thingie.
> A pipedream would be an entirely differentiable image compression pipeline where all the steps can be fine tuned together to optimize a particular image with respect to any differentiable loss function.
Unfortunately you wouldn't have any guarantees on the output of any particular image though, just some reassurances about the expected behaviour over the training set.
In terms of the decoded image, yes - it's very unlikely you would get something substantially different from the original image. But in terms of the bitrate it's not hard to find examples where the compressed bitrate can be several standard deviations above the average bitrate on the training set - see e.g. the last example here: https://github.com/Justin-Tan/high-fidelity-generative-compr...
(Lossy) neural compression methods may also synthesize small portions of an image to avoid compression artefacts associated with standard image codecs, so should definitely not be used in sensitive applications where small details can make a big difference such as security imaging, guarantees or none.
Yea neural image compression looks pretty neat! The reason I bring up jpeg is because it's so well established and if you come up with a more optimized jpeg (which I'm not really convinced is possible, again a pipedream) you don't have to force people to transition to a new image format. In the end methods like neural style or whatever comes after are probably the better pick but there is a transition period.
The thing about compression is that there is no single "more optimized" knob - there's a bunch of different tradeoffs.
Want a compression algo that can compress existing images to smaller sizes than JPEG? You can already do that with neural image compression. Want a compression algo that can decode that compressed image in 0.01 seconds? You need JPEG.
Sorry I could have been more specific. By more optimized jpeg I meant better perceptual quality (again subjective) within the confines of what a jpeg decoder could understand.
To co opt your knobs analogy I imagine each of the steps of a complex image compression pipeline comes with its own knobs each with its own tradeoffs. The dream here would be to tune all those knobs at the same time to optimize some sense of quality in a particular image. Of course huge disclaimer I’m not an image or signal processing expert. It’s also very possible that these “knobs” have been tuned well enough so that even if we optimized them for a specific image the quality difference would not be noticeable.
Depending on what is meant by entirely differentiable, this might be impossible without relaxation. ie. you can't differentiate through the quantization step
There are a couple of solutions which work empirically - as you mentioned, one solution is a dithering-like differentiable relaxation where uniform noise is added, which simulates quantization, or just to ignore the quantization operation when taking gradients, essentially treating it as an identity operation in the backward pass.
But how do you optimize the lossless encoding of the quantized latent space? ie. how do you tell the encoder to produce something that can be well encoded, given that the encoding is a bunch of discrete steps.
Usually the lossless encoding is offloaded to a standard entropy coder, e.g. arithmetic, ANS, etc. because these approach the theoretical minimum rate given by the source entropy pretty closely, so there wouldn't be a point building a fancy differentiable replacement.
That makes sense, I don't think I stated my question very clearly: how do you control/optimize the entropy of the latent space?
ie. what stops the network from laundering all of the information for reconstructing the image through a super high entropy latent space that is hard to code but allows it to reconstruct perfectly
e: I guess I should just get up to date by reading some papers
The objective function used in these lossy neural compression schemes usually takes the form of a rate-distortion Lagrangian - the rate term captures the expected length of the message needed to transmit the compressed information and the distortion term measures the reconstruction error. So it wouldn't be able to cheat like in your example, because this would incur a high value of the loss through the rate term.
It's a very interesting approach, however once you have the probability distribution for each pixel, independent random sampling produces a poor dither pattern compared to Floyd-Steinberg or other error diffusion approaches.
I think once you have the target distributions then maybe you can combine the sampling with some error diffusion approach. The idea is to make the sampling of neighboring pixels negatively correlated, so the colors average out at shorter length scale.
For a sledgehammer approach you can try to have a blur in your loss function and try to sample from the combined probability distribution of all the pixels (ie. sample whole images). It would probably make the calculation even more expensive or possibly even infeasible.
A differentiable error diffusion loss would dither the image with quantisation (like in the post), but then minimise the difference between the blurred dithered image and the blurred original, instead of the dithered image and the original. This would tend to distribute errors so that the average colour in an area is the same in the dithered image as the original, similar to Floyd-Steinberg.
Although I agree with the sentiment I am going to point out for fun that the algorithm does not have to assign probabilities less than 1. Technically a solution like Floyd steinberg produces is in the search space. You would just need the right objective to motivate it
Last time I did dithering was for Polyjet 3D printers. The problem is substantially different from what’s in the article.
The palette is fixed, as the colors are physically different materials. The amount of data is huge, an image is a layer and the complete model has thousands of layers, because 3D.
I implemented a 3D-generalization of ordered dithering https://en.wikipedia.org/wiki/Ordered_dithering The algorithm doesn’t have any data dependencies across voxels, the result only depends on source data, and position of the voxel. I did it on GPU with HLSL shaders, it takes a few seconds to produce thousands of images.
Fun. I never considered differentiable dithering before.
Would be interesting to see results using a content loss function as defined by Gatys (2015), as opposed to the L2 loss as given. That should hopefully capture more long-distance structures in the image rather than optimising each pixel independently.
This seems somewhat similar to the recently published GIFnets[1]. However, I believe GIFnets is training a reusable network to a predict palettes, and pixel assignments, while this post is focusing on optimising the "weights" (i.e. pixel values) for a single image.
I wonder if the loss functions from GIFnets could be applied to this single-image approach to potentially solve the banding problem via something a little more "perceptual" than the variance term mentioned.
That's interesting! One thing I was surprised about is that they don't address optimizing the palette and dither pattern across time (b/c most gifs are animated). This feels to me like it would be really interesting and a hard problem for traditional algorithms. They do mention it as a possibility for future work at the end tho.
They also seem to have separate losses for the palette net and the dither net instead of just adjusting both to optimize a general image quality metric (although it does look like they have some kind of perceptual loss, it's just not the only objective)
- Is this approach also learning the palette? It is kind presented as a given here but it is of course very important for a good dithering.
- The loss function might work better on spatially downsampled images. The downsampling causes a mix of the image colors making the dithered image look more like the original given a good dithering. This also naturally removes the variance that is now penalized in the loss function as this is blurred away.
This blew up while I was asleep so I’ll try my best to answer now!
1. Yes the palette is being optimized for as well which is imho what makes it different from a quantization approach
2. That’s a good point. I cite a reference blog post which does use blur in the loss function towards the end of the post. Unfortunately I think pure blur would still produce a noisy image as it would remove variance in the eyes of the loss function but not the final image. I would guess something like the example I give with purple, red, blue pixels would still be a problem for blurred loss
What are the applications for dithering these days? I understand it was needed when we had 4 or 16 or 256 color limits. But now we have 8-bit/channel displays, and 10-bit is becoming popular.
Eight and ten bits are the full range of the image, but dark scenes only use a fraction of the range and often suffer from banding (and comically bad compression artifacts on certain popular streaming services). Clean, slight gradients as a backdrop often only cover a small distance in RGB, so again, very low resolution and banding is the result.
Dithering is vital. Just like dithering is vital for audio, even at 24 bits.
Also keep in mind that the "10 bit" you speak of is implemented by dithering on an 8 bit panel in almost every display. Similarly many cheaper 8 bit displays are actually 6 bit with dithering. Additionally, 10 bit is a very rare output format [1] and rarely used by applications apart from the handful of HDR games; even for content creation applications 10 bit support is uncommon, and it actually being utilized even less common.
[1] Just because everything is output and composited in 8 bit, doesn't mean 10 bit display output is entirely for naught. If you are using hardware gamma correction, which you are when you use tools like flux/redshift/... or most ICC display profiles, then 10 bit scanout of an 8 bit framebuffer still makes sense.
There are still plenty of 1-bit displays in the world. Consumer electronics with small OLEDs, and various low-power signage have low bit depth. Just because our modern phones and laptops have high bit depth doesn’t mean that dithering goes away.
A straightforward implementation of differentiable dithering consists in applying a large support band-pass filter to the image (so that it becomes of of zero-mean), and then thresholding it at 0. Sure, you lose the property that the average colors over large regions are conserved, but the image is perfectly recognizable, even with higher contrast than the original.
Seems like the gains in pallete information is wasted in precise placement of pixels for dithering. Net loss IMHO, except for naive formats like bitmap.
Interesting nevertheless but I guess we could do better by optimizing against the storage format. But then we are at the state-of-the-art
Lower bit depth/palette encoding has not been a state of the art option for compressing natural images like this for decades, and nobody is claiming it is.
If you're doing this, it's either because your medium is limited (retro games or 8-bit equivalent embedded systems), because you can't afford the CPU power to decompress something more complex (unlikely these days), because lower bit depth is ideal for the rest of your image (e.g. largely UI graphics with no gradients, and just a few small graphics), or because you just don't care.
But given those reasons exist, there is value in researching better dithering algorithms. Also, to some extent, these things also apply to non-palette formats (dithering to lower bit depths), and that is still relevant today when e.g. converting HDR content to typical 8bpc (24bpp) formats.
It's an incredible dive into how he created the game's remarkable and unique look, featuring a wonderful and unexpected mathematical contribution from a forum member. If you're not familiar with the game peek at a trailer to see what an achievement it was.
https://forums.tigsource.com/index.php?topic=40832.msg136374...