It's an incredible dive into how he created the game's remarkable and unique look, featuring a wonderful and unexpected mathematical contribution from a forum member. If you're not familiar with the game peek at a trailer to see what an achievement it was.
It's open source:
Neural Image Compression? https://arxiv.org/abs/1908.08988
You don't have any guarantees with this non-convex optimization.
I think most of these methods would work OK on out-of-domain data.
(Lossy) neural compression methods may also synthesize small portions of an image to avoid compression artefacts associated with standard image codecs, so should definitely not be used in sensitive applications where small details can make a big difference such as security imaging, guarantees or none.
Unrelated, but I actually recognize your name from Github - I guess deep image compression is a pretty small space.
The thing about compression is that there is no single "more optimized" knob - there's a bunch of different tradeoffs.
Want a compression algo that can compress existing images to smaller sizes than JPEG? You can already do that with neural image compression. Want a compression algo that can decode that compressed image in 0.01 seconds? You need JPEG.
To co opt your knobs analogy I imagine each of the steps of a complex image compression pipeline comes with its own knobs each with its own tradeoffs. The dream here would be to tune all those knobs at the same time to optimize some sense of quality in a particular image. Of course huge disclaimer I’m not an image or signal processing expert. It’s also very possible that these “knobs” have been tuned well enough so that even if we optimized them for a specific image the quality difference would not be noticeable.
Depending on what is meant by entirely differentiable, this might be impossible without relaxation. ie. you can't differentiate through the quantization step
ie. what stops the network from laundering all of the information for reconstructing the image through a super high entropy latent space that is hard to code but allows it to reconstruct perfectly
e: I guess I should just get up to date by reading some papers
I think once you have the target distributions then maybe you can combine the sampling with some error diffusion approach. The idea is to make the sampling of neighboring pixels negatively correlated, so the colors average out at shorter length scale.
For a sledgehammer approach you can try to have a blur in your loss function and try to sample from the combined probability distribution of all the pixels (ie. sample whole images). It would probably make the calculation even more expensive or possibly even infeasible.
The palette is fixed, as the colors are physically different materials. The amount of data is huge, an image is a layer and the complete model has thousands of layers, because 3D.
I implemented a 3D-generalization of ordered dithering https://en.wikipedia.org/wiki/Ordered_dithering The algorithm doesn’t have any data dependencies across voxels, the result only depends on source data, and position of the voxel. I did it on GPU with HLSL shaders, it takes a few seconds to produce thousands of images.
Would be interesting to see results using a content loss function as defined by Gatys (2015), as opposed to the L2 loss as given. That should hopefully capture more long-distance structures in the image rather than optimising each pixel independently.
This seems somewhat similar to the recently published GIFnets. However, I believe GIFnets is training a reusable network to a predict palettes, and pixel assignments, while this post is focusing on optimising the "weights" (i.e. pixel values) for a single image.
I wonder if the loss functions from GIFnets could be applied to this single-image approach to potentially solve the banding problem via something a little more "perceptual" than the variance term mentioned.
: "GIFnets: Differentiable GIF Encoding Framework" https://arxiv.org/abs/2006.13434
- Is this approach also learning the palette? It is kind presented as a given here but it is of course very important for a good dithering.
- The loss function might work better on spatially downsampled images. The downsampling causes a mix of the image colors making the dithered image look more like the original given a good dithering. This also naturally removes the variance that is now penalized in the loss function as this is blurred away.
1. Yes the palette is being optimized for as well which is imho what makes it different from a quantization approach
2. That’s a good point. I cite a reference blog post which does use blur in the loss function towards the end of the post. Unfortunately I think pure blur would still produce a noisy image as it would remove variance in the eyes of the loss function but not the final image. I would guess something like the example I give with purple, red, blue pixels would still be a problem for blurred loss
Dithering is vital. Just like dithering is vital for audio, even at 24 bits.
Also keep in mind that the "10 bit" you speak of is implemented by dithering on an 8 bit panel in almost every display. Similarly many cheaper 8 bit displays are actually 6 bit with dithering. Additionally, 10 bit is a very rare output format  and rarely used by applications apart from the handful of HDR games; even for content creation applications 10 bit support is uncommon, and it actually being utilized even less common.
 Just because everything is output and composited in 8 bit, doesn't mean 10 bit display output is entirely for naught. If you are using hardware gamma correction, which you are when you use tools like flux/redshift/... or most ICC display profiles, then 10 bit scanout of an 8 bit framebuffer still makes sense.
Horrible huge squares/rectangles of slightly different black all over the place.
If you're doing this, it's either because your medium is limited (retro games or 8-bit equivalent embedded systems), because you can't afford the CPU power to decompress something more complex (unlikely these days), because lower bit depth is ideal for the rest of your image (e.g. largely UI graphics with no gradients, and just a few small graphics), or because you just don't care.
But given those reasons exist, there is value in researching better dithering algorithms. Also, to some extent, these things also apply to non-palette formats (dithering to lower bit depths), and that is still relevant today when e.g. converting HDR content to typical 8bpc (24bpp) formats.