So it's using a simple Median Cut quantization... which is okay, but you can get great results using an octree quantizer, and without dithering. Dithering kills your compression savings in GIF so you'd want to avoid it if possible. In any case, you can see the difference a better quantizer makes here:
Any method that makes axis-aligned subdivisions (which includes both standard median cut and octree), even in a space like HSV, is going to be pretty suboptimal. There's probably been more sophisticated things done since I looked at this in the 1990's, but Xiaolin Wu's 1992 approach [1] works quite a bit better than both (including the 1991 Wu v2 implementation of an octree method linked elsewhere in this thread). It's based on recursive PCA, dynamic programming to assign colors along the principal axis, and K-means refinement in CIE L* u* v* coordinates. Not the fastest algorithm, but the topic is "high quality".
And you can still dither the image without completely destroying compression. You just have to use an ordered dither [2].
[1] Xiaolin Wu, "Color Quantization by Dynamic Programming and Principal Analysis", ACM Transactions on Graphics 11(4): 348-372, 1992. https://dl.acm.org/citation.cfm?id=146475
Aligned subdivisions are not that much of a problem actually.
I've tested median cut that can cut at various angles and variants that cut out spheres, and it didn't make much difference.
Choice of the box to cut and the location where you cut is more important. Wu's method is nice in this case because it exactly measures variance of sets after a split, while median cut just estimates.
However, Wu's method needs lookup tables for the whole color (hyper)cube, which requires either large (or for RGBA still ridiculous) amount of RAM or reduced precision, and losing bits of input is way worse than having suboptimal division.
The biggest wins I've found (and implemented in libimagequant) were:
• Voronoi iteration after generating palette (like a final step in LBG quantization). This shifts axis-aligned subdivisions to local optimum (I've tried applying that after every subdivision, but it didn't matter — once in postprocessing is enough!)
• Adjust weights/counts in histogram from quality feedback measured from remapped image. If you're aiming to minimize MSE, then that's not too expensive and it's similar to gradient descent.
> However, Wu's method needs lookup tables for the whole color (hyper)cube,
You're talking about the 1991 method. The 1992 paper I cited explicitly claims as one of its advantages that it does not require lookup tables (and thus avoids the usual step of pre-quantizing down to 5 bit color to get a reasonably-sized table).
Your two big wins sound an awful lot like "K-means". I'd believe they are responsible for the bulk of the improvement (they're also the bulk of the computation), but they mean you don't have axis-aligned partitioning anymore, which was my point.
The problem is that K-means is notoriously sensitive to the initial clustering. For unsupervised learning, I'd want to do as good a job there as possible, and applying PCA means you are picking the split direction that exactly minimizes variance in the orthogonal directions. Random restarts are also a nice way to escape local minima, but are even more expensive computationally.
For apng2gif I modified Wu's method to use 64x64x64 cube (instead of original 32x32x32), and the results are much better. It doesn't require too much RAM, especially compared with what was available in 1992.
Ironically, that ACM paper is a scan of a printed out paper, with half-toned images, ruining the comparison images. Is there a purely digital version of that paper out there?
The median cut quantization with a few tweaks (using the the variance typically - see the post for more) actually gives very good results. According to your post, it seems it will not provide much better results with Octree, if any. That calls for testing though (I won't but feel free to, the infrastructure is ready for you to hack in), as improvements are very welcome.
Concerning the dithering, you can disable it (dither=none). And indeed, it kills the compression, mainly when using animation, that's why I added the diff_mode option typically (see at the end of the post).
(most implementations just naively discard 2-3 bits, because of RAM limits in '80-'90s)
Might it also have something to do with earlier VGA DACs having 6 bits per color channel? I seem to remember that newer video cards required setting a separate register to enable eight-bit-per-channel palette entries.
Not as much as err-diff if you use custom-palette ordered dithering, since the ordering stays the same from frame to frame the transparent-pixel optimization applies, unlike error diffusion dithering.
It's what photoshop uses to create optimized gif, although their particular approach is patented alternative algorithms exist, e.g. http://bisqwit.iki.fi/story/howto/dither/jy/
this image is actually a pretty poor example of quantization, it's too easy. full color gradients give a much better indicator of behavior, quality and performance.
That is pretty amazing, thanks for sharing. I am interested in different quantization methods, if only for screenshot usage ( I am working on a remote control product, and the quantization used to be quite poor before I added in octree ). Of course, most of my code is in C# but I can at least take a look at the algorithms, and if they are MIT licensed, I can see what it would take to convert them.
I once won a code golf "popularity contest" competition that was about dithering a grayscale image. The algorithm was perhaps interesting. More so my language of choice: Fortran.
Wu v2 is probably the best hands-off quantizer that I have seen. It's not the fastest and probably too resource intensive for embedded apps, but it is remarkably good. I would recommend it (or anything based on it) without hesitation.
I'm actually in the process of porting it to js using typed arrays and possibly WebGL/GLSL. You can compile it pretty easily to asm.js for max pure-js perf via Emscripten, but I don't know how clean its API/interface will be and if it will leak globals based on how the code is structured [1]
Interestingly, both of the quantized versions appear slightly brighter in all colors than the original. It's easier to see in the dithered image, but also visible in the undithered image.
http://www.leptonica.com/color-quantization.html