After some more reading, I concluded that authors did think about this possibility but wrote a wrong code.
I have mentioned that there was the hard-coded maximum number of entries. This is derived from zlib's enough.c [1], which determines the maximum possible size of 2-level table for given number of symbols, N and the maximum allowed bit length (here 15). I've verified that those numbers indeed come from this program:
for ((color_cache_size = 0; color_cache_size < 12; ++color_cache_size)); do
# Note that color_cache_size = 0 entirely disables the color cache, so no symbols
./enough $((256 + 24 + (($color_cache_size > 0) << $color_cache_size))) 8 15
done
So at the worst case, there are 256 + 24 + 2^11 = 2328 symbols possible, and the maximum table size is 2704. (Caveat: I couldn't verify values for color_cache_size >= 8 with the current version of enough.c. Probably the value was calculated with an alternative implementation using bigints.) So this bug cannot be found if any randomly constructed Huffman tree was thrown!
But enough.c states that it covers "all possible valid and complete prefix codes". In the other words it assumes that the Huffman tree has been already verified to be correct. If it's not the case, it is very easy to make much worse cases. For example `enough.c 280 8 15` returns 654, which is possible with the following tree:
So the real issue here is that the lack of tree validation before the tree construction, I believe. I'm surprised that this check was not yet implemented (I actually checked libwebp to make sure that I wasn't missing one). Given this blind spot, an automated test based on the domain knowledge is likely useless to catch this bug.
I have mentioned that there was the hard-coded maximum number of entries. This is derived from zlib's enough.c [1], which determines the maximum possible size of 2-level table for given number of symbols, N and the maximum allowed bit length (here 15). I've verified that those numbers indeed come from this program:
So at the worst case, there are 256 + 24 + 2^11 = 2328 symbols possible, and the maximum table size is 2704. (Caveat: I couldn't verify values for color_cache_size >= 8 with the current version of enough.c. Probably the value was calculated with an alternative implementation using bigints.) So this bug cannot be found if any randomly constructed Huffman tree was thrown!But enough.c states that it covers "all possible valid and complete prefix codes". In the other words it assumes that the Huffman tree has been already verified to be correct. If it's not the case, it is very easy to make much worse cases. For example `enough.c 280 8 15` returns 654, which is possible with the following tree:
But the following partial tree should be able to reach 768 entries: So the real issue here is that the lack of tree validation before the tree construction, I believe. I'm surprised that this check was not yet implemented (I actually checked libwebp to make sure that I wasn't missing one). Given this blind spot, an automated test based on the domain knowledge is likely useless to catch this bug.[1] https://github.com/madler/zlib/blob/master/examples/enough.c