The 4cc gets a decent distribution in that you don't have to examine very many c...

lmm · 2024-12-17T04:03:24 1734408204

> A poor distribution is an obvious bug in hashing; if you don't suffer from that bug, you don't have to do anything.

Right, but if you don't have and solve that problem then what you have made isn't a hash table. Often you don't need a hash table - if you have something that already has a nice distribution, you can use a simpler data structure (like, IDK, a radix tree) and get all the properties you wanted.

> The inventors hit a key insight there: that any way of calculating a hash code is valid as long as it can be consistently followed, and leads to short searches.

If they did, then I would agree you're right. But do we know that they did? Or might they have seen it as just a different way of considering radicals? (E.g. did they ever try indexing anything else that way, not just characters?)

kazinator · 2024-12-17T04:36:45 1734410205

Note that a radix tree and hash table are not mutually exclusive. A radix tree is a way of representing a sparse table. That could be used as a hash table. There's a trade off there because if the table is very sparse, and we're using hashing, we could just shrink the table so as not to have it so sparse, and then just make it a regular array.

The key aspect of the four corner code is that it mashes together completely unrelated characters. There's no meaningful index to it. It's not easy to look at a four corner code to figure out the list of characters it aliases for.