If you turn a txt file with a list of "common surnames -> percentage of population" into a binary tree it will get more or less an order of magnitude bigger in memory compared to the raw txt file.
This is a common pattern: when you add a lot of metadata, for fast access, memory management, "zero-copy" transmission of this information, expires, ..., the size is not going to be the one of concatenating all this data like in a unique string.
If we want we can save more memory per key, with some tradeoff. Example: instead of having an hash table where every hash slot is a pointer (holding the key) -> pointer (holding the value) map, we can take the hash and use it as a unique string with: <len>key<ptr><len>key<ptr>.... for all the colliding strings. Than tune the hash table in order to have on average 5 or 10 collisions per hash slot.
That is: prefixed length as string and pointer (that is of fixed size).
Maybe we'll do this in future, and in Redis master we are doing a lot of this in the "value" side of the business. So if you store a list of 10 elements it will use an amount of memory very similar to storing the same bytes as a comma separated value.
But for now our reasoning is: it's not bad to be able to store 1 million of keys with less than 200 MB of memory (100 MB on 32bit systems) if an entry level box is able to serve this data at the rate of 100k requests/second, including the networking overhead. And with hashes we have a much better memory performance compared to top level keys. So... with a few GB our users can store ten or hundred of millions of stuff in a Redis server.
Better to focus on stability and clustering for now... and when we'll have a system that can scale transparently we'll focus on memory overhead to check if it's still important to reduce it, or since we want to be a very high performance DB , if it's just better to accept the tradeoff.