(2) you need to preserve the lexicograhpic order of your elements - something that a hash table won't give you
(3) you are afraid of DoS attacks or maybe just "bad" data - hash tables can be attacked easily if the implementation is known
(4) you don't care about memory usage that much.
I compared a simple 8-bit radix tree with some standard hash table implementation - the former took roughly ten times more memory. I then changed my radix to be based on 4 bits (each char is just split into 2 parts) and the memory usage improved twice. Now I'm wondering if radix tries have more room for improvement.
So this is something to consider.
And thanks for the article - never heard of Aguri, it's beautiful.
Upd. sorted lists basically give you same advantages except for fast inserts
Compressed binary radix tries (also known as PATRICIA trees) give you excellent memory usage characteristics and pretty much optimal search time characteristics: both wins come from only storing the edges corresponding to significant bit differences.
Since hash tables by necessity have to allocate enough storage to keep table load down and minimize collisions, you could find a PATRICIA tree has even better memory footprint than whatever hash table you use.
Of course, all data structures share the DoS problem. There's an input to a PATRICIA tree that pessimizes lookup and memory usage: the one that creates a full 32 bits of fanout for every key. In all cases, the attack degrades performance to linear search.
Of all the "mainstream" data structures, balanced binary trees are probably the hardest to attack this way.
both wins come from only storing the edges corresponding to significant bit differences.
Of course I used compressed suffixes, but it didn't help much as compared with hash tables.
you could find a PATRICIA tree has even better memory footprint than whatever hash table you use
I don't think you can demonstrate that Patricia has a better memory footprint than a good hashtable. In fact best radix trees are much worse.
Of course, all data structures share the DoS problem.
You can attack radix trees in terms of memory usage, while hash tables are prone to serious performance attacks.
In all cases, the attack degrades performance to linear search.
Not true. I don't think you understand these algorithms and O(n) complexity very well.
The question how a hash table degrades when attacked highly depends on how exactly conflict resolution is implemented. Among other methods is, for example, expansion and rehashing, and you can't tell for sure if it degrades to linear search or not.
And radix never degrades to linear search - never.
Of all the "mainstream" data structures, balanced binary trees are probably the hardest to attack this way.
They are prone to a very specific attack when you make the tree to re-balance often.
You're ignoring table load factor in your memory comparison, you can attack any dictionary "in terms of memory", I conceded the O(n)/O(k) lookup thing before you wrote this, and why do you think "balancing" a red-black tree is expensive?
You're ignoring table load factor in your memory comparison
I believe I don't ignore anything, I'm measuring precise memory usage by two identical programs doing same thing but based on different algorithms. I also believe I got the most out of radix in my tests. The 4-bit version of radix wasn't trivial, but it was reasonably fast and the mem usage was twice as better than the 8-bit one, as I said before.
But even speculatively, on paper if you wish, radix consumes more memory than some good hash table implementation.
Upd. forgot to mention, I did some more optimizations as well, like compressing the pointer tables in tree nodes.
I'm sorry, you're talking about a totally different algorithm than I am either here or in the article you're responding to. There's no such thing as a "4 bit PATRICIA" trie. Obviously, if you want to burn memory, you can increase the arity of your tree nodes to "speed up" (heh) fetches.
(1) you need fast lookup and fast inserts
(2) you need to preserve the lexicograhpic order of your elements - something that a hash table won't give you
(3) you are afraid of DoS attacks or maybe just "bad" data - hash tables can be attacked easily if the implementation is known
(4) you don't care about memory usage that much.
I compared a simple 8-bit radix tree with some standard hash table implementation - the former took roughly ten times more memory. I then changed my radix to be based on 4 bits (each char is just split into 2 parts) and the memory usage improved twice. Now I'm wondering if radix tries have more room for improvement.
So this is something to consider.
And thanks for the article - never heard of Aguri, it's beautiful.
Upd. sorted lists basically give you same advantages except for fast inserts