
The Trie: A Neglected Data Structure - bbeneschott
http://www.toptal.com/java/the-trie-a-neglected-data-structure
======
tmoertel
The original article uses Java, which makes it harder to see how simple it can
be to make and use prefix trees. For comparison, here's the Python code to
create a prefix tree from a set of words:

    
    
        def prefix_tree(words):
            d = dict()
            for word in words:
                root = d
                for c in word:
                    root = root.setdefault(c, {})
                root['$$'] = {}  # mark end of word with '$$' token
            return d
    

For example:

    
    
        >>> prefix_tree('foo bar baz'.split())
        {'b': {'a': {'r': {'$$': {}},
                     'z': {'$$': {}}}},
         'f': {'o': {'o': {'$$': {}}}}}
    

To query whether a word is present in a tree you basically compute

    
    
        reduce(dict.__getitem__, word, tree)
    

and test whether the result (a forest) has an end-of-word '$$' tree.

------
leif
Tries (actually pronounced "trees" or "Patricia trees") are very memory
efficient compared to most alternatives, but really only shine in membership
query workloads. If you actually need to return the stored element, you have
to rebuild it for every query, which is probably too expensive if you have
lots of queries, it would be better to store the whole object continuously
somewhere in the data structure so you could return a reference to it.

Even then, tries only win if the distribution is suitable to give you the
memory efficiency you are hoping for. If you don't have many common prefixes,
you're better off with just a hashtable.

~~~
zheng
Thanks, while I think the pronunciation is lame and confusing, I grow tired of
explaining to people that it is actually correct, they aren't called "trys" or
"treys".

~~~
nly
Pronunciation can be a very individual or regional thing when it comes to
technical abbreviations. I just finished watching the Channel9 GoingNative
talks (C++), and was surprised to find a couple of the most respected C++ guys
pronounce 'ptr' as 'put-er' (not 'putter') instead of 'pointer'.

Personally I'll continue to pronounce it as 'try'. It has, at least, fewer
conflicting interpretations in a programming context. When I say 'tree',
people will probably assume I mean a binary tree, not a radix tree. So, if I'm
not going at 30% light speed, I'll just say that.

~~~
leif
Mispronouncing variables is great fun. We have lots of "txn_XXX" which we
pronounce "texan_XXX", and "le_XXX" (short for "leafentry XXX", as in
"le_key"), which always make us sound like we're mocking French people.

~~~
nly
Now you're getting it. I also pronounce 'char' from C as if it's short for
charred, rather than character. Principally because I find it easier easier to
say 'char star' with that vocalisation. I know one guy who, if speaking
quickly, would pronounce 'char* sugar' as 'caster sugar'... come to think of
it, I'm not sure how Americans pronounce 'caster', but in the UK it's car-stir
sugar. Not 'Casper', as in the friendly ghost, but with a 't'.

I've never had a problem with people using different pronunciations. It
certainly keeps talking about code fresh.

~~~
bjeanes
cache is a fun one (cash or kay-sh?). Most Australians I know (and possibly
people from the UK too) say it as "cash"

~~~
DigitalJack
Under what rules of pronunciation would it be Kay-sh?

I recognize pronunciation is more about exceptions than rules, but even so...

------
chatman
A trie is a poor data structure in practice. There is no locality of reference
that can be exploited: to lookup a single key, almost the entire length of the
memory blocks occupied by the trie needs to be accessed, and almost at random
(depending on the insertion technique). For keys inserted later on, the number
of different blocks of memory accesses needed for a single key is proportional
to the length of the key.

A B-Tree is better due to better exploitation of locality of reference in
memory.

~~~
swannodette
It depends on the kind of trie that you build - it's worth taking a look at
Hash Array Mapped Tries
[http://lampwww.epfl.ch/papers/idealhashtrees.pdf](http://lampwww.epfl.ch/papers/idealhashtrees.pdf)

The Clojure programming language is completely designed around several
immutable variants.

------
fogleman
Should mention that tries can be heavily compressed into DAWGs (Directed
Acyclic Word Graphs) by eliminating common subtrees. I've used this in word
games on iOS where I want a small memory footprint.

P.S. I love tries!

~~~
nathell
"How to Squeeze a Lexicon" by Ciura and Deorowicz [1] is a good and practical
introduction to DAWGs.

[1]:
[http://sun.aei.polsl.pl/~mciura/publikacje/lexicon.pdf](http://sun.aei.polsl.pl/~mciura/publikacje/lexicon.pdf)

------
tieTYT
I think the article was well written. This is the standard example to use for
a Trie. But I never run into any scenarios except for this that make me think,
"A Trie is a good fit for this!" What are some other examples where Tries come
in handy?

~~~
tmoertel
I've used prefix trees to represent matching rules for network-routing
rulesets. They're also handy when you need to compute intersections because
you can perform a parallel depth-first search over two prefix trees and prune
from the search any nodes that don't have a corresponding node in the other
tree.

For a fun example of this last application, there was a recent Google Code Jam
problem called "Alien Languages" [1]. My solution [2] basically counts the
leaf nodes in the intersection of two prefix trees. (Note that we can compute
the count on the fly during the search and need not actually construct the
intersection.)

[1]
[http://code.google.com/codejam/contest/90101/dashboard#s=p0](http://code.google.com/codejam/contest/90101/dashboard#s=p0)

[2] [https://github.com/tmoertel/practice/blob/master/google-
code...](https://github.com/tmoertel/practice/blob/master/google-code-
jam/2009/qual-A-alien-language/alien_language.py)

------
supo
With a basic Trie, you always have this problem of how to keep the pointers at
each node. Naively, you can:

1) Keep an array of size equal to the alphabet size, indexed by the
characters, holding pointers to successive nodes.

\- but this blows the memory to O(N * alphabet_size) where N is the input size

2) Keep a linked list of pairs (character, pointer).

\- but this blows the lookup time to O(word_length * alphabet_size)

3) Keep something like an STL map<character, pointer>.

\- still suboptimal complexity of O(word_length * log alphabet_size) + it
complicates the code + it dramatically increases the constant time per
operation

Or you research a bit more and actually learn a way to properly implement a
trie, called a Ternary Search Tree:
[http://en.wikipedia.org/wiki/Ternary_search_tree](http://en.wikipedia.org/wiki/Ternary_search_tree)

~~~
anaphor
Or you decide that your problem is small enough that the poor memory
complexity doesn't matter much.

~~~
supo
Sure, but it can blow up faster than you think. Say you have 100k URL strings,
up to 128 chars each, with 64 allowed characters in alphabet.

\- storing this at byte per char = 12.8MB

\- storing this in a basic trie = 12.8 * 64 = 820MB

\- storing this in the TST = 12.8 * 3 = 38MB

~~~
anaphor
Sure, I was thinking on the order of <100 strings around 5-10 characters each
at most. I agree that the TST is much better suited for large sets of strings.
Also I think more developers should actually do the calculations even if they
only have an estimate of how large the inputs will be.

~~~
supo
Ah, in that case it goes straight to set<string> for me :-) I think that in
practice you want to do that but then drop in a few asserts to check whether
things are not getting out of control once you forgot about this decision.

------
k3n
John Resig did a really cool series[1][2][3] of deep-dive posts into
optimization in JS for mobile, which covered tries (among others) -- it was
the first I'd heard of them. Fascinating read.

1\. [http://ejohn.org/blog/dictionary-lookups-in-
javascript/](http://ejohn.org/blog/dictionary-lookups-in-javascript/)

2\. [http://ejohn.org/blog/javascript-trie-performance-
analysis/](http://ejohn.org/blog/javascript-trie-performance-analysis/)

3\. [http://ejohn.org/blog/revised-javascript-dictionary-
search/](http://ejohn.org/blog/revised-javascript-dictionary-search/)

------
dnr
If you like tries, take a look at the double-array trie: it's a pretty nifty
(though somewhat complicated) way to greatly reduce the space usage of tries
by interleaving nodes in the same block of memory. It also makes it easier to
serialize a trie, so you can quickly load a pre-built one instead of re-doing
all the inserts, which is useful for static dictionaries.

[http://linux.thai.net/~thep/datrie/](http://linux.thai.net/~thep/datrie/)

~~~
GhotiFish
Thanks for the link! I wish I'd known about this implementation.

------
munificent
Tries haven't been entirely neglected. Quadtrees and octrees are their 2- and
3D analogues, and those are very common in games, renderers and other spatial
simulations.

~~~
Scaevolus
Quadtrees and octrees are far more related to binary space partitioning than
tries.

------
pathikrit
[DAWGs]([http://en.wikipedia.org/wiki/Directed_acyclic_word_graph](http://en.wikipedia.org/wiki/Directed_acyclic_word_graph))
are a special kind of Trie where similar child trees are compressed into
single parents. I extended modified DAWGs and came up with a nifty data
structure called ASSDAWG (Anagram Search Sorted DAWG). The way this works is
whenever a string is inserted into the DAWG, it is bucket-sorted first and
then inserted and the leaf nodes hold an additional number indicating which
permutations are valid if we reach that leaf node from root. This has 2 nifty
advantages:

Since I sort the strings before insertion and since DAWGs naturally collapse
similar sub trees, I get high level of compression (e.g. "eat", "ate", "tea"
all become 1 path a-e-t with a list of numbers at the leaf node indicating
which permutations of a-e-t are valid). Searching for anagrams of a given
string is super fast and trivial now as a path from root to leaf holds all the
valid anagrams of that path at the leaf node using permutation-numbers.

------
anaphor
I just used a Trie yesterday to implement some string matching that I needed
to do in a tokenizer :) It's quite useful if you want to do string matching
and you know ahead of time the exact list of strings you need to match with.
My only issue with this article is that it explains them using Java, which I
think just obscures the explanation too much, but I guess it's a Java blog.

~~~
ArbitraryLimits
Some people say to use a perfect hash function in that case, IMO it's more
trouble than it's worth.

------
tehwalrus
The _Dasher_ input method (aimed at people with disabilities who are able to
enter data only by manipulating a pointer) uses a visualized trie with vertex
areas proportional to linguistic probability:

[http://www.inference.phy.cam.ac.uk/dasher/DasherSummary2.htm...](http://www.inference.phy.cam.ac.uk/dasher/DasherSummary2.html)

(Prof. Mackay, its author, uses it to teach information theory and
compression, among other things.)

------
andrewparker
Implementing the Trie (on paper, in C) was about 50% of my final exam on my
intro to CS undergrad class at Stanford. We had never seen Tries befor that
point, but we had done a variety of trees, so we had the necessary prior
experience. It was a fun exam and a very memorable experience.

~~~
supo
You went to Stanford and have never seen a Trie in highschool? There are a few
highschools in Bratislava, Slovakia that teach basic data structures in their
Intro to programming courses.

~~~
Pxtl
North American high-school programming classes were not known for quality.
Programming was not generally a high priority item for the school boards so
teachers were often completely left on their own for curriculum, and often
were amateur programmers at best (if programmers at all).

The end result is that university CS programs are run with the assumption that
highschool students had zero prior experience with programming.

There's a reason that so many software geeks are hardcore libertarians and
Bill Gates was fighting to reform teaching into a meritocracy - the
educational system basically ignored their core skill-set and so the students
were often self-taught. Obviously schools are playing catch-up now, but you
can't change the past experiences of two generations of programmers.

~~~
rch
The computer science AP classes at my high school were taught by the typing
teacher, after she took a summer course. The instruction consisted entirely of
the slides from that course, followed by time to work independently on pretty
much whatever, in Pascal.

------
awda
Tries have hardly been neglected. "Path-compressed" tries were very recently
added to FreeBSD[0] for use in the vm subsystem.

[0]:
[http://fxr.watson.org/fxr/source/sys/pctrie.h](http://fxr.watson.org/fxr/source/sys/pctrie.h)

------
sambeau
Tries are great for caching URLs & other path-like data. I've found them to be
faster (and smaller) than hashes for these purposes in the past.

However, this was back in the days of tiny caches. I suspect they may have
lost the edge they had in 2003.

~~~
sspiff
This contradicts chatman's comment at the top, who claims a trie has poor
locality. Since you have some experience with tries, do you have an
explanation on how to implement tries with good locality properties?

~~~
sambeau
I'd take his word for it as my tests are 10 years old!

And my apologies - looking back it was Patricia trees I was using as you can
take a pointer to a mid-point in the path: useful for disk caches etc.

I was using them for small dictionaries (mainly symbol table-size structures)
comparing them against an early super-fast hash (and I recall for really small
dictionaries linear search as I naively assumed that a linear search would win
for tiny dictionaries - it didn't). In all my tests then (as I say, 2003, on a
3 year old IBM Thinkpad) the tries won.

------
nawitus
> ~O(1) access time

What's the difference between O(1) and ~O(1)?

~~~
mhurron
I've always seen ~ as used for about, so ~1h is somewhere around one hour.

I would read it as close to, but maybe not quite O(1).

~~~
msluyter
I think the point may be that the fuzziness is already built into big-O
notation. That is O(n) + C = O(n), where C is a constant.

~~~
nawitus
If only set notation was more common with big-O notation. 'O(n) + C = O(n)'
can be confusing, whereas 'f(x) ∈ O(n) ∧ f(x) ∈ O(n) + C' makes sense.

~~~
jlarocco
The set notation you gave is redundant. There's always an implied "+C" in any
O(...), so why waste space writing it out multiple times?

Furthermore, using precise set notation would, IMO, make it more difficult to
use. A big blob of set notation would not be easy to read for most people.

~~~
nawitus
>The set notation you gave is redundant. There's always an implied "+C" in any
O(...),

No. O(n) and O(n)+C and different sets. However, if an algorithm belongs to
the first set, it also belongs to the latter. The confusion comes from using
the equality sign instead of the "element of" operator.

>Furthermore, using precise set notation would, IMO, make it more difficult to
use. A big blob of set notation would not be easy to read for most people.

No, the only difference is that f(x) = O(n) becomes f(x) ∈ O(n).

~~~
jlarocco
All big-O has an implied "+C", so O(n) is equivalent to O(n)+C by definition.
Where is the confusion?

~~~
nawitus
>All big-O has an implied "+C"

Can you state this in a mathematically precise notation?

>Where is the confusion?

The confusion comes from stateents like 'O(n)+C = O(n)', which are actually
false. O(n)+C and O(n) are _different_ sets. If a function belongs to the
latter set, it also belongs to the former, but that doesn't mean they're
equal.

Using the equal sign here is "abuse of notation". 'O(n)+C = O(n)' would seem
to implicate that C=0, but that's untrue. C is a constant which can be non-
zero. Therefore the whole statement doesn't really make sense unless we assign
a new definition to the equal sign.

Using set notation solves all the confusion.

'Implied +C' is pretty much nonsense. There are an infinite number of
different sets that a single function belongs to. Just because f(x) belongs to
O(n) and therefore to O(n)+C doesn't mean that 'big-O implies +C'. 'big-O'
doesn't imply anything like that. '+C' is just a special case, and there's an
infinite number of extra 'implies'.

~~~
anonymoushn
[http://en.wikipedia.org/wiki/Big_O_notation#Formal_definitio...](http://en.wikipedia.org/wiki/Big_O_notation#Formal_definition)

A function f(x) is a member of the set O(n) iff there is some M,x0 such that
for all x>x0, |f(x)|<=Mn. This means functions like lg n, n+30, n+1e999,
1e999n + sqrt(n), sin(n), and 7 are all members of the set O(n).

It is difficult to give a formal proof that O(n) + C = O(n) because there
isn't really a formal notion of what is meant by O(n) + C. To me, that looks
like the result of adding a real constant to a set of functions. It is common
to see people do algorithmic analysis and replace terms in a runtime with a
set they belong to, in which case an expression like O(n) + C might appear,
but what this means is just "some function that is in O(n) plus some
constant," and it is simple to prove that the set of all functions that can be
described in this way is equal to the set O(n). O(n) is closed under the
operation of adding constants, and the "adding constants" operation is its own
inverse.

------
ape4
This blog post appears to be to get the author's resume looked at. Seems to be
working.

~~~
macNchz
17 years of both Java and JavaScript, now ain't that something...

