
Introduction to Tries - mriley
http://drmcawesome.com/IntroductionToTries
======
naz
I've recently been working on directed acyclic word graphs. They are a small
modification to tries and can be built with (I think) the same time
complexity. For each node, count the distance from EOW when building, maintain
a list of nodes with the same EOW distance and merge the ones that are equal
strings. They use much less space by sharing suffixes as well as prefixes.
<http://en.wikipedia.org/wiki/Directed_acyclic_word_graph>

------
carterschonwald
for an excellent example of a trie implementation, the haskell bytestring trie
datastructure is really nice to look at. Its docs can be found at
<http://hackage.haskell.org/package/bytestring-trie> and the very readable
sourcecode at [http://hackage.haskell.org/packages/archive/bytestring-
trie/...](http://hackage.haskell.org/packages/archive/bytestring-
trie/0.2.2/doc/html/src/Data-Trie-Internal.html)

edit: note that its not strictly a trie, but can be used as one, in general
its more like it does a trie-like key value lookup.

edit: it is also worth noting that its pretty fast, counting reading data in
from the file system and other overhead, loading 1.5 million key - value pairs
of realworld production data into the trie, and then making my first query, in
under 30 seconds. And thats without even doing the sort of preprocessing that
can make it much much faster! (namely, lexigraphically sorting the strings I'm
using as keys before doing the k-v insertions)

------
dustrider
personally I prefer this intro: <http://marknelson.us/1996/08/01/suffix-
trees/>

or this one which is slightly harder:
<http://linux.thai.net/~thep/datrie/datrie.html>

------
praptak
_"First, lookup time is O(1) in the size of the trie."_

The article skims over the tricky part - the dependency on the size of the
alphabet. Which you can no longer treat as insignificant in the days of
unicode.

~~~
Robin_Message
Good luck building a Unicode trie -- the branching factor would be too high,
never mind lookup time. Instead, you'd make the trie of an encoding, probably
UTF-8 (off the top of my head) that would enable you to keep the branching
factor at 256, which is already rather large but doable (You can switch to
Judy arrays if the wasted space bothers you.)

Does anyone know, is there a Unicode encoding that enables you to map
arbitrary ranges (so I can, for example, use the greek alphabet only at 1 byte
per character or less)? I suppose UTF-8 is already hard enough to decode.

~~~
shadytrees
Ternary search tries attempt to solve this problem (among others):

<http://en.wikipedia.org/wiki/Ternary_search_tree>

<http://www.strchr.com/ternary_dags>

------
VolatileVoid
This is interesting. I wonder if you could use the a similar algorithm for
mapping sentences. That is, construct a ternary DAG and keep inserting words
(rather than letters) into the trie.

~~~
silentbicycle
Yes, of course. You can index tries with pretty much anything, though elements
with constant-time comparison (individual bytes/chars, pointers, hashes, etc.)
will have less overhead.

------
bruceboughton
How is "trie" pronounced? Like "try"?

~~~
michael_dorfman
According to Wikipedia:

 _Following the etymology, the inventor, Edward Fredkin, pronounces it /ˈtriː/
"tree".However, it is pronounced /ˈtraɪ/ "try" by other authors._

<http://en.wikipedia.org/wiki/Trie>

------
lt
Is it me or there's no RSS in the site?

