
Simple Trie Implementation in C++11 - xorz57
https://xorz57.netlify.com/posts/trie-implementation-in-c++11/
======
codr7
Like someone else mentioned, there's no need for Node pointers at all here,
storing Nodes by value cuts away a decent chunk of complexity and should run
plenty faster (especially when threads are linked).

I would consider switching the unordered_map to a sorted vector<pair<T,
Node>>; or even better, move the key into Node and simply store as
vector<Node>. Hash tables are relatively expensive to create in exchange for
excellent performance for large tables, and you're creating a lot of tiny
tables. A regular map would likely fall somewhere in between performance-wise
since they're cheaper to create.

~~~
psurge
I don't think Node can contain a std::vector<Node>, since STL containers
cannot have incomplete value types. But maybe I'm misunderstanding you?

I think absl::flat_hash_map<T, std::unique_ptr<Node>> would be worth
considering in this application. Keys and values are stored inline, so the
memory and creation costs should be comparable to std::vector -
[https://abseil.io/docs/cpp/guides/container#abslflat_hash_ma...](https://abseil.io/docs/cpp/guides/container#abslflat_hash_map-
and-abslflat_hash_set)

~~~
stabbles
Not all containers though, since C++17 std::vector, std::list and
std::forward_list do support incomplete element types as long as the allocator
meets certain requirements. So std::vector<Node> is definitely OK.

~~~
psurge
Thanks for pointing this out! I didn't know that C++17 had relaxed this
restriction.

------
psurge
Aren't the second and third lines of the insert/search implementations
superfluous? Casting root to a boolean will never return false, since it is
initialized. The insert implementation doesn't update root, so if root were
nullptr, insert would be throwing away its work.

~~~
xorz57
Yes, indeed!

------
Ciberth
I really love this! I saw your github repo "forest" as well. I wish there were
more bundled resources or code examples like this. Not because I think
everything should be out there in the open, free to get. But more as a way of
retrieving examples about (maybe complex) algorithms.

Traditional books fail me in most of the cases as I want modern techniques
combined with (maybe) "old" algorithms.

I would love it if people could show me modern examples and implementations
(and reasoning) about let's say things from clrs for example:

\- trees (red/black, splay, B, quadtrees, k-trees

\- dynamic programming

\- hasing (extendible, linear)

\- pairing heaps, binomial queues

\- shortest distances with dijkstra, johnson, bellman-ford

\- finite state machines etc

\- string algorithms like boyer-moore, knuth-morris-pratt

\- (...)

FYI: I had no account prior to this comment, created one just for you ;) Keep
up the good work OP!

~~~
xorz57
I am pleased to hear you like my work! I have written a Red/Black and a Splay
Tree Implementation as part of the forest repository but I wasen't satisfied
enough with those so I removed them. I am planning to implement more stuff in
the future so stay tuned!

~~~
Ciberth
Ohn it would have been cool to have a look. I have some start questions and
code from a few years back at uni but I really disliked the courses, prof and
way of programming. If you would be interested I can share them to gain some
insights or motivations on how to (maybe not) do it :)

~~~
xorz57
I feel you.

------
stabbles
Is `shared_ptr` just used for convenience? I think `unique_ptr` would be
excellent for this tree structure.

It's not too bad to traverse the tree using non-owning raw pointers.

~~~
nemetroid
I don't think there's need for smart pointers at all, really. The "children"
member could just be an unordered_map<T, Node>.

Edit: on second thought, having a class contain a container of itself actually
isn't allowed (though you would think it would be in cases like this).

Edit again: apparently it depends on the container. unordered_map does not
allow instantiation with an incomplete type, but other containers do (e.g.
vector or map).

~~~
stabbles
Yes, I was playing around with it as well. It seems the support has been out
there for a while, but officially since C++17 [http://www.open-
std.org/jtc1/sc22/wg21/docs/papers/2015/n437...](http://www.open-
std.org/jtc1/sc22/wg21/docs/papers/2015/n4371.html) was accepted, s.t.
std::vector, std::list and std::forward_list can work with incomplete types:

    
    
       struct example {
         std::vector<example> xs;
       };

------
xorz57
I will probably include this implementation in a repository of mine called
"forest". You can check it out here
[https://github.com/xorz57/forest](https://github.com/xorz57/forest) Or you
could copy paste it and open a pull request with additional features!

------
riffraff
Doesn't using a map to implement a trie defeat the purpose? I mean, would such
an implementation make sense in some situation?

~~~
panda88888
Do you mean using an array instead of map for storing pointer to the next
node? Conceptually, an array is just a map that uses an integer and identity
function and the hashing function.

In practice, the map implementation trades speed for space efficiency.

For an array implementation, the entire array of pointers has to be allocated
for all possible character set for each node. This leads to wasted space since
trie nodes usually only have child pointers for a subset of character sets.
Map implementation don’t have to allocate space to store pointers for the
entire character set, so it can be more space efficient. However, the hash
function is probably an order of magnitude more cycle than that for array
(usually just the ascii code with offset and some bound checking).

~~~
stabbles
Well, an unordered_map / hash map is in the end backed by a dynamically sized
array, so "Map implementation don’t have to allocate space to store pointers
for the entire character set" might not be true for two reasons: (1) there
_is_ an additional allocation* that is not present in the std::array case for
the dynamic array and (2) there is a default size of this unordered_map when
nothing has been inserted yet, are you sure it's smaller than 256 elements?

* Well, maybe there is something like small vector optimization going on for small unordered_maps, though.

~~~
panda88888
Agreed.

For the standard implementation, the dynamically allocated array may be bigger
than 256. I would imagine, since it’s C++, it’s probably possible to set the
initial array size.

------
xorz57
This implementation lacks operations like clear() or remove(). I could create
a repository if you would like to contribute.

~~~
panda88888
I would love to contribute. Ping me if you get the repo setup.

~~~
xorz57
Sure!

------
bhaavan
I couldn't come up with such a concise and succinct implementation even if I
tried.

~~~
egwor
I tried giving this a run from CLion and it crashes if I search first, see

#include "trie.hpp" #include <iostream>

int main() { trie<char16_t> trie;

    
    
        // Greek
        std::cout << trie.search(u"υπολογιστής") << std::endl;
        std::cout << trie.search(u"υπολογιστης") << std::endl;
    
        // English
        std::cout << trie.search(u"computer") << std::endl;
        std::cout << trie.search(u"compute") << std::endl;
    
    
        // Greek
      trie.insert(u"υπολογιστής"); //crash
    
    
      // English
      trie.insert(u"computer");
    
      // Greek
      std::cout << trie.search(u"υπολογιστής") << std::endl;
      std::cout << trie.search(u"υπολογιστης") << std::endl;
    
      // English
      std::cout << trie.search(u"computer") << std::endl;
      std::cout << trie.search(u"compute") << std::endl;
    
      return 0;
    }

~~~
xorz57
My apologies. I updated the post! Now it works just fine!

~~~
nemetroid
There's still an issue if you insert something you've searched for first,
e.g.:

    
    
      trie<char16_t> trie;
    
      trie.insert(u"a");
      trie.search(u"ab");
      trie.insert(u"ab");

~~~
xorz57
[https://github.com/xorz57/xorz57.netlify.com/commit/48d1c580...](https://github.com/xorz57/xorz57.netlify.com/commit/48d1c5803b3ff0ff1159fc952903aee48e4eecae)

My bad! I fixed it!

------
imedadel
On a side note, I recently started competitive programming in my college, and
while preparing for the upcoming xCPC competitions, I was wondering if I
should use "auto" and a more functional approach in C++ (instead of using int,
double, etc. and OOP). What do you think?

~~~
Ciberth
I don't like auto over int, and other primitives. But auto in for loops or
with iterators seems the way to go and is a bless!

~~~
mehrdadn
With begin()/end() it's probably fine, but for-each loops it's not a great
idea since then you run into problems when there was a cast intended. The main
(only?) example of this pattern in the standard is vector<bool>, so of course
people's immediate reaction to blame this on vector<bool>. However that's just
the example currently in the standard, and people use it for other things too.
By using auto you end up writing code that can break. Example:

    
    
      typedef std::vector<bool> Vec;
      Vec f(1, false);
      for (typename Vec::value_type &&x : f) { std::cout << typeid(x).name() << std::endl; }
      for (auto &&x : f) { std::cout << typeid(x).name() << std::endl; }

~~~
nemetroid
Do you have examples of cases other than vector<bool> where this is an issue?

~~~
mehrdadn
[https://eigen.tuxfamily.org/dox/TopicLazyEvaluation.html](https://eigen.tuxfamily.org/dox/TopicLazyEvaluation.html)

[https://stackoverflow.com/q/46321667](https://stackoverflow.com/q/46321667)

~~~
nemetroid
Thanks, those were interesting to read about. It doesn't seem like something
that would come up in the for loop case, though.

~~~
mehrdadn
I mean, you can imagine proxies for stuff beside bools and matrices...

    
    
      template<class It> struct iterator_range { It b, e; It begin() const { return b; } It end() const { return e; } };
      template<class It> iterator_range<It> make_iterator_range(It b, It e) { iterator_range<It> r = { b, e }; return r; }
    
      template<class T> struct endian_reversed_iterator {
       T *p;
       struct reference {
        T *p;
        operator T() const { return boost::endian::endian_reverse(*p); }
        reference &operator =(T const &other) const { *p = boost::endian::endian_reverse(other); }
       };
       endian_reversed_iterator(T *p) : p(p) { }
       reference operator *() { reference r = { p }; return r; }
       endian_reversed_iterator &operator++() { ++p; return *this; }
       bool operator!=(endian_reversed_iterator const &other) const { return p != other.p; }
      };
    
      int main() {
       unsigned arr[] = { 1, 2 };
       for (auto x : make_iterator_range<endian_reversed_iterator<unsigned> >(&arr[0], &arr[2])) {
        ++x;  // whatever
       }
      }

