
Cttrie – Compile-time trie-based string matching for C++ - texec
https://smilingthax.github.io/slides/cttrie/
======
joejoint
Why not just hash the string using constexpr ?

====================

C preprocessor based compile time hash from lolengine:

[http://lolengine.net/blog/2011/12/20/cpp-constant-string-
has...](http://lolengine.net/blog/2011/12/20/cpp-constant-string-hash)

====================

#define H1(s,i,x)(x _65599u+(uint8_t)s[(i)
<strlen(s)?strlen(s)-1-(i):strlen(s)])

#define H4(s,i,x) H1(s,i,H1(s,i+1,H1(s,i+2,H1(s,i+3,x))))

#define H16(s,i,x) H4(s,i,H4(s,i+4,H4(s,i+8,H4(s,i+12,x))))

#define H64(s,i,x) H16(s,i,H16(s,i+16,H16(s,i+32,H16(s,i+48,x))))

#define H256(s,i,x) H64(s,i,H64(s,i+64,H64(s,i+128,H64(s,i+192,x))))

#define HASH(s) ((uint32_t)(H256(s,0,0)^(H256(s,0,0)>>16)))

template<size_t N> constexpr uint32_t h(char const (&s)[N]) { return HASH(s);
}

constexpr uint32_t h(const char_ s) { return HASH(s); }

uint32_t h(const std::string& s) { return h(s.c_str()); }

int main(int argc, const char __argv) {

    
    
        auto s = "keep";
        std::string x = "simple";
    
        switch(h(x))
        {
            case h("keep")  : std::cout << "keep"; break;
            case h("it")    : std::cout << "it"; break;
            case h("simple"): std::cout << "simple"; break;
    
        };

~~~
ur-whale
The hash has to look at all the chars in the string.

Does the trie have to as well?

~~~
beached_whale
The string compare has a lot of branching and looks at both strings. The hash
method one checks the hash of the test string(size_t) with the size_t of the
other string. The hash itself, say fnv1a, can be very quick(a xor and a
multiply I think per character) I think I remember testing this and it was
quicker. ill try to find the test, but it is easily done. Either way, it's the
only way to use a switch statement this way

------
chubot
If you're OK with textual code generation, you can also use something like
re2c:

[http://re2c.org/](http://re2c.org/)

And then you get the power of regexes too. re2c will match an arbitrary set of
regexes (including constant strings) by walking through the string byte-by-
byte, a single time.

A trie is basically a special case of a DFA.

~~~
wmu
"A trie is basically a special case of a DFA." The best definition of trie I
read.

------
acidx
I like this trick that I use in C. It's limited, as it only matches string
prefixes, but it works pretty well in the end:
[https://tia.mat.br/posts/2018/02/01/more_on_string_switch_in...](https://tia.mat.br/posts/2018/02/01/more_on_string_switch_in_c.html)

It allows you to write something like this:
[https://github.com/lpereira/lwan/blob/master/src/lib/lwan-
ti...](https://github.com/lpereira/lwan/blob/master/src/lib/lwan-time.c) \--
which is generated as a binary search by GCC.

~~~
convivialdingo
Something we’ve used before is preprocessor string hashing similar to [1].
Basically we hash strings at compile time to ints and use those for our case
statements. Word of caution though, there’s the possibility of collisions. It
works great on a limited set of strings like XML tags and such.

1\. [http://lolengine.net/blog/2011/12/20/cpp-constant-string-
has...](http://lolengine.net/blog/2011/12/20/cpp-constant-string-hash)

------
wyldfire
Another approach not mentioned: LLVM's StringSwitch [1]. Unfortunately it's
not designed to be used independent of LLVM, so it has a dependency or two.
Also note that it's a slightly different use case (can only return different
values, is not used for flow control).

[1] [https://github.com/llvm-
mirror/llvm/blob/master/include/llvm...](https://github.com/llvm-
mirror/llvm/blob/master/include/llvm/ADT/StringSwitch.h#L23-L41)

~~~
BeeOnRope
It is mentioned right near the end of the slides.

------
lasagnaphil
Although I understand people want to do some crazy stuff with template
metaprogramming, I’m more concerned with how it will affect build times
(you’re basically running a tree-walk interpreter in a compiler that was not
supposed to run a turing complete language... it’s going to be very slow.)

For more practical reasons, what about just using runtime interned strings?
Each unique string is hashed and allocated in heap memory only once, and to
compare the strings you just compare the hashed values (which is just uint_64
comparison). Then you can easily use switch statements to compare your
strings... As for performance, interning has an initial runtime cost for each
string, but it will probably be miniscule compared to the massive compile time
cost induced by crazy constexpr stuff.

Example of a string interning library:

[https://github.com/mattiasgustavsson/libs/blob/master/docs/s...](https://github.com/mattiasgustavsson/libs/blob/master/docs/strpool.md)

------
stargrazer
If I recall correctly,
[https://www.boost.org/doc/libs/1_68_0/libs/spirit/doc/html/i...](https://www.boost.org/doc/libs/1_68_0/libs/spirit/doc/html/index.html)
uses trie's for it's compile time recursive descent parser for writing
grammars and lexical analyzers.

------
ape4
This only an issue if you want to have many hardcoded strings in a switch. If
you want that, there is a probably a better way to organize the code.

~~~
detaro
I'd assume that in most cases where this comes up, you're not exactly in a
position to not use those strings, e.g. because they're defined in an input
format you're processing.

------
saagarjha
Interesting. The way I would have thought of doing something like this is by
laying out each "switch" case sequentially in memory, then having the
"SWITCH(string)" macro expand out to something that compute the offset
necessary, using a similar pre-computed trie, and jump to that offset. That
way fallthrough should work as expected.

------
AndyKelley
Here is my response to this: [https://andrewkelley.me/post/string-matching-
comptime-perfec...](https://andrewkelley.me/post/string-matching-comptime-
perfect-hashing-zig.html)

