
Is Prefix of String in Table?  a Journey into SIMD String Processing - trentnelson
http://trent.me/is-prefix-of-string-in-table/
======
voidmain
There is an asymptotic speedup available for this problem, to the extent that
there are lots of prefixes in the table: a finite state automaton representing
the regular expression ^(prefix1|prefix2|prefix3|...) can be matched against a
length N string in O(N) time, regardless of how many prefixes are in the
regex. This approach is also behind the Aho-Corasick algorithm, which I think
finds all occurrences of the strings but could be easily modified to find only
prefixes.

I don't know if, in practice, the constant factor speedup in the article would
beat out this asymptotic advantage for the table sizes that matter to the
author. But the article doesn't mention considering and rejecting this
approach.

~~~
trentnelson
You can't do Aho-Corasick or any form of tree/trie structure (practically)
without using pointers, and any form of pointer-chasing algorithm is going to
have a tough time competing with the lower bounds of this algorithm, when
we're dealing with 6 to 13 CPU cycles.

This data structure is specifically optimized for small sets of short strings.
If we were dealing with larger data sets we wanted to match against, then a
comparison against other contemporary techniques would be warranted.

~~~
voidmain
Yes, I agree, the article's approach will definitely win for small tables. But
readers of this article may want to solve the same problem on big tables, so
should be aware that there's an asymptotically faster solution.

(Actually, I think you could probably use the article's approach in
combination with DFA techniques to build something that's both fast and
scalable. Something like a finite state automaton doing something like what's
in the article at each node. But I don't need this and am not going to invest
in trying to figure it out or learn whether there is prior art.)

