

Regular Expression Search with Suffix Arrays - srsamarthyam
https://blog.nelhage.com/2015/02/regular-expression-search-with-suffix-arrays/

======
TheLoneWolfling
I wonder how efficient this is in the worst case. In particular, something
like [02468]+ (or something similar. A discontinuous range repeated a bunch of
times.)

I wonder how this would compete with a full suffix tree or trie - or better
yet, suffix DAG, which will generally take a whole lot more time to construct
but may be (much) smaller.

~~~
adamtj
> but may be (much) smaller.

The suffix array is already very small. It doesn't store the full suffixes,
just the numbers. It's basically a list of pointers with one pointer for each
character in the input data. The characters aren't duplicated, unlike a suffix
tree. In fact, the suffix array is just the leaf nodes of a suffix tree in
depth-first order. The internal nodes are omitted. Further, you can store it
as an array instead of a linked list, meaning you halve the number of pointers
to store.

~~~
TheLoneWolfling
I mean that a suffix DAG would be smaller than a full suffix tree, sorry that
that was confusing.

I am aware that a suffix array is smaller than a suffix DAG.

------
pronoiac
Oh, cool! I've looked into suffix trees and suffix arrays, by way of
investigating how diff works. While I found suffix trees hard to understand,
this is a great explanation of suffix arrays. Suffix arrays can take much less
space - I've heard 1/4 as much space? - than suffix trees.

