

Lightweight Indexing for Small Strings - silentbicycle
http://spin.atomicobject.com/2014/01/13/lightweight-indexing-for-embedded-systems/

======
jibsen
One trick you could try is: in find_longest_match, if you already have a
match, check if the byte at match_maxlen matches before doing the linear
compare off all bytes up to it.

If that one byte does not match, the entire match has no chance of being
longer than the current best (in this simple case).

~~~
silentbicycle
That did speed things up a bit further:
[https://github.com/atomicobject/heatshrink/commit/38b8025895...](https://github.com/atomicobject/heatshrink/commit/38b8025895fd5a2b5955c3f64f79369fed965eb8)

------
ccleve
A nice trick. It could be used for generalized string search as well as
compression. And if you indexed bigrams instead of single characters, it could
be even faster.

I especially like the clear, easy-to-understand, well-written presentation
along with links to prior art. Wouldn't it be nice if most academic papers
were written like this?

~~~
silentbicycle
Thanks! There are much better indexing algorithms if you don't have such tight
resource constraints, but it may be a useful trick to speed up inner loops in
some other algorithms.

I realized yesterday that it can also be used as the basis for a linear-time
sorting algorithm:
[https://gist.github.com/silentbicycle/8389129](https://gist.github.com/silentbicycle/8389129)

Benchmarking indicates that it's probably not competitive speed-wise compared
to counting sort (of which it is a variant), but the implementation should be
pretty easy to understand. It may be good for pedagogical purposes.

