It uses the FOLLOWING character to do big jumps, after failing a vectorized forward compare (memcmp)
This is the good shit, my friends
When I had to implement it for an assignment I thought I was going mad. E.g. everybody mentions the good suffix rule can be created in O(n), but no one explained how to, or even sold their O(n²) as the real deal. Then they talk about borders and such... I had to come up with my own thing based on the Z-algorithm... and then realized that what the others were doing as well, but can't fucking spell. And yeah, if they drawn that connection they would have the O(n) connection at hand too. Btw. it matters for real life performance.
ADCBCADCBA <- haystack
CBA <- needle
So here we shift by 2 so the Bs line up.
The main point is the amount to shift depends only on the next character so it can be precomputed and stored in an array (qsBc). You always shift by 1 to line up an A, 2 to line up a B, and 3 to line up a C. Anything else you shift by 4.
i: Index of comparison, running 0..n
p: Index in pattern P, running m..0
t: Index in text T, running 0..n, t=i at cycle beginning.
Is the trick to not seek the character mismatching at T[i], but T[t+1], in front of the pattern-text-alignment?
I think that's the Sundance (?) variance or something. It's quite neat because you can do the simple BC preprocessing in O(m) and access in O(1) and still lose nothing over the extended BC, if the text is random (at least I can then prove it via expected value for character occurrences). So you simplify BC pre-processing, access times and gain one char possible shift distance.
However I am not sure how this goes with repetitive texts theoretically. And e.g. DNA is not random, although at our level of understanding it probably is practically.
In my implementation cutting GS didn't improve performance. Tho, I wrote it in Rust and then the compiler really weirdly optimize for cache misses in a not predictable manner; it mattered way too much where I put functionally equivalent code (do A, then so B wasn't the same as do B, then do A, for the most trivial and independent things).
Here he explains how to overcome its limitations in one case. A likely bug in the loop unroller. With --unwind N you can tell CBMC how far to unroll automatically, and --depth is for recursion. In this case he just unrolled it manually.
Note: should check if it works out of the box with a current version. CBMC is getting better and better lately.
I often use satabs then, which doesn't struggle with unbounded models, but has other limitations.