Hacker News new | past | comments | ask | show | jobs | submit login

I am not sure I follow either.

i: Index of comparison, running 0..n

p: Index in pattern P, running m..0

t: Index in text T, running 0..n, t=i at cycle beginning.

Is the trick to not seek the character mismatching at T[i], but T[t+1], in front of the pattern-text-alignment?

I think that's the Sundance (?) variance or something. It's quite neat because you can do the simple BC preprocessing in O(m) and access in O(1) and still lose nothing over the extended BC, if the text is random (at least I can then prove it via expected value for character occurrences). So you simplify BC pre-processing, access times and gain one char possible shift distance.

However I am not sure how this goes with repetitive texts theoretically. And e.g. DNA is not random, although at our level of understanding it probably is practically.

In my implementation cutting GS didn't improve performance. Tho, I wrote it in Rust and then the compiler really weirdly optimize for cache misses in a not predictable manner; it mattered way too much where I put functionally equivalent code (do A, then so B wasn't the same as do B, then do A, for the most trivial and independent things).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: