Excellent! I built a string matcher based on these same principles at Intel while working on Hyperscan but never released it. I'm pleased to see that you have discovered and publicized this yourself as I can say the cat is out of the bag. :-)
You can use PEXT alone without even messing with SIMD as long as there's enough good distinguishing bits in the early (first/last 8) parts of the string. The only thing that complicates this is that, as in the example in the article, sometimes some strings are proper prefixes/suffixes of others (which one is problematic depends on which direction you're using).
You can use PEXT alone without even messing with SIMD as long as there's enough good distinguishing bits in the early (first/last 8) parts of the string. The only thing that complicates this is that, as in the example in the article, sometimes some strings are proper prefixes/suffixes of others (which one is problematic depends on which direction you're using).