This is fine work, and well presented. The only flaw (which I've discussed on Twitter with Trent) is the performance analysis against a single string at a time means we can't really analyze the effects of branch prediction on a realistic input (as the branch predictor will converge to 'perfection' almost immediately). I think these effects would be small, but it really does need to be properly analyzed.