x86 processors simply run a instruction length predictor the same way they do it for branch prediction. That turns the problem into something that can be tuned. Instead of having to decode the instruction at every byte offset, you can simply decide to optimize for the 99% case with a slow path for rare combinations.