And the counterpoint is that some instructions that are commonly believed to be slow (eg. the string instructions like rep movs, lods, stos), are in fact fast on modern processors.
Now I'm curious - do you have performance numbers somewhere for this? The rep instructions can actually be shorter than a call to str*, so if rep is actually fast enough then it might make a nice optimization.