Hacker News new | past | comments | ask | show | jobs | submit login

And the counterpoint is that some instructions that are commonly believed to be slow (eg. the string instructions like rep movs, lods, stos), are in fact fast on modern processors.



Now I'm curious - do you have performance numbers somewhere for this? The rep instructions can actually be shorter than a call to str*, so if rep is actually fast enough then it might make a nice optimization.


It depends if the strings are aligned or not, the size of the copy, and also on the generation of processor. There are some good answers here:

http://stackoverflow.com/a/9177369

https://stackoverflow.com/questions/12359228/reliable-inform...

https://stackoverflow.com/questions/8425022/performance-of-x...


Reasonably modern versions of GCC will emit various rep instructions in some cases.

Some code I just compiled with GCC 6.1.1 had several snippets like this emitted for zeroing with memset:

    xor eax,eax
    ...
    rep stos QWORD PTR es:[rdi],rax
and some rep movs for memcpy/memmove.


really? I wonder if the timing of rep and string instructions have improved at all?


They operate on cacheline-sized pieces ever since the P6.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: