Hacker News new | past | comments | ask | show | jobs | submit login

No, there is no such limit, but it's confusing.

Core2 had an 18 x86 instruction "loop stream detector".

Nehalem went to a 28 u-op (not instruction) LSD.

Sandy Bridge has a 1500 u-op "decoded instruction cache" plus a 28 u-op LSD.

Haswell is the same as SB, but doubles the u-op LSD to 56 if hyperthreading is disabled.

The confusing part is why there is still a tiny loop stream detector in addition to the much larger decoded iCache. Perhaps there is a small additional power savings for code that fits the smaller LSD?

But in general, you no longer have to worry about instruction decoding speed if your loops are less than 1500 u-ops. With some caveats: https://www-ssl.intel.com/content/dam/www/public/us/en/docum...




The 18-uop limit comes from how Sandy Bridge (and Haswell) handle the decoded instruction cache. A 32-byte line of the instruction cache can map up to 3 lines in the u-op cache, each of which can hold 6 instructions. Any 32-byte line of instructions that decodes to more than 3x6 = 18 u-ops cannot be stored in the u-op cache.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: