Hacker News new | past | comments | ask | show | jobs | submit login

You can interleave most operations how you like, without any extra latency, each op starting as soon as all the results are ready, regardless of where they were computed.

There are some exceptions where "domain crossing", or using a very different operation costs an extra clock cycle. Notably, in vector registers using FP or integer operations on the result of the different type of op, as modern CPUs don't actually hold FP values in their IEEE754 transfer format inside registers, but instead registers have hidden extra bits and store all values in normal form, allowing fast operations on denormals. The downside of this is that if you alternate between FP and INT operations, the CPU has to insert extra conversion ops between them. Typical cost is 1-3 cycles per domain crossing.






Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: