>> Aligning the code means compiler will insert NOPs before the code you want to...

nkurz · on Jan 21, 2018

This functionality essentially already exists in the form of multi-byte NOP's: https://stackoverflow.com/questions/25545470/long-multi-byte.... Because of the way the decoder fetches instructions, any approach that requires the decoder to act conditionally upon anything other than individual instruction length is likely impossible.

While in theory NOP decoding could be a bottleneck, I think it would be a really rare occurrence. Usually a hot loop is going to be fed from the LSD or DSB caches, so the NOP's will already be removed. It would be interesting to see a benchmark that illustrates a case where excessive alignment actually causes a slowdown.

BeeOnRope · on Jan 22, 2018

A whole other approach to alignment would be to strategically lengthen earlier instructions so that the designed alignment is achieved. This avoids adding any executed nops.

It's not as hard as it sounds: there is lots of redundancy in the x86 encoding, so you can often add REX prefixes, make offsets longer or add an offset where it doesn't exist, etc, etc.

physguy1123 · on Jan 21, 2018

This is just a relative jump instruction and on intel can be just 2 bytes, although an ignop might fit into one.

It wouldn't solve the icache problem though. Also, writing to data inline with code would be super problematic performance-wise with modifying icache lines that currently are being executed from.