Hacker News new | past | comments | ask | show | jobs | submit login

> Wouldn't the instruction be bigger if EAX was used? I guess it would have to have 4 byte constant then, with this encoding it's one byte. How would you clear the 2 lowest bits in the second byte of the EAX?

Indeed: Here a list of encodings of the original instruction and natural alternatives:

> and ah,0xfc: 80E4FC [3 bytes]

> and ax,0xfcff: 6625FFFC [4 bytes]

EDIT: Note that

> and eax,0xfffffcff: 25FFFCFFFF [5 bytes]

does something different: It also zeros the upper 32 bits of the rax register (Exercise: Why?), which is important since the next line

> movq %rax, (%rbx)

depends on the fact that this has not been changed.




At least, as the CPU bug manifests only when "AH, BH, CH or DH" are used, the compilers could be patched to use the 4 byte variant. But if Intel properly updates all the affected processors, it's not going to be needed.

(Response to wolfgke's edit: don't forget the "sign extend" variants).


We pondered reporting this directly to GCC too so that they can implement a workaround. But despite the buzz generated by this issue in the press I believe Intel is right in saying that this bug is very unlikely to trigger.

They say it can only happen in tight loops, and although the C code is quite large, the hot path has only 17 instructions in the loop, which only sequential memory loads (since the GC is scanning the heap linearly and most ocaml values are small).

GCC actually did quite a good job here : it managed to lay down the code so that the "memory reachable" branch is the shortest one. Indeed, when scanning the major heap in a generational GC assuming that most values scanned are reachable is reasonable.

In this case, the loop condition will likely be predicted true, and the other jumps predicted false, and there is a very tight loop with no random memory load/store. I suspect those conditions are very unlikely.


Fortunately, normally immediate constants are still 32-bit even if operating on 64-bit registers, so the cost is only the REX byte.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: