Where's the Race Condition? “works on AArch64, how can it fail on x86_64?”

layer8 · on May 27, 2022

> Which architecture’s memory model is more restrictive is not always easy to tell.

If I understand correctly, the guarantees requested by the code are already implicitly fulfilled by x86_64, but mapping them to AArch64 requires inserting machine instructions that actually give stronger guarantees than requested by the code (and than provided by x86_64). Since the code actually needs those stronger guarantees (but didn’t request them), it fails on x86_64 while working on AArch64. AArch64 is thus only more “restrictive” in the sense that it isn’t able to map the weaker guarantees in the way x86_64 is able to.

I’d say the main lesson is that the C++-level memory instructions may translate to stronger guarantees for a target platform with a nominally weaker memory model than another target platform. That is, the actual guarantees of compiled code aren’t necessarily monotonic with the strength of the memory model of a target platform.

josephcsible · on May 28, 2022

> AArch64 is thus only more “restrictive” in the sense that it isn’t able to map the weaker guarantees in the way x86_64 is able to.

I thought that it actually is able to map that, by using LDAPR instead of LDAR, and that the compiler just didn't for some reason.

layer8 · on May 28, 2022

You may be right, I was actually just assuming. Nevertheless, in general one has to expect both possibilities.

josephcsible · on May 27, 2022

tl;dr explanation of the bug: memory_order_acquire guarantees that things won't be moved from after it to before it, and memory_order_release guarantees that things won't be moved from before it to after it. The buggy code was doing atomicBase.store(nb, memory_order_release) followed by atomicEnd.load(memory_order_acquire), and relied on the order of those not being swapped, which is not guaranteed.

Since x86_64 has a strong memory model, that code just got compiled into regular MOV instructions there. The Intel software developer's manual says "The Intel-64 memory-ordering model allows a load to be reordered with an earlier store to a different location", and that's exactly what triggers the bug there.

On AArch64, that code got compiled into STLR and LDAR instructions, which can't be reordered in that way (see https://stackoverflow.com/q/67397460/7509065 and https://stackoverflow.com/q/65466840/7509065), so the bug didn't show up there.