> Which architecture’s memory model is more restrictive is not always easy to tell.
If I understand correctly, the guarantees requested by the code are already implicitly fulfilled by x86_64, but mapping them to AArch64 requires inserting machine instructions that actually give stronger guarantees than requested by the code (and than provided by x86_64). Since the code actually needs those stronger guarantees (but didn’t request them), it fails on x86_64 while working on AArch64. AArch64 is thus only more “restrictive” in the sense that it isn’t able to map the weaker guarantees in the way x86_64 is able to.
I’d say the main lesson is that the C++-level memory instructions may translate to stronger guarantees for a target platform with a nominally weaker memory model than another target platform. That is, the actual guarantees of compiled code aren’t necessarily monotonic with the strength of the memory model of a target platform.
tl;dr explanation of the bug: memory_order_acquire guarantees that things won't be moved from after it to before it, and memory_order_release guarantees that things won't be moved from before it to after it. The buggy code was doing atomicBase.store(nb, memory_order_release) followed by atomicEnd.load(memory_order_acquire), and relied on the order of those not being swapped, which is not guaranteed.
Since x86_64 has a strong memory model, that code just got compiled into regular MOV instructions there. The Intel software developer's manual says "The Intel-64 memory-ordering model allows a load to be reordered with an earlier store to a different location", and that's exactly what triggers the bug there.
If I understand correctly, the guarantees requested by the code are already implicitly fulfilled by x86_64, but mapping them to AArch64 requires inserting machine instructions that actually give stronger guarantees than requested by the code (and than provided by x86_64). Since the code actually needs those stronger guarantees (but didn’t request them), it fails on x86_64 while working on AArch64. AArch64 is thus only more “restrictive” in the sense that it isn’t able to map the weaker guarantees in the way x86_64 is able to.
I’d say the main lesson is that the C++-level memory instructions may translate to stronger guarantees for a target platform with a nominally weaker memory model than another target platform. That is, the actual guarantees of compiled code aren’t necessarily monotonic with the strength of the memory model of a target platform.