Valid points, although I have another perspective on this bit:
> But in the end: yes the reordering done by the CPU is the issue
I think from a programmer perspective, the CPU side of things is mostly beside the point (unless you're writing assembly), and this contributes to the misunderstanding and air of mystery surrounding thread safety.
At the end of the day the CPU can do anything, really. I'd argue this doesn't matter because the compiler is generating machine code, not us. What does matter is the contract between us and the compiler / language spec.
Without language-level synchronisation the code is not valid C/C++ and we will likely observe unexpected behaviour - either due to CPU reordering or compiler optimisations, doesn't matter.
I think the article is somewhat missing the point by presenting the case somewhat pretending that the compiler is not part of the equation.
It seems like often people think they know how to do thread safety because they know, e.g. what reorderings the CPU may do. "Just need to add volatile here and we're good!" (probably wrong). In reality they need to understand how the language models concurrency.
We could translate that queue code into another language with a different concurrency model - e.g. Python - and now the behaviour is different despite the CPU doing the same fundamental reorderings.
This is true but in practice it's pretty common to find this sort of code seems to work fine on x64 because the compiler doesn't actually reorder things and then sometimes blows up on ARM (or PowerPC, though that's less commonly encountered in the wild these days).
> But in the end: yes the reordering done by the CPU is the issue
I think from a programmer perspective, the CPU side of things is mostly beside the point (unless you're writing assembly), and this contributes to the misunderstanding and air of mystery surrounding thread safety.
At the end of the day the CPU can do anything, really. I'd argue this doesn't matter because the compiler is generating machine code, not us. What does matter is the contract between us and the compiler / language spec. Without language-level synchronisation the code is not valid C/C++ and we will likely observe unexpected behaviour - either due to CPU reordering or compiler optimisations, doesn't matter.
I think the article is somewhat missing the point by presenting the case somewhat pretending that the compiler is not part of the equation. It seems like often people think they know how to do thread safety because they know, e.g. what reorderings the CPU may do. "Just need to add volatile here and we're good!" (probably wrong). In reality they need to understand how the language models concurrency.
We could translate that queue code into another language with a different concurrency model - e.g. Python - and now the behaviour is different despite the CPU doing the same fundamental reorderings.