> No, this is definitely not how atomics with memory orderings work. Of course i...

> No, this is definitely not how atomics with memory orderings work.

Of course it is.

> If you do two atomic reads, the CPU will do two load instructions. Neither the compiler or the CPU is allowed to optimize this.

If you perform two atomic loads on the same location in sequence, then it is completely feasible and normal that the two loads would return exactly the same value, thus the second load can be optimised away. Even under sequential consistency, this is perfectly, well, consistent. It's also perfectly valid to fold atomic writes. See http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/n445... for more discussion on the subject

No compiler currently bothers doing that (that I know), but it's not because they can't, it's because the effort is probably not worth it, at least at the moment.

> I've done lots of mmio code on ARM at $work in the past few years, and we've spent a lot of time to make sure we have all our memory barriers right. They are quite tricky on ARMv8, as mmio can be either on the system bus or the main memory bus which need distinct kinds of memory barriers.

I can believe that.

> Using volatile in mmio code on modern CPUs is almost certainly a bug. Some microcontrollers may be an exception.

Not using volatile in mmio code on modern compilers is almost certainly a bug as well.