My understanding is that x86 implementations use speculation to be able to reorder beyond what's allowed by the memory model. This is not free in area and power, but allows recovering some of the cost of the stronger memory model.
As TSO support is only a transitional aid for Apple, it is possible that they didn't bother to implement the full extend of optimizations possible.
As TSO support is only a transitional aid for Apple, it is possible that they didn't bother to implement the full extend of optimizations possible.