I think that's taking a dim view of what machine vision should be capable of. If a picture is rotated 180 degrees, upside down text should be recognized as its flipped version, i.e., the word "6op" should be read as "dog" having been rotated. Similarly a machine should be able to tell if it's looking at the reflection of the sky in a lake. And disregarding entire scenes, any object upside down (upside down chair, upside down dog, upside down car, ...) is clearly recognizable as such to a human, and hence should be to a machine as well.
Upside down objects and scenes look alien to us. We can recognize them but not without effort. It's a combination of having seen objects upside down before and being able to reason about what we are seeing. It does not come for free from invariance.
The PIRL technique in question here seems useless to me, because its loss deliberately trains it to behave differently from what a human would do. But that overview page https://amitness.com/2020/02/illustrated-self-supervised-lea... is gold.