The 60% accuracy is for a 50ms glimpse of a photo "mostly devoid of cultural cues". In real life you get much richer information for much more time, so it seems reasonable to conclude that 60% is a lower-bound for real-world gaydar accuracy.
Unless the extra information is somehow misleading.
I'm thinking here of the example of lie detection. Humans generally do pretty well at detecting lies when they're just reading text. They do slightly less well if they can hear the person talking. And if the person is visible, we don't do any better than chance.