Second of all, GANs are most definitely not just for graphics. They've been applied to text generation, to generating adversarial examples, to data preprocessing, etc.
Third, I have no idea what you even mean by "test set" in the context of GANs. It is true that it's hard to tell their performance, but that's irrespective of whatever you're talking about. It's hard to evaluate performance because we're usually judging the quality of the generated images, and we don't have any good ways of evaluating "perceptual loss", or how real an image looks.
As for the OP, GANs have been a very hot topic. Not as hot as this blog post makes them look perhaps (with nearly every paper about them...), but I wouldn't really disagree with any of the papers posted. Only one I'm not familiar with is the "most useful" one, but the rest were all pretty great papers imo. As for Ian Goodfellow, he's a very smart guy who seems to do a pretty good job explaining things. I saw a couple YouTube videos from him at a meetup covering his DL book, and he did a great job teaching.
Your third point is actually the point of those who disagree. It's the same reason why we have the principle of unfalsifiability in science.
Machine learning is typically split into supervised learning, unsupervised learning, and reinforcement learning, and GANs are usually considered part of unsupervised learning. I guess the part I don't understand is what you mean by "their concerns are valid"? What are their concerns about? Whether GANs are a promising path of research? And if GANs aren't part of machine learning what are they?
When you are trying to generate "realistic" samples of human concepts, the ultimate measure of evaluation is whether humans think that the output is realistic. So you have no choice but to ask humans to judge the quality of your results. That's a standard thing to do e.g. in text-to-speech generation, whether GANs are used or not.