It would be like attaching wings to a car, driving it down the highway, and then concluding that wings don't really help with speed.
Or another example would be be to take Pharrell's hat and placing it on some random person's head.
Context matters. Design is not the sum of its parts.
Is there a particular way you think it could be improved or do you just think that we shouldn't try to empirically compare something like flat vs non flat buttons?
Says who? The author goes into detail about what exactly they were measuring. The last two headings are "When Flat Designs Can Work" and "Limitations of the Study." Can you clarify exactly what you found "significantly flawed"? I don't get what you're trying to say by your wikipedia link.
> What works most of the time might not work in your specific case.
So what? Is it not worth knowing what works most of the time and under what assumptions?
I feel like Flat Design works for a specific use case, though you'll have to imagine one because I don't know of any, but it has been applied universally.
Many interfaces I encounter nowadays seem like every interaction is an Easter Egg, where clicking / tapping around randomly is World's Best Practice for feature discovery.
If you want to see how useful wings are, build a plane.
This plane metaphor is ridiculous and unhelpful. We're not trying to measure "how useful wings are", we're talking about what the buttons on the inside of the plane should look like, we need two cockpits with different buttons.