> We primarily rely on crowdsourced Mean Opinion Score (MOS) evaluations based on subjective listening tests. All our MOS evaluations are aligned to the Absolute Category Rating scale , with rating scores from 1 to 5 in 0.5 point increments. We use this framework to evaluate synthesized speech along two dimensions: its naturalness and similarity to real speech from the target speaker.
They're testing if the generated speech sounds natural with a well-defined and reproducible experiment. That's science.
There’s no investigation of the physical or natural world going on, unless they really think they’re modeling how humans are able to talk. But they’re not — they’re trying to create a system that works no matter how unnatural it is.
> There’s no investigation of the physical or natural world going on
I just quoted them describing their observational method! Do you just not believe psychology is a science?
> unless they really think they’re modeling how humans are able to talk
I've lost you. They're not generating birdsong. What do you think WaveNet does exactly?